Requests for Research 2.0

Like our originalRequests for Research⁠(opens in a new window) (whichresulted⁠(opens in a new window)in⁠(opens in a new window)several⁠(opens in a new window)papers⁠(opens in a new window)), we expect these problems to be a fun and meaningful way for new people to enter the field, as well as for practitioners to hone their skills (it’s also a great way to get ajob⁠(opens in a new window)at OpenAI). Many will require inventing new ideas. Pleaseemail⁠us with questions or solutions you’d like us to publicize!

(Also, if you don’t have deep learning background but want to learn to solve problems like these, please apply for ourFellowship⁠(opens in a new window)program!)

If you’re not sure where to begin, here are some solved starter problems.

⭐ Train an LSTM to solve the`XOR`problem: that is, given a sequence of bits, determine its parity. TheLSTM⁠(opens in a new window)should consume the sequence, one bit at a time, and then output the correct answer at the sequence’s end. Test the two approaches below:

⭐ Implement a clone of the classicSnake⁠(opens in a new window)game as aGym⁠(opens in a new window)environment, and solve it with areinforcement learning⁠(opens in a new window)algorithm of your choice.Tweet⁠(opens in a new window)us videos of the agent playing. Were you able to train a policy that wins the game?

## Requests for Research

⭐⭐Slitherin’.Implement and solve a multiplayer clone of the classicSnake⁠(opens in a new window)game (seeslither.io⁠(opens in a new window)for inspiration) as aGym⁠(opens in a new window)environment.

⭐⭐⭐Parameter Averaging in Distributed RL.Explore the effect of parameter averaging schemes onsample complexity⁠(opens in a new window)and amount of communication in RL algorithms. While the simplest solution is to average the gradients from every worker on every update, you cansave⁠(opens in a new window)on communication bandwidth by independently updating workers and then infrequently averaging parameters. In RL, this may have another benefit: at any given time we’ll have agents with different parameters, which could lead to better exploration behavior. Another possibility is use algorithms likeEASGD⁠(opens in a new window)that bring parameters partly together each update.

⭐⭐⭐Transfer Learning Between Different Games via Generative Models.Proceed as follows:

⭐⭐⭐Transformers with Linear Attention.TheTransformer⁠(opens in a new window)model uses soft attention with softmax. If we could instead use linear attention (which can be converted into an RNN that usesfast weights⁠(opens in a new window)), we could use the resulting model for RL. Specifically, an RL rollout with a transformer over a huge context would be impractical, but running an RNN with fast weights would be very feasible. Your goal: take any language modeling task; train a transformer; then find a way to get the same bits per character/word using a linear-attention transformer with different hyperparameters, without increasing the total number of parameters by much. Only one caveat: this may turn out to be impossible. But one potentially helpful hint: it is likely that transformers with linear attention require much higher dimensional key/value vectors compared to attention that uses the softmax, which can be done without significantly increasing the number of parameters.

⭐⭐⭐Learned Data Augmentation.You could use a learnedVAE⁠(opens in a new window)of data, to perform “learned data augmentation”. One would first train a VAE on input data, then each training point would be transformed by encoding to a latent space, then applying a simple (e.g. Gaussian) perturbation in latent space, then decoding back to observed space. Could we use such an approach to obtain improved generalization? A potential benefit of such data augmentation is that it could include many nonlinear transformations like viewpoint changes and changes in scene lightning. Can we approximate the set of transformations to which the label is invariant? Check out theexisting⁠(opens in a new window)work⁠(opens in a new window)on⁠(opens in a new window)this⁠(opens in a new window)topic⁠(opens in a new window)if⁠(opens in a new window)you⁠(opens in a new window)want⁠(opens in a new window)a place to get started.

⭐⭐⭐⭐Regularization in Reinforcement Learning.Experimentally investigate (and qualitatively explain) the effect of different regularization methods on an RL algorithm of choice. In supervised deep learning, regularization is extremely important forimproving optimization⁠(opens in a new window)and for preventing overfitting, with very successful methods likedropout⁠(opens in a new window),batch normalization⁠(opens in a new window), andL2 regularization⁠(opens in a new window). However, people haven’t benefited from regularization with reinforcement learning algorithms such aspolicy gradients⁠(opens in a new window)andQ-learning⁠(opens in a new window). Incidentally, people generally use much smaller models in RL than in supervised learning, as large models perform worse — perhaps because they overfit to recent experience. To get started,here⁠(opens in a new window).pdf)is a relevant but older theoretical study.

⭐⭐⭐⭐⭐Automated Solutions of Olympiad Inequality Problems.Olympiad inequality problems are simple to express, butsolving⁠(opens in a new window)them often requires clever manipulations. Build a dataset of olympiad inequality problems and write a program that can solve a large fraction of them. It’s not clear whether machine learning will be useful here, but you could potentially use a learned policy to reduce the branching factor.

Want to work on problems like these professionally?Apply⁠to OpenAI!

Ilya Sutskever, Tim Salimans, Durk Kingma

Frontier risk and preparedness Safety Oct 26, 2023

OpenAI Red Teaming Network Safety Sep 19, 2023

Confidence-Building Measures for Artificial Intelligence: Workshop proceedings Conclusion Aug 1, 2023

Our Research * Research Index * Research Overview * Research Residency * OpenAI for Science * Economic Research

Latest Advancements * GPT-5.3 Instant * GPT-5.3-Codex * GPT-5 * Codex

Safety * Safety Approach * Security & Privacy * Trust & Transparency

ChatGPT * Explore ChatGPT(opens in a new window) * Business * Enterprise * Education * Pricing(opens in a new window) * Download(opens in a new window)

Sora * Sora Overview * Features * Pricing * Sora log in(opens in a new window)

API Platform * Platform Overview * Pricing * API log in(opens in a new window) * Documentation(opens in a new window) * Developer Forum(opens in a new window)

For Business * Business Overview * Solutions * Contact Sales

Company * About Us * Our Charter * Foundation * Careers * Brand

Support * Help Center(opens in a new window)

More * News * Stories * Livestreams * Podcast * RSS

Terms & Policies * Terms of Use * Privacy Policy * Other Policies

(opens in a new window)(opens in a new window)(opens in a new window)(opens in a new window)(opens in a new window)(opens in a new window)(opens in a new window)

English United States

Requests for Research 2.0

Cursor's Aman Sanger Addresses Kimi Model Use in Composer 2

Cursor Addresses Kimi Model Usage in Composer 2 Launch

The unpaid, unrecognised burden of the women-led care economy of India

Andrej Karpathy Transitions from Coding to Directing AI Agents

Latest Briefs