-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for non-static data for reinforcement learning #713
Comments
could you post a snippet? |
Along this line, I think what would be good is to have the PyTorch Lightning equivalent of the reinforcement learning examples in PyTorch or PyTorch Ignite: https://github.com/pytorch/examples/tree/master/reinforcement_learning Is this possible? |
I'm interested in this too. I'm thinking about trying to make it work using pytorch's new IterableDataset for feeding data from a (prioritized) replay buffer. Edit: Then I would rollout episodes (across a cluster) before each "epoch", which is just a fixed number of training steps between rollouts. |
@colllin may you consider creating a PR? |
Hey guys, also really interested in using Pytorch Lightning for Reinforcement Learning. Not sure that the Dataloader is the best structure for RL though, has anyone found a good way of incorporating Dataloaders for things like gym environments? |
I've been looking at pytorch's built in map-stype and iterable-style datasets, and I think there might be a way of getting RL to work with them. Map-style might work for replay buffers, otherwise iterable-style would provide more flexibility in feeding data. I'll post code if I get something to work. |
I was trying to see if there was a good way to incorporate the Data loader into the RL environment, but it doesn't seem to fit. Using it for a replay buffer sounds like a good idea. But what should you do if you are using Lightning for an RL agent that doesn't use a Replay Buffer? should you just use a dummy DataLoader that isn't utilized? |
Pytorch's |
Was looking at something like this to use the DataLoader for simply retrieving the current state
This would provide a "dummy" dataloader, providing Lightning with everything it needs. However, this solution feels like trying to fit the project to the framework. Would it be possible to change the hard requirement of providing a dataloader to Lightning for systems like RL agents? |
Doesn't RL usually involve 'rollouts with existing network' then 'evaluation of the data' for learning? It seems kind of odd even for RL to have the 'next()' of the environment in the 'inner loop' of the learning. There does need to be a hook to switch back and forth between learning and 'rollouts' but it might be counterproductive to put the learning on a 'per frame basis' where each pulling of a sample from the dataloader 'runs' the environment. So I'm just saying in the design of this, it's not about pulling one sample from 'the environment', it's about pulling a 'batch of data' from the environment, but there would be a benefit to having a standard way to connect the dataloader to the environment to pull batches (theoretically as small as single frames/samples). |
@AwokeKnowing i wish i was more up to speed on RL but haven't been doing much of it. I'd love to make sure lightning supports it. Mind suggesting what needs to change to do that? thanks |
@AwokeKnowing are you saying that the dataset would have a reference to both the agent and the env. Then the iter/getitem function inside the dataset would collect a batch of transitions ? |
I’ve been thinking about this a lot. Here’s how I would try to sort this out in a simple way:
In on_before_epoch hook, rollout 5000 time steps in the environment using the current policy. Store the episodes in a replay buffer.
Then, for train_dataloader, sample batches from the replay_buffer. There are a few ways to accomplish this...
- probably the easiest for lightning is to create a custom dataset with __len__ hard-coded to 5000 and returns random steps from your replay buffer.
- an iterable-style dataset feels more appropriate, but I’m not sure how you then tell lightning how many training steps to take per epoch. Perhaps you could also hard-code __len__ on your custom iterable dataset.
- if you need to train on complete episodes (e.g. for an RNN), you might want to instead create a dummy dataset that does nothing and pass your own collate fn when instantiating the dataloader so that you can sample an entire episode from your replay buffer. Maybe you can come up with something cleaner — a dataset that returns entire episodes and a dataloader batch_size of 1?
Then for an on-policy algorithm you would want to clear your replay buffer
after each epoch. Above I use 5000 as a placeholder but you would want to
tune that value.
|
@colllin I don't think any hardcoded value (5000) is appropriate because some tasks the samples ("frames") are a few floats (many gym) a tiny matrix (chess/go) or sometimes they are 1024x1024 images. And some tasks (meta learning) may require different amounts of samples per 'model update' step. So from Lightning perspective, it cannot know how many rollouts. This has to be configured for each task. What I am suggesting is something inbetween the DataLoader and Environment, call it EnvDataManager. The EnvDataManager is configured with information about how to collect rollouts and feed to DataLoader. when DataLoader requests Data from EnvDataManager (EDM), and EDM decides it's time for a 'brain' update, EDM updates the model used by the Environment and collects more samples (async) and begins feeding the DataLoader. The EDM would also know when to 'add' to the existing data vs 'replace' with new data. Note that by 'new model weights' I just mean access to the agent to run inference to select an action to pass to the environment to get a new observation. However, typically in RL you don't run the latest 'agent' but a 'checkpoint', which also you pass to multiple 'rollout servers/processes'. You might even have a couple different versions of the agent. Thus I said 'weights' and the EDM will keep copies of them as needed. @djbyrne I think so, if I understood you correctly. @williamFalcon I think Lightning is flexible enough to work with RL but as it wraps common scenarios, I think the case of RL where it's not a 'static data set' has some good potential for wrapping so people don't do same/similar custom code in all their RL projects to work with Lighning. |
The |
I came across the Ptan RL library which uses a class called ExperienceSource. This is essentially an iterator that keeps track of the environment and the weights of the current policy and rolls out the batch of trajectory data. I think this is aligned with what you were describing @AwokeKnowing |
@djbyrne yes that's the general idea. Though the ExperienceSource there seems to include the part about working with gym and DQN-specific concepts etc. I think for a PyTorch Lightning, it would make more sense to have the ExperienceDataManager not know how to work with gym and specific buffers etc, but rather be focused on interfacing with the Lightning Agent and Lightning Dataset. Maybe a better name is DynamicDataset So concretely, on the Lightning side, we need to provide a class that 'looks like a dataset' (to the dataloader) but also can receive 'model checkpoints'. Then the users could use a library like Ptan or their own, or just a simple couple handcoded methods to launch/run the 'gym'. But Lightning would give them automatic flow up 'updated agents', and a clear place to feed the data. It seems a bit odd that a 'dataset' should have this functionality, but in RL the 'dataset' is very much 'alive', and changes in the model directly affect the data that is passed to the dataloader. There may be something we can learn from the Unity ML agents about where to separate the concerns. So think of simplest possible environment that provides an observation of a number 1 to 10, and the action is 1 or 0 to say whether it's over 5 or not, and the reward is 1 or 0. We need Lightning to think of the series of observations and rewards as a DynamicDataset, and we need Lightning to provide the agent checkpoints to the DynamicDataset so that it can continue to generate (unlimited) data. |
@AwokeKnowing yeah I agree that the EDM should not need to know about the specific env or buffer and should really just be an interface. If the lightning model contained a function for env_step() where the user can provide the logic for carrying out a single step of their specific environment. The EDM would have a reference to the PL model which provides access to the weights, forward and the env_step. Then the EDM can handle the rollout agnostic to the type of environment being used and provides the dataset interface for the dataloader. I wonder is this actually a problem that lightning should be trying to solve or should this be solely in the domain of the user? |
@djbyrne for the question of should 'lightning solve it', the question is, is there some 'repetitive' code that all RL projects will be writing to wire these together? If so, I think yes, because the point of Pytorch Lightning is that I want to just write the logic (the model, and the code to interact with "minecraft" or my own env using the model) and I don't want to write the code to manage checkpoint agents, and transform pool my observations into a 'dataset'. What would help is to actually use PL to do 10 similar and totally different RL projects, and see what is the repetitive code specific to RL 'data', and try to put that part in PL as a DynamicDataset. my expectation is that a common thread is managing the agent checkpoints, and batching together observations in randomized buffers of sequences. It would be good to have something that you can hook and environemnt to and start 'filling up' a dataset. The difference between RL and other dynamic datasets (eg a webcam) is that the 'agent' affects the data. So standardizing the way to plug the PL agent (checkpoints) into a dataset, and saving the samples to disk or db as buffers. Leaving the RL practicioner to write the code of the model, and the code of directly pulling samples from the environment given a particular agent. And there would need to be clear place to inject logic of how to select data from the buffers to feed to dataloader. |
Id certainly be up for building some varied examples of RL projects with lightning. Get a better idea of what works across the board |
What would be the best approach for reinforcement learning problems where you would need to interact with the environment for data? Maybe DataLoader is restricting?
The text was updated successfully, but these errors were encountered: