allow ending the episode for MaxStepsReached #4453
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Proposed change(s)
This came up when trying to use MaxSteps on the match-3 sample. If you have multiple Agents in the scene using on-demand decisions, their step count will get incremented whenever the Academy steps, even if they're not doing anything. For training, you can work around this by requesting a decision each step, but for inference or heuristic mode when you might want to show animations of the Agent "boards", the decision request frames get out of sync and you end up hitting MaxSteps too soon.
I think this is the best solution without breaking existing behavior (a better approach would be to only count decision or action steps towards MaxSteps, but that's a breaking change). An alternative would be to make DoneReason public and allow that as an optional parameter to EndEpisode.
Useful links (Github issues, JIRA tickets, ML-Agents forum threads etc.)
https://jira.unity3d.com/browse/MLA-1345
Types of change(s)
Checklist
Other comments