Skip to content

[Feature] More pythonic approach to dataloading #1139

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
CarlosGomes98 opened this issue Apr 23, 2025 · 0 comments · May be fixed by #1138
Open

[Feature] More pythonic approach to dataloading #1139

CarlosGomes98 opened this issue Apr 23, 2025 · 0 comments · May be fixed by #1138

Comments

@CarlosGomes98
Copy link

CarlosGomes98 commented Apr 23, 2025

Hey Folks!

I've had a really good time playing with torchtitan so far :)

Looking into the code, as it stands, the way the data loader is wrapped by next_batch is not very pythonic. It makes it very awkward to iterate through the data loader in a pythonic way with for or while.

#1138 is my draft-suggestion of a way to pythonify this.

I think this will especially bear fruit in situations where we might want to iterate through a dataset from beginning to end, e.g., for a validation dataset.

To note, this current change might break a few of the models under experimental, which would need to mirror this change in their train.py

Just wanted to start this discussion. I know its in a particularly critical part of the architecture, so I understand friction regarding changes to it

@tianyu-l tianyu-l linked a pull request Apr 27, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant