What does the loss graph / time (or iterations) look like for a working training setup? #38
Unanswered
StevenSchrembeck
asked this question in
Q&A
Replies: 2 comments 2 replies
-
Hey, what dataset are you using? And what architectural details (number of heads, depth etc.) and feature extraction details (pre-trained model, k-means clustering model)? I got things to work reasonably well (loss falling to more like ~2 and outputs starting to move towards what you'd expect, more details here) with LibriSpeech, which is quite a small dataset in comparison to the one used in AudioLM for speech. |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I'm training a SemanticTransformer with the pre-made trainer and the loss graph isn't promising. It falls rapidly from 6 to ~5 then remains there, even 1000 iterations later. Might not be nearly enough iterations to know, but I expected it fall further, faster.
If you have a converging SemanticTransformer, what does your loss graph look like? Are you using an out-of-the-box dataset I can also test as a control?
Much appreciated!
Beta Was this translation helpful? Give feedback.
All reactions