-
Notifications
You must be signed in to change notification settings - Fork 919
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Training never starts on TFT/Progress bar not working #830
Comments
Hi, do you see any progress with the Pytorch Lightning progress bar, or it's not moving at all? |
It's not moving at all. |
I could not reproduce the issue (on version 0.17.1). Could you try running it on CPU instead of GPU? With
Would also interest me if you still get the issue in version 0.17.1 |
Hi,
Is this probably too many parameters? Does it usually work smoothly on a model this big? Probably too many samples. |
The model trains on batches, and the progress bar should by default get updated after every batch rather than after an epoch. This is customizable through How large is your dataset? Do you have any memory issues? |
it has over 31k samples for 22 features and a single target, all set for 200 timesteps lookback and 200 timesteps horizon. The dataset works perfectly and quick with other libraries using pytorch, so I don't think there is a memory issue. Also, I have tried cutting down on features and samples and it still does not work. |
31k training series may represent a huge amount of training samples (the input/output subslices obtained from your series) by default. Could you try limiting the number of training samples by providing e.g., max_samples_per_ts=1 to the fit() function and check if the problem persists. This will trivially limit the number of training samples by considering only the last (input, output) slice of each series. |
It's composed of 31k timestamps, not series. As I stated I already tried slicing the features, as well as the samples, and it does not seem to make a difference. |
@strakehyr Do you see any training dataset length being printed when you launch the training (before it hangs)? |
Can you still try to use max_samples_per_ts=1 in the fit() call and tell if the problem persists? |
The following gets printed (sliced dataset):
When attempting this, it returns:
However, no training parameters are shown, no loss, no epochs, nothing. Just returns that. |
I believe the problem is caused by something in the configuration, as I am trying to perform the Air Passenger example (https://unit8co.github.io/darts/examples/13-TFT-examples.html?highlight=tft#Air-Passenger-Example) and I am running into the same (getting Any idea what the root might possibly be? |
I think I found a solution to this issue. It seems to be an Following ipywidgets docs see here running the following command from bash in the env with the darts package fixed it for me.
Could anyone test this to confirm? |
I use an IDE for my code, so I doubt this was the issue. |
True, in that case could you try uninstalling ipywidgets? |
Thank you, this completely solved it. |
Small update: below should work as well and show the progress bar as intended.
|
Describe the bug
I use a dataset composed of 20 features and a single target. All of the features are future covariates. I use target past as well as the features's history as past covariates. To covariates, I add datetime attributes of year, month, day of week, hour, and holidays. The dataset has several years of hourly data, however I tried cutting down the samples to check if it made a difference. I am succesfully using the same dataset on other models (not from DARTS) and getting good results.
To Reproduce
Expected behavior
Training starts but it gets stuck. It never ends a single epoch.
System:
The text was updated successfully, but these errors were encountered: