-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Neptune logger with a validation epoch end conflict due to the 'epoch' key added on the fly. #2946
Comments
Hi! thanks for your contribution!, great first issue! |
thanks! yeah, unfortunately, this might be a problem on their end since we have to track the epoch. In the meantime, can you post a colab that replicates this issue? thanks! |
Hi, Thanks for answering. I've been looking at the code and I'm not sure why you need that
|
* add val step arg to metrics * add val step arg to metrics * add val step arg to metrics * add val step arg to metrics * add val step arg to metrics * add val step arg to metrics * add val step arg to metrics * add val step arg to metrics * add val step arg to metrics * add step metrics * add step metrics
Thanks for the quick fix 👍 |
* add val step arg to metrics * add val step arg to metrics * add val step arg to metrics * add val step arg to metrics * add val step arg to metrics * add val step arg to metrics * add val step arg to metrics * add val step arg to metrics * add val step arg to metrics * add step metrics * add step metrics
Hi everybody! First thanks for this lib, it is very handy!
🐛 Bug
When using pytorch lightning in conjunction with the neptune logger, one can see this kind of error popping every time an epoch ends:
neptune.api_exceptions.ChannelsValuesSendBatchError: Received batch errors sending channels' values to experiment SOC-114. Cause: Error(code=400, message='X-coordinates must be strictly increasing for channel: e4e2635d-b707-46fa-9a1b-996dd009790f. Invalid point: InputChannelValue(timestamp=2020-08-13T11:55:38.422Z, x=5.0, numericValue=2.0, textValue=null, image', type=None) (metricId: 'e4e2635d-b707-46fa-9a1b-996dd009790f', x: 5.0) Skipping 1 values.
the import part in this error is the following line:
X-coordinates must be strictly increasing
This is because, in
trainer/logging.py
, theepoch
key is added on the fly on line 69:But why does Neptune complains?
If you log all the timesteps (using
row_log_interval = 1
), at the end of an epoch, 2 calls are emitted to the logger: One to log the training logs and one for the validation logs.Both of those have the same
step
value which is the current trainingstep
value. Since the keyepoch
is duplicated in both those calls, Neptune receives the keyepoch
twice with the samestep
value leading to the exception.To Reproduce
launch training with:
Expected behaviour
Don't add the
epoch
key on the fly which force the logger to log it.The text was updated successfully, but these errors were encountered: