Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Neptune logger with a validation epoch end conflict due to the 'epoch' key added on the fly. #2946

Closed
morgangiraud opened this issue Aug 13, 2020 · 4 comments · Fixed by #2986
Labels
bug Something isn't working help wanted Open to be worked on

Comments

@morgangiraud
Copy link

morgangiraud commented Aug 13, 2020

Hi everybody! First thanks for this lib, it is very handy!

🐛 Bug

When using pytorch lightning in conjunction with the neptune logger, one can see this kind of error popping every time an epoch ends:
neptune.api_exceptions.ChannelsValuesSendBatchError: Received batch errors sending channels' values to experiment SOC-114. Cause: Error(code=400, message='X-coordinates must be strictly increasing for channel: e4e2635d-b707-46fa-9a1b-996dd009790f. Invalid point: InputChannelValue(timestamp=2020-08-13T11:55:38.422Z, x=5.0, numericValue=2.0, textValue=null, image', type=None) (metricId: 'e4e2635d-b707-46fa-9a1b-996dd009790f', x: 5.0) Skipping 1 values.

the import part in this error is the following line: X-coordinates must be strictly increasing

This is because, in trainer/logging.py, the epoch key is added on the fly on line 69:

scalar_metrics['epoch'] = self.current_epoch

But why does Neptune complains?

If you log all the timesteps (using row_log_interval = 1), at the end of an epoch, 2 calls are emitted to the logger: One to log the training logs and one for the validation logs.
Both of those have the same step value which is the current training step value. Since the key epoch is duplicated in both those calls, Neptune receives the key epoch twice with the same step value leading to the exception.

To Reproduce

launch training with:

  • Neptune logger
  • training logs
  • validation logs
  • row_log_interval=1

Expected behaviour

Don't add the epoch key on the fly which force the logger to log it.

@morgangiraud morgangiraud added bug Something isn't working help wanted Open to be worked on labels Aug 13, 2020
@github-actions
Copy link
Contributor

Hi! thanks for your contribution!, great first issue!

@williamFalcon
Copy link
Contributor

thanks!

yeah, unfortunately, this might be a problem on their end since we have to track the epoch.

In the meantime, can you post a colab that replicates this issue?

thanks!

@morgangiraud
Copy link
Author

morgangiraud commented Aug 14, 2020

Hi,

Thanks for answering.

I've been looking at the code and I'm not sure why you need that epoch at that moment.
When I look at the following code, I see that the epoch key is added on only one part of the "if statement". So the code used after this part can't rely on this key to exist. What am I missing?

if "step" in scalar_metrics and step is None:
    step = scalar_metrics.pop("step")
else:
    # added metrics by Lightning for convenience
    scalar_metrics['epoch'] = self.current_epoch
    step = step if step is not None else self.global_step

williamFalcon added a commit that referenced this issue Aug 15, 2020
* add val step arg to metrics

* add val step arg to metrics

* add val step arg to metrics

* add val step arg to metrics

* add val step arg to metrics

* add val step arg to metrics

* add val step arg to metrics

* add val step arg to metrics

* add val step arg to metrics

* add step metrics

* add step metrics
@morgangiraud
Copy link
Author

Thanks for the quick fix 👍

ameliatqy pushed a commit to ameliatqy/pytorch-lightning that referenced this issue Aug 17, 2020
* add val step arg to metrics

* add val step arg to metrics

* add val step arg to metrics

* add val step arg to metrics

* add val step arg to metrics

* add val step arg to metrics

* add val step arg to metrics

* add val step arg to metrics

* add val step arg to metrics

* add step metrics

* add step metrics
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Open to be worked on
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants