You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Seems like argo retry does not work as expected when a workflow is triggered using metaflow retry enabled.
Current Behaviour
Argo retry restarts failed tasks and resets the retry count to 0, overwriting previous artifacts. The downstream task uses the latest attempt as the input artifact, leading to failure due to missing artifacts. These artifacts were recorded in the overwritten attempt 0 but are absent in the latest attempt.
Desired Behaviour
Ideally, hitting argo retry button should create a new attempt rather than overwriting the old attempt.
Infer the retry_count from flow datastore [example: s3]. The retry_count will be the latest_done _attempt number + 1. This will not overwrite artifacts on retry aka, the restarted run would add artifacts as it was new attempt.
Argo workflow with metaflow retry enabled and fails in all retries. Due to error induced via envar
Node print_df passes after a fix in the backend [or updated the workflow with right env-var value], and then hit argo retry --node button.
end step fails with a missing artifact, new_df is a artifiact from the parent step, which was created when the parent node was successful on retry.
print(f"the new df is: {self.new_df}")
^^^^^^^^^^^
File "/app/metaflow/metaflow/flowspec.py", line 250, in __getattr__
raise AttributeError("Flow %s has no attribute '%s'" % (self.name, name))
AttributeError: Flow HelloWorld has no attribute 'new_df'
The text was updated successfully, but these errors were encountered:
Seems like argo retry does not work as expected when a workflow is triggered using metaflow retry enabled.
Current Behaviour
Argo retry restarts failed tasks and resets the retry count to
0
, overwriting previous artifacts. The downstream task uses the latest attempt as the input artifact, leading to failure due to missing artifacts. These artifacts were recorded in the overwritten attempt0
but are absent in the latest attempt.Desired Behaviour
Ideally, hitting argo retry button should create a new attempt rather than overwriting the old attempt.
Metaflow ChatRoom Discussion
Solution Proposal
Infer the
retry_count
from flow datastore [example: s3]. Theretry_count
will be thelatest_done _attempt number + 1
. This will not overwrite artifacts on retry aka, the restarted run would add artifacts as it was new attempt.Steps to reproduce:
Run
Node
print_df
passes after a fix in the backend [or updated the workflow with right env-var value], and then hitargo retry --node
button.end
step fails with a missing artifact,new_df
is a artifiact from the parent step, which was created when the parent node was successful on retry.The text was updated successfully, but these errors were encountered: