Skip to content

Allow CalcJob as an input node #6837

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 10 commits into from

Conversation

agoscinski
Copy link
Contributor

Hello, I am not sure if this clashes with some aiida core principle but the tests pass and when trying it out I get the right result. See for this workflow

from aiida import load_profile
load_profile()
from aiida import orm, plugins
from aiida.engine import run

ArithmeticAddCalculation = plugins.CalculationFactory('core.arithmetic.add')
node = run(ArithmeticAddCalculation, code=orm.load_code("bash@localhost"), x=orm.Int(1), y=orm.Int(2))
node = run(ArithmeticAddCalculation, calcj=node, code=orm.load_code("bash@localhost"), x=orm.Int(1), y=orm.Int(2))

Output provenance
image (1)

Copy link

codecov bot commented Apr 24, 2025

Codecov Report

Attention: Patch coverage is 85.71429% with 1 line in your changes missing coverage. Please review.

Project coverage is 78.29%. Comparing base (cfd2052) to head (f7c085c).

Files with missing lines Patch % Lines
src/aiida/orm/nodes/process/calculation/calcjob.py 75.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6837      +/-   ##
==========================================
- Coverage   78.29%   78.29%   -0.00%     
==========================================
  Files         566      566              
  Lines       42764    42768       +4     
==========================================
+ Hits        33479    33482       +3     
- Misses       9285     9286       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@agoscinski
Copy link
Contributor Author

agoscinski commented Apr 24, 2025

@unkcpz mentioned an example that this might break. This could allow to create cycles. Need to check this.

EDIT:
In discussion with @khsrali we avoid this concern by only allowing to use CalcJobs as input if they are sealed

@superstar54
Copy link
Member

In your example,

node = run(ArithmeticAddCalculation, code=orm.load_code("bash@localhost"), x=orm.Int(1), y=orm.Int(2))

The returned value is not a CalcJob node, but a dict with all outputs of the CalcJob. So you are not passing the CalcJob node to the calcj.

@khsrali
Copy link
Contributor

khsrali commented Apr 25, 2025

The returned value is not a CalcJob node, but a dict with all outputs of the CalcJob. So you are not passing the CalcJob node to the calcj.

You are right, in this particular example the output of the cacjob is passed.

However, the changes of this PR supports passing CalcJobNode itself as an input node.
I just tried for stashing, and it works.

@@ -132,7 +132,7 @@ def validate_outgoing(self, target, link_type, link_label):
:param link_label: the link label
:raise aiida.common.ModificationNotAllowed: if the source node (self) is sealed
"""
if self._node.is_sealed:
if self._node.is_sealed and link_type is not LinkType.INPUT_CALC:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just had a discussion with @agoscinski about this.
To avoid cyclic graph, a concern of @unkcpz.
We think we could add a function validate_ingoing, that would allow this only if the source CalcJobNode is sealed! This way we the cyclic issue would resolve.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just discovered validate_incoming already exists.
So I think this issue is over solved.

@agoscinski
Copy link
Contributor Author

In your example,

node = run(ArithmeticAddCalculation, code=orm.load_code("bash@localhost"), x=orm.Int(1), y=orm.Int(2))

The returned value is not a CalcJob node, but a dict with all outputs of the CalcJob. So you are not passing the CalcJob node to the calcj.

You are right, I adapted the snippet last minute when posting it here, just load some calcjob node instead

from aiida import load_profile
load_profile()
from aiida import orm, plugins
from aiida.engine import run

ArithmeticAddCalculation = plugins.CalculationFactory('core.arithmetic.add')
node = orm.load_node(<SOME_PK_FROM_A_CALCJOB>)
run(ArithmeticAddCalculation, calcj=node, code=orm.load_code("bash@localhost"), x=orm.Int(1), y=orm.Int(2))

@agoscinski agoscinski force-pushed the allow-cacljob-as-input branch from a60095c to 992e140 Compare April 25, 2025 09:47
Copy link
Contributor

@khsrali khsrali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @agoscinski
Nice work 👍

@@ -132,7 +132,7 @@ def validate_outgoing(self, target, link_type, link_label):
:param link_label: the link label
:raise aiida.common.ModificationNotAllowed: if the source node (self) is sealed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the sake of documentation:
We have to update the docstring, and also explain why this is not being an issue, and why we are making an exception.

@danielhollas
Copy link
Collaborator

I always thought it is one of the core principles that when you want to combine CalcJobs, you need to create a WorkChain. Same principle applies to calcfunctions/workfunctions. I think the cyclic nature was indeed the reason. I believe this was explained in the original AiiDA publication, might be worth checking that.

@superstar54
Copy link
Member

A few things might be worth thinking about:

  • If CalcJobNode is allowed to pass as input to another process, then people will start writing code that does not pass data between processes explicitly, and only pass the whole CalcJobNode as input (because it could be more convenient in some cases). This will make the provenance graph only have the logic, thus one always needs to check the source code to know what data is passed. Of course, the provenance is still there, but it is not obvious from the graph anymore.
  • When people start writing code like that, they will wonder why they can not pass CalcFunctionNode as input.

@khsrali
Copy link
Contributor

khsrali commented Apr 29, 2025

@danielhollas

Just had a look at the paper, you're right. They say ProcessNodes must always be connected via DataNodes to each other. Because otherwise a cycle like this can occur:

+------------------+
|   CalcJobNode A  |  <------------------------
+------------------+                          |
       |                                      |      
       |                                      |
       v                                      |                               
+------------------+                    +------------------+  
|   CalcJobNode B  |  ----------->      |    DataNode B    |
+------------------+                    +------------------+  

However, this scenario is never goign to be happen, because CalcJobNode A will always be sealed --and we have a check for that-- therefore it cannot accept another incoming links. We only allow an outgoing.

@superstar54

I can see the danger, but I'd say it's good to allow some flexibility. Of course it would be the responsibility of the user how they want to use it. One could also provide the same for CalcFunctionNode but we don't see the use case, yet.
This PR was opened due to a solid use case by StashCalculation #6764

@khsrali
Copy link
Contributor

khsrali commented Apr 29, 2025

Ok, There are a few errors that can be fixed. But I stop developing this PR.

Because in the mean time, I met @superstar54 in person. I just learned each CalcJob node, produces one and only one RemoteData node, that has the info of the only one remote output folder.

Therefore, I'd say for #6772 , this change is unnecessarily, and no longer justified.

In fact, using RemoteData has more and better information in the provenance graph of what is a actually happening:

image

@danielhollas
Copy link
Collaborator

danielhollas commented Apr 29, 2025

Thanks @khsrali.

I still think that a fundamental change like this should be discussed more robustly, either on Discourse or via AEP, not on a GitHub PR without a pre-existing issue. Some questions:

  • Better motivation. What would this enable? You mention stashing use case, how exactly would this be used. What are alternative solutions?
  • How would we teach this to users?

because CalcJobNode A will always be sealed --and we have a check for that-- therefore it cannot accept another incoming links. We only allow an outgoing.

I am not convinced by this argument. You're basically trading a static design principle for a runtime check. That means for example opening doors to race conditions, depending on whether given CalcJob finished in time.

I can see the danger, but I'd say it's good to allow some flexibility.

I strongly disagree with this blanket statement. :-) More flexibility is not always good.

EDIT: Sorry, I didn't see your last comment before I submitted this. :-)

@agoscinski
Copy link
Contributor Author

Thanks for the comments. This PR was mainly motivated to allow to pass a CalcJobNode for the StashCalculation in PR #6772. However @superstar54 pointed out that the CalcJob already outputs a RemoteData node that contains the path to the calcjob's working directory. This remote data node can be used in the StashCalculation to allow the user to pass the stash options as it one can pass them in the first place when submitting a CalcJob. So this feature is not needed anymore.

Thanks @danielhollas @superstar54 for the comments. I will close the PR now since discussing a theoretical feature seems not worse the time. Maybe in the future if a valid use case exists, this PR and the discussion can be picked up and continued. Please feel free to add any final comments or reopen this PR if needed.

@agoscinski agoscinski closed this Apr 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants