Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial draft of CodeTF v3 #44

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Initial draft of CodeTF v3 #44

wants to merge 3 commits into from

Conversation

drdavella
Copy link
Member

This draft provides a starting point for the discussion about a new, remediation-focused version of CodeTF.

The previous version of CodeTF is oriented around batch changes to files. Our new architecture requires a format that is instead oriented around the remediation of individual findings, each of which may require changes to multiple files.

This version retains much of the metadata structures contained within v2 but reorganized to better reflect the realities of remediation.

@nahsra
Copy link
Contributor

nahsra commented Jan 15, 2025

It's so hard for me to reason about a schema, can we make an example that shows all the fields in use? We could use that to validate the schema does what we want, as well.

@@ -0,0 +1,181 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this have this header?

"commandLine": "codemodder-python --verbose --dry-run --sonar /Users/example/sonar_juice-shop.json /Users/example/juice-shop",
"directory": "/Users/example/juice-shop",
"sarifs": []
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect this to be a single fix. I thought we liked the idea of fix orchestration being done upstream. This implies there should be some orchestration here.

This means callers would have to change how they call codemods -- with a single input, but that significantly de-scopes the codemodder implementations.

"findingMetadata": {
"type": "object",
"description": "Metadata about the finding being addressed",
"properties": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's worth pointing out all this findingMetadata should be from the tool itself -- which begs the question -- in the scope of a single issue being fixed, the caller already knows this -- so, do we need to send it back? Don't they already know it?

@drdavella
Copy link
Member Author

Before I push additional changes, let me try to summarize the proposed general direction:

  • We want to return to the philosophy of codemods as "small, sharp tools" where each codemod has only a very small scope of responsibility. In this new iteration, that responsibility boils down to fixing a single finding
  • Because of this descoping of responsibility, CodeTF needs to represent very little information about the the given finding itself. The caller is responsible for indicating which vulnerability is being addressed, and so holds all of the relevant information already. However, I would like to argue that representing, at minimum, finding ID, tool, and possibly also rule ID, is useful as a record of the transformation that occurred and potentially will be useful for debugging
  • As another result of descoping, we expect that codemodder will be asked to fix only one issue at a time, which means we need to represent only one fix result in CodeTF. While I understand the argument and there are many advantages to this approach, I would also like to leave flexibility for batch processing, which means I would like to allow an array of fixes, even if in most cases it contains only a single result
  • We still need to communicate several other statuses in addition to successful fixes: failures (could be parsing, LLM errors, etc.), skipped (don't have a codemod/rule for the fix), and "declined" (refuse to fix because deemed not remediable or another reason). Each of these states should be able to convey high-level metadata about the reasoning, even if we no longer need more specific fields like failedFiles
  • Responsibility for any kind of fix patch composition lives outside the scope of codemodder itsef

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants