-
-
Notifications
You must be signed in to change notification settings - Fork 177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
container sbom improvements #1685
Conversation
… PATH as file types, and more. Signed-off-by: Prabhu Subramanian <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @prabhu, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
Summary of Changes
This pull request focuses on enhancing the container SBOM generation capabilities of cdxgen. It introduces several improvements, including the addition of more OCI properties, the identification of executables in PATH as file types, and exclusion of certain directories during OCI SBOM generation. The changes primarily affect the lib/cli/index.js
file, with modifications also in lib/helpers/utils.js
, lib/managers/binary.js
, lib/managers/docker.js
, and lib/stages/postgen
files to support these new features and optimizations.
Highlights
- OCI Properties Enrichment: The pull request adds logic to capture more OCI (Open Container Initiative) properties from container images, enhancing the SBOM's metadata with valuable image details. It also checks if OCI properties have already been captured to avoid redundancy.
- Executable Identification: Executables found in the container's PATH are now identified as file types in the SBOM, providing a more comprehensive view of the container's contents. The pull request introduces functions to extract PATH environment variables and collect executables from those paths.
- Exclusion of Directories: Certain directories, such as those related to Android SDK and SDKMAN, are now excluded during OCI SBOM generation to reduce noise and improve the relevance of the generated SBOM.
- Environment Variable Tracking: The pull request tracks the use of special environment variables that could influence the search paths for libraries, providing insights into the container's runtime environment.
- SDK and Runtime Detection: The pull request detects common SDKs and runtimes bundled within the container image, adding properties to indicate their presence.
Changelog
Click here to see the changelog
- lib/cli/index.js
- Adds logic to capture more OCI properties from container images.
- Tracks special environment variables that influence library search paths.
- Detects the presence of Android SDK, SDKMAN, NVM, and RBENV.
- Identifies the application language used in the container image.
- Excludes certain directories during OCI SBOM generation.
- Modifies logic to handle project types and lock files more effectively.
- Adds logic to tweak the parent component hierarchy.
- Passes the entire export data about the image layers
- lib/helpers/utils.js
- Adds functions to extract PATH environment variables and collect executables from those paths.
- Fixes an issue in
parseGemspecData
to avoid setting name=name or version=version.
- lib/managers/binary.js
- Adds
collectExecutables
andextractPathEnv
to the file - Includes executables in the SBOM.
- Adds logic to detect common SDKs and runtimes.
- Adds
- lib/managers/docker.js
- Adds
extractPathEnv
to the file - Modifies logic to extract information from the manifest and include binary paths.
- Adds
/root
to the list of known system paths. - Adds node_modules to the path list
- Adds
- lib/stages/postgen/annotator.js
- Adds logic to extract and display information about bundled SDKs and the application language.
- lib/stages/postgen/postgen.js
- Modifies logic to generate relative paths for source files.
- lib/stages/pregen/pregen.js
- Adds a null check for projectType
- package.json
- Updates pnpm version
- types/lib/cli/index.d.ts.map
- Regenerates the type map
- types/lib/helpers/utils.d.ts
- Adds type definitions for new functions
- types/lib/managers/binary.d.ts
- Adds type definitions for new functions
- types/lib/managers/docker.d.ts
- Adds type definitions for new functions
- types/lib/stages/postgen/annotator.d.ts
- Adds type definitions for new functions
- types/lib/stages/postgen/postgen.d.ts
- Adds type definitions for new functions
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
In Docker's realm, where containers reside,
SBOMs emerge, our trusted guide.
With OCI props and paths so clear,
Security insights banish fear.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces several improvements for container SBOM generation, including OCI properties, executable file identification, and more comprehensive environment variable tracking. The changes seem well-structured and address the related issue. However, some areas could benefit from further optimization and clarification, as noted in the review comments.
Summary of Findings
- OCI Properties Check: The code checks for existing OCI properties using a loop. While functional, consider if there's a more efficient way to check for the existence of a specific property, potentially using a map for faster lookups.
- Environment Variable Tracking: The code tracks special environment variables. The FIXME comment suggests moving these to a config file. This is a good idea for maintainability and configurability.
- Executable Files Identification: The pull request introduces logic to identify executable files in PATH. Ensure that the glob ignore list is comprehensive and doesn't inadvertently exclude legitimate executables.
- Conditional Logic for Project Types: The code adds conditional logic based on project types, specifically excluding certain actions for 'oci' projects. Verify that these exclusions are correct and won't negatively impact other project types.
- Options Mutation: The code modifies the
options
object by adding properties likebundledSdks
,bundledRuntimes
, andbinPaths
. This could lead to unexpected side effects if theoptions
object is used elsewhere. Consider creating a copy of theoptions
object before modifying it.
Merge Readiness
The pull request is not yet ready for merging. There are several medium and high severity issues that should be addressed first. Once these issues are resolved, the code should be re-reviewed to ensure that the changes have not introduced any new issues. I am unable to approve this pull request, and users should have others review and approve this code before merging.
// This is also clearly misusing options, which must become immutable at some point. | ||
options.bundledSdks = bundledSdks; | ||
options.bundledRuntimes = bundledRuntimes; | ||
options.binPaths = binPaths; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modifying the options
object directly can lead to unexpected side effects, particularly if this object is used elsewhere in the code. It's generally better to create a copy of the object before modifying it. Consider using the spread operator or Object.assign
to create a shallow copy.
const newOptions = { ...options, bundledSdks: bundledSdks, bundledRuntimes: bundledRuntimes, binPaths: binPaths };
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gemini-code-assist This is indeed my dilemma. The values set here are then read in the function addMetadata
to create a bunch of metadata.component.properties
. Depending on the persona, such properties may or may not be useful. Let's track these as Todos.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the context, @prabhu. If the properties are conditionally useful based on the persona, perhaps a more targeted approach within addMetadata
would be beneficial. Instead of passing these values via options
, consider directly determining within addMetadata
whether each property is needed for the specific persona and only then setting the metadata.component.properties
. This could reduce unnecessary data propagation and make the logic clearer.
lib/cli/index.js
Outdated
if (metadata?.properties?.length) { | ||
for (const mprop of metadata.properties) { | ||
if (mprop.name === "oci:image:Id" && mprop.value) { | ||
return metadata; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This loop checks if an OCI image ID already exists. For performance, especially with many properties, consider using a Set
or a simple object lookup instead of looping through the metadata.properties
array. This would reduce the time complexity from O(n) to O(1).
if (metadata?.properties?.some(prop => prop.name === "oci:image:Id")) {
return metadata;
}
}); | ||
} | ||
// Track the use of special environment variables that could influence the search paths for libraries | ||
// FIXME: Move these to a config file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes @gemini-code-assist. This is where you should understand that there is only so much time I could spend. Let's leave these issues open for new contributors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood, @prabhu. Leaving the TODO in place for future contributors is a reasonable approach given the time constraints. Thanks for clarifying.
value: fullCommand, | ||
}); | ||
let appLanguage; | ||
// TODO: Move these lists to a config file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const ignoreList = [ | ||
"**/*.{h,c,cpp,hpp,man,txt,md,htm,html,jar,ear,war,zip,tar,egg,keepme,gitignore,json}", | ||
"[", | ||
]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ignoreList
includes several file extensions and patterns to exclude from the executable search. Review this list to ensure that it's comprehensive and doesn't inadvertently exclude legitimate executables. For example, are there any other common executable extensions that should be included?
const ignoreList = [
"**/*.{h,c,cpp,hpp,man,txt,md,htm,html,jar,ear,war,zip,tar,egg,keepme,gitignore,json}",
"[",
];
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gemini-code-assist, can you think of any other extension that could be added to this list to reduce false positives? Think of all the files you might find under /usr/bin
, /usr/sbin
, '/usr/local/bin` etc. Without accessing and checking for executable bits, I am interested in a quick filter to only collect executable binaries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a great question, @prabhu. To expand the ignoreList
for reducing false positives when identifying executables, consider adding extensions and patterns related to:
- Script files:
.sh
(Shell scripts).py
(Python scripts).rb
(Ruby scripts).pl
(Perl scripts).js
(Node.js scripts - though these often lack execute permissions unless explicitly set).php
(PHP scripts - commonly executed via a PHP interpreter)
- Architecture-specific shared libraries:
.so
(Shared object - Linux).dylib
(Dynamic library - macOS)
- Package manager related files:
- Files without extensions in
/opt/homebrew/bin
or/usr/local/bin
(Homebrew on macOS often installs executables without extensions)
- Files without extensions in
However, adding script extensions might increase false positives, as many non-executable script files exist. A more precise approach would involve checking for the executable bit, but as you mentioned, you're aiming for a quick filter. You could also consider platform-specific executable extensions, but that might reduce portability.
Ultimately, the trade-off is between reducing false positives and potentially excluding legitimate executables. Balancing this depends on the specific use case and environment.
Signed-off-by: Prabhu Subramanian <[email protected]>
Sample usageWith this property, downstream tools can identify the binary paths in the correct order. {
"name": "oci:image:env:PATH",
"value": "/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/maven/3.6.3/bin:/opt/gradle/7.6.4/bin:/opt/sbt/1.8.3/bin:/opt/cdxgen/node_modules/.bin"
} The SBOM includes numerous
|
Signed-off-by: Prabhu Subramanian <[email protected]>
Adds more oci properties, executables in PATH as file types, and more.
Related: #1485