Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

container sbom improvements #1685

Merged
merged 3 commits into from
Mar 15, 2025
Merged

container sbom improvements #1685

merged 3 commits into from
Mar 15, 2025

Conversation

prabhu
Copy link
Collaborator

@prabhu prabhu commented Mar 15, 2025

Adds more oci properties, executables in PATH as file types, and more.

Related: #1485

… PATH as file types, and more.

Signed-off-by: Prabhu Subramanian <[email protected]>
Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @prabhu, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

This pull request focuses on enhancing the container SBOM generation capabilities of cdxgen. It introduces several improvements, including the addition of more OCI properties, the identification of executables in PATH as file types, and exclusion of certain directories during OCI SBOM generation. The changes primarily affect the lib/cli/index.js file, with modifications also in lib/helpers/utils.js, lib/managers/binary.js, lib/managers/docker.js, and lib/stages/postgen files to support these new features and optimizations.

Highlights

  • OCI Properties Enrichment: The pull request adds logic to capture more OCI (Open Container Initiative) properties from container images, enhancing the SBOM's metadata with valuable image details. It also checks if OCI properties have already been captured to avoid redundancy.
  • Executable Identification: Executables found in the container's PATH are now identified as file types in the SBOM, providing a more comprehensive view of the container's contents. The pull request introduces functions to extract PATH environment variables and collect executables from those paths.
  • Exclusion of Directories: Certain directories, such as those related to Android SDK and SDKMAN, are now excluded during OCI SBOM generation to reduce noise and improve the relevance of the generated SBOM.
  • Environment Variable Tracking: The pull request tracks the use of special environment variables that could influence the search paths for libraries, providing insights into the container's runtime environment.
  • SDK and Runtime Detection: The pull request detects common SDKs and runtimes bundled within the container image, adding properties to indicate their presence.

Changelog

Click here to see the changelog
  • lib/cli/index.js
    • Adds logic to capture more OCI properties from container images.
    • Tracks special environment variables that influence library search paths.
    • Detects the presence of Android SDK, SDKMAN, NVM, and RBENV.
    • Identifies the application language used in the container image.
    • Excludes certain directories during OCI SBOM generation.
    • Modifies logic to handle project types and lock files more effectively.
    • Adds logic to tweak the parent component hierarchy.
    • Passes the entire export data about the image layers
  • lib/helpers/utils.js
    • Adds functions to extract PATH environment variables and collect executables from those paths.
    • Fixes an issue in parseGemspecData to avoid setting name=name or version=version.
  • lib/managers/binary.js
    • Adds collectExecutables and extractPathEnv to the file
    • Includes executables in the SBOM.
    • Adds logic to detect common SDKs and runtimes.
  • lib/managers/docker.js
    • Adds extractPathEnv to the file
    • Modifies logic to extract information from the manifest and include binary paths.
    • Adds /root to the list of known system paths.
    • Adds node_modules to the path list
  • lib/stages/postgen/annotator.js
    • Adds logic to extract and display information about bundled SDKs and the application language.
  • lib/stages/postgen/postgen.js
    • Modifies logic to generate relative paths for source files.
  • lib/stages/pregen/pregen.js
    • Adds a null check for projectType
  • package.json
    • Updates pnpm version
  • types/lib/cli/index.d.ts.map
    • Regenerates the type map
  • types/lib/helpers/utils.d.ts
    • Adds type definitions for new functions
  • types/lib/managers/binary.d.ts
    • Adds type definitions for new functions
  • types/lib/managers/docker.d.ts
    • Adds type definitions for new functions
  • types/lib/stages/postgen/annotator.d.ts
    • Adds type definitions for new functions
  • types/lib/stages/postgen/postgen.d.ts
    • Adds type definitions for new functions
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.


In Docker's realm, where containers reside,
SBOMs emerge, our trusted guide.
With OCI props and paths so clear,
Security insights banish fear.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces several improvements for container SBOM generation, including OCI properties, executable file identification, and more comprehensive environment variable tracking. The changes seem well-structured and address the related issue. However, some areas could benefit from further optimization and clarification, as noted in the review comments.

Summary of Findings

  • OCI Properties Check: The code checks for existing OCI properties using a loop. While functional, consider if there's a more efficient way to check for the existence of a specific property, potentially using a map for faster lookups.
  • Environment Variable Tracking: The code tracks special environment variables. The FIXME comment suggests moving these to a config file. This is a good idea for maintainability and configurability.
  • Executable Files Identification: The pull request introduces logic to identify executable files in PATH. Ensure that the glob ignore list is comprehensive and doesn't inadvertently exclude legitimate executables.
  • Conditional Logic for Project Types: The code adds conditional logic based on project types, specifically excluding certain actions for 'oci' projects. Verify that these exclusions are correct and won't negatively impact other project types.
  • Options Mutation: The code modifies the options object by adding properties like bundledSdks, bundledRuntimes, and binPaths. This could lead to unexpected side effects if the options object is used elsewhere. Consider creating a copy of the options object before modifying it.

Merge Readiness

The pull request is not yet ready for merging. There are several medium and high severity issues that should be addressed first. Once these issues are resolved, the code should be re-reviewed to ensure that the changes have not introduced any new issues. I am unable to approve this pull request, and users should have others review and approve this code before merging.

Comment on lines +6718 to +6721
// This is also clearly misusing options, which must become immutable at some point.
options.bundledSdks = bundledSdks;
options.bundledRuntimes = bundledRuntimes;
options.binPaths = binPaths;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Modifying the options object directly can lead to unexpected side effects, particularly if this object is used elsewhere in the code. It's generally better to create a copy of the object before modifying it. Consider using the spread operator or Object.assign to create a shallow copy.

const newOptions = { ...options, bundledSdks: bundledSdks, bundledRuntimes: bundledRuntimes, binPaths: binPaths };

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gemini-code-assist This is indeed my dilemma. The values set here are then read in the function addMetadata to create a bunch of metadata.component.properties. Depending on the persona, such properties may or may not be useful. Let's track these as Todos.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the context, @prabhu. If the properties are conditionally useful based on the persona, perhaps a more targeted approach within addMetadata would be beneficial. Instead of passing these values via options, consider directly determining within addMetadata whether each property is needed for the specific persona and only then setting the metadata.component.properties. This could reduce unnecessary data propagation and make the logic clearer.

lib/cli/index.js Outdated
Comment on lines 632 to 637
if (metadata?.properties?.length) {
for (const mprop of metadata.properties) {
if (mprop.name === "oci:image:Id" && mprop.value) {
return metadata;
}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This loop checks if an OCI image ID already exists. For performance, especially with many properties, consider using a Set or a simple object lookup instead of looping through the metadata.properties array. This would reduce the time complexity from O(n) to O(1).

if (metadata?.properties?.some(prop => prop.name === "oci:image:Id")) {
  return metadata;
}

});
}
// Track the use of special environment variables that could influence the search paths for libraries
// FIXME: Move these to a config file

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This is a good start, but consider moving these environment variables to a configuration file that can be easily updated without modifying the code. This would improve maintainability and allow users to customize the list of tracked variables.

// TODO: Move these to a config file

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes @gemini-code-assist. This is where you should understand that there is only so much time I could spend. Let's leave these issues open for new contributors.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood, @prabhu. Leaving the TODO in place for future contributors is a reasonable approach given the time constraints. Thanks for clarifying.

value: fullCommand,
});
let appLanguage;
// TODO: Move these lists to a config file.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to the environment variables, consider moving this list of languages to a configuration file for easier customization and maintenance.

// TODO: Move these lists to a config file.

Comment on lines +15167 to +15170
const ignoreList = [
"**/*.{h,c,cpp,hpp,man,txt,md,htm,html,jar,ear,war,zip,tar,egg,keepme,gitignore,json}",
"[",
];

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The ignoreList includes several file extensions and patterns to exclude from the executable search. Review this list to ensure that it's comprehensive and doesn't inadvertently exclude legitimate executables. For example, are there any other common executable extensions that should be included?

const ignoreList = [
    "**/*.{h,c,cpp,hpp,man,txt,md,htm,html,jar,ear,war,zip,tar,egg,keepme,gitignore,json}",
    "[",
  ];

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gemini-code-assist, can you think of any other extension that could be added to this list to reduce false positives? Think of all the files you might find under /usr/bin, /usr/sbin, '/usr/local/bin` etc. Without accessing and checking for executable bits, I am interested in a quick filter to only collect executable binaries.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great question, @prabhu. To expand the ignoreList for reducing false positives when identifying executables, consider adding extensions and patterns related to:

  • Script files:
    • .sh (Shell scripts)
    • .py (Python scripts)
    • .rb (Ruby scripts)
    • .pl (Perl scripts)
    • .js (Node.js scripts - though these often lack execute permissions unless explicitly set)
    • .php (PHP scripts - commonly executed via a PHP interpreter)
  • Architecture-specific shared libraries:
    • .so (Shared object - Linux)
    • .dylib (Dynamic library - macOS)
  • Package manager related files:
    • Files without extensions in /opt/homebrew/bin or /usr/local/bin (Homebrew on macOS often installs executables without extensions)

However, adding script extensions might increase false positives, as many non-executable script files exist. A more precise approach would involve checking for the executable bit, but as you mentioned, you're aiming for a quick filter. You could also consider platform-specific executable extensions, but that might reduce portability.

Ultimately, the trade-off is between reducing false positives and potentially excluding legitimate executables. Balancing this depends on the specific use case and environment.

Signed-off-by: Prabhu Subramanian <[email protected]>
@prabhu
Copy link
Collaborator Author

prabhu commented Mar 15, 2025

Sample usage

With this property, downstream tools can identify the binary paths in the correct order.

{
    "name": "oci:image:env:PATH",
    "value": "/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/maven/3.6.3/bin:/opt/gradle/7.6.4/bin:/opt/sbt/1.8.3/bin:/opt/cdxgen/node_modules/.bin"
}

The SBOM includes numerous file components, as shown. Consequently, downstream tools can now identify all commands an adversary might use to compromise and maintain persistence in a given container image. The confidence value is 0, since we do not know the exact purl for these file components.

{
            "name": "bash",
            "type": "file",
            "purl": "pkg:generic/bash",
            "bom-ref": "pkg:generic/bash",
            "properties": [
                {
                    "name": "SrcFile",
                    "value": "/bin/bash"
                },
                {
                    "name": "internal:is_executable",
                    "value": "true"
                }
            ],
            "evidence": {
                "identity": {
                    "field": "purl",
                    "confidence": 0,
                    "methods": [
                        {
                            "technique": "filename",
                            "confidence": 0,
                            "value": "/bin/bash"
                        }
                    ]
                }
            },
            "tags": [
                "shell"
            ]
        },

{
            "name": "zdump",
            "type": "file",
            "purl": "pkg:generic/zdump",
            "bom-ref": "pkg:generic/zdump",
            "properties": [
                {
                    "name": "SrcFile",
                    "value": "/usr/sbin/zdump"
                },
                {
                    "name": "internal:is_executable",
                    "value": "true"
                }
            ],
            "evidence": {
                "identity": {
                    "field": "purl",
                    "confidence": 0,
                    "methods": [
                        {
                            "technique": "filename",
                            "confidence": 0,
                            "value": "/usr/sbin/zdump"
                        }
                    ]
                }
            }
        },

cdxgen-oci-image-master.cdx.json

Signed-off-by: Prabhu Subramanian <[email protected]>
@prabhu
Copy link
Collaborator Author

prabhu commented Mar 15, 2025

@prabhu prabhu merged commit 25571e8 into master Mar 15, 2025
27 of 31 checks passed
@prabhu prabhu deleted the feature/oci-sbom-tweaks branch March 15, 2025 18:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant