Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable labels for IMEX Domain and Clique #965

Merged
merged 1 commit into from
Oct 10, 2024

Conversation

ArangoGutierrez
Copy link
Collaborator

No description provided.

@ArangoGutierrez ArangoGutierrez force-pushed the cnt/103 branch 2 times, most recently from 986db77 to 91aa9c4 Compare September 24, 2024 18:57
@ArangoGutierrez ArangoGutierrez marked this pull request as draft September 25, 2024 08:21
@ArangoGutierrez ArangoGutierrez marked this pull request as ready for review September 25, 2024 08:27
@ArangoGutierrez ArangoGutierrez force-pushed the cnt/103 branch 2 times, most recently from f0e1d1b to 97ea6a6 Compare September 25, 2024 15:45
Copy link
Contributor

@klueska klueska left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good to me now. Thanks for your patience on all the back-and-forth.

- name: nvidia-imex-dir
type: DirectoryOrCreate
hostPath:
path: {{ include "nvidia-device-plugin.filepathJoin" (list "/" .Values.nvidiaDriverRoot "/etc/nvidia-imex") }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
path: {{ include "nvidia-device-plugin.filepathJoin" (list "/" .Values.nvidiaDriverRoot "/etc/nvidia-imex") }}
path: {{ clean ( join "/" ( list .Values.nvidiaDriverRoot "/etc/nvidia-imex" ) ) | quote }}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean, dropping the helper function?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change going to do what we want? The normal join is not a filepath.Join() so I'm not sure if it will treat redundant / missing slashes correctly.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did {{ clean ( join "/" ( list "/" .Values.nvidiaDriverRoot "/etc/nvidia-imex" ) ) | quote }}
tested with values:

  • null
  • /
  • /proc/nvidia
  • proc/nvidia
  • /proc/nvidia/

@@ -22,6 +22,8 @@ import (
"github.com/NVIDIA/go-nvlib/pkg/nvlib/device"
"github.com/NVIDIA/go-nvlib/pkg/nvpci"
"github.com/NVIDIA/go-nvml/pkg/nvml"

"github.com/google/uuid"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Run make goimports.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wait. Ignore me. I thought the other imports were local. I would have expected "github.com/google/uuid" to be before the NVIDIA imports though. Not a blocker.

@@ -99,3 +101,19 @@ func (d nvmlDevice) GetPCIClass() (uint32, error) {
}
return nvDevice.Class, nil
}

func (d nvmlDevice) GetFabricIDs() (string, string, error) {
gfInfo, ret := d.GetGpuFabricInfo()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: why not just:

Suggested change
gfInfo, ret := d.GetGpuFabricInfo()
info, ret := d.GetGpuFabricInfo()

the gf prefix is not needed in this context.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it

@@ -86,6 +86,12 @@ func main() {
Value: "/etc/kubernetes/node-feature-discovery/features.d/gfd",
EnvVars: []string{"GFD_OUTPUT_FILE"},
},
&cli.StringFlag{
Name: "imex-nodes-config-file",
Usage: "Path to the IMEX ",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Usage: "Path to the IMEX ",
Usage: "Path to the IMEX domain configuration file",

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear -- the "nodes config file" is different than the IMEX config file (that is a different file in the same directory which we dont need to look at).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. as long as the usage is consistent.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you like how it reads now (after recent push)

Comment on lines 110 to 111
ImexNodesConfigFile *string `json:"imexNodesConfigFile" yaml:"imexNodesConfigFile"`
MachineTypeFile *string `json:"machineTypeFile" yaml:"machineTypeFile"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ImexNodesConfigFile *string `json:"imexNodesConfigFile" yaml:"imexNodesConfigFile"`
MachineTypeFile *string `json:"machineTypeFile" yaml:"machineTypeFile"`
MachineTypeFile *string `json:"machineTypeFile" yaml:"machineTypeFile"`
// ImexNodesConfigFile is the path to a file containing the IMEX domain configuaration.
// Such a file contains the IP addresses of nodes that are part of the IMEX domain.
// Note that this is the absolute path to the file in the device plugin container.
ImexNodesConfigFile *string `json:"imexNodesConfigFile" yaml:"imexNodesConfigFile"`

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear -- the "nodes config file" is different than the IMEX config file (that is a different file in the same directory which we dont need to look at).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Let's update this to use the right terminology.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll set it to

	// ImexNodesConfigFile is the path to a file containing the IP addresses of nodes
	// that are part of the IMEX domain.
	// Note that this is the absolute path to the file in the device plugin container.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pushed this change

)

func newImexLabeler(config *spec.Config, devices []resource.Device) (Labeler, error) {
if config.Flags.GFD.ImexNodesConfigFile == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also check for the empty string?

Suggested change
if config.Flags.GFD.ImexNodesConfigFile == nil {
if config.Flags.GFD.ImexNodesConfigFile == nil || *config.Flags.GFD.ImexNodesConfigFile == "" {

Copy link
Member

@elezar elezar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some final suggestions and questions.

Copy link
Member

@elezar elezar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm good with this now.

The only questions I have (which can be done as follow ups) are:

  • The exact wording of the usage text and docstring for the config options
  • Whether a top-level IMEX config option makes sense since we're adding other Imex-related config options in Enable labels for IMEX Domain and Clique #965 for example.

The latter would affect the config-based API, but we can always address that in the next day or two.

@@ -86,6 +86,12 @@ func main() {
Value: "/etc/kubernetes/node-feature-discovery/features.d/gfd",
EnvVars: []string{"GFD_OUTPUT_FILE"},
},
&cli.StringFlag{
Name: "imex-nodes-config-file",
Usage: "Path to the IMEX domain nods IP list file",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Usage: "Path to the IMEX domain nods IP list file",
Usage: "Path to the IMEX nodes config file. This file contains a list of IP addresses of the nodes in the IMEX domain.",

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more explicitly, I agree. Done

Signed-off-by: Carlos Eduardo Arango Gutierrez <[email protected]>
@ArangoGutierrez ArangoGutierrez requested a review from elezar October 9, 2024 14:46
@ArangoGutierrez ArangoGutierrez merged commit 966d5b8 into NVIDIA:main Oct 10, 2024
8 checks passed
@ArangoGutierrez ArangoGutierrez deleted the cnt/103 branch October 10, 2024 09:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants