-
Notifications
You must be signed in to change notification settings - Fork 666
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable labels for IMEX Domain and Clique #965
Conversation
3522b29
to
8b68698
Compare
8b68698
to
971ba42
Compare
986db77
to
91aa9c4
Compare
91aa9c4
to
cada570
Compare
fe63d86
to
53ca695
Compare
deployments/helm/nvidia-device-plugin/templates/daemonset-gfd.yml
Outdated
Show resolved
Hide resolved
f0e1d1b
to
97ea6a6
Compare
deployments/helm/nvidia-device-plugin/templates/daemonset-gfd.yml
Outdated
Show resolved
Hide resolved
3c02b49
to
cad5c37
Compare
deployments/helm/nvidia-device-plugin/templates/daemonset-gfd.yml
Outdated
Show resolved
Hide resolved
cad5c37
to
1d82935
Compare
1d82935
to
51a2a70
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking good to me now. Thanks for your patience on all the back-and-forth.
- name: nvidia-imex-dir | ||
type: DirectoryOrCreate | ||
hostPath: | ||
path: {{ include "nvidia-device-plugin.filepathJoin" (list "/" .Values.nvidiaDriverRoot "/etc/nvidia-imex") }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
path: {{ include "nvidia-device-plugin.filepathJoin" (list "/" .Values.nvidiaDriverRoot "/etc/nvidia-imex") }} | |
path: {{ clean ( join "/" ( list .Values.nvidiaDriverRoot "/etc/nvidia-imex" ) ) | quote }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you mean, dropping the helper function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change going to do what we want? The normal join is not a filepath.Join()
so I'm not sure if it will treat redundant / missing slashes correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did {{ clean ( join "/" ( list "/" .Values.nvidiaDriverRoot "/etc/nvidia-imex" ) ) | quote }}
tested with values:
- null
- /
- /proc/nvidia
- proc/nvidia
- /proc/nvidia/
@@ -22,6 +22,8 @@ import ( | |||
"github.com/NVIDIA/go-nvlib/pkg/nvlib/device" | |||
"github.com/NVIDIA/go-nvlib/pkg/nvpci" | |||
"github.com/NVIDIA/go-nvml/pkg/nvml" | |||
|
|||
"github.com/google/uuid" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Run make goimports
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wait. Ignore me. I thought the other imports were local. I would have expected "github.com/google/uuid"
to be before the NVIDIA imports though. Not a blocker.
internal/resource/nvml-device.go
Outdated
@@ -99,3 +101,19 @@ func (d nvmlDevice) GetPCIClass() (uint32, error) { | |||
} | |||
return nvDevice.Class, nil | |||
} | |||
|
|||
func (d nvmlDevice) GetFabricIDs() (string, string, error) { | |||
gfInfo, ret := d.GetGpuFabricInfo() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: why not just:
gfInfo, ret := d.GetGpuFabricInfo() | |
info, ret := d.GetGpuFabricInfo() |
the gf
prefix is not needed in this context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it
cmd/gpu-feature-discovery/main.go
Outdated
@@ -86,6 +86,12 @@ func main() { | |||
Value: "/etc/kubernetes/node-feature-discovery/features.d/gfd", | |||
EnvVars: []string{"GFD_OUTPUT_FILE"}, | |||
}, | |||
&cli.StringFlag{ | |||
Name: "imex-nodes-config-file", | |||
Usage: "Path to the IMEX ", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usage: "Path to the IMEX ", | |
Usage: "Path to the IMEX domain configuration file", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear -- the "nodes config file" is different than the IMEX config file (that is a different file in the same directory which we dont need to look at).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. as long as the usage is consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you like how it reads now (after recent push)
api/config/v1/flags.go
Outdated
ImexNodesConfigFile *string `json:"imexNodesConfigFile" yaml:"imexNodesConfigFile"` | ||
MachineTypeFile *string `json:"machineTypeFile" yaml:"machineTypeFile"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ImexNodesConfigFile *string `json:"imexNodesConfigFile" yaml:"imexNodesConfigFile"` | |
MachineTypeFile *string `json:"machineTypeFile" yaml:"machineTypeFile"` | |
MachineTypeFile *string `json:"machineTypeFile" yaml:"machineTypeFile"` | |
// ImexNodesConfigFile is the path to a file containing the IMEX domain configuaration. | |
// Such a file contains the IP addresses of nodes that are part of the IMEX domain. | |
// Note that this is the absolute path to the file in the device plugin container. | |
ImexNodesConfigFile *string `json:"imexNodesConfigFile" yaml:"imexNodesConfigFile"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear -- the "nodes config file" is different than the IMEX config file (that is a different file in the same directory which we dont need to look at).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Let's update this to use the right terminology.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll set it to
// ImexNodesConfigFile is the path to a file containing the IP addresses of nodes
// that are part of the IMEX domain.
// Note that this is the absolute path to the file in the device plugin container.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pushed this change
internal/lm/fabric.go
Outdated
) | ||
|
||
func newImexLabeler(config *spec.Config, devices []resource.Device) (Labeler, error) { | ||
if config.Flags.GFD.ImexNodesConfigFile == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also check for the empty string?
if config.Flags.GFD.ImexNodesConfigFile == nil { | |
if config.Flags.GFD.ImexNodesConfigFile == nil || *config.Flags.GFD.ImexNodesConfigFile == "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some final suggestions and questions.
51a2a70
to
25f6d7b
Compare
25f6d7b
to
1e76d2e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm good with this now.
The only questions I have (which can be done as follow ups) are:
- The exact wording of the usage text and docstring for the config options
- Whether a top-level IMEX config option makes sense since we're adding other Imex-related config options in Enable labels for IMEX Domain and Clique #965 for example.
The latter would affect the config-based API, but we can always address that in the next day or two.
1e76d2e
to
0a64484
Compare
cmd/gpu-feature-discovery/main.go
Outdated
@@ -86,6 +86,12 @@ func main() { | |||
Value: "/etc/kubernetes/node-feature-discovery/features.d/gfd", | |||
EnvVars: []string{"GFD_OUTPUT_FILE"}, | |||
}, | |||
&cli.StringFlag{ | |||
Name: "imex-nodes-config-file", | |||
Usage: "Path to the IMEX domain nods IP list file", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usage: "Path to the IMEX domain nods IP list file", | |
Usage: "Path to the IMEX nodes config file. This file contains a list of IP addresses of the nodes in the IMEX domain.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
more explicitly, I agree. Done
Signed-off-by: Carlos Eduardo Arango Gutierrez <[email protected]>
1656670
to
0719b2a
Compare
No description provided.