-
Notifications
You must be signed in to change notification settings - Fork 1.4k
📖 Add proposal for MachinePool Machines #6088
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mboersma sorry if I get to this only now
+1 for the direction, some questions WRT to the implementation details; happy to discuss by person if this can help to clarify my comments
Did another round of review, no further comments from my side, apart from the ones that are currently open. |
dropped few more suggestions, lgtm overall I have no objections to proceed. |
lgtm pending selector field naming comment resolution |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
squash?
|
||
When a MachinePool Machine is deleted manually, the system will delete the corresponding provider-specific resource. The opposite is also true: when a provider-specific resource is deleted, the system will delete the corresponding MachinePool Machine. This happens by virtue of the infrastructureRef <-> ownerRef relationship. | ||
|
||
In both cases, the MachinePool will notice the missing replica and create a new one in order to maintain the desired number of replicas. To scale down by removing a specific instance, that Machine should be given the "cluster.x-k8s.io/delete-machine" annotation and then the replicaCount on the MachinePool should be decremented. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Who will decrement the replica count? Is it common capi controller or the individual machinepool controller?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually the cluster-autoscaler case, so the annotation is applied to a specific Machine and then the replicaCount is decremented by an external actor. Either a user or software like cluster-autoscaler.
|
||
#### Story U3 | ||
|
||
A cluster admin updates a MachinePool to a newer Kubernetes version and would like to configure the strategy for that deployment so that the MachinePool will progressively roll out the new version of the machines. They would like this operation to cordon and drain each node to minimize workload disruptions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again I somehow dont understand which component will own this? Will CAPI machinepool controller do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case the CAPI machinepool controller's existing logic should own everything. Behind this point is the idea that CAPZ has already implemented cordon + drain separately for its own MachinePool Machine implementation, but should be able to remove some of that duplicated logic now.
@shyamradhakrishnan I apologize for missing these comments for so long! Let me know what other questions you have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like the idea of machine pools relying on selectors to identify machines.
However, I have some aspects I still need to properly grok:
- Cohesistance of MachinePools with and without machines
- Creations of a new kind of infra machines (providerMachinePoolMachine)
- Ownership of Machines management assigned to providers
I will try to make another pass next week, but if it could help to speed up things I'm also available for a call discussing above points in person
agreed at the 11 May 2022 meeting, we are starting a 1 week lazy consensus today. if there are no objections it will merge on the 18th. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall lgtm, I like the UX consistency + being able to reuse the same InfraMachine Kind revisions as those will make adoption easier for more providers
just a few minor comments from the re-read
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few nits from my side.
I'm unfortunately not that familiar with MachinePools so I have a hard time reviewing this proposal, but no objections from my side.
Would be great to get a review from some CAPA folks
(cc @richardcase @sedefsavas @pydctw)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/lgtm |
Thx! |
/approve Will leave the hold until EOD in case anyone wants to take another look, we're past lazy consensus and there are no objections |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: CecileRobertMichon The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/hold cancel |
What this PR does / why we need it:
Adds a proposal for MachinePool Machines: MachinePools should be enhanced to own Machines that represent each of their replicas.
This should still be considered a draft or WIP, but any feedback or reviewing would be highly appreciated!There is an accompanying proof-of-concept implementation for Docker in #6089.
There is an accompanying implementation of MachinePool support in cluster-autoscaler in kubernetes/autoscaler#4676.
These "MachinePool Machines" will open up the following opportunities:
Which issue(s) this PR fixes:
Fixes #4063