Skip to content

Commit b41622f

Browse files
Merge pull request kubernetes#11 from enxebre/machine-instance-lifecycle
Add machine-instance-lifecycle proposal
2 parents 361dd57 + c753b24 commit b41622f

File tree

1 file changed

+155
-0
lines changed

1 file changed

+155
-0
lines changed
+155
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,155 @@
1+
---
2+
title: Machine/Instance lifecycle
3+
authors:
4+
- @enxebre
5+
reviewers:
6+
- @derekwaynecarr
7+
- @michaelgugino
8+
- @bison
9+
- @mrunalp
10+
approvers:
11+
- @derekwaynecarr
12+
- @michaelgugino
13+
- @bison
14+
- @mrunalp
15+
16+
creation-date: 2019-09-09
17+
last-updated: 2019-09-09
18+
status: implementable
19+
see-also:
20+
replaces:
21+
superseded-by:
22+
---
23+
24+
# Machine/Instance lifecycle
25+
26+
## Release Signoff Checklist
27+
28+
- [x] Enhancement is `implementable`
29+
- [ ] Design details are appropriately documented from clear requirements
30+
- [ ] Test plan is defined
31+
- [ ] Graduation criteria for dev preview, tech preview, GA
32+
- [ ] User-facing documentation is created in [openshift/docs
33+
34+
35+
## Summary
36+
37+
Enable unified semantics across any provider to represent the lifecycle of instances backed by machines resources as phases.
38+
39+
## Motivation
40+
41+
Provide the most similar user experience for the machine API across providers.
42+
43+
### Goals
44+
45+
- Provide semantics to convey the lifecycle of machines in a unified manner across any provider.
46+
47+
- Prevent actuators from creating more than one instance for a machine resource during its lifetime.
48+
49+
### Non-Goals
50+
51+
- Introduce breaking changes in the interface for actuators.
52+
53+
## Proposal
54+
55+
- As a user I want to understand at a glance where a machine is at its lifecycle regardless of the cloud.
56+
57+
- As a platform I want to provide the most similar user experience across providers and hide provider specific details.
58+
59+
- As a dev I want to enforce the machine API invariants and reduce the surface of arbitrary specific provider decisions.
60+
61+
This proposes to use unified phases across any provider to convey the lifecycle of machines by leveraging the [existing API](https://github.com/openshift/cluster-api/blob/openshift-4.2-cluster-api-0.1.0/pkg/apis/machine/v1beta1/machine_types.go#L174).
62+
63+
### Implementation Details
64+
65+
#### Phases
66+
67+
The phase of a Machine is a simple, high-level summary of where the machine is in its lifecycle. In the same vein of [pods](https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-phase) and [cluster API Upstream](https://github.com/kubernetes-sigs/cluster-api/blob/master/docs/proposals/20190610-machine-states-preboot-bootstrapping.md)
68+
69+
MachineSets or other upper level controllers might choose to leverage signaled phases for weighing when choosing machines for scaling down or ignoring failed machines to satisfy replica count.
70+
71+
The phase will be exposed to kubectl additionalPrinterColumns so users can understand the current state of the world for machines at a glance.
72+
73+
The machine controller will set the right phase when its communication flow with the actuator interface meets the following criteria:
74+
75+
##### Provisioning
76+
77+
- Exists() is False.
78+
79+
- Machine has **no** providerID/address.
80+
81+
##### Provisioned
82+
83+
- Exists() is True.
84+
85+
- Machine has **no** status.nodeRef.
86+
87+
##### Running
88+
89+
- Exists() is True.
90+
91+
- Machine has status.nodeRef.
92+
93+
##### Deleting
94+
95+
- Machine has a DeletionTimestamp.
96+
97+
##### Failed
98+
99+
- Create() returns a permanentError type or Exists() is False and machine has a providerID/address.
100+
101+
#### Actuator invariants
102+
103+
- Once the nodeRef is set, it must never be mutated/deleted.
104+
105+
- Once the providerID is set, it must never mutated/deleted.
106+
107+
##### Create()
108+
109+
- It must set providerID and IP addresses.
110+
111+
- It must return a permanentError type for known unrecoverable errors, e.g invalid cloud input or insufficient quota.
112+
113+
##### Update()
114+
115+
- It must not modify cloud infrastructure.
116+
117+
- It should reconcile machine.Status with cloud values.
118+
119+
#### Errors
120+
121+
- A machine entering a "Failed" phase should set the machine.Status.ErrorMessage.
122+
123+
- The machine.Status.ErrorMessage might be bubbled up to the machineSet.Status.ErrorMessage for easier visibility.
124+
125+
126+
#### [Implementation example](https://github.com/enxebre/cluster-api-provider-azure/commit/ef9a0dd68918eb6ca4af50765b07bcae4d309aaf)
127+
128+
### Risks and Mitigations
129+
130+
For it to happen in a non disruptive manner for actuators this proposal is intentionally keeping the surface change small by not breaking the create()/exists()/update() actuators interface.
131+
132+
Once this is settled we might consider to simplify the interface.
133+
134+
## Design Details
135+
136+
### Test Plan
137+
138+
Changes will be test driven via the current e2e machine API suite https://github.com/openshift/cluster-api-actuator-pkg/tree/8250b456dec7b2fb06c591738518de1265e84a2c/pkg/e2e.
139+
140+
### Graduation Criteria
141+
142+
### Upgrade / Downgrade Strategy
143+
144+
Operators must revendor the lastest openshift/cluster-api version. The actuator machine controller image is set in the machine-api-operator repo https://github.com/openshift/machine-api-operator/blob/474e14e4965a8c5e6788417c851ccc7fad1acb3a/install/0000_30_machine-api-operator_01_images.configmap.yaml so the upgrades will be driven by the CVO which will fetch the right image version as usual.
145+
146+
147+
### Version Skew Strategy
148+
149+
## Implementation History
150+
151+
## Drawbacks
152+
153+
## Alternatives
154+
155+
## Infrastructure Needed

0 commit comments

Comments
 (0)