Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes for OWLS-83136 - Limit concurrent pod shutdowns during a cluster shrink #1892

Merged
merged 14 commits into from
Sep 4, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions docs/domains/Domain.json
Original file line number Diff line number Diff line change
Expand Up @@ -117,6 +117,11 @@
"description": "Customization affecting Kubernetes Service generated for this WebLogic cluster.",
"$ref": "#/definitions/KubernetesResource"
},
"maxConcurrentShutdown": {
"description": "The maximum number of WebLogic Server instances that will shut down in parallel for this cluster when it is being partially shut down by lowering its replica count. A value of 0 means there is no limit. Defaults to `spec.maxClusterConcurrentShutdown`, which defaults to 1.",
"type": "number",
"minimum": 0
},
"serverStartPolicy": {
"description": "The strategy for deciding whether to start a WebLogic Server instance. Legal values are NEVER, or IF_NEEDED. Defaults to IF_NEEDED. More info: https://oracle.github.io/weblogic-kubernetes-operator/userguide/managing-domains/domain-lifecycle/startup/#starting-and-stopping-servers.",
"type": "string",
Expand Down Expand Up @@ -354,6 +359,11 @@
"type": "number",
"minimum": 0
},
"maxClusterConcurrentShutdown": {
"description": "The default maximum number of WebLogic Server instances that a cluster will shut down in parallel when it is being partially shut down by lowering its replica count. You can override this default on a per cluster basis by setting the cluster\u0027s `maxConcurrentShutdown` field. A value of 0 means there is no limit. Defaults to 1.",
"type": "number",
"minimum": 0
},
"domainHomeInImage": {
"deprecated": "true",
"description": "Deprecated. Use `domainHomeSourceType` instead. Ignored if `domainHomeSourceType` is specified. True indicates that the domain home file system is present in the container image specified by the image field. False indicates that the domain home file system is located on a persistent volume. Defaults to unset.",
Expand Down
2 changes: 2 additions & 0 deletions docs/domains/Domain.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ The specification of the operation of the WebLogic domain. Required.
| `logHome` | string | The directory in a server's container in which to store the domain, Node Manager, server logs, server *.out, introspector .out, and optionally HTTP access log files if `httpAccessLogInLogHome` is true. Ignored if `logHomeEnabled` is false. |
| `logHomeEnabled` | Boolean | Specifies whether the log home folder is enabled. Defaults to true if `domainHomeSourceType` is PersistentVolume; false, otherwise. |
| `managedServers` | array of [Managed Server](#managed-server) | Lifecycle options for individual Managed Servers, including Java options, environment variables, additional Pod content, and the ability to explicitly start, stop, or restart a named server instance. The `serverName` field of each entry must match a Managed Server that already exists in the WebLogic domain configuration or that matches a dynamic cluster member based on the server template. |
| `maxClusterConcurrentShutdown` | number | The default maximum number of WebLogic Server instances that a cluster will shut down in parallel when it is being partially shut down by lowering its replica count. You can override this default on a per cluster basis by setting the cluster's `maxConcurrentShutdown` field. A value of 0 means there is no limit. Defaults to 1. |
| `maxClusterConcurrentStartup` | number | The maximum number of cluster member Managed Server instances that the operator will start in parallel for a given cluster, if `maxConcurrentStartup` is not specified for a specific cluster under the `clusters` field. A value of 0 means there is no configured limit. Defaults to 0. |
| `replicas` | number | The default number of cluster member Managed Server instances to start for each WebLogic cluster in the domain configuration, unless `replicas` is specified for that cluster under the `clusters` field. For each cluster, the operator will sort cluster member Managed Server names from the WebLogic domain configuration by normalizing any numbers in the Managed Server name and then sorting alphabetically. This is done so that server names such as "managed-server10" come after "managed-server9". The operator will then start Managed Servers from the sorted list, up to the `replicas` count, unless specific Managed Servers are specified as starting in their entry under the `managedServers` field. In that case, the specified Managed Servers will be started and then additional cluster members will be started, up to the `replicas` count, by finding further cluster members in the sorted list that are not already started. If cluster members are started because of their entries under `managedServers`, then a cluster may have more cluster members running than its `replicas` count. Defaults to 0. |
| `restartVersion` | string | Changes to this field cause the operator to restart WebLogic Server instances. More info: https://oracle.github.io/weblogic-kubernetes-operator/userguide/managing-domains/domain-lifecycle/startup/#restarting-servers. |
Expand Down Expand Up @@ -75,6 +76,7 @@ The current status of the operation of the WebLogic domain. Updated automaticall
| `allowReplicasBelowMinDynClusterSize` | Boolean | Specifies whether the number of running cluster members is allowed to drop below the minimum dynamic cluster size configured in the WebLogic domain configuration. Otherwise, the operator will ensure that the number of running cluster members is not less than the minimum dynamic cluster setting. This setting applies to dynamic clusters only. Defaults to true. |
| `clusterName` | string | The name of the cluster. This value must match the name of a WebLogic cluster already defined in the WebLogic domain configuration. Required. |
| `clusterService` | [Kubernetes Resource](#kubernetes-resource) | Customization affecting Kubernetes Service generated for this WebLogic cluster. |
| `maxConcurrentShutdown` | number | The maximum number of WebLogic Server instances that will shut down in parallel for this cluster when it is being partially shut down by lowering its replica count. A value of 0 means there is no limit. Defaults to `spec.maxClusterConcurrentShutdown`, which defaults to 1. |
| `maxConcurrentStartup` | number | The maximum number of Managed Servers instances that the operator will start in parallel for this cluster in response to a change in the `replicas` count. If more Managed Server instances must be started, the operator will wait until a Managed Server Pod is in the `Ready` state before starting the next Managed Server instance. A value of 0 means all Managed Server instances will start in parallel. Defaults to 0. |
| `maxUnavailable` | number | The maximum number of cluster members that can be temporarily unavailable. Defaults to 1. |
| `replicas` | number | The number of cluster member Managed Server instances to start for this WebLogic cluster. The operator will sort cluster member Managed Server names from the WebLogic domain configuration by normalizing any numbers in the Managed Server name and then sorting alphabetically. This is done so that server names such as "managed-server10" come after "managed-server9". The operator will then start Managed Server instances from the sorted list, up to the `replicas` count, unless specific Managed Servers are specified as starting in their entry under the `managedServers` field. In that case, the specified Managed Server instances will be started and then additional cluster members will be started, up to the `replicas` count, by finding further cluster members in the sorted list that are not already started. If cluster members are started because of their related entries under `managedServers`, then this cluster may have more cluster members running than its `replicas` count. Defaults to 0. |
Expand Down
10 changes: 10 additions & 0 deletions docs/domains/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -1038,6 +1038,11 @@
"description": "Customization affecting Kubernetes Service generated for this WebLogic cluster.",
"$ref": "#/definitions/KubernetesResource"
},
"maxConcurrentShutdown": {
"description": "The maximum number of WebLogic Server instances that will shut down in parallel for this cluster when it is being partially shut down by lowering its replica count. A value of 0 means there is no limit. Defaults to `spec.maxClusterConcurrentShutdown`, which defaults to 1.",
"type": "number",
"minimum": 0.0
},
"serverStartPolicy": {
"description": "The strategy for deciding whether to start a WebLogic Server instance. Legal values are NEVER, or IF_NEEDED. Defaults to IF_NEEDED. More info: https://oracle.github.io/weblogic-kubernetes-operator/userguide/managing-domains/domain-lifecycle/startup/#starting-and-stopping-servers.",
"type": "string",
Expand Down Expand Up @@ -1275,6 +1280,11 @@
"type": "number",
"minimum": 0.0
},
"maxClusterConcurrentShutdown": {
"description": "The default maximum number of WebLogic Server instances that a cluster will shut down in parallel when it is being partially shut down by lowering its replica count. You can override this default on a per cluster basis by setting the cluster\u0027s `maxConcurrentShutdown` field. A value of 0 means there is no limit. Defaults to 1.",
"type": "number",
"minimum": 0.0
},
"domainHomeInImage": {
"deprecated": "true",
"description": "Deprecated. Use `domainHomeSourceType` instead. Ignored if `domainHomeSourceType` is specified. True indicates that the domain home file system is present in the container image specified by the image field. False indicates that the domain home file system is located on a persistent volume. Defaults to unset.",
Expand Down
16 changes: 16 additions & 0 deletions kubernetes/crd/domain-crd.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5127,6 +5127,14 @@ spec:
additionalProperties:
type: string
type: object
maxConcurrentShutdown:
description: The maximum number of WebLogic Server instances
that will shut down in parallel for this cluster when it is
being partially shut down by lowering its replica count. A
value of 0 means there is no limit. Defaults to `spec.maxClusterConcurrentShutdown`,
which defaults to 1.
type: number
minimum: 0.0
serverStartPolicy:
description: 'The strategy for deciding whether to start a WebLogic
Server instance. Legal values are NEVER, or IF_NEEDED. Defaults
Expand Down Expand Up @@ -5200,6 +5208,14 @@ spec:
a cluster may have more cluster members running than its `replicas`
count. Defaults to 0.
minimum: 0.0
maxClusterConcurrentShutdown:
type: number
description: The default maximum number of WebLogic Server instances
that a cluster will shut down in parallel when it is being partially
shut down by lowering its replica count. You can override this default
on a per cluster basis by setting the cluster's `maxConcurrentShutdown`
field. A value of 0 means there is no limit. Defaults to 1.
minimum: 0.0
domainHomeInImage:
type: boolean
description: Deprecated. Use `domainHomeSourceType` instead. Ignored
Expand Down
16 changes: 16 additions & 0 deletions kubernetes/crd/domain-v1beta1-crd.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5114,6 +5114,14 @@ spec:
additionalProperties:
type: string
type: object
maxConcurrentShutdown:
description: The maximum number of WebLogic Server instances that
will shut down in parallel for this cluster when it is being
partially shut down by lowering its replica count. A value of
0 means there is no limit. Defaults to `spec.maxClusterConcurrentShutdown`,
which defaults to 1.
type: number
minimum: 0.0
serverStartPolicy:
description: 'The strategy for deciding whether to start a WebLogic
Server instance. Legal values are NEVER, or IF_NEEDED. Defaults
Expand Down Expand Up @@ -5185,6 +5193,14 @@ spec:
then a cluster may have more cluster members running than its `replicas`
count. Defaults to 0.
minimum: 0.0
maxClusterConcurrentShutdown:
type: number
description: The default maximum number of WebLogic Server instances
that a cluster will shut down in parallel when it is being partially
shut down by lowering its replica count. You can override this default
on a per cluster basis by setting the cluster's `maxConcurrentShutdown`
field. A value of 0 means there is no limit. Defaults to 1.
minimum: 0.0
domainHomeInImage:
type: boolean
description: Deprecated. Use `domainHomeSourceType` instead. Ignored
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ public interface KubernetesConstants {
boolean DEFAULT_INCLUDE_SERVER_OUT_IN_POD_LOG = true;
boolean DEFAULT_ALLOW_REPLICAS_BELOW_MIN_DYN_CLUSTER_SIZE = true;
int DEFAULT_MAX_CLUSTER_CONCURRENT_START_UP = 0;
int DEFAULT_MAX_CLUSTER_CONCURRENT_SHUTDOWN = 1;

String CONTAINER_NAME = "weblogic-server";

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,7 @@ public class AsyncRequestStep<T> extends Step implements RetryStrategyListener {
private final RequestParams requestParams;
private final CallFactory<T> factory;
private final int maxRetryCount;
private final RetryStrategy customRetryStrategy;
private final String fieldSelector;
private final String labelSelector;
private final String resourceVersion;
Expand Down Expand Up @@ -78,10 +79,40 @@ public AsyncRequestStep(
String fieldSelector,
String labelSelector,
String resourceVersion) {
this(next, requestParams, factory, null, helper, timeoutSeconds, maxRetryCount,
fieldSelector, labelSelector, resourceVersion);
}

/**
* Construct async step.
*
* @param next Next
* @param requestParams Request parameters
* @param factory Factory
* @param customRetryStrategy Custom retry strategy
* @param helper Client pool
* @param timeoutSeconds Timeout
* @param maxRetryCount Max retry count
* @param fieldSelector Field selector
* @param labelSelector Label selector
* @param resourceVersion Resource version
*/
public AsyncRequestStep(
ResponseStep<T> next,
RequestParams requestParams,
CallFactory<T> factory,
RetryStrategy customRetryStrategy,
ClientPool helper,
int timeoutSeconds,
int maxRetryCount,
String fieldSelector,
String labelSelector,
String resourceVersion) {
super(next);
this.helper = helper;
this.requestParams = requestParams;
this.factory = factory;
this.customRetryStrategy = customRetryStrategy;
this.timeoutSeconds = timeoutSeconds;
this.maxRetryCount = maxRetryCount;
this.fieldSelector = fieldSelector;
Expand Down Expand Up @@ -227,6 +258,9 @@ public NextAction apply(Packet packet) {

retry = oldResponse.getSpi(RetryStrategy.class);
}
if ((retry == null) && (customRetryStrategy != null)) {
retry = customRetryStrategy;
}

if (LOGGER.isFinerEnabled()) {
logAsyncRequest();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,15 @@

import oracle.kubernetes.operator.calls.CallFactory;
import oracle.kubernetes.operator.calls.RequestParams;
import oracle.kubernetes.operator.calls.RetryStrategy;
import oracle.kubernetes.operator.work.Step;

public interface AsyncRequestStepFactory {
<T> Step createRequestAsync(
ResponseStep<T> next,
RequestParams requestParams,
CallFactory<T> factory,
RetryStrategy retryStrategy,
ClientPool helper,
int timeoutSeconds,
int maxRetryCount,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@
import oracle.kubernetes.operator.calls.CallWrapper;
import oracle.kubernetes.operator.calls.CancellableCall;
import oracle.kubernetes.operator.calls.RequestParams;
import oracle.kubernetes.operator.calls.RetryStrategy;
import oracle.kubernetes.operator.calls.SynchronousCallDispatcher;
import oracle.kubernetes.operator.calls.SynchronousCallFactory;
import oracle.kubernetes.operator.work.Step;
Expand Down Expand Up @@ -230,6 +231,7 @@ public <T> T execute(
null,
callback));
private String fieldSelector;
private RetryStrategy retryStrategy;

/* Version */
private String labelSelector;
Expand Down Expand Up @@ -516,6 +518,11 @@ public CallBuilder withFieldSelector(String fieldSelector) {
return this;
}

public CallBuilder withRetryStrategy(RetryStrategy retryStrategy) {
this.retryStrategy = retryStrategy;
return this;
}

private void tuning(int limit, int timeoutSeconds, int maxRetryCount) {
this.limit = limit;
this.timeoutSeconds = timeoutSeconds;
Expand Down Expand Up @@ -1181,7 +1188,8 @@ public Step deletePodAsync(
V1DeleteOptions deleteOptions,
ResponseStep<V1Status> responseStep) {
return createRequestAsync(
responseStep, new RequestParams("deletePod", namespace, name, deleteOptions, domainUid), deletePod);
responseStep, new RequestParams("deletePod", namespace, name, deleteOptions, domainUid),
deletePod, retryStrategy);
}

private Call patchPodAsync(
Expand Down Expand Up @@ -1836,6 +1844,7 @@ private <T> Step createRequestAsync(
next,
requestParams,
factory,
null,
helper,
timeoutSeconds,
maxRetryCount,
Expand All @@ -1844,6 +1853,21 @@ private <T> Step createRequestAsync(
resourceVersion);
}

private <T> Step createRequestAsync(
ResponseStep<T> next, RequestParams requestParams, CallFactory<T> factory, RetryStrategy retryStrategy) {
return STEP_FACTORY.createRequestAsync(
next,
requestParams,
factory,
retryStrategy,
helper,
timeoutSeconds,
maxRetryCount,
fieldSelector,
labelSelector,
resourceVersion);
}

private CancellableCall wrap(Call call) {
return new CallWrapper(call);
}
Expand Down
Loading