Skip to content

Commit bb4be0d

Browse files
texasmichelleYixin Shi
authored and
Yixin Shi
committed
Simple pipeline demo (kubeflow#322)
* Add simple pipeline demo * Add hyperparameter tuning & GPU autoprovisioning Use pipelines v0.1.2 * Resolve lint issues * Disable lint warning Correct SDK syntax that labels the name of the pipeline step * Add postprocessing step Basically empty step just to show more than one step * Add clarity to instructions * Update pipelines install to release v0.1.2 * Add repo cloning with release versions Remove katib patch Use kubeflow v0.3.3 Add PROJECT to env var override file Further clarification of instructions
1 parent 6814601 commit bb4be0d

7 files changed

+577
-0
lines changed

Diff for: demos/simple_pipeline/README.md

+133
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,133 @@
1+
# Kubeflow demo - Simple pipeline
2+
3+
## Hyperparameter tuning and autoprovisioning GPU nodes
4+
5+
This folder contains a demonstration of Kubeflow capabilities, suitable for
6+
presentation to public audiences.
7+
8+
This demo highlights the use of pipelines and hyperparameter tuning on a GKE
9+
cluster with node autoprovisioning (NAP). A simple pipeline requests GPU resources, which triggers
10+
node pool creation. This demo includes the following steps:
11+
12+
1. [Setup your environment](#1-setup-your-environment)
13+
1. [Run a simple pipeline](#2-run-a-simple-pipeline)
14+
1. [Perform hyperparameter tuning](#3-perform-hyperparameter-tuning)
15+
1. [Run a better pipeline](#4-run-a-better-pipeline)
16+
17+
## 1. Setup your environment
18+
19+
Follow the instructions in
20+
[demo_setup/README.md](https://github.com/kubeflow/examples/blob/master/demos/simple_pipeline/demo_setup/README.md)
21+
to setup your environment and install Kubeflow with pipelines on an
22+
autoprovisioning GKE cluster.
23+
24+
View the installed components in the GCP Console.
25+
* In the
26+
[Kubernetes Engine](https://console.cloud.google.com/kubernetes)
27+
section, you will see a new cluster ${CLUSTER} with 3 `n1-standard-1` nodes
28+
* Under
29+
[Workloads](https://console.cloud.google.com/kubernetes/workload),
30+
you will see all the default Kubeflow and pipeline components.
31+
32+
Source the environment file and activate the conda environment for pipelines:
33+
34+
```
35+
source kubeflow-demo-simple-pipeline.env
36+
source activate kfp
37+
```
38+
39+
## 2. Run a simple pipeline
40+
41+
Show the file `gpu-example-pipeline.py` as an example of a simple pipeline.
42+
43+
Compile it to create a .tar.gz file:
44+
45+
```
46+
./gpu-example-pipeline.py
47+
```
48+
49+
View the pipelines UI locally by forwarding a port to the ml-pipeline-ui pod:
50+
51+
```
52+
PIPELINES_POD=$(kubectl get po -l app=ml-pipeline-ui | \
53+
grep ml-pipeline-ui | \
54+
head -n 1 | \
55+
cut -d " " -f 1 )
56+
kubectl port-forward ${PIPELINES_POD} 8080:3000
57+
```
58+
59+
In the browser, navigate to `localhost:8080` and create a new pipeline by
60+
uploading `gpu-example-pipeline.py.tar.gz`. Select the pipeline and click
61+
_Create experiment_. Use all suggested defaults.
62+
63+
View the effects of autoprovisioning by observing the number of nodes increase.
64+
65+
Select _Experiments_ from the left-hand side, then _Runs_. Click on the
66+
experiment run to view the graph and watch it execute.
67+
68+
View the container logs for the training step and take note of the low accuracy (~0.113).
69+
70+
## 3. Perform hyperparameter tuning
71+
72+
In order to determine parameters that result in higher accuracy, use Katib
73+
to execute a Study, which defines a search space for performing training with a
74+
range of different parameters.
75+
76+
Create a Study by applying an
77+
[example file](https://github.com/kubeflow/katib/blob/master/examples/gpu-example.yaml)
78+
to the cluster:
79+
80+
```
81+
kubectl apply -f gpu-example-katib.yaml
82+
```
83+
84+
This creates a Studyjob object. To view it:
85+
86+
```
87+
kubectl get studyjob
88+
kubectl describe studyjobs gpu-example
89+
```
90+
91+
To view the Katib UI, connect to the modeldb-frontend pod:
92+
93+
```
94+
KATIB_POD=$(kubectl get po -l app=modeldb,component=frontend | \
95+
grep modeldb-frontend | \
96+
head -n 1 | \
97+
cut -d " " -f 1 )
98+
kubectl port-forward ${KATIB_POD} 8081:3000
99+
```
100+
101+
In the browser, navigate to `localhost:8081/katib` and click on the
102+
gpu-example project. In the _Explore Visualizations_ section, select
103+
_Optimizer_ in the _Group By_ dropdown, then click _Compare_.
104+
105+
While you're waiting, watch for autoprovisioning to occur. View the pods in Pending status.
106+
107+
View the creation of a new GPU node pool:
108+
109+
```
110+
gcloud container node-pools list --cluster ${CLUSTER}
111+
```
112+
113+
View the creation of new nodes:
114+
115+
```
116+
kubectl get nodes
117+
```
118+
119+
In the Katib UI, interact with the various graphs to determine which
120+
combination of parameters results in the highest accuracy. Grouping by optimizer
121+
type is one way to find consistently higher accuracies. Gather a set of
122+
parameters to use in a new run of the pipeline.
123+
124+
## 4. Run a better pipeline
125+
126+
In the pipelines UI, clone the previous experiment run and update the arguments
127+
to match the parameters for one of the runs with higher accuracies from the
128+
Katib UI. Execute the pipeline and watch for the resulting accuracy, which
129+
should be closer to 0.98.
130+
131+
Approximately 5 minutes after the last run completes, check the cluster nodes
132+
to verify that GPU nodes have disappeared.
133+

Diff for: demos/simple_pipeline/demo_setup/README.md

+183
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,183 @@
1+
# Kubeflow demo - Simple pipeline
2+
3+
This repository contains a demonstration of Kubeflow capabilities, suitable for
4+
presentation to public audiences.
5+
6+
The base demo includes the following steps:
7+
8+
1. [Setup your environment](#1-setup-your-environment)
9+
1. [Create a GKE cluster and install Kubeflow](#2-create-a-gke-cluster-and-install-kubeflow)
10+
1. [Install pipelines on GKE](#3-install-pipelines-on-gke)
11+
12+
## 1. Setup your environment
13+
14+
Clone the [kubeflow/kubeflow](https://github.com/kubeflow/kubeflow) repo and
15+
checkout the
16+
[`v0.3.3`](https://github.com/kubeflow/kubeflow/releases/tag/v0.3.3) branch.
17+
Clone the [kubeflow/pipelines](https://github.com/kubeflow/pipelines) repo and
18+
checkout the
19+
[`0.1.2`](https://github.com/kubeflow/pipelines/releases/tag/0.1.2) branch.
20+
21+
Ensure that the repo paths, project name, and other variables are set correctly.
22+
When all overrides are set, source the environment file:
23+
24+
```
25+
source kubeflow-demo-simple-pipeline.env
26+
```
27+
28+
Create a clean python environment for installing Kubeflow Pipelines:
29+
30+
```
31+
conda create --name kfp python=3.6
32+
source activate kfp
33+
```
34+
35+
Install the Kubeflow Pipelines SDK:
36+
37+
```
38+
pip install https://storage.googleapis.com/ml-pipeline/release/0.1.2/kfp.tar.gz --upgrade
39+
```
40+
41+
## 2. Create a GKE cluster and install Kubeflow
42+
43+
Creating a cluster with click-to-deploy does not yet support the installation of
44+
pipelines. It is not useful for demonstrating pipelines, but is still worth showing.
45+
46+
### Click-to-deploy
47+
48+
Generate a web app Client ID and Client Secret by following the instructions
49+
[here](https://www.kubeflow.org/docs/started/getting-started-gke/#create-oauth-client-credentials).
50+
Save these as environment variables for easy access.
51+
52+
In the browser, navigate to the
53+
[Click-to-deploy app](https://deploy.kubeflow.cloud/). Enter the project name,
54+
along with the Client ID and Client Secret previously generated. Select the
55+
desired ${ZONE} and latest version of Kubeflow, then click _Create Deployment_.
56+
57+
In the [GCP Console](https://console.cloud.google.com/kubernetes), navigate to the
58+
Kubernetes Engine panel to watch the cluster creation process. This results in a
59+
full cluster with Kubeflow installed.
60+
61+
### kfctl
62+
63+
While node autoprovisioning is in beta, it must be enabled manually. To create
64+
a cluster with autoprovisioning, run the following commands, which will take
65+
around 30 minutes:
66+
67+
```
68+
gcloud container clusters create ${CLUSTER} \
69+
--project ${DEMO_PROJECT} \
70+
--zone ${ZONE} \
71+
--cluster-version 1.11.2-gke.9 \
72+
--num-nodes=8 \
73+
--scopes cloud-platform,compute-rw,storage-rw \
74+
--verbosity error
75+
76+
# scale down cluster to 3 (initial 8 is just to prevent master restarts due to upscaling)
77+
# we cannot use 0 because then cluster autoscaler treats the cluster as unhealthy.
78+
# Also having a few small non-gpu nodes is needed to handle system pods
79+
gcloud container clusters resize ${CLUSTER} \
80+
--project ${DEMO_PROJECT} \
81+
--zone ${ZONE} \
82+
--size=3 \
83+
--node-pool=default-pool
84+
85+
# enable node auto provisioning
86+
gcloud beta container clusters update ${CLUSTER} \
87+
--project ${DEMO_PROJECT} \
88+
--zone ${ZONE} \
89+
--enable-autoprovisioning \
90+
--max-cpu 20 \
91+
--max-memory 200 \
92+
--max-accelerator=type=nvidia-tesla-k80,count=8
93+
```
94+
95+
Once the cluster has been created, install GPU drivers:
96+
97+
```
98+
kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/master/daemonset.yaml
99+
```
100+
101+
Add RBAC permissions, which allows your user to install kubeflow components on
102+
the cluster:
103+
104+
```
105+
kubectl create clusterrolebinding cluster-admin-binding-${USER} \
106+
--clusterrole cluster-admin \
107+
--user $(gcloud config get-value account)
108+
```
109+
110+
Setup kubectl access:
111+
112+
```
113+
kubectl create namespace kubeflow
114+
./create_context.sh gke ${NAMESPACE}
115+
```
116+
117+
Setup OAuth environment variables ${CLIENT_ID} and ${CLIENT_SECRET} using the
118+
instructions
119+
[here](https://www.kubeflow.org/docs/started/getting-started-gke/#create-oauth-client-credentials).
120+
121+
```
122+
kubectl create secret generic kubeflow-oauth --from-literal=client_id=${CLIENT_ID} --from-literal=client_secret=${CLIENT_SECRET}
123+
```
124+
125+
Create service accounts, add permissions, download credentials, and create secrets:
126+
127+
```
128+
ADMIN_EMAIL=${CLUSTER}-admin@${PROJECT}.iam.gserviceaccount.com
129+
USER_EMAIL=${CLUSTER}-user@${PROJECT}.iam.gserviceaccount.com
130+
ADMIN_FILE=${HOME}/.ssh/${ADMIN_EMAIL}.json
131+
USER_FILE=${HOME}/.ssh/${ADMIN_EMAIL}.json
132+
133+
gcloud iam service-accounts create ${CLUSTER}-admin --display-name=${CLUSTER}-admin
134+
gcloud iam service-accounts create ${CLUSTER}-user --display-name=${CLUSTER}-user
135+
136+
gcloud projects add-iam-policy-binding ${PROJECT} \
137+
--member=serviceAccount:${ADMIN_EMAIL} \
138+
--role=roles/storage.admin
139+
gcloud projects add-iam-policy-binding ${PROJECT} \
140+
--member=serviceAccount:${USER_EMAIL} \
141+
--role=roles/storage.admin
142+
143+
gcloud iam service-accounts keys create ${ADMIN_FILE} \
144+
--project ${PROJECT} \
145+
--iam-account ${ADMIN_EMAIL}
146+
gcloud iam service-accounts keys create ${USER_FILE} \
147+
--project ${PROJECT} \
148+
--iam-account ${USER_EMAIL}
149+
150+
kubectl create secret generic admin-gcp-sa \
151+
--from-file=admin-gcp-sa.json=${ADMIN_FILE}
152+
kubectl create secret generic user-gcp-sa \
153+
--from-file=user-gcp-sa.json=${USER_FILE}
154+
```
155+
156+
Install kubeflow with the following commands:
157+
158+
```
159+
kfctl init ${CLUSTER} --platform gcp
160+
cd ${CLUSTER}
161+
kfctl generate k8s
162+
kfctl apply k8s
163+
```
164+
165+
## 3. Install pipelines on GKE
166+
167+
```
168+
kubectl create clusterrolebinding sa-admin --clusterrole=cluster-admin --serviceaccount=kubeflow:pipeline-runner
169+
cd ks_app
170+
ks registry add ml-pipeline "${PIPELINES_REPO}/ml-pipeline"
171+
ks pkg install ml-pipeline/ml-pipeline
172+
ks generate ml-pipeline ml-pipeline
173+
ks param set ml-pipeline namespace kubeflow
174+
ks apply default -c ml-pipeline
175+
```
176+
177+
View the installed components in the GCP Console. In the
178+
[Kubernetes Engine](https://console.cloud.google.com/kubernetes)
179+
section, you will see a new cluster ${CLUSTER}. Under
180+
[Workloads](https://console.cloud.google.com/kubernetes/workload),
181+
you will see all the default Kubeflow and pipeline components.
182+
183+

Diff for: demos/simple_pipeline/gpu-example-katib.yaml

+39
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
apiVersion: "kubeflow.org/v1alpha1"
2+
kind: StudyJob
3+
metadata:
4+
namespace: kubeflow
5+
labels:
6+
controller-tools.k8s.io: "1.0"
7+
name: gpu-example
8+
spec:
9+
studyName: gpu-example
10+
owner: crd
11+
optimizationtype: maximize
12+
objectivevaluename: Validation-accuracy
13+
optimizationgoal: 0.99
14+
metricsnames:
15+
- accuracy
16+
parameterconfigs:
17+
- name: --lr
18+
parametertype: double
19+
feasible:
20+
min: "0.01"
21+
max: "0.03"
22+
- name: --num-layers
23+
parametertype: int
24+
feasible:
25+
min: "2"
26+
max: "3"
27+
- name: --optimizer
28+
parametertype: categorical
29+
feasible:
30+
list:
31+
- sgd
32+
- adam
33+
- ftrl
34+
workerSpec:
35+
goTemplate:
36+
templatePath: "/worker-template/gpuWorkerTemplate.yaml"
37+
suggestionSpec:
38+
suggestionAlgorithm: "random"
39+
requestNumber: 3

0 commit comments

Comments
 (0)