yixinshi
diff --git a/Diff for: ‎demos/simple_pipeline/README.md
+133 b/Diff for: ‎demos/simple_pipeline/README.md
+133
diff --git a/Diff for: ‎demos/simple_pipeline/demo_setup/README.md
+183 b/Diff for: ‎demos/simple_pipeline/demo_setup/README.md
+183
diff --git a/Diff for: ‎demos/simple_pipeline/gpu-example-katib.yaml
+39 b/Diff for: ‎demos/simple_pipeline/gpu-example-katib.yaml
+39
@@ -0,0 +1,133 @@
+# Kubeflow demo - Simple pipeline
+
+## Hyperparameter tuning and autoprovisioning GPU nodes
+
+This folder contains a demonstration of Kubeflow capabilities, suitable for
+presentation to public audiences.
+
+This demo highlights the use of pipelines and hyperparameter tuning on a GKE
+cluster with node autoprovisioning (NAP). A simple pipeline requests GPU resources, which triggers
+node pool creation. This demo includes the following steps:
+
+1. [Setup your environment](#1-setup-your-environment)
+1. [Run a simple pipeline](#2-run-a-simple-pipeline)
+1. [Perform hyperparameter tuning](#3-perform-hyperparameter-tuning)
+1. [Run a better pipeline](#4-run-a-better-pipeline)
+
+## 1. Setup your environment
+
+Follow the instructions in
+[demo_setup/README.md](https://github.com/kubeflow/examples/blob/master/demos/simple_pipeline/demo_setup/README.md)
+to setup your environment and install Kubeflow with pipelines on an
+autoprovisioning GKE cluster.
+
+View the installed components in the GCP Console.
+*  In the
+[Kubernetes Engine](https://console.cloud.google.com/kubernetes)
+section, you will see a new cluster ${CLUSTER} with 3 `n1-standard-1` nodes
+*  Under
+[Workloads](https://console.cloud.google.com/kubernetes/workload),
+you will see all the default Kubeflow and pipeline components.
+
+Source the environment file and activate the conda environment for pipelines:
+
+```
+source kubeflow-demo-simple-pipeline.env
+source activate kfp
+```
+
+## 2. Run a simple pipeline
+
+Show the file `gpu-example-pipeline.py` as an example of a simple pipeline.
+
+Compile it to create a .tar.gz file:
+
+```
+./gpu-example-pipeline.py
+```
+
+View the pipelines UI locally by forwarding a port to the ml-pipeline-ui pod:
+
+```
+PIPELINES_POD=$(kubectl get po -l app=ml-pipeline-ui | \
+  grep ml-pipeline-ui | \
+  head -n 1 | \
+  cut -d " " -f 1 )
+kubectl port-forward ${PIPELINES_POD} 8080:3000
+```
+
+In the browser, navigate to `localhost:8080` and create a new pipeline by
+uploading `gpu-example-pipeline.py.tar.gz`. Select the pipeline and click
+_Create experiment_. Use all suggested defaults.
+
+View the effects of autoprovisioning by observing the number of nodes increase.
+
+Select _Experiments_ from the left-hand side, then _Runs_. Click on the
+experiment run to view the graph and watch it execute.
+
+View the container logs for the training step and take note of the low accuracy (~0.113).
+
+## 3. Perform hyperparameter tuning
+
+In order to determine parameters that result in higher accuracy, use Katib
+to execute a Study, which defines a search space for performing training with a
+range of different parameters.
+
+Create a Study by applying an
+[example file](https://github.com/kubeflow/katib/blob/master/examples/gpu-example.yaml)
+to the cluster:
+
+```
+kubectl apply -f gpu-example-katib.yaml
+```
+
+This creates a Studyjob object. To view it:
+
+```
+kubectl get studyjob
+kubectl describe studyjobs gpu-example
+```
+
+To view the Katib UI, connect to the modeldb-frontend pod:
+
+```
+KATIB_POD=$(kubectl get po -l app=modeldb,component=frontend | \
+  grep modeldb-frontend | \
+  head -n 1 | \
+  cut -d " " -f 1 )
+kubectl port-forward ${KATIB_POD} 8081:3000
+```
+
+In the browser, navigate to `localhost:8081/katib` and click on the
+gpu-example project. In the _Explore Visualizations_ section, select
+_Optimizer_ in the _Group By_ dropdown, then click _Compare_.
+
+While you're waiting, watch for autoprovisioning to occur. View the pods in Pending status.
+
+View the creation of a new GPU node pool:
+
+```
+gcloud container node-pools list --cluster ${CLUSTER}
+```
+
+View the creation of new nodes:
+
+```
+kubectl get nodes
+```
+
+In the Katib UI, interact with the various graphs to determine which
+combination of parameters results in the highest accuracy. Grouping by optimizer
+type is one way to find consistently higher accuracies. Gather a set of
+parameters to use in a new run of the pipeline.
+
+## 4. Run a better pipeline
+
+In the pipelines UI, clone the previous experiment run and update the arguments
+to match the parameters for one of the runs with higher accuracies from the
+Katib UI. Execute the pipeline and watch for the resulting accuracy, which
+should be closer to 0.98.
+
+Approximately 5 minutes after the last run completes, check the cluster nodes
+to verify that GPU nodes have disappeared.
+
@@ -0,0 +1,183 @@
+# Kubeflow demo - Simple pipeline
+
+This repository contains a demonstration of Kubeflow capabilities, suitable for
+presentation to public audiences.
+
+The base demo includes the following steps:
+
+1. [Setup your environment](#1-setup-your-environment)
+1. [Create a GKE cluster and install Kubeflow](#2-create-a-gke-cluster-and-install-kubeflow)
+1. [Install pipelines on GKE](#3-install-pipelines-on-gke)
+
+## 1. Setup your environment
+
+Clone the [kubeflow/kubeflow](https://github.com/kubeflow/kubeflow) repo and
+checkout the
+[`v0.3.3`](https://github.com/kubeflow/kubeflow/releases/tag/v0.3.3) branch.
+Clone the [kubeflow/pipelines](https://github.com/kubeflow/pipelines) repo and
+checkout the
+[`0.1.2`](https://github.com/kubeflow/pipelines/releases/tag/0.1.2) branch.
+
+Ensure that the repo paths, project name, and other variables are set correctly.
+When all overrides are set, source the environment file:
+
+```
+source kubeflow-demo-simple-pipeline.env
+```
+
+Create a clean python environment for installing Kubeflow Pipelines:
+
+```
+conda create --name kfp python=3.6
+source activate kfp
+```
+
+Install the Kubeflow Pipelines SDK:
+
+```
+pip install https://storage.googleapis.com/ml-pipeline/release/0.1.2/kfp.tar.gz --upgrade
+```
+
+## 2. Create a GKE cluster and install Kubeflow
+
+Creating a cluster with click-to-deploy does not yet support the installation of
+pipelines. It is not useful for demonstrating pipelines, but is still worth showing.
+
+### Click-to-deploy
+
+Generate a web app Client ID and Client Secret by following the instructions
+[here](https://www.kubeflow.org/docs/started/getting-started-gke/#create-oauth-client-credentials).
+Save these as environment variables for easy access.
+
+In the browser, navigate to the
+[Click-to-deploy app](https://deploy.kubeflow.cloud/). Enter the project name,
+along with the Client ID and Client Secret previously generated. Select the
+desired ${ZONE} and latest version of Kubeflow, then click _Create Deployment_.
+
+In the [GCP Console](https://console.cloud.google.com/kubernetes), navigate to the
+Kubernetes Engine panel to watch the cluster creation process. This results in a
+full cluster with Kubeflow installed.
+
+### kfctl
+
+While node autoprovisioning is in beta, it must be enabled manually. To create
+a cluster with autoprovisioning, run the following commands, which will take
+around 30 minutes:
+
+```
+gcloud container clusters create ${CLUSTER} \
+  --project ${DEMO_PROJECT} \
+  --zone ${ZONE} \
+  --cluster-version 1.11.2-gke.9 \
+  --num-nodes=8 \
+  --scopes cloud-platform,compute-rw,storage-rw \
+  --verbosity error
+
+# scale down cluster to 3 (initial 8 is just to prevent master restarts due to upscaling)
+# we cannot use 0 because then cluster autoscaler treats the cluster as unhealthy.
+# Also having a few small non-gpu nodes is needed to handle system pods
+gcloud container clusters resize ${CLUSTER} \
+  --project ${DEMO_PROJECT} \
+  --zone ${ZONE} \
+  --size=3 \
+  --node-pool=default-pool
+
+# enable node auto provisioning
+gcloud beta container clusters update ${CLUSTER} \
+  --project ${DEMO_PROJECT} \
+  --zone ${ZONE} \
+  --enable-autoprovisioning \
+  --max-cpu 20 \
+  --max-memory 200 \
+  --max-accelerator=type=nvidia-tesla-k80,count=8
+```
+
+Once the cluster has been created, install GPU drivers:
+
+```
+kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/master/daemonset.yaml
+```
+
+Add RBAC permissions, which allows your user to install kubeflow components on
+the cluster:
+
+```
+kubectl create clusterrolebinding cluster-admin-binding-${USER} \
+  --clusterrole cluster-admin \
+  --user $(gcloud config get-value account)
+```
+
+Setup kubectl access:
+
+```
+kubectl create namespace kubeflow
+./create_context.sh gke ${NAMESPACE}
+```
+
+Setup OAuth environment variables ${CLIENT_ID} and ${CLIENT_SECRET} using the
+instructions
+[here](https://www.kubeflow.org/docs/started/getting-started-gke/#create-oauth-client-credentials).
+
+```
+kubectl create secret generic kubeflow-oauth --from-literal=client_id=${CLIENT_ID} --from-literal=client_secret=${CLIENT_SECRET}
+```
+
+Create service accounts, add permissions, download credentials, and create secrets:
+
+```
+ADMIN_EMAIL=${CLUSTER}-admin@${PROJECT}.iam.gserviceaccount.com
+USER_EMAIL=${CLUSTER}-user@${PROJECT}.iam.gserviceaccount.com
+ADMIN_FILE=${HOME}/.ssh/${ADMIN_EMAIL}.json
+USER_FILE=${HOME}/.ssh/${ADMIN_EMAIL}.json
+
+gcloud iam service-accounts create ${CLUSTER}-admin --display-name=${CLUSTER}-admin
+gcloud iam service-accounts create ${CLUSTER}-user --display-name=${CLUSTER}-user
+
+gcloud projects add-iam-policy-binding ${PROJECT} \
+  --member=serviceAccount:${ADMIN_EMAIL} \
+  --role=roles/storage.admin
+gcloud projects add-iam-policy-binding ${PROJECT} \
+  --member=serviceAccount:${USER_EMAIL} \
+  --role=roles/storage.admin
+
+gcloud iam service-accounts keys create ${ADMIN_FILE} \
+  --project ${PROJECT} \
+  --iam-account ${ADMIN_EMAIL}
+gcloud iam service-accounts keys create ${USER_FILE} \
+  --project ${PROJECT} \
+  --iam-account ${USER_EMAIL}
+
+kubectl create secret generic admin-gcp-sa \
+  --from-file=admin-gcp-sa.json=${ADMIN_FILE}
+kubectl create secret generic user-gcp-sa \
+  --from-file=user-gcp-sa.json=${USER_FILE}
+```
+
+Install kubeflow with the following commands:
+
+```
+kfctl init ${CLUSTER} --platform gcp
+cd ${CLUSTER}
+kfctl generate k8s
+kfctl apply k8s
+```
+
+## 3. Install pipelines on GKE
+
+```
+kubectl create clusterrolebinding sa-admin --clusterrole=cluster-admin --serviceaccount=kubeflow:pipeline-runner
+cd ks_app
+ks registry add ml-pipeline "${PIPELINES_REPO}/ml-pipeline"
+ks pkg install ml-pipeline/ml-pipeline
+ks generate ml-pipeline ml-pipeline
+ks param set ml-pipeline namespace kubeflow
+ks apply default -c ml-pipeline
+```
+
+View the installed components in the GCP Console. In the
+[Kubernetes Engine](https://console.cloud.google.com/kubernetes)
+section, you will see a new cluster ${CLUSTER}. Under
+[Workloads](https://console.cloud.google.com/kubernetes/workload),
+you will see all the default Kubeflow and pipeline components.
+
+
@@ -0,0 +1,39 @@
+apiVersion: "kubeflow.org/v1alpha1"
+kind: StudyJob
+metadata:
+  namespace: kubeflow
+  labels:
+    controller-tools.k8s.io: "1.0"
+  name: gpu-example
+spec:
+  studyName: gpu-example
+  owner: crd
+  optimizationtype: maximize
+  objectivevaluename: Validation-accuracy
+  optimizationgoal: 0.99
+  metricsnames:
+    - accuracy
+  parameterconfigs:
+    - name: --lr
+      parametertype: double
+      feasible:
+        min: "0.01"
+        max: "0.03"
+    - name: --num-layers
+      parametertype: int
+      feasible:
+        min: "2"
+        max: "3"
+    - name: --optimizer
+      parametertype: categorical
+      feasible:
+        list:
+        - sgd
+        - adam
+        - ftrl
+  workerSpec:
+    goTemplate:
+        templatePath: "/worker-template/gpuWorkerTemplate.yaml"
+  suggestionSpec:
+    suggestionAlgorithm: "random"
+    requestNumber: 3