Kubeflow is a machine learning toolkit for Kubernetes. The project is dedicated to making deployments of machine learning (ML) workflows on Kubernetes simple, portable, and scalable. The goal is to provide a straightforward way to deploy best-of-breed open-source systems for ML to diverse infrastructures.

What does a Kubeflow deployment look like?

A Kubeflow deployment is:

It is a means of organizing loosely-coupled microservices as a single unit and deploying them to a variety of locations, whether that's a laptop or the cloud. This codelab will walk you through creating your own Kubeflow deployment.

What you'll build

In this codelab, you're going to build a web app that summarizes GitHub issues using a trained model. It is based on the walkthrough provided in the Kubeflow Examples repo. Upon completion, your infrastructure will contain:

What you'll learn

What you'll need

This is an advanced codelab focused on Kubeflow. For more background and an introduction to the platform, see the Introduction to Kubeflow on Kubernetes codelab. Non-relevant concepts and code blocks are glossed over and provided for you to simply copy and paste.

Choose one of the following environments for running this codelab:

Cloud Shell

This link clones the Kubeflow Examples repo and places it in the ~/examples directory.

Download in Google Cloud Shell

Once you have the project files, checkout the v0.5.1 branch, which contains the resources you will need:

cd ${HOME}/examples/github_issue_summarization
export KUBEFLOW_TAG=0.5.1
git checkout v${KUBEFLOW_TAG}

Enable Boost Mode

In the Cloud Shell window, click on the Settings dropdown at the far right. Select Enable Boost Mode. This will provision a larger instance for your Cloud Shell session, resulting in speedier Docker builds. If you can't find this menu, ensure the main Navigation Menu is hidden by clicking the three lines at the top left of the screen, next to the Google Cloud Platform logo.

Local Linux or MacOS

This link downloads an archive of the Kubeflow examples repo. Unpacking the downloaded zip file will produce a root folder (examples-0.5.1) containing all of the official Kubeflow examples.

Download locally

Unzip and move the folder for consistency with the absolute paths in this codelab:

cd ${HOME}
export KUBEFLOW_TAG=0.5.1
unzip v${KUBEFLOW_TAG}.zip
mv examples-${KUBEFLOW_TAG} ${HOME}/examples

Set your GitHub token

This codelab involves the use of many different files obtained from public repos on GitHub. To prevent rate-limiting, especially at events where a large number of anonymized requests are sent to the GitHub APIs, setup an access token with no permissions. This is simply to authorize you as an individual rather than anonymous user.

  1. Navigate to https://github.com/settings/tokens and generate a new token with no permissions.
  2. Save it somewhere safe. If you lose it, you will need to delete and create a new one.
  3. Set the GITHUB_TOKEN environment variable:
export GITHUB_TOKEN=<token>

Installing pyyaml

Ensure that pyyaml is installed by running:

pip install -U --user pyyaml

Installing ksonnet

Set the correct version

To install on Cloud Shell or a local Linux machine, set this environment variable:

export KS_VER=0.13.1
export KS_BIN=ks_${KS_VER}_linux_amd64

To install on a Mac, set this environment variable:

export KS_BIN=ks_${KS_VER}_darwin_amd64

Install ksonnet

Download and unpack the appropriate binary, then add it to your $PATH:

wget -O /tmp/$KS_BIN.tar.gz https://github.com/ksonnet/ksonnet/releases/download/v${KS_VER}/${KS_BIN}.tar.gz

mkdir -p ${HOME}/bin
tar -xvf /tmp/${KS_BIN}.tar.gz -C ${HOME}/bin

export PATH=$PATH:${HOME}/bin/${KS_BIN}

To familiarize yourself with ksonnet concepts, see this diagram.

Install kfctl

Download and unpack kfctl, the Kubeflow command-line tool, then add it to your $PATH:

wget -P /tmp https://github.com/kubeflow/kubeflow/releases/download/v${KUBEFLOW_TAG}/kfctl_v${KUBEFLOW_TAG}_linux.tar.gz
tar -xvf /tmp/kfctl_v${KUBEFLOW_TAG}_linux.tar.gz -C ${HOME}/bin

kfctl allows you to install Kubeflow on an existing cluster or create one from scratch.

Set your GCP project ID

Store the GCP Project ID and activate the latest scopes in Kubernetes Engine:

export PROJECT_ID=<gcp_project_id>
export ZONE=us-central1-a
gcloud config set project ${PROJECT_ID}
gcloud config set compute/zone ${ZONE}

Authorize Docker

Allow Docker access to your project's Container Registry:

gcloud auth configure-docker

Create a storage bucket

Create a Cloud Storage bucket for storing your trained model. Fill in a new, unique bucket name and issue the "mb" (make bucket) command:

export BUCKET_NAME=kubeflow-${PROJECT_ID}
gsutil mb gs://${BUCKET_NAME}

To create a managed Kubernetes cluster on Kubernetes Engine using kfctl, we will walk through the following steps:

To create an application directory with local config files and enable APIs for your project, run these commands:

cd ${HOME}
export KUBEFLOW_USERNAME=codelab-user
export KUBEFLOW_PASSWORD=password
export KFAPP=kubeflow-codelab
kfctl init ${KFAPP} --platform gcp --project ${PROJECT_ID} --use_basic_auth -V

This creates the file kubeflow-codelab/app.yaml, which defines a full, default Kubeflow installation.
To generate the files used to create the deployment, including a cluster and service accounts, run these commands:

cd ${KFAPP}
kfctl generate platform -V --zone ${ZONE}

This generates several new directories, each with customized files. To use the generated files to create all the objects in your project, run this command:

kfctl apply platform -V

When kfctl has exited with the message, "KUBECONFIG context kubeflow-codelab is created," verify the connection with this command:

kubectl cluster-info

Verify that this IP address matches the IP address corresponding to the Endpoint in your Google Cloud Platform Console by comparing the Kubernetes master IP is the same as the Master_IP address in the previous step.

To add a default Kubeflow installation to the cluster you just built, first generate manifest files:

cd ${HOME}/${KFAPP}
kfctl generate k8s -V --zone ${ZONE}

Apply the generated manifests to the cluster:

kfctl apply k8s -V

When kfctl has exited with the message, "All components apply succeeded," continue below.

Add Seldon to the default installation

To add Seldon, ksonnet can help. ksonnet is a templating framework which allows you to utilize common object definitions and customize them to your environment. You'll begin by referencing Kubeflow templates and apply environment-specific parameters. Once manifests have been generated specifically for your cluster, they can be applied like any other Kubernetes object using `kubectl`.

Run the following commands to install Seldon using ksonnet:

cd ${HOME}/${KFAPP}/ks_app
ks generate seldon seldon
ks apply default -c seldon

Congratulations! Your cluster now contains a Kubeflow installation with Seldon. You can view the components by running:

kubectl get pods

You should see output similar to this:

View the Kubeflow Central Dashboard

To view the UI, open a new tab in Cloud Shell and run this command to open a port to the ambassador service:

kubectl port-forward svc/ambassador 8080:80

In Cloud Shell, click on the Web Preview button and select "Preview on port 8080."

This will open a new browser tab that shows you a login, where you can enter the username and password you provided when you created the cluster ("codelab-user", "password"). This will bring you to the Kubeflow central dashboard.

In this section, you will create a component that trains a model.

In Cloud Shell, set the component parameters

cd ${HOME}/${KFAPP}/ks_app
ks generate tf-job-simple-v1beta1 tfjob --name tfjob-issue-summarization
cp ${HOME}/examples/github_issue_summarization/ks_app/components/tfjob.jsonnet components/
ks param set tfjob gcpSecretName "user-gcp-sa"
ks param set tfjob gcpSecretFile "user-gcp-sa.json"
ks param set tfjob image "gcr.io/kubeflow-examples/tf-job-issue-summarization:v20180629-v0.1-2-g98ed4b4-dirty-182929"
ks param set tfjob input_data "gs://kubeflow-examples/github-issue-summarization-data/github_issues_sample.csv"
ks param set tfjob input_data_gcs_bucket "kubeflow-examples"
ks param set tfjob input_data_gcs_path "github-issue-summarization-data/github-issues.zip"
ks param set tfjob num_epochs "7"
ks param set tfjob output_model "/tmp/model.h5"
ks param set tfjob output_model_gcs_bucket "${BUCKET_NAME}"
ks param set tfjob output_model_gcs_path "github-issue-summarization-data"
ks param set tfjob sample_size "100000"

The training component tfjob is now configured to use a pre-built container image.

Launch training

Apply the component manifests to the cluster:

ks apply default -c tfjob

View the running job

View the resulting pods:

kubectl get pod -l tf-job-name=tfjob-issue-summarization

The training pod should look similar to this:

It can take a few minutes to pull the image and start the container. Once the "tfjob-issue-summarization-master" pod is running, tail the logs:

kubectl logs -f tfjob-issue-summarization-master-0

Inside the pod, you will see the download of source data (github-issues.zip) before training begins. Continue tailing the logs until the pod exits on its own and you find yourself back at the command prompt.

To verify that training completed successfully, check to make sure all three model files were uploaded to your Cloud Storage bucket:

gsutil ls -l gs://${BUCKET_NAME}/github-issue-summarization-data

In this section, you will create a component that serves a trained model.

Set serving image path

export SERVING_IMAGE=gcr.io/kubeflow-examples/issue-summarization-model:v20180718-g98ed4b4-codelab

Create the serving component

The serving component is configured to run a pre-built image. Using a Seldon ksonnet template, generate the serving component. Navigate back to the ksonnet app directory for Kubeflow, and issue the following commands:

cd ${HOME}/${KFAPP}/ks_app
ks generate seldon-serve-simple-v1alpha2 issue-summarization-model \
  --name=issue-summarization \
  --image=${SERVING_IMAGE} \

Launch serving

Apply the component manifests to the cluster:

ks apply default -c issue-summarization-model

View the running pods

You will see several new pods appear:

kubectl get pods -l seldon-deployment-id=issue-summarization

Once the pod is running, tail the logs for one of the serving containers to verify that it is running on port 9000:

kubectl logs \
  $(kubectl get pods \
    -lseldon-deployment-id=issue-summarization \
    -o=jsonpath='{.items[0].metadata.name}') \

In this section, you will create a component that provides browser access to the serving component.

Set parameter values

cd ${HOME}/${KFAPP}/ks_app
ks generate deployed-service ui \
  --name issue-summarization-ui \
  --image gcr.io/kubeflow-examples/issue-summarization-ui:v20180629-v0.1-2-g98ed4b4-dirty-182929

cp ${HOME}/examples/github_issue_summarization/ks_app/components/ui.jsonnet components/

ks param set ui githubToken ${GITHUB_TOKEN}
ks param set ui modelUrl "http://issue-summarization.kubeflow.svc.cluster.local:8000/api/v0.1/predictions"

The UI component is now configured to use a pre-built container image which is made available in Container Registry (gcr.io).

(Optional) Create the UI image

The UI component is now configured to use a pre-built container image which we've made available in Container Registry (gcr.io). If you would prefer to generate your own image instead, continue with this step.

Switch to the docker directory and build the image for the UI:

cd ${HOME}/examples/github_issue_summarization/docker
docker build -t gcr.io/${PROJECT_ID}/issue-summarization-ui:latest .

After the image has been successfully built, store it in Container Registry:

docker push gcr.io/${PROJECT_ID}/issue-summarization-ui:latest

Update the component parameter with a link that points to the custom image:

cd ${HOME}/${KFAPP}/ks_app
ks param set ui image gcr.io/${PROJECT_ID}/issue-summarization-ui:latest

Launch the UI

Apply the component manifests to the cluster:

ks apply default -c ui

You should see an additional pod with the status ContainerCreating:

kubectl get pods -l app=issue-summarization-ui

Wait until the pod status is Running before proceeding to the next step.

View the UI

To view the UI, navigate to the Kubeflow central dashboard. Add the text "issue-summarization/" to the end of the URL and press Enter (don't forget the trailing slash).

You should see something like this:

Click the Populate Random Issue button to fill in the large text box with a random issue summary. Then click the Generate Title button to view the machine generated title produced by your trained model.

View serving container logs

Tail the logs of one of the serving containers to verify that it is receiving a request from the UI and providing a prediction in response:

kubectl logs \
  $(kubectl get pods \
    -lseldon-deployment-id=issue-summarization \
    -o=jsonpath='{.items[0].metadata.name}') \

Press the Generate Title button in the UI a few times to view the POST request. Since there are two serving containers, you might need to try a few times before you see the log entry.

Press Ctrl-C to return to the command prompt.

Destroy the cluster

In the GCP Console, navigate to Deployment Manager. Locate the "kubeflow-codelab" deployment and delete it. This will remove all related components, such as the cluster itself and any service accounts.

Remove the deployment "kubeflow-codelab-storage" to remove all persistent state. You would use this if you wished to create a new deployment that uses existing state.

Destroy images

These snippets will remove all versions of the training, serving, and UI images that were stored in your project:

export IMAGE=gcr.io/${PROJECT_ID}/tf-job-issue-summarization
for digest in $(gcloud container images list-tags \
  ${IMAGE} --limit=999999 \
  --format='get(digest)'); do
    gcloud container images delete -q --force-delete-tags "${IMAGE}@${digest}"

export IMAGE=gcr.io/${PROJECT_ID}/issue-summarization-model
for digest in $(gcloud container images list-tags \
  ${IMAGE} --limit=999999 \
  --format='get(digest)'); do
    gcloud container images delete -q --force-delete-tags "${IMAGE}@${digest}"

export IMAGE=gcr.io/${PROJECT_ID}/issue-summarization-ui
for digest in $(gcloud container images list-tags \
  ${IMAGE} --limit=999999 \
  --format='get(digest)'); do
    gcloud container images delete -q --force-delete-tags "${IMAGE}@${digest}"

Destroy the storage bucket

gsutil rm -r gs://${BUCKET_NAME}

Remove Ksonnet

rm /tmp/${KS_VER}.tar.gz
rm -rf ${HOME}/bin/${KS_VER}

Remove sample code

rm -rf ${HOME}/examples

Remove GitHub token

Navigate to https://github.com/settings/tokens and remove the generated token.