Kubeflow is a machine learning toolkit for Kubernetes. The project is dedicated to making deployments of machine learning (ML) workflows on Kubernetes simple, portable, and scalable. The goal is to provide a straightforward way to deploy best-of-breed open-source systems for ML to diverse infrastructures. |
A Kubeflow deployment is:
It is a means of organizing loosely-coupled microservices as a single unit and deploying them to a variety of locations, whether that's a laptop or the cloud. This codelab will walk you through creating your own Kubeflow deployment.
In this codelab, you're going to build a web app that summarizes GitHub issues using a trained model. It is based on the walkthrough provided in the Kubeflow Examples repo. Upon completion, your infrastructure will contain:
This is an advanced codelab focused on Kubeflow. For more background and an introduction to the platform, see the Introduction to Kubeflow on Kubernetes codelab. Non-relevant concepts and code blocks are glossed over and provided for you to simply copy and paste.
Choose one of the following environments for running this codelab:
This link clones the Kubeflow Examples repo and places it in the ~/examples
directory.
Once you have the project files, checkout the v0.5.1
branch, which contains the resources you will need:
cd ${HOME}/examples/github_issue_summarization export KUBEFLOW_TAG=0.5.1 git checkout v${KUBEFLOW_TAG}
In the Cloud Shell window, click on the Settings dropdown at the far right. Select Enable Boost Mode. This will provision a larger instance for your Cloud Shell session, resulting in speedier Docker builds. If you can't find this menu, ensure the main Navigation Menu is hidden by clicking the three lines at the top left of the screen, next to the Google Cloud Platform logo.
This link downloads an archive of the Kubeflow examples repo. Unpacking the downloaded zip file will produce a root folder (examples-0.5.1
) containing all of the official Kubeflow examples.
Unzip and move the folder for consistency with the absolute paths in this codelab:
cd ${HOME} export KUBEFLOW_TAG=0.5.1 unzip v${KUBEFLOW_TAG}.zip mv examples-${KUBEFLOW_TAG} ${HOME}/examples
This codelab involves the use of many different files obtained from public repos on GitHub. To prevent rate-limiting, especially at events where a large number of anonymized requests are sent to the GitHub APIs, setup an access token with no permissions. This is simply to authorize you as an individual rather than anonymous user.
export GITHUB_TOKEN=<token>
Ensure that pyyaml is installed by running:
pip install -U --user pyyaml
To install on Cloud Shell or a local Linux machine, set this environment variable:
export KS_VER=0.13.1 export KS_BIN=ks_${KS_VER}_linux_amd64
To install on a Mac, set this environment variable:
export KS_BIN=ks_${KS_VER}_darwin_amd64
Download and unpack the appropriate binary, then add it to your $PATH:
wget -O /tmp/$KS_BIN.tar.gz https://github.com/ksonnet/ksonnet/releases/download/v${KS_VER}/${KS_BIN}.tar.gz mkdir -p ${HOME}/bin tar -xvf /tmp/${KS_BIN}.tar.gz -C ${HOME}/bin export PATH=$PATH:${HOME}/bin/${KS_BIN}
To familiarize yourself with ksonnet concepts, see this diagram.
Download and unpack kfctl, the Kubeflow command-line tool, then add it to your $PATH:
wget -P /tmp https://github.com/kubeflow/kubeflow/releases/download/v${KUBEFLOW_TAG}/kfctl_v${KUBEFLOW_TAG}_linux.tar.gz tar -xvf /tmp/kfctl_v${KUBEFLOW_TAG}_linux.tar.gz -C ${HOME}/bin
kfctl
allows you to install Kubeflow on an existing cluster or create one from scratch.
Store the GCP Project ID and activate the latest scopes in Kubernetes Engine:
export PROJECT_ID=<gcp_project_id> export ZONE=us-central1-a gcloud config set project ${PROJECT_ID} gcloud config set compute/zone ${ZONE}
Allow Docker access to your project's Container Registry:
gcloud auth configure-docker
Create a Cloud Storage bucket for storing your trained model. Fill in a new, unique bucket name and issue the "mb" (make bucket) command:
export BUCKET_NAME=kubeflow-${PROJECT_ID} gsutil mb gs://${BUCKET_NAME}
To create a managed Kubernetes cluster on Kubernetes Engine using kfctl
, we will walk through the following steps:
To create an application directory with local config files and enable APIs for your project, run these commands:
cd ${HOME} export KUBEFLOW_USERNAME=codelab-user export KUBEFLOW_PASSWORD=password export KFAPP=kubeflow-codelab kfctl init ${KFAPP} --platform gcp --project ${PROJECT_ID} --use_basic_auth -V
This creates the file kubeflow-codelab/app.yaml
, which defines a full, default Kubeflow installation.
To generate the files used to create the deployment, including a cluster and service accounts, run these commands:
cd ${KFAPP} kfctl generate platform -V --zone ${ZONE}
This generates several new directories, each with customized files. To use the generated files to create all the objects in your project, run this command:
kfctl apply platform -V
When kfctl
has exited with the message, "KUBECONFIG context kubeflow-codelab is created," verify the connection with this command:
kubectl cluster-info
Verify that this IP address matches the IP address corresponding to the Endpoint in your Google Cloud Platform Console by comparing the Kubernetes master IP is the same as the Master_IP address in the previous step.
To add a default Kubeflow installation to the cluster you just built, first generate manifest files:
cd ${HOME}/${KFAPP} kfctl generate k8s -V --zone ${ZONE}
Apply the generated manifests to the cluster:
kfctl apply k8s -V
When kfctl
has exited with the message, "All components apply succeeded," continue below.
To add Seldon, ksonnet can help. ksonnet is a templating framework which allows you to utilize common object definitions and customize them to your environment. You'll begin by referencing Kubeflow templates and apply environment-specific parameters. Once manifests have been generated specifically for your cluster, they can be applied like any other Kubernetes object using `kubectl`.
Run the following commands to install Seldon using ksonnet:
cd ${HOME}/${KFAPP}/ks_app ks generate seldon seldon ks apply default -c seldon
Congratulations! Your cluster now contains a Kubeflow installation with Seldon. You can view the components by running:
kubectl get pods
You should see output similar to this:
To view the UI, open a new tab in Cloud Shell and run this command to open a port to the ambassador service:
kubectl port-forward svc/ambassador 8080:80
In Cloud Shell, click on the Web Preview button and select "Preview on port 8080."
This will open a new browser tab that shows you a login, where you can enter the username and password you provided when you created the cluster ("codelab-user", "password"). This will bring you to the Kubeflow central dashboard.
In this section, you will create a component that trains a model.
cd ${HOME}/${KFAPP}/ks_app ks generate tf-job-simple-v1beta1 tfjob --name tfjob-issue-summarization cp ${HOME}/examples/github_issue_summarization/ks_app/components/tfjob.jsonnet components/ ks param set tfjob gcpSecretName "user-gcp-sa" ks param set tfjob gcpSecretFile "user-gcp-sa.json" ks param set tfjob image "gcr.io/kubeflow-examples/tf-job-issue-summarization:v20180629-v0.1-2-g98ed4b4-dirty-182929" ks param set tfjob input_data "gs://kubeflow-examples/github-issue-summarization-data/github_issues_sample.csv" ks param set tfjob input_data_gcs_bucket "kubeflow-examples" ks param set tfjob input_data_gcs_path "github-issue-summarization-data/github-issues.zip" ks param set tfjob num_epochs "7" ks param set tfjob output_model "/tmp/model.h5" ks param set tfjob output_model_gcs_bucket "${BUCKET_NAME}" ks param set tfjob output_model_gcs_path "github-issue-summarization-data" ks param set tfjob sample_size "100000"
The training component tfjob
is now configured to use a pre-built container image.
Apply the component manifests to the cluster:
ks apply default -c tfjob
View the resulting pods:
kubectl get pod -l tf-job-name=tfjob-issue-summarization
The training pod should look similar to this:
It can take a few minutes to pull the image and start the container. Once the "tfjob-issue-summarization-master" pod is running, tail the logs:
kubectl logs -f tfjob-issue-summarization-master-0
Inside the pod, you will see the download of source data (github-issues.zip
) before training begins. Continue tailing the logs until the pod exits on its own and you find yourself back at the command prompt.
To verify that training completed successfully, check to make sure all three model files were uploaded to your Cloud Storage bucket:
gsutil ls -l gs://${BUCKET_NAME}/github-issue-summarization-data
In this section, you will create a component that serves a trained model.
export SERVING_IMAGE=gcr.io/kubeflow-examples/issue-summarization-model:v20180718-g98ed4b4-codelab
The serving component is configured to run a pre-built image. Using a Seldon ksonnet template, generate the serving component. Navigate back to the ksonnet app directory for Kubeflow, and issue the following commands:
cd ${HOME}/${KFAPP}/ks_app ks generate seldon-serve-simple-v1alpha2 issue-summarization-model \ --name=issue-summarization \ --image=${SERVING_IMAGE} \ --replicas=2
Apply the component manifests to the cluster:
ks apply default -c issue-summarization-model
You will see several new pods appear:
kubectl get pods -l seldon-deployment-id=issue-summarization
Once the pod is running, tail the logs for one of the serving containers to verify that it is running on port 9000:
kubectl logs \ $(kubectl get pods \ -lseldon-deployment-id=issue-summarization \ -o=jsonpath='{.items[0].metadata.name}') \ issue-summarization
In this section, you will create a component that provides browser access to the serving component.
cd ${HOME}/${KFAPP}/ks_app ks generate deployed-service ui \ --name issue-summarization-ui \ --image gcr.io/kubeflow-examples/issue-summarization-ui:v20180629-v0.1-2-g98ed4b4-dirty-182929 cp ${HOME}/examples/github_issue_summarization/ks_app/components/ui.jsonnet components/ ks param set ui githubToken ${GITHUB_TOKEN} ks param set ui modelUrl "http://issue-summarization.kubeflow.svc.cluster.local:8000/api/v0.1/predictions"
The UI component is now configured to use a pre-built container image which is made available in Container Registry (gcr.io).
The UI component is now configured to use a pre-built container image which we've made available in Container Registry (gcr.io). If you would prefer to generate your own image instead, continue with this step.
Switch to the docker directory and build the image for the UI:
cd ${HOME}/examples/github_issue_summarization/docker docker build -t gcr.io/${PROJECT_ID}/issue-summarization-ui:latest .
After the image has been successfully built, store it in Container Registry:
docker push gcr.io/${PROJECT_ID}/issue-summarization-ui:latest
Update the component parameter with a link that points to the custom image:
cd ${HOME}/${KFAPP}/ks_app ks param set ui image gcr.io/${PROJECT_ID}/issue-summarization-ui:latest
Apply the component manifests to the cluster:
ks apply default -c ui
You should see an additional pod with the status ContainerCreating:
kubectl get pods -l app=issue-summarization-ui
Wait until the pod status is Running before proceeding to the next step.
To view the UI, navigate to the Kubeflow central dashboard. Add the text "issue-summarization/" to the end of the URL and press Enter (don't forget the trailing slash).
You should see something like this:
Click the Populate Random Issue button to fill in the large text box with a random issue summary. Then click the Generate Title button to view the machine generated title produced by your trained model.
Tail the logs of one of the serving containers to verify that it is receiving a request from the UI and providing a prediction in response:
kubectl logs \ $(kubectl get pods \ -lseldon-deployment-id=issue-summarization \ -o=jsonpath='{.items[0].metadata.name}') \ issue-summarization
Press the Generate Title button in the UI a few times to view the POST request. Since there are two serving containers, you might need to try a few times before you see the log entry.
Press Ctrl-C to return to the command prompt.
In the GCP Console, navigate to Deployment Manager. Locate the "kubeflow-codelab
" deployment and delete it. This will remove all related components, such as the cluster itself and any service accounts.
Remove the deployment "kubeflow-codelab-storage
" to remove all persistent state. You would use this if you wished to create a new deployment that uses existing state.
These snippets will remove all versions of the training, serving, and UI images that were stored in your project:
export IMAGE=gcr.io/${PROJECT_ID}/tf-job-issue-summarization for digest in $(gcloud container images list-tags \ ${IMAGE} --limit=999999 \ --format='get(digest)'); do gcloud container images delete -q --force-delete-tags "${IMAGE}@${digest}" done export IMAGE=gcr.io/${PROJECT_ID}/issue-summarization-model for digest in $(gcloud container images list-tags \ ${IMAGE} --limit=999999 \ --format='get(digest)'); do gcloud container images delete -q --force-delete-tags "${IMAGE}@${digest}" done export IMAGE=gcr.io/${PROJECT_ID}/issue-summarization-ui for digest in $(gcloud container images list-tags \ ${IMAGE} --limit=999999 \ --format='get(digest)'); do gcloud container images delete -q --force-delete-tags "${IMAGE}@${digest}" done
gsutil rm -r gs://${BUCKET_NAME}
rm /tmp/${KS_VER}.tar.gz rm -rf ${HOME}/bin/${KS_VER}
rm -rf ${HOME}/examples
Navigate to https://github.com/settings/tokens and remove the generated token.