Installation + Configuration script for ML-Sandbox.

Summary

This is a guide for installing the KServe stack locally using WMF tools and images. The install steps diverge from the official KServe quick_install script in order to run on WMF infrastructure. All upstream changes to YAML configs were first published in the KServe chart’s README for the deployment-charts repository. In deployment-charts/custom_deploy.d/istio/ml-serve there is the config.yaml that we apply in production.

Minikube

We are running a small cluster using Minikube, which can be installed with the following command:

curl -LO https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64
sudo install minikube-linux-amd64 /usr/local/bin/minikube

To match production, we want to make sure we set our k8s version to v.1.16.15:

# if needed
minikube stop
minikube delete

# start minikube
minikube config set memory 24576
minikube config set cpus 4
minikube start --kubernetes-version=v1.16.15

If you see an issue related to something like HOST_LOCK_JUJU, you can do the following hack:

sudo chown root:root /tmp/juju-mk*
sudo sysctl fs.protected_regular=0

You will also need to install kubectl, or you can use the one provided by minikube with an alias:

alias kubectl="minikube kubectl --"

Helm

First, install helm3 (it is in the WMF APT repo https://wikitech.wikimedia.org/wiki/APT_repository, debian buster) See: https://apt-browser.toolforge.org/buster-wikimedia/main/:

sudo apt install helm

Also ensure that it is helm3:

helm version
version.BuildInfo{Version:"v3.7.1", GitCommit:"1d11fcb5d3f3bf00dbe6fe31b8412839a96b3dc4", GitTreeState:"clean", GoVersion:"go1.16.9"}

Now download the deployment-charts repo and use the templates to create “dev” charts:

Depending on what your usecase for this setup is, it may be more practical to use the official upstream charts instead of the WMF custom ones. The latter contain some extra bits (like Calico NetworkPolicy instances) that may not be supported out of the box. The helm template approach outlined here also does not take into account any values.yaml files.

#######################################
# Create dev charts via helm template #
#######################################
git clone "https://gerrit.wikimedia.org/r/operations/deployment-charts"
cd deployment-charts
helm template "charts/knative-serving" > dev-knative-serving.yaml
helm template "charts/kserve" > dev-kserve.yaml

There will a number of references to “RELEASE_NAME” in the new yaml files, so we will need to replace it with a name like “dev”:

# replace all references to "RELEASE_NAME" to "dev"
sed -i 's/RELEASE-NAME/dev/g' dev-knative-serving.yaml
sed -i 's/RELEASE-NAME/dev/g' dev-kserve.yaml

Istio

Istio is installed using the istioctl package, which has been added to the WMF APT repository, you can use it (https://wikitech.wikimedia.org/wiki/APT_repository, debian buster). See: https://apt-browser.toolforge.org/buster-wikimedia/main/ , we want to install Istio 1.9.5 (istioctl: 1.9.5-1)

For Wikimedia servers and Cloud VPS instances, the repositories are automatically configured via Puppet. You can install it as follows

sudo apt install istioctl -y

Now we need to create the istio-system namespace:

######################
# Istio Installation #
######################
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Namespace
metadata:
  name: istio-system
  labels:
    istio-injection: disabled
EOF

Next you will need to create a file called istio-minimal-operator.yaml:

apiVersion: install.istio.io/v1beta1
kind: IstioOperator
spec:
  values:
    global:
      proxy:
        autoInject: disabled
      useMCP: false
      # The third-party-jwt is not enabled on all k8s.
      # See: https://istio.io/docs/ops/best-practices/security/#configure-third-party-service-account-tokens
      jwtPolicy: first-party-jwt

  meshConfig:
    accessLogFile: /dev/stdout

  addonComponents:
    pilot:
      enabled: true

  components:
    ingressGateways:
      - name: istio-ingressgateway
        enabled: true
      - name: cluster-local-gateway
        enabled: true
        label:
          istio: cluster-local-gateway
          app: cluster-local-gateway
        k8s:
          service:
            type: ClusterIP
            ports:
            - port: 15020
              targetPort: 15021
              name: status-port
            - port: 80
              name: http2
              targetPort: 8080
            - port: 443
              name: https
              targetPort: 8443

Next you can apply the manifest using istioctl:

/usr/bin/istioctl-1.9.5 manifest apply -f ../istio-minimal-operator.yaml -y

Knative

We are currently running Knative Serving v0.18.1.

First, let’s create a namespace for knative-serving:

########################
# Knative Installation #
########################
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Namespace
metadata:
  name: knative-serving
  labels:
    serving.knative.dev/release: "v0.18.1"
EOF

Now let’s install the Knative serving-crds.yaml. The CRDs are copied from upstream: https://github.com/knative/serving/releases/download/v0.18.1/serving-crds.yaml

We have them included in our deployment-charts repo: https://gerrit.wikimedia.org/r/plugins/gitiles/operations/deployment-charts/+/refs/heads/master/charts/knative-serving-crds/templates/crds.yaml

You can install using the following command (in the deployment-charts repo):

kubectl apply -f charts/knative-serving-crds/templates/crds.yaml

We can now apply the Knative “dev” chart that we generated using helm:

kubectl apply -f dev-knative-serving.yaml

Next we need to add registries skipping tag resolving etc.:

# update config-deployment to skip tag resolving
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: ConfigMap
metadata:
  name: config-deployment
  namespace: knative-serving
data:
  queueSidecarImage: docker-registry.wikimedia.org/knative-serving-queue:0.18.1-4
  registriesSkippingTagResolving: "kind.local,ko.local,dev.local,docker-registry.wikimedia.org,index.docker.io"
EOF

Images

Webhook: https://docker-registry.wikimedia.org/knative-serving-webhook/tags/
Queue: https://docker-registry.wikimedia.org/knative-serving-queue/tags/
Controller: https://docker-registry.wikimedia.org/knative-serving-controller/tags/
Autoscaler: https://docker-registry.wikimedia.org/knative-serving-autoscaler/tags/
Activator: https://docker-registry.wikimedia.org/knative-serving-activator/tags/
Net-istio webhook: https://docker-registry.wikimedia.org/knative-net-istio-webhook/tags/
Net-istio controller: https://docker-registry.wikimedia.org/knative-net-istio-controller/tags/

KServe

Let’s create the namespace kserve:

#######################
# KServe Installation #
#######################
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Namespace
metadata:
  labels:
    control-plane: kserve-controller-manager
    controller-tools.k8s.io: "1.0"
    istio-injection: disabled
  name: kserve
EOF

Now we can install the “dev” chart we created with helm template:

kubectl apply -f dev-kserve.yaml

This should install everything we need to run kserve, however, we still need to deal with tls certificate. We will use the self-signed-ca hack outlined in the kserve repo: https://github.com/kserve/kserve/blob/master/hack/self-signed-ca.sh

First, delete the existing secrets:

# delete  existing certs
kubectl delete secret kserve-webhook-server-cert -n kserve
kubectl delete secret kserve-webhook-server-secret -n kserve

Now copy that script and execute it:

curl -LJ0 https://raw.githubusercontent.com/kserve/kserve/master/hack/self-signed-ca.sh > self-signed-ca.sh
chmod +x self-signed-ca.sh
./self-signed-ca.sh

Verify that you now have a new webhook-server-cert:

kubectl get secrets -n kserve
NAME                         TYPE                                  DATA   AGE
default-token-ccsk4          kubernetes.io/service-account-token   3      5d1h
kserve-webhook-server-cert   Opaque                                2      30s

Lastly, let’s setup a namespace to deploy our inference services to:

kubectl create namespace kserve-test

Images

KServe agent: https://docker-registry.wikimedia.org/kserve-agent/tags/

Kserve controller: https://docker-registry.wikimedia.org/kserve-controller/tags/

KServe storage-initializer: https://docker-registry.wikimedia.org/kserve-storage-initializer/tags/

Minio

This is an optional step for using minio for model storage in your development cluster. In Production, we us Thanos Swift to store our model binaries, however, we can use something more adhoc for local dev.

This will mostly follow the document here: https://github.com/kserve/website/blob/main/docs/modelserving/kafka/kafka.md

First we create a file called minio.yaml, with the following contents:

---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: minio
  name: minio
  namespace: kserve-test
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: minio
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        app: minio
    spec:
      containers:
      - args:
        - server
        - /data
        env:
        - name: MINIO_ACCESS_KEY
          value: minio
        - name: MINIO_SECRET_KEY
          value: minio123
        image: minio/minio:RELEASE.2020-10-18T21-54-12Z
        imagePullPolicy: IfNotPresent
        name: minio
        ports:
        - containerPort: 9000
          protocol: TCP
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: minio
  name: minio-service
spec:
  ports:
    - port: 9000
      protocol: TCP
      targetPort: 9000
  selector:
    app: minio
  type: ClusterIP

Next, you can install the minio test instance to your cluster:

kubectl apply -f minio.yaml -n kserve-test

Now we need to install the Minio client (mc):

curl -LJ0 https://dl.min.io/client/mc/release/linux-amd64/mc > mc
chmod +x mc
./mc --help

Now we need to port-forward our minio test app in a different terminal window

# Run port forwarding command in a different terminal
kubectl port-forward $(kubectl get pod -n kserve-test --selector="app=minio" --output jsonpath='{.items[0].metadata.name}') 9000:9000 -n kserve-test

Now lets add our test instance and create a bucket for model storage

./mc config host add myminio http://127.0.0.1:9000 minio minio123
./mc mb myminio/wmf-ml-models

Now we need to create an s3 secret for minio and attach it to a service account.

apiVersion: v1
kind: Secret
metadata:
  name: storage-secret
  annotations:
     serving.kserve.io/s3-endpoint: minio-service.kserve-test:9000 # replace with your s3 endpoint
     serving.kserve.io/s3-usehttps: "0" # by default 1, for testing with minio you need to set to 0
     serving.kserve.io/s3-verifyssl: "0"
     serving.kserve.io/s3-region: us-east-1
type: Opaque
stringData:
  AWS_ACCESS_KEY_ID: minio
  AWS_SECRET_ACCESS_KEY: minio123
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: sa
secrets:
- name: storage-secret
---

and we can apply it as follows:

kubectl apply -f s3-secret.yaml -n kserve-test

You should be able to upload a model binary file as follows:

./mc cp model.bin myminio/wmf-ml-models/

You can use the model_upload.sh script to handle model uploads to minio. First you need to create a s3cmd config file called ~/.s3cfg:

# Setup endpoint
host_base = 127.0.0.1:9000
host_bucket = 127.0.01:9000
bucket_location = us-east-1
use_https = False

# Setup access keys
access_key =  minio
secret_key = minio123

# Enable S3 v4 signature APIs
signature_v2 = False

Now you can download the model_upload script and use in on the ml-sandbox:

curl -LJ0 https://gitlab.wikimedia.org/accraze/ml-utils/-/raw/main/model_upload.sh > model_upload.sh
chmod +x model_upload.sh
./model_upload.sh model.bin articlequality enwiki wmf-ml-models ~/.s3cfg

Finally, when you create an Inference service, you can point it at the new minio bucket (s3://wmf-ml-models), just make sure to add the serviceAccountName “sa” to the container that has a storage uri.

Example Inference Service spec:

apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  name: enwiki-goodfaith
  annotations:
    sidecar.istio.io/inject: "false"
spec:
  predictor:
    serviceAccountName: sa
    containers:
      - name: kfserving-container
        image: docker-registry.wikimedia.org/wikimedia/machinelearning-liftwing-inference-services-editquality:2021-07-28-204847-production
        env:
          # TODO: https://phabricator.wikimedia.org/T284091
          - name: STORAGE_URI
            value: "s3://wmf-ml-models/"
          - name: INFERENCE_NAME
            value: "enwiki-goodfaith"

Notes

Delete cluster

Sometimes you might need to destroy the cluster and rebuild. Here is a helpful command:

minikube delete --purge --all
minikube start --kubernetes-version=v1.16.15  --cpus 4 --memory 8192 --driver=docker --force