Machine Learning/LiftWing/ML-Sandbox/Usage-Examples

Examples on how to use the WMF ML-Sandbox environment to develop and test your isvc code.

Example 1: Work With and Test The model.py File

When moving ORES models to LiftWing, model.py relies on revscoring to load the model, extract features, handle errors and return a prediction.

To edit and test the model.py file we need an environment where all revscoring dependencies have been installed.

For example, if we want to work with the editquality model.py, one of the easiest ways would be to use the editquality docker image found on the Wikimedia docker registry.

The steps below are what I would take to work with the editquality model.py:

Log into the ML sandbox

$ ssh ml-sandbox.machine-learning.eqiad1.wikimedia.cloud

Pull the editquality docker image

$ docker pull docker-registry.wikimedia.org/wikimedia/machinelearning-liftwing-inference-services-editquality:stable

Run the docker container

$ docker run -it --entrypoint=/bin/bash docker-registry.wikimedia.org/wikimedia/machinelearning-liftwing-inference-services-editquality:stable

Edit the model.py and save your new changes.

I usually use vim for this but feel free to use a text/code editor of your choice.

Test/Run model.py
```
# python3 model.py
```

Hacks I've found helpful:

I usually download the model binary and point to it locally within the load() method of model.py

Download model binary

# apt-get install wget -y && wget -O enwiki.damaging.gradient_boosting.model https://github.com/wikimedia/editquality/blob/master/models/enwiki.damaging.gradient_boosting.model?raw=true

Point to downloaded model in model.py

with open("enwiki.damaging.gradient_boosting.model") as f:

Our LiftWing images on the Wikimedia docker registry are created using blubber. Blubber adds air-tight security restrictions like not being able to easily download the model binary or install software tools from the internet like text/code editors. Although these restrictions are good for the production environment, I find that they are a challenge in the dev environment since I like to have full autonomy and be able to tinker and test my ideas to make sure they work the best way possible. So to avoid the restrictions, on steps 2 & 3 above, I pull a docker image that I created without the blubber restrictions.
- Pull and run editquality docker image that doesn't have blubber restrictions
```
$ docker pull kevinbazira/kfserving-revscoring-model:v15

$ docker run -it --entrypoint=/bin/bash kevinbazira/kfserving-revscoring-model:v15
```
- Steps 1, 4, and 5 remain the same regardless of the image I am using.