User Guide
GTO lets you build an artifact registry or model registry out of your Git repository by creating annotated Git tags with a special format. To read more about building a model registry, read this get started guide.
Finding the right artifact version
You may need to get a specific artifact version to a certain
environment, most likely the latest one or the one currently assigned to the
stage. Use gto show
to find the Git reference (tag) you need.
Get the git tag for the latest version:
$ gto show churn@latest --ref
churn@v3.1.1
Get the git tag for the version in prod
stage:
$ gto show churn#prod --ref
churn@v3.0.0
GTO doesn't provide a way to deliver the artifacts, but you can use DVC or any
method to retrieve files from the repo. With DVC, you can use dvc get
:
$ dvc get $REPO $ARTIFACT_PATH --rev $REVISION -o $OUTPUT_PATH
You can also use DVC with GTO to:
- Store large artifacts (models and data) and track pointers to them in your repo.
- Keep artifact metadata like the path or type (
model
ordataset
). To see an example, check out theexample-gto
repo.
Acting on new registrations and assignments
A popular option to act on Git tags pushed in your repo is to set up CI/CD. To
see an example, check out
the workflow in example-gto
repo.
The workflow uses the GTO GH Action
that fetches all Git tags (to correctly interpret the Registry), finds out the
version
of the artifact that was registered, the stage
that was assigned,
and annotations details such as path
, type
, description
, etc, so you could
use them in the next steps of the CI. Note that it finds these annotation
details by
reading dvc.yaml
managed by DVC.
If you're working with GitLab or BitBucket, feel free to create an issue asking for a similar action, or submit yours for us to add to documentation.
Besides using CI/CD, the other option is to configure webhooks that will send HTTP requests to your server upon pushing Git tags to the remote.
Besides, you can configure your server to query your Git provider via something like REST API to check if changes happened. As an example, check out Github REST API.
CI/CD workflow examples
This workflow will build a docker image out of the model and push it to a DockerHub.
# .github/workflows/build.yaml
on:
push:
tags:
- '*'
jobs:
act:
name: Build a Docker image for new model versions
runs-on: ubuntu-latest
steps:
- name: Login to Docker Hub
uses: docker/login-action@v2
# set credentials to login to DockerHub
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
- uses: actions/checkout@v3
- id: gto
uses: iterative/gto-action@v2
- uses: actions/setup-python@v2
- name: Install dependencies
run: |
pip install --upgrade pip setuptools wheel
pip install -r requirements.txt
- if: steps.gto.outputs.event == 'registration'
run: |
# build docker image and push it to Dockerhub
...
This workflow will deploy a model to Heroku upon stage assignment:
# .github/workflows/deploy.yaml
on:
push:
tags:
- '*'
# set credentials to run deployment and save its state to s3
env:
HEROKU_API_KEY: ${{ secrets.HEROKU_API_KEY }}
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
jobs:
act:
name: Deploy a model upon stage assignment
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- id: gto
uses: iterative/gto-action@v2
- uses: actions/setup-python@v2
- name: Install dependencies
run: |
pip install --upgrade pip setuptools wheel
pip install -r requirements.txt
- if: steps.gto.outputs.event == 'assignment'
run: |
# deploy your model here
...
Listing non-GTO Git tags
Sometimes your repository contains both model management tags generated by GTO
as well as other Git tags which are created manually. If you have a lot of
models in your repository it can be helpful to filter out GTO tags and only list
manually created Git tags. You can do that using the following code snippet
which uses standard unix commands together with git
and gto
.
comm -2 -3 <(sort <<< $(git tag)) <(sort <<< $(gto history --json | jq -r '.[].ref'))
Configuring GTO
To configure GTO, use file .gto
in the root of your repo:
# .gto config file
stages: [dev, stage, prod] # list of allowed Stages
When allowed Stages are specified, GTO will check commands you run and error out
if you provided a value that doesn't exist in the config. Note, that GTO applies
the config from the workspace, so if want to apply the config from main
branch, you need to check it out first with git checkout main
.
Alternatively, you can use environment variables (note the GTO_
prefix)
$ GTO_EMOJIS=false gto show
Git tags format
You can work with GTO without knowing these conventions, since
gto
commands take care of everything for you.
All events have the standard formats of Git tags:
{artifact_name}@{version_number}#{e}
for version registration.{artifact_name}@{version_number}!#{e}
for version deregistration.{artifact_name}#{stage}#{e}
for stage assignment.{artifact_name}#{stage}!#{e}
for stage unassignment.{artifact_name}@deprecated#{e}
for artifact deprecation.
All of them share two parts:
{artifact_name}
prefix part.#{e}
counter at the end that can be omitted (in "simple" Git tag format).
Generally, #{e}
counter is used, because Git doesn't allow to create two Git
tags with the same name. If you want to have two Git tags that assign dev
stage to model
artifact without the counter (model#dev
), that will require
deleting the old Git tag first. Consequently, that doesn't allow you to preserve
history of events that happened.
By default, #{e}
sometimes is omitted, sometimes not. We are setting defaults
to omit using #{e}
when it's rarely necessary, e.g. for version registrations
and artifact deprecations.