Edit on GitHub

User Guide

GTO lets you build an artifact registry or model registry out of your Git repository by creating annotated Git tags with a special format. To read more about building a model registry, read this get started guide.

Finding the right artifact version

You may need to get a specific artifact version to a certain environment, most likely the latest one or the one currently assigned to the stage. Use gto show to find the Git reference (tag) you need.

Get the git tag for the latest version:

$ gto show churn@latest --ref
churn@v3.1.1

Get the git tag for the version in prod stage:

$ gto show churn#prod --ref
churn@v3.0.0

GTO doesn't provide a way to deliver the artifacts, but you can use DVC or any method to retrieve files from the repo. With DVC, you can use dvc get:

$ dvc get $REPO $ARTIFACT_PATH --rev $REVISION -o $OUTPUT_PATH

You can also use DVC with GTO to:

Acting on new registrations and assignments

A popular option to act on Git tags pushed in your repo is to set up CI/CD. To see an example, check out the workflow in example-gto repo. The workflow uses the GTO GH Action that fetches all Git tags (to correctly interpret the Registry), finds out the version of the artifact that was registered, the stage that was assigned, and annotations details such as path, type, description, etc, so you could use them in the next steps of the CI. Note that it finds these annotation details by reading dvc.yaml managed by DVC.

If you're working with GitLab or BitBucket, feel free to create an issue asking for a similar action, or submit yours for us to add to documentation.

Besides using CI/CD, the other option is to configure webhooks that will send HTTP requests to your server upon pushing Git tags to the remote.

Besides, you can configure your server to query your Git provider via something like REST API to check if changes happened. As an example, check out Github REST API.

CI/CD workflow examples

This workflow will build a docker image out of the model and push it to a DockerHub.

# .github/workflows/build.yaml
on:
  push:
    tags:
      - '*'

jobs:
  act:
    name: Build a Docker image for new model versions
    runs-on: ubuntu-latest
    steps:
      - name: Login to Docker Hub
        uses: docker/login-action@v2
        # set credentials to login to DockerHub
        with:
          username: ${{ secrets.DOCKERHUB_USERNAME }}
          password: ${{ secrets.DOCKERHUB_TOKEN }}
      - uses: actions/checkout@v3
      - id: gto
        uses: iterative/gto-action@v2
      - uses: actions/setup-python@v2
      - name: Install dependencies
        run: |
          pip install --upgrade pip setuptools wheel
          pip install -r requirements.txt
      - if: steps.gto.outputs.event == 'registration'
        run: |
          # build docker image and push it to Dockerhub
          ...

This workflow will deploy a model to Heroku upon stage assignment:

# .github/workflows/deploy.yaml
on:
  push:
    tags:
      - '*'

# set credentials to run deployment and save its state to s3
env:
  HEROKU_API_KEY: ${{ secrets.HEROKU_API_KEY }}
  AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
  AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}

jobs:
  act:
    name: Deploy a model upon stage assignment
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - id: gto
        uses: iterative/gto-action@v2
      - uses: actions/setup-python@v2
      - name: Install dependencies
        run: |
          pip install --upgrade pip setuptools wheel
          pip install -r requirements.txt
      - if: steps.gto.outputs.event == 'assignment'
        run: |
          # deploy your model here
          ...

Listing non-GTO Git tags

Sometimes your repository contains both model management tags generated by GTO as well as other Git tags which are created manually. If you have a lot of models in your repository it can be helpful to filter out GTO tags and only list manually created Git tags. You can do that using the following code snippet which uses standard unix commands together with git and gto.

comm -2 -3 <(sort <<< $(git tag)) <(sort <<< $(gto history --json | jq -r '.[].ref'))

Configuring GTO

To configure GTO, use file .gto in the root of your repo:

# .gto config file
stages: [dev, stage, prod] # list of allowed Stages

When allowed Stages are specified, GTO will check commands you run and error out if you provided a value that doesn't exist in the config. Note, that GTO applies the config from the workspace, so if want to apply the config from main branch, you need to check it out first with git checkout main.

Alternatively, you can use environment variables (note the GTO_ prefix)

$ GTO_EMOJIS=false gto show

Git tags format

You can work with GTO without knowing these conventions, since gto commands take care of everything for you.

All events have the standard formats of Git tags:

  • {artifact_name}@{version_number}#{e} for version registration.
  • {artifact_name}@{version_number}!#{e} for version deregistration.
  • {artifact_name}#{stage}#{e} for stage assignment.
  • {artifact_name}#{stage}!#{e} for stage unassignment.
  • {artifact_name}@deprecated#{e} for artifact deprecation.

All of them share two parts:

  1. {artifact_name} prefix part.
  2. #{e} counter at the end that can be omitted (in "simple" Git tag format).

Generally, #{e} counter is used, because Git doesn't allow to create two Git tags with the same name. If you want to have two Git tags that assign dev stage to model artifact without the counter (model#dev), that will require deleting the old Git tag first. Consequently, that doesn't allow you to preserve history of events that happened.

By default, #{e} sometimes is omitted, sometimes not. We are setting defaults to omit using #{e} when it's rarely necessary, e.g. for version registrations and artifact deprecations.

Content

๐Ÿ› Found an issue? Let us know! Or fix it:

Edit on GitHub

โ“ Have a question? Join our chat, we will help you:

Discord Chat