Edit on GitHub
Pipelines
If you find yourself repeating sequence of actions to get or update the results of your project, then you may already have a pipeline. For example, a data science workflow could involve:
- Gathering data for training and validation
- Extracting useful features from the training dataset
- (Re)training an ML model
- Evaluating the results against the validation set
DVC helps you define these stages in a standard YAML format (.dvc
and
dvc.yaml
files), making your pipeline more manageable and
consistent to reproduce.
See Get Started: Data Pipelines for a hands-on introduction to this topic.