Edit on GitHub

.dvc Files

You can use dvc add to track data files or directories located in your current workspace*. Additionally, dvc import and dvc import-url let you bring data from external locations to your project, and start tracking it locally. See Data Versioning for more info.

Files ending with the .dvc extension ("dot DVC file") are created by these commands as data placeholders that can be versioned with Git. They contain the information needed to track the target data over time. Here's an example:

outs:
  - md5: a304afb96060aad90176268345e10355
    path: data.xml
    desc: Cats and dogs dataset
    remote: myremote

These files use the YAML 1.2 file format, and a human-friendly schema described below. We encourage you to get familiar with it so you may modify, write, or generate .dvc files on your own.

See also How to Merge Conflicts.

Specification

These are the fields that are accepted at the root level of the .dvc file schema:

FieldDescription
outs(Required) list of output entries (details below) that represent the files or directories tracked with DVC. Typically there is only one (but several can be added or combined manually).
depsList of dependency entries (details below). Only present when dvc import or dvc import-url are used to generate this .dvc file. Typically there is only one (but several can be added manually).
wdirWorking directory for the outs and deps paths (relative to the .dvc file's location). It defaults to . (the file's location).
md5(Only for imports) MD5 hash of the .dvc file itself.

Comments can be entered using the # comment format.

Output entries

The following subfields may be present under outs entries:

FieldDescription
path(Required) Path to the file or directory (relative to wdir which defaults to the file's location)
hashHash algorithm for the file or directory being tracked with DVC (only md5 is currently supported).
md5
etag
checksum
Hash value for the file or directory being tracked with DVC. MD5 is used for most locations (local file system and SSH); ETag for HTTP, S3, or Azure external outputs; and a special checksum for HDFS and WebHDFS.
version_idVersion ID native to the cloud provider. Used to track each file in the cloud if cloud versioning is enabled.
sizeSize of the file or directory (sum of all files)
nfilesIf this output is a directory, the number of files inside (recursive).
isexecWhether this is an executable file. DVC preserves execute permissions upon dvc checkout and dvc pull. This has no effect on directories, or in general on Windows.
cacheWhether or not this file or directory is cached (true by default). See the --no-commit option of dvc add.
remoteName of the remote to use for pushing/fetching
persistWhether the output file/dir should remain in place while dvc repro runs (false by default: outputs are deleted when dvc repro starts)
pushWhether or not this file or directory, when previously cached, is uploaded to remote storage by dvc push (true by default).

Dependency entries

The following subfields may be present under deps entries:

FieldDescription
path(Required) Path to the dependency (relative to wdir, which defaults to the file's location)
hashHash algorithm for the file or directory being tracked with DVC (only md5 is currently supported).
md5
etag
checksum
Only in external dependencies created with dvc import-url: Hash value of the imported file or directory. MD5 is used for local paths and SSH; ETag for HTTP, S3, GCS, and Azure; and a special checksum for HDFS and WebHDFS.
sizeSize of the file or directory (sum of all files).
nfilesIf this dependency is a directory, the number of files inside (recursive).
repoOnly in external dependencies created with dvc import: It can contain url, rev, rev_lock, config and remote (detailed below).
dbOnly in db dependencies created with dvc import-db: It can contain connection, file_format, query and table (detailed below).

Dependency repo subfields:

FieldDescription
urlURL of Git repository with source DVC project
revOnly when dvc import --rev is used: Specific commit hash, branch or tag name, etc. (a Git revision) used to import the dependency from.
rev_lockGit commit hash of the external DVC repository at the time of importing or updating the dependency (with dvc update)
configWhen dvc import --config is used: Path to a config file that will be merged with the config in the target repository. When both --remote and --remote-config are used: config options that will be merged with the config in the target repository. See examples section indvc import.
remoteOnly when dvc import --remote or --remote-config is used: name of the dvc remote to set as a default or remote config options to merge with a default remote's config in the target repository. See examples section in dvc import.

Dependency db subfields:

FieldDescription
connectionName of the connection to use. The connection has to be set in the config before use. See Database Connections.
querySQL query to snapshot. It is only set if --sql option was used on dvc import-db. dvc update will use this field to re-import.
tableName of the database table to snapshot. It is only set if --table option was used on dvc import-db. dvc update will use this field to re-import.
file_formatExport format to use. At the moment, it can be set to either csv or json.
Content

๐Ÿ› Found an issue? Let us know! Or fix it:

Edit on GitHub

โ“ Have a question? Join our chat, we will help you:

Discord Chat