data status
Show changes to the files and directories tracked by DVC in the workspace.
For the status of data pipelines, see dvc status
.
Synopsis
usage: dvc data status [-h] [-q | -v]
[--granular] [--unchanged]
[--untracked-files [{no,all}]]
[--json]
[--not-in-remote] [--no-remote-refresh]
Description
The data status
command displays the state of the workspace and
the changes with respect to the last Git commit (HEAD
). It shows you what new
changes have been committed to DVC, which haven't been committed, which files
aren't being tracked by DVC and Git, and what files are missing from the
cache.
data status
can be used as a companion to git status
. When used together,
this pair of commands shows the status of all paths in a repository.
The dvc data status
command only outputs information, it won't modify or
change anything in your workspace. It's a good practice to check
the state of your repository before doing dvc commit
or git commit
so that
you don't accidentally commit something you don't mean to.
An example output might look something like follows:
$ dvc data status
Not in cache:
(use "dvc fetch <file>..." to download files)
data/data.xml
DVC committed changes:
(git commit the corresponding dvc files to update the repo)
modified: data/features/
DVC uncommitted changes:
(use "dvc commit <file>..." to track changes)
(use "dvc checkout <file>..." to discard changes)
deleted: model.pkl
(there are other changes not tracked by dvc, use "git status" to see)
dvc data status
displays changes in multiple categories:
-
Not in cache indicates that there are file hashes in
.dvc
ordvc.lock
files, but the corresponding cache files are missing. This may happen after cloning a DVC repository but before usingdvc pull
(ordvc fetch
) to download the data; or after usingdvc gc
. -
Not in remote indicates that there are file hashes in
.dvc
ordvc.lock
files, but the corresponding remote files are missing. This may happen after adding new files into dvc but before usingdvc push
to upload the data; or after usingdvc gc -c
. -
DVC committed changes are new, modified, or deleted tracked files or directories that have been committed to DVC. These may be ready for committing to Git.
-
DVC uncommitted changes are new, modified, or deleted tracked files or directories that have not been committed to DVC yet. You can
dvc add
ordvc commit
these. -
Untracked files have not been added to DVC (nor Git). Only shown if the
--untracked-files
flag is used. -
Unchanged files have no modifications. Only shown if the
--unchanged
flag is used.
Individual changes to files inside tracked directories are not shown by default
but this can be enabled with the --granular
flag.
Options
-
--granular
- show granular file-level changes inside DVC-tracked directories. Note that some granular changes may be reported asunknown
as DVC tracks directory-level hash values. -
--untracked-files
- show files that are not being tracked by DVC and Git. -
--unchanged
- show unchanged DVC-tracked files. -
--not-in-remote
- show files that are missing from the remote. -
--no-remote-refresh
- use cached remote index (don't check remote). Only has an effect along with--not-in-remote
. -
--json
- prints the command's output in easily parsable JSON format, instead of a human-readable output. -
-h
,--help
- prints the usage/help message, and exit. -
-q
,--quiet
- do not write anything to standard output. -
-v
,--verbose
- displays detailed tracing information.
Examples
$ dvc data status
Not in cache:
(use "dvc fetch <file>..." to download files)
data/data.xml
DVC committed changes:
(git commit the corresponding dvc files to update the repo)
modified: data/features/
DVC uncommitted changes:
(use "dvc commit <file>..." to track changes)
(use "dvc checkout <file>..." to discard changes)
deleted: model.pkl
(there are other changes not tracked by dvc, use "git status" to see)
This shows that the data/data.xml
is missing from the cache, data/features/
a directory, has changes that are being tracked by DVC but is not Git committed
yet, and a file model.pkl
has been deleted from the workspace.
The data/features/
directory is modified, but there is no further details to
what changed inside. The --granular
option can provide more information on
that.
Example: Full repository status (including Git)
$ dvc data status
Not in cache:
(use "dvc fetch <file>..." to download files)
data/data.xml
DVC committed changes:
(git commit the corresponding dvc files to update the repo)
modified: data/features/
DVC uncommitted changes:
(use "dvc commit <file>..." to track changes)
(use "dvc checkout <file>..." to discard changes)
deleted: model.pkl
(there are other changes not tracked by dvc, use "git status" to see)
$ git status
On branch main
Your branch is up to date with 'origin/main'.
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: dvc.lock
no changes added to commit (use "git add" and/or "git commit -a")
dvc data status
and git status
combined show you the full status of a
repository. dvc data status
shows you changes to DVC data, and git status
shows changes to the corresponding dvc.lock
or .dvc
files (as well as
unrelated changes to your Git repository).
Example: Granular output
Following on from the above example, using --granular
will show file-level
information for the changes:
$ dvc data status --granular
Not in cache:
(use "dvc fetch <file>..." to download files)
data/data.xml
DVC committed changes:
(git commit the corresponding dvc files to update the repo)
added: data/features/foo
DVC uncommitted changes:
(use "dvc commit <file>..." to track changes)
(use "dvc checkout <file>..." to discard changes)
deleted: model.pkl
(there are other changes not tracked by dvc, use "git status" to see)
Now there's more information in DVC committed changes regarding the changes in
data/features
. From the output, it shows that there is a new file added to
data/features
: data/features/foo
.
Example: Remote status
$ dvc data status --not-in-remote
Not in cache:
(use "dvc fetch <file>..." to download files)
data/data.xml
Not in remote:
(use "dvc push <file>..." to upload files)
data/data.xml
DVC committed changes:
(git commit the corresponding dvc files to update the repo)
modified: data/features/
DVC uncommitted changes:
(use "dvc commit <file>..." to track changes)
(use "dvc checkout <file>..." to discard changes)
deleted: model.pkl
(there are other changes not tracked by dvc, use "git status" to see)