How to Run DVC on Windows
Different issues can arise when running DVC on Microsoft Windows, mainly involving system performance. Some, for example, have to do with NTFS file system characteristics and Windows built-in security mechanisms. Below are some workarounds that can help avoid these potential problems:
Did you know that DVC is available for Microsoft Visual Studio Code? More details here!
POSIX-like command line experience
The regular Command Prompt (cmd
) in Windows will most likely not help you use
DVC effectively, nor help you follow the examples in our docs. Here are some
alternatives:
- WSL 2 with
Windows Terminal supports the
most CLI features (e.g.
\
line continuation). It has good performance (and can even access GPUs). - The full Cmder console emulator is another good option. It combines several useful tools like ConEmu terminal and Git for Windows (Git Bash), among other shells.
- Anaconda Prompt is also popular.
- Install an actual Linux distro (e.g. Ubuntu) on a virtual machine, or in an HD partition (dual boot).
Line endings
If you are using Windows and you are working with people or environments (e.g. production) that are not, you’ll probably run into line-ending issues at some point. This is because Windows uses both a carriage-return character and a linefeed character for newlines in its files (CRLF), whereas macOS and Linux systems use only the linefeed character (LF).
Since DVC is using content-based checksums for your pipeline dependencies,
depending on your Git configuration (see
core.autocrlf and core.eol config options),
DVC might see Git-tracked files as changed, thus triggering pipeline
reproduction on dvc repro
on one system and not on another. Thus we strongly
recommend sticking with LF line endings when doing cross-platform work.
Configure your editor to use LF line endings
Many editors on Windows will use CRLF line endings by default or even replace existing LF with CRLF. It is recommended that you configure your editor to always stick to LF line endings.
For VS Code, add
{
"files.eol": "\n"
}
to your global settings.json
or to your project's .vscode/settings.json
.
Set up LF Line Endings with .gitattributes
To enhance DVC compatibility on Windows, it is advisable to employ a
.gitattributes
file with the eol attribute to configure line endings.
Add the following line to your .gitattributes
:
* text=auto eol=lf
This configuration tells Git to treat all files as text files and use LF line endings regardless of the platform.
Configure Git for LF Line Endings
Set core.autocrlf
to false
and core.eol
to lf
$ git config --global core.autocrlf false
$ git config --global core.eol lf
Now Git will handle line endings consistently.
Use pre-commit hook to check and fix line endings
Add this to your .pre-commit-config.yaml
hooks:
- id: mixed-line-ending
args: [--fix=lf]
to make pre-commit
check and automatically replace all line endings with LF.
Enable symbolic links
Symlinks are one of the possible file link types that DVC can use for
optimization
purposes. They're available on Windows, but the Create symbolic links user
privilege is needed. It's granted to the Administrators group by default, so
running dvc
in an admin terminal is a good option for occasional usage. For
regular users, it can be granted using the Local policy settings.
This is done automatically by DVC's Windows installer,
but you may want to
do it manually
after any other installation method (choco
, conda
, pip
).
Whitelist in Windows Security
Windows 10 includes the Windows Security antivirus. If user wants to avoid antivirus scans on specific folders or files to improve the performance, then whitelist them in Windows Security as per this guide. For example, we can whitelist DVC binary files on Windows to speed up the processes.
Enable long folder/file paths
DVC commands (e.g. dvc pull
, dvc repro
) may fail when the folder path is
longer than 260 characters. This may happen with the error
[Errno 2] No such file or directory
. Starting in Windows 10, path length
limitations have been removed from common file and directory functions. However,
you must opt-in to the new behavior. The user can explicitly enable long paths
either by editing Group Policy or by editing registry keys following
this
guide.
Fix or disable Search Indexing
Search Indexing can also slow down file I/O operations on Windows. Try fixing or disabling this feature if you don't need it.
Disable short-file name generation
With NTFS, users may want to disable 8dot3
as per
this article
to disable the short-file name generation. It is important to do so for better
performance when the user has over 300K files in a single directory.
Avoid directories with large number of files
The performance of NTFS degrades while handling large volumes of files in a directory, as explained in this issue.
Enabling paging with less
By default, DVC tries to use Less
as pager for the output of dvc dag
. Windows doesn't have the less
command
available however. Fortunately, there is a easy way of installing it via
Chocolatey. After installing Chocolatey, run:
$ choco install less
less
can be installed in other ways, just make sure it's available in the
command line environment where you run dvc
. (This usually means adding the
directory where less
is installed to the PATH
environment variable.)