dvc.api.read()
Returns the contents of a tracked file.
This is similar to the dvc get
command in our CLI.
def read(path: str,
repo: str = None,
rev: str = None,
remote: str = None,
remote_config: dict = None,
config: dict = None,
mode: str = "r",
encoding: str = None)
Usage
import dvc.api
modelpkl = dvc.api.read(
'model.pkl',
repo='https://github.com/iterative/example-get-started',
mode='rb'
)
Description
This function wraps dvc.api.open()
, for a simple way to return the complete
contents of a file tracked in a DVC project. The file can be
tracked by DVC (as an output) or by Git.
The returned contents can be a string or a bytearray. These are loaded to memory directly (without using any disc space).
The type returned depends on the mode
used. For more details, please refer to
Python's open()
built-in, which is used under the hood.
Parameters
-
path
(required) - location and file name of the target to read, relative to the root of the project (repo
). -
repo
- specifies the location of the DVC project. It can be a URL or a file system path. Both HTTP and SSH protocols are supported for online Git repos (e.g.[user@]server:project.git
). Default: The current project is used (the current working directory tree is walked up to find it). -
rev
- Git commit (any revision such as a branch or tag name, commit hash, or experiment name). Ifrepo
is not a Git repo, this option is ignored. Default:None
(current working tree will be used) -
remote
- name of the DVC remote to look for the target data. Default: The default remote ofrepo
is used if aremote
argument is not given. For local projects, the cache is tried before the default remote. -
remote_config
- dictionary of options to pass to the DVC remote. This can be used to, for example, provide credentials to theremote
. -
config
- config dictionary to pass to the DVC project. This is merged with the existing project config and can be used to, for example, add an entirely newremote
. -
mode
- specifies the mode in which the file is opened. Defaults to"r"
(read). Mirrors the namesake parameter in builtinopen()
. -
encoding
- codec used to decode the file contents to a string. This should only be used in text mode. Defaults to"utf-8"
. Mirrors the namesake parameter in builtinopen()
.
Exceptions
-
dvc.exceptions.FileMissingError
- file inpath
is missing fromrepo
. -
dvc.exceptions.PathMissingError
-path
cannot be found inrepo
. -
dvc.exceptions.NoRemoteError
- noremote
is found.
Example: Load data from a DVC repository
Any file tracked in a DVC project (and stored remotely) can be loaded directly in your Python code with this API. For example, let's say that you want to load and unserialize a binary model from a repo on GitHub:
import pickle
import dvc.api
data = dvc.api.read(
'model.pkl',
repo='https://github.com/iterative/example-get-started'
mode='rb'
)
model = pickle.loads(data)
We're using
'rb'
mode here for compatibility withpickle.loads()
.