Skip to content

File resource

FileDataWrapper

FileDataWrapper(
    path,
    name,
    usage=None,
    derived_from=None,
    inputs=None,
    type=None,
)

Bases: DataWrapper

Deprecated. Wrap columnar data and its metadata in a local file.

Wrappers are deprecated.

Instead, use Dataset and FileResource.

Vectice stores metadata -- data about your dataset -- communicated with a DataWrapper. Your actual dataset is not stored by Vectice.

This DataWrapper wraps data that you have stored in a local file.

from vectice import DatasetType, FileDataWrapper, connect

my_project = connect(...)  # (1)
my_phase = my_project.phase(...)  # (2)
my_iter = my_phase.iteration()  # (3)

my_iter.step_my_data = FileDataWrapper(
    path="my/file/path",
    name="My origin dataset name",
    type=DatasetType.ORIGIN,
)
  1. See connection.
  2. See phases.
  3. See iterations.

Note that these three concepts are distinct, even if easily conflated:

  • Where the data is stored
  • The format at rest (in storage)
  • The format when loaded in a running python program

Notably, the statistics collectors provided by Vectice operate only on this last and only in the even that the data is loaded as a pandas dataframe.

Parameters:

Name Type Description Default
path str

The path of the file to wrap.

required
name str

The name of the DataWrapper (local to Vectice).

required
usage DatasetSourceUsage | None

The usage of the dataset.

None
derived_from list[int] | None

The list of dataset ids to derive this new dataset from.

None
inputs list[int] | None

Deprecated. Use derived_from instead.

None
type DatasetType | None

The type of the dataset.

None

Examples:

The following example shows how to wrap a CSV file called iris.csv in the current directory:

>>> from vectice import FileDataWrapper, DatasetSourceUsage, DatasetType
>>> iris_trainset = FileDataWrapper(
...     path="iris.csv",
...     name="training dataset",
...     type=DatasetType.MODELING,
...     usage=DatasetSourceUsage.TRAINING,
...     derived_from=[64, 128],
... )
2023/02/01 15:43:59 INFO vectice.models.datasource.datawrapper.file_data_wrapper: File: iris.csv wrapped successfully.
Source code in src/vectice/models/datasource/datawrapper/file_data_wrapper.py
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
@deprecate(
    parameter="inputs",
    warn_at="23.1",
    fail_at="23.2",
    remove_at="23.3",
    reason="The 'inputs' parameter is renamed 'derived_from'. "
    "Using 'inputs' will raise an error in v{fail_at}. "
    "The parameter will be removed in v{remove_at}.",
)
def __init__(
    self,
    path: str,
    name: str,
    usage: DatasetSourceUsage | None = None,
    derived_from: list[int] | None = None,
    inputs: list[int] | None = None,
    type: DatasetType | None = None,
):
    """Initialize a file data wrapper.

    Parameters:
        path: The path of the file to wrap.
        name: The name of the DataWrapper (local to Vectice).
        usage: The usage of the dataset.
        derived_from: The list of dataset ids to derive this new dataset from.
        inputs: Deprecated. Use `derived_from` instead.
        type: The type of the dataset.

    Examples:
        The following example shows how to wrap a CSV file
        called `iris.csv` in the current directory:

        >>> from vectice import FileDataWrapper, DatasetSourceUsage, DatasetType
        >>> iris_trainset = FileDataWrapper(
        ...     path="iris.csv",
        ...     name="training dataset",
        ...     type=DatasetType.MODELING,
        ...     usage=DatasetSourceUsage.TRAINING,
        ...     derived_from=[64, 128],
        ... )
        2023/02/01 15:43:59 INFO vectice.models.datasource.datawrapper.file_data_wrapper: File: iris.csv wrapped successfully.
    """
    if not derived_from and inputs:
        derived_from = inputs

    self.path = path
    super().__init__(name=name, type=type, usage=usage, derived_from=derived_from)
    _logger.info(f"File: {path} wrapped successfully.")