Skip to content

Datasets

Dataset

attachments

attachments(attachments)

Attach a file or files to the dataset.

Parameters:

Name Type Description Default
attachments TAttachment

The filename or filenames of the file or set of files to attach to the dataset.

required

clean staticmethod

clean(
    resource,
    name=None,
    derived_from=None,
    properties=None,
    attachments=None,
)

Create a clean dataset.

Examples:

from vectice import Dataset, FileResource

dataset = Dataset.clean(
    name="my clean dataset",
    resource=FileResource(paths="clean_dataset.csv"),
)

Parameters:

Name Type Description Default
resource Resource

The resource for the clean dataset.

required
name str | None

The name of the dataset.

None
derived_from list[TBaseDerivedFrom | Dataset] | TBaseDerivedFrom | Dataset | None

A list of datasets versions (or ids) from which this dataset is derived.

None
properties dict[str, str | int] | list[Property] | Property | None

A dict, for example {"folds": 32}.

None
attachments str | list[str] | None

The file paths that will be attached to the iteration along with the dataset.

None

derived_from

derived_from()

The datasets versions from which this dataset is derived.

Returns:

Type Description
list[str]

The datasets versions from which this dataset is derived.

latest_version_id

latest_version_id(value)

Set the id of the latest version of this dataset.

Parameters:

Name Type Description Default
value str

The id of the latest version of this dataset.

required

modeling staticmethod

modeling(
    training_resource,
    testing_resource,
    validation_resource=None,
    name=None,
    properties=None,
    attachments=None,
    derived_from=None,
)

Create a modeling dataset.

Examples:

from vectice import Dataset, FileResource

dataset = Dataset.modeling(
    name="my modeling dataset",
    training_resource=FileResource(paths="training_dataset.csv"),
    testing_resource=FileResource(paths="testing_dataset.csv"),
    validation_resource=FileResource(paths="validation_dataset.csv"),
)

Parameters:

Name Type Description Default
training_resource Resource

The resource for the training set (for modeling datasets).

required
testing_resource Resource

The resource for the testing set (for modeling datasets).

required
validation_resource Resource | None

The resource for the validation set (optional, for modeling datasets).

None
name str | None

The name of the dataset.

None
properties dict[str, str | int] | list[Property] | Property | None

A dict, for example {"folds": 32}.

None
attachments str | list[str] | None

The file paths that will be attached to the iteration along with the dataset.

None
derived_from list[TBaseDerivedFrom | Dataset] | TBaseDerivedFrom | Dataset | None

A list of datasets versions (or ids) from which this dataset is derived.

None

name

name(name)

Set the dataset's name.

Parameters:

Name Type Description Default
name str

The name of the dataset.

required

origin staticmethod

origin(
    resource, name=None, properties=None, attachments=None
)

Create an origin dataset.

Examples:

from vectice import Dataset, FileResource

dataset = Dataset.origin(
    name="my origin dataset",
    resource=FileResource(paths="origin_dataset.csv"),
)

Parameters:

Name Type Description Default
resource Resource

The resource for the origin dataset.

required
name str | None

The name of the dataset.

None
properties dict[str, str | int] | list[Property] | Property | None

A dict, for example {"folds": 32}.

None
attachments str | list[str] | None

The file paths that will be attached to the iteration along with the dataset.

None

properties

properties(properties)

Set the dataset's properties.

Parameters:

Name Type Description Default
properties dict[str, str | int] | list[Property] | Property | None

The properties of the dataset.

required

resource

resource()

The dataset's resource.

Returns:

Type Description
Resource | tuple[Resource, Resource, Resource | None]

The dataset's resource.

type

type()

The dataset's type.

Returns:

Type Description
DatasetType

The dataset's type.