Skip to content

Datasets

Dataset

attachments property writable

attachments

The file attachment objects or paths associated with the dataset.

List[Union[Table, str]] | None A list of attachment objects, where each attachment is one of:

  • Table A formatted table object containing structured data
  • str A string representation of an attachment (e.g., file path)
  • None If no attachments are present

derived_from property

derived_from

The datasets versions from which this dataset is derived.

Returns:

Type Description
list[str]

The datasets versions from which this dataset is derived.

latest_version_id property writable

latest_version_id

The id of the latest version of this dataset.

Returns:

Type Description
str | None

The id of the latest version of this dataset.

name property writable

name

The dataset's name.

Returns:

Type Description
str

The dataset's name.

properties property writable

properties

The dataset's properties.

Returns:

Type Description
list[Property] | None

The dataset's properties.

resource property

resource

The dataset's resource.

Returns:

Type Description
Resource | tuple[Resource, Resource, Resource | None]

The dataset's resource.

type property

type

The dataset's type.

Returns:

Type Description
DatasetType

The dataset's type.

clean staticmethod

clean(
    resource,
    name=None,
    derived_from=None,
    properties=None,
    attachments=None,
)

Create a clean dataset.

Examples:

from vectice import Dataset, FileResource

dataset = Dataset.clean(
    name="my clean dataset",
    resource=FileResource(paths="clean_dataset.csv"),
)

Parameters:

Name Type Description Default
resource Resource

The resource for the clean dataset.

required
name str | None

The name of the dataset.

None
derived_from list[TBaseDerivedFrom | Dataset] | TBaseDerivedFrom | Dataset | None

A list of datasets versions (or ids) from which this dataset is derived.

None
properties dict[str, str | int] | list[Property] | Property | None

A dict, for example {"folds": 32}.

None
attachments str | list[str] | None

The file paths that will be attached to the iteration along with the dataset.

None

modeling staticmethod

modeling(
    training_resource,
    testing_resource,
    validation_resource=None,
    name=None,
    properties=None,
    attachments=None,
    derived_from=None,
)

Create a modeling dataset.

Examples:

from vectice import Dataset, FileResource

dataset = Dataset.modeling(
    name="my modeling dataset",
    training_resource=FileResource(paths="training_dataset.csv"),
    testing_resource=FileResource(paths="testing_dataset.csv"),
    validation_resource=FileResource(paths="validation_dataset.csv"),
)

Parameters:

Name Type Description Default
training_resource Resource

The resource for the training set (for modeling datasets).

required
testing_resource Resource

The resource for the testing set (for modeling datasets).

required
validation_resource Resource | None

The resource for the validation set (optional, for modeling datasets).

None
name str | None

The name of the dataset.

None
properties dict[str, str | int] | list[Property] | Property | None

A dict, for example {"folds": 32}.

None
attachments str | list[str] | None

The file paths that will be attached to the iteration along with the dataset.

None
derived_from list[TBaseDerivedFrom | Dataset] | TBaseDerivedFrom | Dataset | None

A list of datasets versions (or ids) from which this dataset is derived.

None

origin staticmethod

origin(
    resource, name=None, properties=None, attachments=None
)

Create an origin dataset.

Examples:

from vectice import Dataset, FileResource

dataset = Dataset.origin(
    name="my origin dataset",
    resource=FileResource(paths="origin_dataset.csv"),
)

Parameters:

Name Type Description Default
resource Resource

The resource for the origin dataset.

required
name str | None

The name of the dataset.

None
properties dict[str, str | int] | list[Property] | Property | None

A dict, for example {"folds": 32}.

None
attachments str | list[str] | None

The file paths that will be attached to the iteration along with the dataset.

None