Skip to content

Metadata

metadata

Column

Column(name, data_type, stats=None)

Model a column of a dataset.

Parameters:

Name Type Description Default
name str

The name of the column.

required
data_type str

The type of the data contained in the column.

required
stats list[StatValue] | None

Additional statistics about the column.

None
Source code in src/vectice/models/resource/metadata/column_metadata.py
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
def __init__(
    self,
    name: str,
    data_type: str,
    stats: list[StatValue] | None = None,
):
    """Initialize a column.

    Parameters:
        name: The name of the column.
        data_type: The type of the data contained in the column.
        stats: Additional statistics about the column.
    """
    self.name = name
    self.data_type = data_type
    self.stats = stats

DBColumn

DBColumn(
    name,
    data_type,
    is_unique=None,
    nullable=None,
    is_private_key=None,
    is_foreign_key=None,
    stats=None,
)

Bases: Column

Model a column of a dataset, like a database column.

Parameters:

Name Type Description Default
name str

The name of the column.

required
data_type str

The type of the data contained in the column.

required
is_unique bool | None

If the column uniquely defines a record.

None
nullable bool | None

If the column can contain null value.

None
is_private_key bool | None

If the column uniquely defines a record, individually or with other columns (can be null).

None
is_foreign_key bool | None

If the column refers to another one, individually or with other columns (cannot be null).

None
stats list[StatValue] | None

Additional statistics about the column.

None
Source code in src/vectice/models/resource/metadata/column_metadata.py
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
def __init__(
    self,
    name: str,
    data_type: str,
    is_unique: bool | None = None,
    nullable: bool | None = None,
    is_private_key: bool | None = None,
    is_foreign_key: bool | None = None,
    stats: list[StatValue] | None = None,
):
    """Initialize a column.

    Parameters:
        name: The name of the column.
        data_type: The type of the data contained in the column.
        is_unique: If the column uniquely defines a record.
        nullable: If the column can contain null value.
        is_private_key: If the column uniquely defines a record,
            individually or with other columns (can be null).
        is_foreign_key: If the column refers to another one,
            individually or with other columns (cannot be null).
        stats: Additional statistics about the column.
    """
    super().__init__(name, data_type, stats)
    self.is_unique = is_unique
    self.nullable = nullable
    self.is_private_key = is_private_key
    self.is_foreign_key = is_foreign_key

DBMetadata

DBMetadata(dbs, size, usage=None, origin=None)

Bases: Metadata

Class that describes metadata of dataset that comes from a database.

Parameters:

Name Type Description Default
dbs list[MetadataDB]

The list of databases.

required
size int

The size of the metadata.

required
usage DatasetSourceUsage | None

The usage of the metadata.

None
origin str | None

The origin of the metadata.

None
Source code in src/vectice/models/resource/metadata/db_metadata.py
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
def __init__(
    self,
    dbs: list[MetadataDB],
    size: int,
    usage: DatasetSourceUsage | None = None,
    origin: str | None = None,
):
    """Initialize a DBMetadata instance.

    Parameters:
        dbs: The list of databases.
        size: The size of the metadata.
        usage: The usage of the metadata.
        origin: The origin of the metadata.
    """
    super().__init__(size=size, type=DatasetSourceType.DB, usage=usage, origin=origin)
    self.dbs = dbs

DatasetSourceOrigin

Bases: Enum

Enumeration that defines where datasets comes from.

BIGQUERY class-attribute

BIGQUERY = 'BIGQUERY'

BigQuery storage.

GCS class-attribute

GCS = 'GCS'

Google Cloud Storage.

LOCAL class-attribute

LOCAL = 'LOCAL'

Local storage.

OTHER class-attribute

OTHER = 'OTHER'

Other storage.

REDSHIFT class-attribute

REDSHIFT = 'REDSHIFT'

Redshift storage.

S3 class-attribute

S3 = 'S3'

S3 storage.

SNOWFLAKE class-attribute

SNOWFLAKE = 'SNOWFLAKE'

Snowflake storage.

DatasetSourceType

Bases: Enum

Enumeration that defines the type of datasets.

DB class-attribute

DB = 'DB'

DB source.

FILES class-attribute

FILES = 'FILES'

Files source.

DatasetSourceUsage

Bases: Enum

Enumeration that defines what datasets are used for.

TESTING class-attribute

TESTING = 'TESTING'

For testing datasets.

TRAINING class-attribute

TRAINING = 'TRAINING'

For training datasets.

VALIDATION class-attribute

VALIDATION = 'VALIDATION'

For validation datasets.

DatasetType

Bases: Enum

Enumeration that defines in what shape dataset ares.

CLEAN class-attribute

CLEAN = 'CLEAN'

Clean shape.

MODELING class-attribute

MODELING = 'MODELING'

Modeling shape.

ORIGIN class-attribute

ORIGIN = 'ORIGIN'

Raw/origin shape.

VALIDATION class-attribute

VALIDATION = 'VALIDATION'

Validation shape.

File

File(
    name,
    size,
    fingerprint,
    created_date=None,
    updated_date=None,
    uri=None,
    columns=None,
    dataframe=None,
)

Describe a dataset file.

Parameters:

Name Type Description Default
name str

The name of the file.

required
size int

The size of the file.

required
fingerprint str

The hash of the file.

required
created_date str | None

The date of creation of the file.

None
updated_date str | None

The date of last update of the file.

None
uri str | None

The uri of the file.

None
columns list[Column] | None

The columns coming from the dataframe with the statistics.

None
dataframe DataFrame | None

A pandas dataframe which will capture the files metadata.

None
Source code in src/vectice/models/resource/metadata/files_metadata.py
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
def __init__(
    self,
    name: str,
    size: int,
    fingerprint: str,
    created_date: str | None = None,
    updated_date: str | None = None,
    uri: str | None = None,
    columns: list[Column] | None = None,
    dataframe: DataFrame | None = None,
):
    """Initialize a file.

    Parameters:
        name: The name of the file.
        size: The size of the file.
        fingerprint: The hash of the file.
        created_date: The date of creation of the file.
        updated_date: The date of last update of the file.
        uri: The uri of the file.
        columns: The columns coming from the dataframe with the statistics.
        dataframe: A pandas dataframe which will capture the files metadata.
    """
    self.name = name
    self.size = size
    self.fingerprint = fingerprint
    self.created_date = created_date
    self.updated_date = updated_date
    self.uri = uri
    self.columns = columns
    self._dataframe = dataframe

FilesMetadata

FilesMetadata(files, size, usage=None, origin=None)

Bases: Metadata

The metadata of a set of files.

Parameters:

Name Type Description Default
files list[File]

The list of files of the dataset.

required
size int

The size of the set of files.

required
usage DatasetSourceUsage | None

The usage of the dataset.

None
origin str | None

Where the dataset files come from.

None
Source code in src/vectice/models/resource/metadata/files_metadata.py
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
def __init__(
    self,
    files: list[File],
    size: int,
    usage: DatasetSourceUsage | None = None,
    origin: str | None = None,
):
    """Initialize a FilesMetadata instance.

    Parameters:
        files: The list of files of the dataset.
        size: The size of the set of files.
        usage: The usage of the dataset.
        origin: Where the dataset files come from.
    """
    super().__init__(size=size, type=DatasetSourceType.FILES, origin=origin, usage=usage)
    self.files = files

Metadata

Metadata(size, type, usage=None, origin=None)

This class describes the metadata of a dataset.

Parameters:

Name Type Description Default
size int

The size of the file.

required
type DatasetSourceType

The type of file.

required
usage DatasetSourceUsage | None

The usage made of the data.

None
origin str | None

The origin of the data.

None
Source code in src/vectice/models/resource/metadata/base.py
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
def __init__(
    self,
    size: int,
    type: DatasetSourceType,
    usage: DatasetSourceUsage | None = None,
    origin: str | None = None,
):
    """Initialize a metadata instance.

    Parameters:
        size: The size of the file.
        type: The type of file.
        usage: The usage made of the data.
        origin: The origin of the data.
    """
    self.size = size
    self.type = type
    self.origin = origin
    self.usage = usage

MetadataDB

MetadataDB(name, columns, rows_number, size=None)

Parameters:

Name Type Description Default
name str

The name of the table.

required
columns list[DBColumn]

The columns that compose the table.

required
rows_number int

The number of row of the table.

required
size int | None

The size of the table.

None
Source code in src/vectice/models/resource/metadata/db_metadata.py
45
46
47
48
49
50
51
52
53
54
55
56
57
def __init__(self, name: str, columns: list[DBColumn], rows_number: int, size: int | None = None):
    """Initialize a MetadataDB instance.

    Parameters:
        name: The name of the table.
        columns: The columns that compose the table.
        rows_number: The number of row of the table.
        size: The size of the table.
    """
    self.name = name
    self.size = size
    self.rows_number = rows_number
    self.columns = columns