Skip to content

Metadata

metadata

Column

Column(name, data_type, stats=None, category_type=None)

Model a column of a dataset.

Parameters:

Name Type Description Default
name str

The name of the column.

required
data_type str

The type of the data contained in the column.

required
stats BooleanStat | TextStat | NumericalStat | DateStat | None

Additional statistics about the column.

None
category_type ColumnCategoryType | None

Column category type.

None

DBColumn

DBColumn(
    name,
    data_type,
    is_unique=None,
    nullable=None,
    is_private_key=None,
    is_foreign_key=None,
    stats=None,
)

Bases: Column

Model a column of a dataset, like a database column.

Parameters:

Name Type Description Default
name str

The name of the column.

required
data_type str

The type of the data contained in the column.

required
is_unique bool | None

If the column uniquely defines a record.

None
nullable bool | None

If the column can contain null value.

None
is_private_key bool | None

If the column uniquely defines a record, individually or with other columns (can be null).

None
is_foreign_key bool | None

If the column refers to another one, individually or with other columns (cannot be null).

None
stats BooleanStat | TextStat | NumericalStat | DateStat | None

Additional statistics about the column.

None

DBMetadata

DBMetadata(dbs, size, usage=None, origin=None)

Bases: Metadata

Class that describes metadata of dataset that comes from a database.

Parameters:

Name Type Description Default
dbs list[MetadataDB]

The list of databases.

required
size int | None

The size of the metadata.

required
usage DatasetSourceUsage | None

The usage of the metadata.

None
origin str | None

The origin of the metadata.

None

DatasetSourceOrigin

Bases: Enum

Enumeration that defines where datasets comes from.

BIGQUERY class-attribute instance-attribute

BIGQUERY = 'BIGQUERY'

BigQuery storage.

GCS class-attribute instance-attribute

GCS = 'GCS'

Google Cloud Storage.

LOCAL class-attribute instance-attribute

LOCAL = 'LOCAL'

Local storage.

OTHER class-attribute instance-attribute

OTHER = 'OTHER'

Other storage.

REDSHIFT class-attribute instance-attribute

REDSHIFT = 'REDSHIFT'

Redshift storage.

S3 class-attribute instance-attribute

S3 = 'S3'

S3 storage.

SNOWFLAKE class-attribute instance-attribute

SNOWFLAKE = 'SNOWFLAKE'

Snowflake storage.

DatasetSourceType

Bases: Enum

Enumeration that defines the type of datasets.

DB class-attribute instance-attribute

DB = 'DB'

DB source.

FILES class-attribute instance-attribute

FILES = 'FILES'

Files source.

DatasetSourceUsage

Bases: Enum

Enumeration that defines what datasets are used for.

TESTING class-attribute instance-attribute

TESTING = 'TESTING'

For testing datasets.

TRAINING class-attribute instance-attribute

TRAINING = 'TRAINING'

For training datasets.

VALIDATION class-attribute instance-attribute

VALIDATION = 'VALIDATION'

For validation datasets.

DatasetType

Bases: Enum

Enumeration that defines in what shape dataset ares.

CLEAN class-attribute instance-attribute

CLEAN = 'CLEAN'

Clean shape.

MODELING class-attribute instance-attribute

MODELING = 'MODELING'

Modeling shape.

ORIGIN class-attribute instance-attribute

ORIGIN = 'ORIGIN'

Raw/origin shape.

UNKNOWN class-attribute instance-attribute

UNKNOWN = 'UNKNOWN'

Unknown shape.

VALIDATION class-attribute instance-attribute

VALIDATION = 'VALIDATION'

Validation shape.

File

File(
    name,
    size=None,
    fingerprint=None,
    created_date=None,
    updated_date=None,
    uri=None,
    columns=None,
    dataframe=None,
    content_type=None,
    extra_metadata=None,
    display_name=None,
    capture_schema_only=False,
)

Bases: Source

Describe a dataset file.

Parameters:

Name Type Description Default
name str

The name of the file.

required
size int | None

The size of the file.

None
fingerprint str | None

The hash of the file.

None
created_date str | None

The date of creation of the file.

None
updated_date str | None

The date of last update of the file.

None
uri str | None

The uri of the file.

None
columns list[Column] | None

The columns coming from the dataframe with the statistics.

None
dataframe Optional

A dataframe allowing vectice to optionally compute more metadata about this resource such as columns stats, size, rows number and column numbers. (Support Pandas and Spark)

None
content_type Optional

HTTP 'Content-Type' header for this file.

None
extra_metadata Optional

Extra metadata to be captured.

None
display_name Optional

Name that will be shown in the Web App.

None
capture_schema_only Optional

A boolean parameter indicating whether to capture only the schema or both the schema and column statistics of the dataframes.

False

FilesMetadata

FilesMetadata(files, size=None, usage=None, origin=None)

Bases: Metadata

The metadata of a set of files.

Parameters:

Name Type Description Default
files list[File]

The list of files of the dataset.

required
size int | None

The size of the set of files.

None
usage DatasetSourceUsage | None

The usage of the dataset.

None
origin str | None

Where the dataset files come from.

None

Metadata

Metadata(type, size=None, usage=None, origin=None)

This class describes the metadata of a dataset.

Parameters:

Name Type Description Default
size int | None

The size of the file.

None
type DatasetSourceType

The type of file.

required
usage DatasetSourceUsage | None

The usage made of the data.

None
origin str | None

The origin of the data.

None

MetadataDB

MetadataDB(
    name,
    columns,
    rows_number=None,
    size=None,
    updated_date=None,
    created_date=None,
    uri=None,
    dataframe=None,
    extra_metadata=None,
    display_name=None,
    capture_schema_only=False,
    type=TableType.UNKNOWN,
)

Bases: Source

Parameters:

Name Type Description Default
name str

The name of the table.

required
columns list[DBColumn] | None

The columns that compose the table.

required
rows_number int | None

The number of row of the table.

None
size int | None

The size of the table.

None
updated_date str | None

The date of last update of the table.

None
created_date str | None

The creation date of the table.

None
uri str | None

The uri of the table.

None
dataframe Optional

A dataframe allowing vectice to optionally compute more metadata about this resource such as columns stats, size, rows number and column numbers. (Support Pandas and Spark)

None
extra_metadata Optional

Extra metadata to be captured.

None
display_name Optional

Name that will be shown in the Web App.

None
capture_schema_only Optional

A boolean parameter indicating whether to capture only the schema or both the schema and column statistics of the dataframes.

False
type Optional

The table type.

UNKNOWN