Skip to content

S3 resource

S3Resource

S3Resource(
    uris,
    dataframes=None,
    s3_client=None,
    capture_schema_only=False,
    columns_description=None,
)

Bases: Resource

AWS S3resource reference wrapper.

This resource wraps AWS S3 uris references such as file folders that you have stored in AWS S3 with optional metadata and versioning. You pass it as an argument of your Vectice Dataset wrapper before logging it to an iteration.

from vectice import S3Resource

s3_resource = S3Resource(
    uris="s3://<bucket_name>/<file_path_inside_bucket>",
)

Vectice does not store your actual dataset. Instead, it stores your dataset's metadata, which is information about your dataset. These details are captured by resources tailored to the specific environment in use, such as: local (FileResource), Bigquery, S3, GCS...

Parameters:

Name Type Description Default
uris str | list[str]

The uris of the resources to get. Should follow the pattern 's3:///'

required
dataframes Optional

The dataframes allowing vectice to optionally compute more metadata about this resource such as columns stats. (Support Pandas, Spark)

None
s3_client Optional

The Amazon s3 client to optionally retrieve file size, creation date and updated date (used for auto-versioning) up to 5000 files.

None
capture_schema_only Optional

A boolean parameter indicating whether to capture only the schema or both the schema and column statistics of the dataframes.

False
columns_description Optional

A dictionary or path to a csv file to map the column's name to a specific description. Dictionary should follow the format { "column_name": "Description", ... }

None