Skip to content

S3 resource

S3DataWrapper

S3DataWrapper(
    s3_client,
    bucket_name,
    resource_path,
    name,
    usage=None,
    derived_from=None,
    inputs=None,
    type=None,
)

Bases: DataWrapper

Deprecated. Wrap columnar data and its metadata in AWS S3.

Wrappers are deprecated.

Instead, use Dataset and S3Resource.

Vectice stores metadata -- data about your dataset -- communicated with a DataWrapper. Your actual dataset is not stored by Vectice.

This DataWrapper wraps data that you have stored in AWS S3. You assign it to a step.

from vectice import DatasetType, S3DataWrapper, connect
from boto3.session import Session

s3_session = Session(  # (1)
    aws_access_key_id="...",
    aws_secret_access_key="...",
    region_name="us-east-1",
)
s3_client = s3_session.client(service_name="s3")  # (2)

my_project = connect(...)  # (3)
my_phase = my_project.phase(...)  # (4)
my_iter = my_phase.iteration()  # (5)

my_iter.step_my_data = S3DataWrapper(
    s3_client,
    bucket_name="my_bucket",
    resource_path="my_resource_path",
    name="My origin dataset name",
    type=DatasetType.ORIGIN,
)
  1. See boto3 sessions.
  2. See boto3 session client.
  3. See connection.
  4. See phases.
  5. See iterations.

Note that these three concepts are distinct, even if easily conflated:

  • Where the data is stored
  • The format at rest (in storage)
  • The format when loaded in a running Python program

Notably, the statistics collectors provided by Vectice operate only on this last and only in the event that the data is loaded as a pandas dataframe.

Parameters:

Name Type Description Default
s3_client Client

The client used to interact with Amazon s3.

required
bucket_name str

The name of the bucket to get data from.

required
resource_path str

The paths of the resources to get.

required
name str

The name of the DataWrapper (local to Vectice).

required
usage DatasetSourceUsage | None

The usage of the dataset.

None
derived_from list[int] | None

The list of dataset ids to create a new dataset from.

None
inputs list[int] | None

Deprecated. Use derived_from instead.

None
type DatasetType | None

The type of the dataset.

None
Source code in src/vectice/models/datasource/datawrapper/s3_data_wrapper.py
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
@deprecate(
    parameter="inputs",
    warn_at="23.1",
    fail_at="23.2",
    remove_at="23.3",
    reason="The 'inputs' parameter is renamed 'derived_from'. "
    "Using 'inputs' will raise an error in v{fail_at}. "
    "The parameter will be removed in v{remove_at}.",
)
def __init__(
    self,
    s3_client: Client,
    bucket_name: str,
    resource_path: str,
    name: str,
    usage: DatasetSourceUsage | None = None,
    derived_from: list[int] | None = None,
    inputs: list[int] | None = None,
    type: DatasetType | None = None,
):
    """Initialize an S3 data wrapper.

    Parameters:
        s3_client: The client used to interact with Amazon s3.
        bucket_name: The name of the bucket to get data from.
        resource_path: The paths of the resources to get.
        name: The name of the DataWrapper (local to Vectice).
        usage: The usage of the dataset.
        derived_from: The list of dataset ids to create a new dataset from.
        inputs: Deprecated. Use `derived_from` instead.
        type: The type of the dataset.
    """
    if not derived_from and inputs:
        derived_from = inputs

    self.s3_client = s3_client
    self.bucket_name = bucket_name
    self.resource_path = resource_path
    super().__init__(name=name, type=type, usage=usage, derived_from=derived_from)