S3 resource
S3DataWrapper ¶
S3DataWrapper(
s3_client,
bucket_name,
resource_path,
name,
usage=None,
derived_from=None,
inputs=None,
type=None,
)
Bases: DataWrapper
Deprecated. Wrap columnar data and its metadata in AWS S3.
Wrappers are deprecated.
Instead, use Dataset
and S3Resource
.
Vectice stores metadata -- data about your dataset -- communicated with a DataWrapper. Your actual dataset is not stored by Vectice.
This DataWrapper wraps data that you have stored in AWS S3. You assign it to a step.
from vectice import DatasetType, S3DataWrapper, connect
from boto3.session import Session
s3_session = Session( # (1)
aws_access_key_id="...",
aws_secret_access_key="...",
region_name="us-east-1",
)
s3_client = s3_session.client(service_name="s3") # (2)
my_project = connect(...) # (3)
my_phase = my_project.phase(...) # (4)
my_iter = my_phase.iteration() # (5)
my_iter.step_my_data = S3DataWrapper(
s3_client,
bucket_name="my_bucket",
resource_path="my_resource_path",
name="My origin dataset name",
type=DatasetType.ORIGIN,
)
- See boto3 sessions.
- See boto3 session client.
- See connection.
- See phases.
- See iterations.
Note that these three concepts are distinct, even if easily conflated:
- Where the data is stored
- The format at rest (in storage)
- The format when loaded in a running Python program
Notably, the statistics collectors provided by Vectice operate only on this last and only in the event that the data is loaded as a pandas dataframe.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
s3_client |
Client
|
The client used to interact with Amazon s3. |
required |
bucket_name |
str
|
The name of the bucket to get data from. |
required |
resource_path |
str
|
The paths of the resources to get. |
required |
name |
str
|
The name of the DataWrapper (local to Vectice). |
required |
usage |
DatasetSourceUsage | None
|
The usage of the dataset. |
None
|
derived_from |
list[int] | None
|
The list of dataset ids to create a new dataset from. |
None
|
inputs |
list[int] | None
|
Deprecated. Use |
None
|
type |
DatasetType | None
|
The type of the dataset. |
None
|
Source code in src/vectice/models/datasource/datawrapper/s3_data_wrapper.py
68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 |
|