Skip to content

S3 resource

S3Resource

S3Resource(s3_client, bucket_name, resource_path)

Bases: Resource

Wrap columnar data and its metadata in AWS S3.

Vectice stores metadata -- data about your dataset -- communicated with a resource. Your actual dataset is not stored by Vectice.

This resource wraps data that you have stored in AWS S3. You assign it to a step.

from vectice import S3Resource
from boto3.session import Session

s3_session = Session(  # (1)
    aws_access_key_id="...",
    aws_secret_access_key="...",
    region_name="us-east-1",
)
s3_client = s3_session.client(service_name="s3")  # (2)

s3_resource = S3Resource(
    s3_client,
    bucket_name="my_bucket",
    resource_path="my_resource_path",
)
  1. See boto3 sessions.
  2. See boto3 session client.

Note that these three concepts are distinct, even if easily conflated:

  • Where the data is stored
  • The format at rest (in storage)
  • The format when loaded in a running Python program

Notably, the statistics collectors provided by Vectice operate only on this last and only in the event that the data is loaded as a pandas dataframe.

Parameters:

Name Type Description Default
s3_client Client

The client used to interact with Amazon s3.

required
bucket_name str

The name of the bucket to get data from.

required
resource_path str

The paths of the resources to get.

required
Source code in src/vectice/models/resource/s3_resource.py
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
def __init__(
    self,
    s3_client: Client,
    bucket_name: str,
    resource_path: str,
):
    """Initialize an S3 resource.

    Parameters:
        s3_client: The client used to interact with Amazon s3.
        bucket_name: The name of the bucket to get data from.
        resource_path: The paths of the resources to get.
    """
    super().__init__()
    self.s3_client = s3_client
    self.bucket_name = bucket_name
    self.resource_path = resource_path