GCS resource
GCSResource ¶
GCSResource(gcs_client, bucket_name, resource_paths)
Bases: Resource
Wrap columnar data and its metadata in GCS.
Vectice stores metadata -- data about your dataset -- communicated with a resource. Your actual dataset is not stored by Vectice.
This resource wraps data that you have stored in Google Cloud Storage. You assign it to a step.
from vectice import GCSResource
from google.cloud.storage import Client
my_service_account_file = "MY_SERVICE_ACCOUNT_JSON_PATH" # (1)
gcs_client = Client.from_service_account_json(json_credentials_path=my_service_account_file) # (2)
gcs_resource = GCSResource(
gcs_client,
bucket_name="my_bucket",
resource_paths="my_folder/my_filename",
)
- See Service account credentials.
- See GCS docs.
Note that these three concepts are distinct, even if easily conflated:
- Where the data is stored
- The format at rest (in storage)
- The format when loaded in a running Python program
Notably, the statistics collectors provided by Vectice operate only on this last and only in the event that the data is loaded as a pandas dataframe.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
gcs_client |
Client
|
The |
required |
bucket_name |
str
|
The name of the bucket to get data from. |
required |
resource_paths |
str | list[str]
|
The paths of the resources to get. |
required |
Source code in src/vectice/models/resource/gcs_resource.py
51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 |
|