GCS resource
GcsDataWrapper ¶
GcsDataWrapper(
gcs_client,
bucket_name,
resource_paths,
name,
usage=None,
derived_from=None,
inputs=None,
type=None,
)
Bases: DataWrapper
Deprecated. Wrap columnar data and its metadata in GCS.
Wrappers are deprecated.
Instead, use Dataset
and GCSResource
.
Vectice stores metadata -- data about your dataset -- communicated with a DataWrapper. Your actual dataset is not stored by Vectice.
This DataWrapper wraps data that you have stored in Google Cloud Storage. You assign it to a step.
from vectice import DatasetType, GcsDataWrapper, connect
from google.cloud.storage import Client
my_service_account_file = "MY_SERVICE_ACCOUNT_JSON_PATH" # (1)
gcs_client = Client.from_service_account_json(json_credentials_path=my_service_account_file) # (2)
my_project = connect(...) # (3)
my_phase = my_project.phase(...) # (4)
my_iter = my_phase.iteration() # (5)
my_iter.step_my_data = GcsDataWrapper(
gcs_client,
bucket_name="my_bucket",
resource_paths="my_folder/my_filename",
name="My origin dataset name",
type=DatasetType.ORIGIN,
)
- See Service account credentials.
- See GCS docs.
- See connection.
- See phases.
- See iterations.
Note that these three concepts are distinct, even if easily conflated:
- Where the data is stored
- The format at rest (in storage)
- The format when loaded in a running Python program
Notably, the statistics collectors provided by Vectice operate only on this last and only in the event that the data is loaded as a pandas dataframe.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
gcs_client |
Client
|
The |
required |
bucket_name |
str
|
The name of the bucket to get data from. |
required |
resource_paths |
str | list[str]
|
The paths of the resources to get. |
required |
name |
str
|
The name of the DataWrapper (local to Vectice). |
required |
usage |
DatasetSourceUsage | None
|
The usage of the dataset. |
None
|
derived_from |
list[int] | None
|
The list of dataset ids to create a new dataset from. |
None
|
inputs |
list[int] | None
|
Deprecated. Use |
None
|
type |
DatasetType | None
|
The type of the dataset. |
None
|
Source code in src/vectice/models/datasource/datawrapper/gcs_data_wrapper.py
69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 |
|