Base resource
Resource ¶
Resource(
paths,
dataframes=None,
capture_schema_only=False,
columns_description=None,
)
Base class for resources.
Use Resource subclasses to assign datasets to steps. The
Vectice library supports a handful of common cases. Additional
cases are generally easy to supply by deriving from this base
class. In particular, subclasses must override this class'
abstact methods (_build_metadata()
, _fetch_data()
).
Examples:
To create a custom resource class, inherit from Resource
,
and implement the _build_metadata()
and _fetch_data()
methods:
from vectice import Resource, DatasetSourceOrigin, FilesMetadata
class MyResource(Resource):
_origin = "Data source name"
def __init__(
self,
paths: str | list[str],
):
super().__init__(paths=paths)
def _build_metadata(self) -> FilesMetadata: # (1)
files = ... # fetch file list from your custom storage
total_size = ... # compute total file size, retrieve them from self._paths
return FilesMetadata(
size=total_size,
origin=self._origin,
files=files,
usage=self.usage,
)
def _fetch_data(self) -> dict[str, bytes]:
files_data = {}
for file in self.metadata.files:
file_contents = ... # fetch file contents from your custom storage
files_data[file.name] = file_contents
return files_data
- Return FilesMetadata for data stored in files, DBMetadata for data stored in a database.
metadata ¶
metadata(value)
Set the resource's metadata.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
value
|
Metadata
|
The metadata to set. |
required |
usage ¶
usage(value)
Set the resource's usage.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
value
|
DatasetSourceUsage
|
The usage to set. |
required |