File resource
FileResource ¶
FileResource(
paths,
dataframes=None,
capture_schema_only=False,
columns_description=None,
)
Bases: Resource
Wrap columnar data and its metadata in a local file.
This resource wraps data that you have stored in a local file with optional metadata and versioning. You pass it as an argument of your Vectice Dataset wrapper before logging it to an iteration.
from vectice import FileResource
resource = FileResource(paths="my/file/path")
Vectice does not store your actual dataset. Instead, it stores your dataset's metadata, which is information about your dataset. These details are captured by resources tailored to the specific environment in use, such as: local (FileResource), Bigquery, S3, GCS...
Parameters:
Name | Type | Description | Default |
---|---|---|---|
paths
|
str | list[str]
|
The paths of the files to wrap. |
required |
dataframes
|
Optional
|
The dataframes allowing vectice to optionally compute more metadata about this resource such as columns stats. (Support Pandas, Spark) |
None
|
capture_schema_only
|
Optional
|
A boolean parameter indicating whether to capture only the schema or both the schema and column statistics of the dataframes. |
False
|
columns_description
|
Optional
|
A dictionary or path to a csv file to map the column's name to a specific description. Dictionary should follow the format { "column_name": "Description", ... } |
None
|
Examples:
The following example shows how to wrap a CSV file
called iris.csv
in the current directory:
from vectice import FileResource
iris_trainset = FileResource(paths="iris.csv")