DataFrame resource
DFResource ¶
DFResource(
dataframes,
capture_schema_only=False,
columns_description=None,
)
Bases: Resource
Wrap in-memory DataFrames into a Dataset Resource for metadata extraction.
This resource is intended for DataFrames that exist only in memory (e.g., Pandas, Spark, or H2O) and are not associated with external file paths or storage locations. It enables Vectice to extract schema and optional column-level statistics from these DataFrames, which can be logged as part of a Dataset version.
Unlike other resource types (e.g., FileResource, S3Resource, BigQueryResource), DFResource does not carry path or source metadata—its sole purpose is to wrap raw DataFrames.
You typically use it when your data is generated on-the-fly, transformed in-memory, or does not have an accessible source path.
from vectice import DFResource, Dataset
my_df_resource = DFResource(dataframes=df)
Dataset.clean(name = 'my_dataset', resource = my_df_resource)
Vectice does not store your actual dataset. Instead, it stores your dataset's metadata, which is information about your dataset. These details are captured by resources tailored to the specific environment in use, such as: local (FileResource), Bigquery, S3, GCS...
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataframes
|
TDataFrameType | list[TDataFrameType]
|
The dataframes allowing vectice to compute metadata about this resource such as columns schema and optionally statistics. (Support Pandas, Spark, H2O) |
required |
capture_schema_only
|
Optional
|
A boolean parameter indicating whether to capture only the schema or both the schema and column statistics of the dataframes. |
False
|
columns_description
|
Optional
|
A dictionary or path to a csv file to map the column's name to a specific description. Dictionary should follow the format { "column_name": "Description", ... } |
None
|