DatabricksTable resource
DatabricksTableResource ¶
DatabricksTableResource(
paths,
dataframes=None,
spark_client=None,
capture_schema_only=False,
columns_description=None,
)
Bases: Resource
Databricks tables resource reference wrapper.
This resource wraps Databricks tables paths that you have stored in Databricks with optional metadata and versioning. You pass it as an argument of your Vectice Dataset wrapper before logging it to an iteration.
from vectice import DatabricksTableResource
db_resource = DatabricksTableResource(
spark_client=spark,
paths="my_table",
)
Vectice does not store your actual dataset. Instead, it stores your dataset's metadata, which is information about your dataset. These details are captured by resources tailored to the specific environment in use, such as: local (FileResource), Bigquery, S3, GCS...
Known limitations
- Resource size won't be captured when passing a specific version. Example: my_table@v2.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
paths
|
str | list[str]
|
The paths to retrieve the tables. Should be either the table name, the location of the table or full path of the table for Spark Connect. |
required |
dataframes
|
Optional
|
The dataframes allowing vectice to optionally compute more metadata about this resource such as columns stats. (Support Pandas, Spark) |
None
|
spark_client
|
Optional
|
The spark session allowing vectice to capture the table metadata. |
None
|
capture_schema_only
|
Optional
|
A boolean parameter indicating whether to capture only the schema or both the schema and column statistics of the dataframes. |
False
|
columns_description
|
Optional
|
A dictionary or path to a csv file to map the column's name to a specific description. Dictionary should follow the format { "column_name": "Description", ... } |
None
|