SparkDatasource
A SparkDatasource is a Datasource that connects to a Spark cluster and provides access to Spark DataFrames.
Adds a Dataframe DataAsset to this SparkDatasource object.
Parameters
Name Description name
The name of the DataFrame asset. This can be any arbitrary string.
dataframe
The Spark Dataframe containing the data for this DataFrame data asset.
batch_metadata
An arbitrary user defined dictionary with string keys which will get inherited by any batches created from the asset.
Returns
Type Description DataFrameAsset
The DataFrameAsset that has been added to this datasource.
Removes the DataAsset referred to by asset_name from internal list of available DataAsset objects.
Parameters
Name Description name
name of DataAsset to be deleted.
Returns the DataAsset referred to by asset_name
Parameters
Name Description name
name of DataAsset sought.
Returns
Type Description great_expectations.datasource.fluent.interfaces._DataAssetT
if named "DataAsset" object exists; otherwise, exception is raised.
class great_expectations.datasource.fluent.SparkDatasource(
*,
type: Literal['spark'] = 'spark',
name: str,
id: Optional[uuid.UUID] = None,
assets: List[great_expectations.datasource.fluent.spark_datasource.DataFrameAsset] = [],
spark_config: Optional[Dict[pydantic.v1.types.StrictStr,
Union[pydantic.v1.types.StrictStr,
pydantic.v1.types.StrictInt,
pydantic.v1.types.StrictFloat,
pydantic.v1.types.StrictBool]]] = None,
force_reuse_spark_context: bool = True,
persist: bool = True
)
Methods
add_dataframe_asset
add_dataframe_asset(
name: str,
batch_metadata: Optional[BatchMetadata] = None
) → DataFrameAsset
delete_asset
delete_asset(
name: str
) → None
get_asset
get_asset(
name: str
) → great_expectations.datasource.fluent.interfaces._DataAssetT