Skip to main content
Version: 1.3.3

SparkDatasource

class great_expectations.datasource.fluent.SparkDatasource(*, type: Literal['spark'] = 'spark', name: str, id: Optional[uuid.UUID] = None, assets: List[great_expectations.datasource.fluent.spark_datasource.DataFrameAsset] = [], spark_config: Optional[Dict[pydantic.v1.types.StrictStr, Union[pydantic.v1.types.StrictStr, pydantic.v1.types.StrictInt, pydantic.v1.types.StrictFloat, pydantic.v1.types.StrictBool]]] = None, force_reuse_spark_context: bool = True, persist: bool = True)#

A SparkDatasource is a Datasource that connects to a Spark cluster and provides access to Spark DataFrames.

Methods

add_dataframe_asset(name: str, batch_metadata: Optional[BatchMetadata] = None) DataFrameAsset#

Adds a Dataframe DataAsset to this SparkDatasource object.

Parameters
  • name – The name of the DataFrame asset. This can be any arbitrary string.

  • dataframe – The Spark Dataframe containing the data for this DataFrame data asset.

  • batch_metadata – An arbitrary user defined dictionary with string keys which will get inherited by any batches created from the asset.

Returns

The DataFameAsset that has been added to this datasource.

delete_asset(name: str) None#

Removes the DataAsset referred to by asset_name from internal list of available DataAsset objects.

Parameters

name – name of DataAsset to be deleted.

get_asset(name: str) great_expectations.datasource.fluent.interfaces._DataAssetT#

Returns the DataAsset referred to by asset_name

Parameters

name – name of DataAsset sought.

Returns

_DataAssetT – if named “DataAsset” object exists; otherwise, exception is raised.