Skip to main content
Version: 1.3.7

SparkDatasource

Signature

class great_expectations.datasource.fluent.SparkDatasource(*, type: Literal['spark'] = 'spark', name: str, id: Optional[uuid.UUID] = None, assets: List[great_expectations.datasource.fluent.spark_datasource.DataFrameAsset] = [], spark_config: Optional[Dict[pydantic.v1.types.StrictStr, Union[pydantic.v1.types.StrictStr, pydantic.v1.types.StrictInt, pydantic.v1.types.StrictFloat, pydantic.v1.types.StrictBool]]] = None, force_reuse_spark_context: bool = True, persist: bool = True)

A SparkDatasource is a Datasource that connects to a Spark cluster and provides access to Spark DataFrames.

Methods

Signature

add_dataframe_asset(name: str, batch_metadata: Optional[BatchMetadata] = None) → DataFrameAsset

Adds a Dataframe DataAsset to this SparkDatasource object.

Parameters

NameDescription

name

The name of the DataFrame asset. This can be any arbitrary string.

Returns

TypeDescription

DataFrameAsset

The DataFrameAsset that has been added to this datasource.

Signature

delete_asset(name: str)None

Removes the DataAsset referred to by asset_name from internal list of available DataAsset objects.

Parameters

NameDescription

name

name of DataAsset to be deleted.

Signature

get_asset(name: str) → great_expectations.datasource.fluent.interfaces._DataAssetT

Returns the DataAsset referred to by asset_name

Parameters

NameDescription

name

name of DataAsset sought.

Returns

TypeDescription

great_expectations.datasource.fluent.interfaces._DataAssetT

if named "DataAsset" object exists; otherwise, exception is raised.