Skip to main content
Version: 1.3.12

SparkDatasource

Signature

class great_expectations.datasource.fluent.SparkDatasource(
*,
type: Literal['spark'] = 'spark',
name: str,
id: Optional[uuid.UUID] = None,
assets: List[great_expectations.datasource.fluent.spark_datasource.DataFrameAsset] = [],
spark_config: Optional[Dict[pydantic.v1.types.StrictStr,
Union[pydantic.v1.types.StrictStr,
pydantic.v1.types.StrictInt,
pydantic.v1.types.StrictFloat,
pydantic.v1.types.StrictBool]]] = None,
force_reuse_spark_context: bool = True,
persist: bool = True
)

A SparkDatasource is a Datasource that connects to a Spark cluster and provides access to Spark DataFrames.

Methods

add_dataframe_asset

Signature

add_dataframe_asset(
name: str,
batch_metadata: Optional[BatchMetadata] = None
) → DataFrameAsset

Adds a Dataframe DataAsset to this SparkDatasource object.

Parameters

NameDescription

name

The name of the DataFrame asset. This can be any arbitrary string.

dataframe

The Spark Dataframe containing the data for this DataFrame data asset.

batch_metadata

An arbitrary user defined dictionary with string keys which will get inherited by any batches created from the asset.

Returns

TypeDescription

DataFrameAsset

The DataFrameAsset that has been added to this datasource.

delete_asset

Signature

delete_asset(
name: str
)None

Removes the DataAsset referred to by asset_name from internal list of available DataAsset objects.

Parameters

NameDescription

name

name of DataAsset to be deleted.

get_asset

Signature

get_asset(
name: str
) → great_expectations.datasource.fluent.interfaces._DataAssetT

Returns the DataAsset referred to by asset_name

Parameters

NameDescription

name

name of DataAsset sought.

Returns

TypeDescription

great_expectations.datasource.fluent.interfaces._DataAssetT

if named "DataAsset" object exists; otherwise, exception is raised.