SparkDatasource
class great_expectations.datasource.fluent.SparkDatasource(*, type: Literal['spark'] = 'spark', name: str, id: Optional[uuid.UUID] = None, assets: List[great_expectations.datasource.fluent.spark_datasource.DataFrameAsset] = [], spark_config: Optional[Dict[pydantic.v1.types.StrictStr, Union[pydantic.v1.types.StrictStr, pydantic.v1.types.StrictInt, pydantic.v1.types.StrictFloat, pydantic.v1.types.StrictBool]]] = None, force_reuse_spark_context: bool = True, persist: bool = True)#
A SparkDatasource is a Datasource that connects to a Spark cluster and provides access to Spark DataFrames.
add_dataframe_asset(name: str, batch_metadata: Optional[BatchMetadata] = None) DataFrameAsset #
Adds a Dataframe DataAsset to this SparkDatasource object.
- Parameters
name – The name of the DataFrame asset. This can be any arbitrary string.
dataframe – The Spark Dataframe containing the data for this DataFrame data asset.
batch_metadata – An arbitrary user defined dictionary with string keys which will get inherited by any batches created from the asset.
- Returns
The DataFameAsset that has been added to this datasource.
- delete_asset(name: str)None #
Removes the DataAsset referred to by asset_name from internal list of available DataAsset objects.
- Parameters
name – name of DataAsset to be deleted.
- get_asset(name: str)great_expectations.datasource.fluent.interfaces._DataAssetT #
Returns the DataAsset referred to by asset_name
- Parameters
name – name of DataAsset sought.
- Returns
_DataAssetT – if named “DataAsset” object exists; otherwise, exception is raised.