SparkDatasource
class great_expectations.datasource.fluent.SparkDatasource(*, type: Literal['spark'] = 'spark', name: str, id: Optional[uuid.UUID] = None, assets: List[great_expectations.datasource.fluent.spark_datasource.DataFrameAsset] = [], spark_config: Optional[Dict[pydantic.v1.types.StrictStr, Union[pydantic.v1.types.StrictStr, pydantic.v1.types.StrictInt, pydantic.v1.types.StrictFloat, pydantic.v1.types.StrictBool]]] = None, force_reuse_spark_context: bool = True, persist: bool = True)#
add_dataframe_asset(name: str, dataframe: Optional[_SparkDataFrameT] = None, batch_metadata: Optional[BatchMetadata] = None) DataFrameAsset #
Adds a Dataframe DataAsset to this SparkDatasource object.
- Parameters
name – The name of the DataFrame asset. This can be any arbitrary string.
dataframe – The Spark Dataframe containing the data for this DataFrame data asset.
batch_metadata – An arbitrary user defined dictionary with string keys which will get inherited by any batches created from the asset.
Deprecated since version 0.16.15: The “dataframe” argument is no longer part of “PandasDatasource.add_dataframe_asset()” method call; instead, “dataframe” is the required argument to “DataFrameAsset.build_batch_request()” method.
- Returns
The DataFameAsset that has been added to this datasource.