great_expectations.datasource.data_connector.runtime_data_connector

Module Contents

Classes

RuntimeDataConnector(name: str, datasource_name: str, execution_engine: Optional[ExecutionEngine] = None, batch_identifiers: Optional[list] = None, batch_spec_passthrough: Optional[dict] = None)

A DataConnector that allows users to specify a Batch’s data directly using a RuntimeBatchRequest that contains

great_expectations.datasource.data_connector.runtime_data_connector.logger
great_expectations.datasource.data_connector.runtime_data_connector.DEFAULT_DELIMITER :str = -
class great_expectations.datasource.data_connector.runtime_data_connector.RuntimeDataConnector(name: str, datasource_name: str, execution_engine: Optional[ExecutionEngine] = None, batch_identifiers: Optional[list] = None, batch_spec_passthrough: Optional[dict] = None)

Bases: great_expectations.datasource.data_connector.data_connector.DataConnector

A DataConnector that allows users to specify a Batch’s data directly using a RuntimeBatchRequest that contains either an in-memory Pandas or Spark DataFrame, a filesystem or S3 path, or an arbitrary SQL query

Parameters
  • name (str) – The name of this DataConnector

  • datasource_name (str) – The name of the Datasource that contains it

  • execution_engine (ExecutionEngine) – An ExecutionEngine

  • batch_identifiers (list) – a list of keys that must be defined in the batch_identifiers dict of

  • RuntimeBatchRequest

  • batch_spec_passthrough (dict) – dictionary with keys that will be added directly to batch_spec

_refresh_data_references_cache(self)
_get_data_reference_list(self, data_asset_name: Optional[str] = None)

List objects in the cache to create a list of data_references. If data_asset_name is passed in, method will return all data_references for the named data_asset. If no data_asset_name is passed in, will return a list of all data_references for all data_assets in the cache.

_get_data_reference_list_from_cache_by_data_asset_name(self, data_asset_name: str)

Fetch data_references corresponding to data_asset_name from the cache.

get_data_reference_list_count(self)

Get number of data_references corresponding to all data_asset_names in cache. In cases where the RuntimeDataConnector has been passed a BatchRequest with the same data_asset_name but different batch_identifiers, it is possible to have more than one data_reference for a data_asset.

get_unmatched_data_references(self)
get_available_data_asset_names(self)

Please see note in : _get_batch_definition_list_from_batch_request()

get_batch_data_and_metadata(self, batch_definition: BatchDefinition, runtime_parameters: dict)

Uses batch_definition to retrieve batch_data and batch_markers by building a batch_spec from batch_definition, then using execution_engine to return batch_data and batch_markers

Parameters

batch_definition (BatchDefinition) – required batch_definition parameter for retrieval

get_batch_definition_list_from_batch_request(self, batch_request: RuntimeBatchRequest)
_get_batch_definition_list_from_batch_request(self, batch_request: BatchRequest)

<Will> 202103. The following behavior of the _data_references_cache follows a pattern that we are using for other data_connectors, including variations of FilePathDataConnector. When BatchRequest contains batch_data that is passed in as a in-memory dataframe, the cache will contain the names of all data_assets (and data_references) that have been passed into the RuntimeDataConnector in this session, even though technically only the most recent batch_data is available. This can be misleading. However, allowing the RuntimeDataConnector to keep a record of all data_assets (and data_references) that have been passed in will allow for the proposed behavior of RuntimeBatchRequest which will allow for paths and queries to be passed in as part of the BatchRequest. Therefore this behavior will be revisited when the design of RuntimeBatchRequest and related classes are complete.

_update_data_references_cache(self, data_asset_name: str, batch_definition_list: List, batch_identifiers: IDDict)
_self_check_fetch_batch(self, pretty_print, example_data_reference, data_asset_name)

Helper function for self_check() to retrieve batch using example_data_reference and data_asset_name, all while printing helpful messages. First 5 rows of batch_data are printed by default.

Parameters
  • pretty_print (bool) – print to console?

  • example_data_reference (Any) – data_reference to retrieve

  • data_asset_name (str) – data_asset_name to retrieve

_generate_batch_spec_parameters_from_batch_definition(self, batch_definition: BatchDefinition)
build_batch_spec(self, batch_definition: BatchDefinition, runtime_parameters: dict)

Builds batch_spec from batch_definition by generating batch_spec params and adding any pass_through params

Parameters

batch_definition (BatchDefinition) – required batch_definition parameter for retrieval

Returns

BatchSpec object built from BatchDefinition

static _get_data_reference_name(batch_identifiers: IDDict)
static _validate_runtime_parameters(runtime_parameters: Union[dict, type(None)])
_validate_batch_request(self, batch_request: BatchRequestBase)
Validate batch_request by checking:
  1. if configured datasource_name matches batch_request’s datasource_name

  2. if current data_connector_name matches batch_request’s data_connector_name

Parameters

batch_request (BatchRequest) – batch_request to validate

_validate_batch_identifiers(self, batch_identifiers: dict)
_validate_batch_identifiers_configuration(self, batch_identifiers: List[str])