great_expectations.datasource.data_connector

Package Contents

Classes

ConfiguredAssetFilePathDataConnector(name: str, datasource_name: str, assets: dict, execution_engine: Optional[ExecutionEngine] = None, default_regex: Optional[dict] = None, sorters: Optional[list] = None, batch_spec_passthrough: Optional[dict] = None)

The ConfiguredAssetFilePathDataConnector is one of two classes (InferredAssetFilePathDataConnector being the

ConfiguredAssetFilesystemDataConnector(name: str, datasource_name: str, base_directory: str, assets: dict, execution_engine: Optional[ExecutionEngine] = None, default_regex: Optional[dict] = None, glob_directive: str = ‘**/*’, sorters: Optional[list] = None, batch_spec_passthrough: Optional[dict] = None)

Extension of ConfiguredAssetFilePathDataConnector used to connect to Filesystem

ConfiguredAssetS3DataConnector(name: str, datasource_name: str, bucket: str, assets: dict, execution_engine: Optional[ExecutionEngine] = None, default_regex: Optional[dict] = None, sorters: Optional[list] = None, prefix: Optional[str] = ‘’, delimiter: Optional[str] = ‘/’, max_keys: Optional[int] = 1000, boto3_options: Optional[dict] = None, batch_spec_passthrough: Optional[dict] = None)

Extension of ConfiguredAssetFilePathDataConnector used to connect to S3

ConfiguredAssetSqlDataConnector(name: str, datasource_name: str, execution_engine: Optional[ExecutionEngine] = None, assets: Optional[Dict[str, dict]] = None, batch_spec_passthrough: Optional[dict] = None)

A DataConnector that requires explicit listing of SQL tables you want to connect to.

DataConnector(name: str, datasource_name: str, execution_engine: Optional[ExecutionEngine] = None, batch_spec_passthrough: Optional[dict] = None)

DataConnectors produce identifying information, called “batch_spec” that ExecutionEngines

FilePathDataConnector(name: str, datasource_name: str, execution_engine: Optional[ExecutionEngine] = None, default_regex: Optional[dict] = None, sorters: Optional[list] = None, batch_spec_passthrough: Optional[dict] = None)

Base-class for DataConnector that are designed for connecting to filesystem-like data, which can include

InferredAssetFilePathDataConnector(name: str, datasource_name: str, execution_engine: Optional[ExecutionEngine] = None, default_regex: Optional[dict] = None, sorters: Optional[list] = None, batch_spec_passthrough: Optional[dict] = None)

The InferredAssetFilePathDataConnector is one of two classes (ConfiguredAssetFilePathDataConnector being the

InferredAssetFilesystemDataConnector(name: str, datasource_name: str, base_directory: str, execution_engine: Optional[ExecutionEngine] = None, default_regex: Optional[dict] = None, glob_directive: Optional[str] = ‘*’, sorters: Optional[list] = None, batch_spec_passthrough: Optional[dict] = None)

Extension of InferredAssetFilePathDataConnector used to connect to data on a filesystem.

InferredAssetS3DataConnector(name: str, datasource_name: str, bucket: str, execution_engine: Optional[ExecutionEngine] = None, default_regex: Optional[dict] = None, sorters: Optional[list] = None, prefix: Optional[str] = ‘’, delimiter: Optional[str] = ‘/’, max_keys: Optional[int] = 1000, boto3_options: Optional[dict] = None, batch_spec_passthrough: Optional[dict] = None)

Extension of InferredAssetFilePathDataConnector used to connect to S3

InferredAssetSqlDataConnector(name: str, datasource_name: str, execution_engine: Optional[ExecutionEngine] = None, data_asset_name_prefix: Optional[str] = ‘’, data_asset_name_suffix: Optional[str] = ‘’, include_schema_name: Optional[bool] = False, splitter_method: Optional[str] = None, splitter_kwargs: Optional[dict] = None, sampling_method: Optional[str] = None, sampling_kwargs: Optional[dict] = None, excluded_tables: Optional[list] = None, included_tables: Optional[list] = None, skip_inapplicable_tables: Optional[bool] = True, introspection_directives: Optional[dict] = None, batch_spec_passthrough: Optional[dict] = None)

A DataConnector that infers data_asset names by introspecting a SQL database

RuntimeDataConnector(name: str, datasource_name: str, execution_engine: Optional[ExecutionEngine] = None, batch_identifiers: Optional[list] = None, batch_spec_passthrough: Optional[dict] = None)

A DataConnector that allows users to specify a Batch’s data directly using a RuntimeBatchRequest that contains

class great_expectations.datasource.data_connector.ConfiguredAssetFilePathDataConnector(name: str, datasource_name: str, assets: dict, execution_engine: Optional[ExecutionEngine] = None, default_regex: Optional[dict] = None, sorters: Optional[list] = None, batch_spec_passthrough: Optional[dict] = None)

Bases: great_expectations.datasource.data_connector.file_path_data_connector.FilePathDataConnector

The ConfiguredAssetFilePathDataConnector is one of two classes (InferredAssetFilePathDataConnector being the other) designed for connecting to filesystem-like data. This includes files on disk, but also things like S3 object stores, etc:

A ConfiguredAssetFilePathDataConnector requires an explicit listing of each DataAsset you want to connect to. This allows more fine-tuning, but also requires more setup.

Note: ConfiguredAssetFilePathDataConnector is not meant to be used on its own, but extended. Currently ConfiguredAssetFilesystemDataConnector and ConfiguredAssetS3DataConnector are subclasses of ConfiguredAssetFilePathDataConnector.

property assets(self)
_build_assets_from_config(self, config: Dict[str, dict])
_build_asset_from_config(self, config: dict)
get_available_data_asset_names(self)

Return the list of asset names known by this DataConnector.

Returns

A list of available names

_refresh_data_references_cache(self)
_get_data_reference_list(self, data_asset_name: Optional[str] = None)

List objects in the underlying data store to create a list of data_references. This method is used to refresh the cache.

get_data_reference_list_count(self)

Returns the list of data_references known by this DataConnector by looping over all data_asset_names in _data_references_cache

Returns

number of data_references known by this DataConnector.

get_unmatched_data_references(self)

Returns the list of data_references unmatched by configuration by looping through items in _data_references_cache and returning data_reference that do not have an associated data_asset.

Returns

list of data_references that are not matched by configuration.

_get_batch_definition_list_from_cache(self)
_get_full_file_path(self, path: str, data_asset_name: Optional[str] = None)
_get_regex_config(self, data_asset_name: Optional[str] = None)
_get_asset(self, data_asset_name: str)
abstract _get_data_reference_list_for_asset(self, asset: Optional[Asset])
abstract _get_full_file_path_for_asset(self, path: str, asset: Optional[Asset])
build_batch_spec(self, batch_definition: BatchDefinition)

Build BatchSpec from batch_definition by calling DataConnector’s build_batch_spec function.

Parameters

batch_definition (BatchDefinition) – to be used to build batch_spec

Returns

BatchSpec built from batch_definition

class great_expectations.datasource.data_connector.ConfiguredAssetFilesystemDataConnector(name: str, datasource_name: str, base_directory: str, assets: dict, execution_engine: Optional[ExecutionEngine] = None, default_regex: Optional[dict] = None, glob_directive: str = '**/*', sorters: Optional[list] = None, batch_spec_passthrough: Optional[dict] = None)

Bases: great_expectations.datasource.data_connector.ConfiguredAssetFilePathDataConnector

Extension of ConfiguredAssetFilePathDataConnector used to connect to Filesystem

The ConfiguredAssetFilesystemDataConnector is one of two classes (InferredAssetFilesystemDataConnector being the other one) designed for connecting to data on a filesystem. It connects to assets defined by the assets configuration.

A ConfiguredAssetFilesystemDataConnector requires an explicit listing of each DataAsset you want to connect to. This allows more fine-tuning, but also requires more setup.

_get_data_reference_list_for_asset(self, asset: Optional[Asset])
_get_full_file_path_for_asset(self, path: str, asset: Optional[Asset] = None)
property base_directory(self)

Accessor method for base_directory. If directory is a relative path, interpret it as relative to the root directory. If it is absolute, then keep as-is.

class great_expectations.datasource.data_connector.ConfiguredAssetS3DataConnector(name: str, datasource_name: str, bucket: str, assets: dict, execution_engine: Optional[ExecutionEngine] = None, default_regex: Optional[dict] = None, sorters: Optional[list] = None, prefix: Optional[str] = '', delimiter: Optional[str] = '/', max_keys: Optional[int] = 1000, boto3_options: Optional[dict] = None, batch_spec_passthrough: Optional[dict] = None)

Bases: great_expectations.datasource.data_connector.ConfiguredAssetFilePathDataConnector

Extension of ConfiguredAssetFilePathDataConnector used to connect to S3

DataConnectors produce identifying information, called “batch_spec” that ExecutionEngines can use to get individual batches of data. They add flexibility in how to obtain data such as with time-based partitioning, downsampling, or other techniques appropriate for the Datasource.

The ConfiguredAssetS3DataConnector is one of two classes (InferredAssetS3DataConnector being the other one) designed for connecting to data on S3.

A ConfiguredAssetS3DataConnector requires an explicit listing of each DataAsset you want to connect to. This allows more fine-tuning, but also requires more setup.

build_batch_spec(self, batch_definition: BatchDefinition)

Build BatchSpec from batch_definition by calling DataConnector’s build_batch_spec function.

Parameters

batch_definition (BatchDefinition) – to be used to build batch_spec

Returns

BatchSpec built from batch_definition

_get_data_reference_list_for_asset(self, asset: Optional[Asset])
_get_full_file_path(self, path: str, data_asset_name: Optional[str] = None)
class great_expectations.datasource.data_connector.ConfiguredAssetSqlDataConnector(name: str, datasource_name: str, execution_engine: Optional[ExecutionEngine] = None, assets: Optional[Dict[str, dict]] = None, batch_spec_passthrough: Optional[dict] = None)

Bases: great_expectations.datasource.data_connector.data_connector.DataConnector

A DataConnector that requires explicit listing of SQL tables you want to connect to.

Parameters
  • name (str) – The name of this DataConnector

  • datasource_name (str) – The name of the Datasource that contains it

  • execution_engine (ExecutionEngine) – An ExecutionEngine

  • assets (str) – assets

  • batch_spec_passthrough (dict) – dictionary with keys that will be added directly to batch_spec

property assets(self)
add_data_asset(self, name: str, config: dict)

Add data_asset to DataConnector using data_asset name as key, and data_asset configuration as value.

_get_batch_identifiers_list_from_data_asset_config(self, data_asset_name, data_asset_config)
_refresh_data_references_cache(self)
_get_column_names_from_splitter_kwargs(self, splitter_kwargs)
get_available_data_asset_names(self)

Return the list of asset names known by this DataConnector.

Returns

A list of available names

get_unmatched_data_references(self)

Returns the list of data_references unmatched by configuration by looping through items in _data_references_cache and returning data_reference that do not have an associated data_asset.

Returns

list of data_references that are not matched by configuration.

get_batch_definition_list_from_batch_request(self, batch_request: BatchRequest)
_get_data_reference_list_from_cache_by_data_asset_name(self, data_asset_name: str)

Fetch data_references corresponding to data_asset_name from the cache.

_map_data_reference_to_batch_definition_list(self, data_reference, data_asset_name: Optional[str] = None)
build_batch_spec(self, batch_definition: BatchDefinition)

Build BatchSpec from batch_definition by calling DataConnector’s build_batch_spec function.

Parameters

batch_definition (BatchDefinition) – to be used to build batch_spec

Returns

BatchSpec built from batch_definition

_generate_batch_spec_parameters_from_batch_definition(self, batch_definition: BatchDefinition)
Build BatchSpec parameters from batch_definition with the following components:
  1. data_asset_name from batch_definition

  2. batch_identifiers from batch_definition

  3. data_asset from data_connector

Parameters

batch_definition (BatchDefinition) – to be used to build batch_spec

Returns

dict built from batch_definition

_split_on_whole_table(self, table_name: str)

‘Split’ by returning the whole table

Note: the table_name parameter is a required to keep the signature of this method consistent with other methods.

_split_on_column_value(self, table_name: str, column_name: str)

Split using the values in the named column

_split_on_converted_datetime(self, table_name: str, column_name: str, date_format_string: str = '%Y-%m-%d')

Convert the values in the named column to the given date_format, and split on that

_split_on_divided_integer(self, table_name: str, column_name: str, divisor: int)

Divide the values in the named column by divisor, and split on that

_split_on_mod_integer(self, table_name: str, column_name: str, mod: int)

Divide the values in the named column by divisor, and split on that

_split_on_multi_column_values(self, table_name: str, column_names: List[str])

Split on the joint values in the named columns

_split_on_hashed_column(self, table_name: str, column_name: str, hash_digits: int)

Note: this method is experimental. It does not work with all SQL dialects.

class great_expectations.datasource.data_connector.DataConnector(name: str, datasource_name: str, execution_engine: Optional[ExecutionEngine] = None, batch_spec_passthrough: Optional[dict] = None)

DataConnectors produce identifying information, called “batch_spec” that ExecutionEngines can use to get individual batches of data. They add flexibility in how to obtain data such as with time-based partitioning, downsampling, or other techniques appropriate for the Datasource.

For example, a DataConnector could produce a SQL query that logically represents “rows in the Events table with a timestamp on February 7, 2012,” which a SqlAlchemyDatasource could use to materialize a SqlAlchemyDataset corresponding to that batch of data and ready for validation.

A batch is a sample from a data asset, sliced according to a particular rule. For example, an hourly slide of the Events table or “most recent users records.”

A Batch is the primary unit of validation in the Great Expectations DataContext. Batches include metadata that identifies how they were constructed–the same “batch_spec” assembled by the data connector, While not every Datasource will enable re-fetching a specific batch of data, GE can store snapshots of batches or store metadata from an external data version control system.

property batch_spec_passthrough(self)
property name(self)
property datasource_name(self)
property data_context_root_directory(self)
get_batch_data_and_metadata(self, batch_definition: BatchDefinition)

Uses batch_definition to retrieve batch_data and batch_markers by building a batch_spec from batch_definition, then using execution_engine to return batch_data and batch_markers

Parameters

batch_definition (BatchDefinition) – required batch_definition parameter for retrieval

build_batch_spec(self, batch_definition: BatchDefinition)

Builds batch_spec from batch_definition by generating batch_spec params and adding any pass_through params

Parameters

batch_definition (BatchDefinition) – required batch_definition parameter for retrieval

Returns

BatchSpec object built from BatchDefinition

abstract _refresh_data_references_cache(self)
abstract _get_data_reference_list(self, data_asset_name: Optional[str] = None)

List objects in the underlying data store to create a list of data_references. This method is used to refresh the cache by classes that extend this base DataConnector class

Parameters

data_asset_name (str) – optional data_asset_name to retrieve more specific results

abstract _get_data_reference_list_from_cache_by_data_asset_name(self, data_asset_name: str)

Fetch data_references corresponding to data_asset_name from the cache.

abstract get_data_reference_list_count(self)
abstract get_unmatched_data_references(self)
abstract get_available_data_asset_names(self)

Return the list of asset names known by this data connector.

Returns

A list of available names

abstract get_batch_definition_list_from_batch_request(self, batch_request: BatchRequest)
abstract _map_data_reference_to_batch_definition_list(self, data_reference: Any, data_asset_name: Optional[str] = None)
abstract _map_batch_definition_to_data_reference(self, batch_definition: BatchDefinition)
abstract _generate_batch_spec_parameters_from_batch_definition(self, batch_definition: BatchDefinition)
self_check(self, pretty_print=True, max_examples=3)

Checks the configuration of the current DataConnector by doing the following :

  1. refresh or create data_reference_cache

  2. print batch_definition_count and example_data_references for each data_asset_names

  3. also print unmatched data_references, and allow the user to modify the regex or glob configuration if necessary

  4. select a random data_reference and attempt to retrieve and print the first few rows to user

When used as part of the test_yaml_config() workflow, the user will be able to know if the data_connector is properly configured, and if the associated execution_engine can properly retrieve data using the configuration.

Parameters
  • pretty_print (bool) – should the output be printed?

  • max_examples (int) – how many data_references should be printed?

_self_check_fetch_batch(self, pretty_print: bool, example_data_reference: Any, data_asset_name: str)

Helper function for self_check() to retrieve batch using example_data_reference and data_asset_name, all while printing helpful messages. First 5 rows of batch_data are printed by default.

Parameters
  • pretty_print (bool) – print to console?

  • example_data_reference (Any) – data_reference to retrieve

  • data_asset_name (str) – data_asset_name to retrieve

_validate_batch_request(self, batch_request: BatchRequest)
Validate batch_request by checking:
  1. if configured datasource_name matches batch_request’s datasource_name

  2. if current data_connector_name matches batch_request’s data_connector_name

Parameters

batch_request (BatchRequest) – batch_request to validate

class great_expectations.datasource.data_connector.FilePathDataConnector(name: str, datasource_name: str, execution_engine: Optional[ExecutionEngine] = None, default_regex: Optional[dict] = None, sorters: Optional[list] = None, batch_spec_passthrough: Optional[dict] = None)

Bases: great_expectations.datasource.data_connector.data_connector.DataConnector

Base-class for DataConnector that are designed for connecting to filesystem-like data, which can include files on disk, but also S3 and GCS.

Note: FilePathDataConnector is not meant to be used on its own, but extended. Currently ConfiguredAssetFilePathDataConnector and InferredAssetFilePathDataConnector are subclasses of FilePathDataConnector.

property sorters(self)
_get_data_reference_list_from_cache_by_data_asset_name(self, data_asset_name: str)

Fetch data_references corresponding to data_asset_name from the cache.

get_batch_definition_list_from_batch_request(self, batch_request: BatchRequest)

Retrieve batch_definitions and that match batch_request.

First retrieves all batch_definitions that match batch_request
  • if batch_request also has a batch_filter, then select batch_definitions that match batch_filter.

  • if data_connector has sorters configured, then sort the batch_definition list before returning.

Parameters

batch_request (BatchRequest) – BatchRequest (containing previously validated attributes) to process

Returns

A list of BatchDefinition objects that match BatchRequest

_get_batch_definition_list_from_batch_request(self, batch_request: BatchRequestBase)

Retrieve batch_definitions that match batch_request.

First retrieves all batch_definitions that match batch_request
  • if batch_request also has a batch_filter, then select batch_definitions that match batch_filter.

  • if data_connector has sorters configured, then sort the batch_definition list before returning.

Parameters

batch_request (BatchRequestBase) – BatchRequestBase (BatchRequest without attribute validation) to process

Returns

A list of BatchDefinition objects that match BatchRequest

_sort_batch_definition_list(self, batch_definition_list: List[BatchDefinition])

Use configured sorters to sort batch_definition

Parameters

batch_definition_list (list) – list of batch_definitions to sort

Returns

sorted list of batch_definitions

_map_data_reference_to_batch_definition_list(self, data_reference: str, data_asset_name: str = None)
_map_batch_definition_to_data_reference(self, batch_definition: BatchDefinition)
build_batch_spec(self, batch_definition: BatchDefinition)

Build BatchSpec from batch_definition by calling DataConnector’s build_batch_spec function.

Parameters

batch_definition (BatchDefinition) – to be used to build batch_spec

Returns

BatchSpec built from batch_definition

_generate_batch_spec_parameters_from_batch_definition(self, batch_definition: BatchDefinition)
_validate_batch_request(self, batch_request: BatchRequestBase)
Validate batch_request by checking:
  1. if configured datasource_name matches batch_request’s datasource_name

  2. if current data_connector_name matches batch_request’s data_connector_name

Parameters

batch_request (BatchRequest) – batch_request to validate

_validate_sorters_configuration(self, data_asset_name: Optional[str] = None)
abstract _get_batch_definition_list_from_cache(self)
abstract _get_regex_config(self, data_asset_name: Optional[str] = None)
abstract _get_full_file_path(self, path: str, data_asset_name: Optional[str] = None)
class great_expectations.datasource.data_connector.InferredAssetFilePathDataConnector(name: str, datasource_name: str, execution_engine: Optional[ExecutionEngine] = None, default_regex: Optional[dict] = None, sorters: Optional[list] = None, batch_spec_passthrough: Optional[dict] = None)

Bases: great_expectations.datasource.data_connector.FilePathDataConnector

The InferredAssetFilePathDataConnector is one of two classes (ConfiguredAssetFilePathDataConnector being the other one) designed for connecting to filesystem-like data. This includes files on disk, but also things like S3 object stores, etc:

InferredAssetFilePathDataConnector is a base class that operates on file paths and determines the data_asset_name implicitly (e.g., through the combination of the regular expressions pattern and group names)

Note: InferredAssetFilePathDataConnector is not meant to be used on its own, but extended. Currently InferredAssetFilesystemDataConnector and InferredAssetS3DataConnector are subclasses of InferredAssetFilePathDataConnector.

_refresh_data_references_cache(self)

refreshes data_reference cache

get_data_reference_list_count(self)

Returns the list of data_references known by this DataConnector by looping over all data_asset_names in _data_references_cache

Returns

number of data_references known by this DataConnector

get_unmatched_data_references(self)

Returns the list of data_references unmatched by configuration by looping through items in _data_references_cache and returning data_references that do not have an associated data_asset.

Returns

list of data_references that are not matched by configuration.

get_available_data_asset_names(self)

Return the list of asset names known by this DataConnector

Returns

A list of available names

build_batch_spec(self, batch_definition: BatchDefinition)

Build BatchSpec from batch_definition by calling DataConnector’s build_batch_spec function.

Parameters

batch_definition (BatchDefinition) – to be used to build batch_spec

Returns

BatchSpec built from batch_definition

_get_batch_definition_list_from_cache(self)
_get_regex_config(self, data_asset_name: Optional[str] = None)
class great_expectations.datasource.data_connector.InferredAssetFilesystemDataConnector(name: str, datasource_name: str, base_directory: str, execution_engine: Optional[ExecutionEngine] = None, default_regex: Optional[dict] = None, glob_directive: Optional[str] = '*', sorters: Optional[list] = None, batch_spec_passthrough: Optional[dict] = None)

Bases: great_expectations.datasource.data_connector.InferredAssetFilePathDataConnector

Extension of InferredAssetFilePathDataConnector used to connect to data on a filesystem.

The InferredAssetFilesystemDataConnector is one of two classes (ConfiguredAssetFilesystemDataConnector being the other one) designed for connecting to data on a filesystem. It connects to assets inferred from directory and file name by default_regex and glob_directive.

InferredAssetFilesystemDataConnector that operates on file paths and determines the data_asset_name implicitly (e.g., through the combination of the regular expressions pattern and group names)

_get_data_reference_list(self, data_asset_name: Optional[str] = None)

List objects in the underlying data store to create a list of data_references.

This method is used to refresh the cache.

_get_full_file_path(self, path: str, data_asset_name: Optional[str] = None)
property base_directory(self)

Accessor method for base_directory. If directory is a relative path, interpret it as relative to the root directory. If it is absolute, then keep as-is.

class great_expectations.datasource.data_connector.InferredAssetS3DataConnector(name: str, datasource_name: str, bucket: str, execution_engine: Optional[ExecutionEngine] = None, default_regex: Optional[dict] = None, sorters: Optional[list] = None, prefix: Optional[str] = '', delimiter: Optional[str] = '/', max_keys: Optional[int] = 1000, boto3_options: Optional[dict] = None, batch_spec_passthrough: Optional[dict] = None)

Bases: great_expectations.datasource.data_connector.InferredAssetFilePathDataConnector

Extension of InferredAssetFilePathDataConnector used to connect to S3

The InferredAssetS3DataConnector is one of two classes (ConfiguredAssetS3DataConnector being the other one) designed for connecting to filesystem-like data, more specifically files on S3. It connects to assets inferred from bucket, prefix, and file name by default_regex.

InferredAssetS3DataConnector that operates on S3 buckets and determines the data_asset_name implicitly (e.g., through the combination of the regular expressions pattern and group names)

build_batch_spec(self, batch_definition: BatchDefinition)

Build BatchSpec from batch_definition by calling DataConnector’s build_batch_spec function.

Parameters

batch_definition (BatchDefinition) – to be used to build batch_spec

Returns

BatchSpec built from batch_definition

_get_data_reference_list(self, data_asset_name: Optional[str] = None)

List objects in the underlying data store to create a list of data_references.

This method is used to refresh the cache.

_get_full_file_path(self, path: str, data_asset_name: Optional[str] = None)
class great_expectations.datasource.data_connector.InferredAssetSqlDataConnector(name: str, datasource_name: str, execution_engine: Optional[ExecutionEngine] = None, data_asset_name_prefix: Optional[str] = '', data_asset_name_suffix: Optional[str] = '', include_schema_name: Optional[bool] = False, splitter_method: Optional[str] = None, splitter_kwargs: Optional[dict] = None, sampling_method: Optional[str] = None, sampling_kwargs: Optional[dict] = None, excluded_tables: Optional[list] = None, included_tables: Optional[list] = None, skip_inapplicable_tables: Optional[bool] = True, introspection_directives: Optional[dict] = None, batch_spec_passthrough: Optional[dict] = None)

Bases: great_expectations.datasource.data_connector.ConfiguredAssetSqlDataConnector

A DataConnector that infers data_asset names by introspecting a SQL database

property assets(self)
_refresh_data_references_cache(self)
_refresh_introspected_assets_cache(self, data_asset_name_prefix: str = None, data_asset_name_suffix: str = None, include_schema_name: bool = False, splitter_method: str = None, splitter_kwargs: dict = None, sampling_method: str = None, sampling_kwargs: dict = None, excluded_tables: List = None, included_tables: List = None, skip_inapplicable_tables: bool = True)
_introspect_db(self, schema_name: str = None, ignore_information_schemas_and_system_tables: bool = True, information_schemas: List[str] = ['INFORMATION_SCHEMA', 'information_schema', 'performance_schema', 'sys', 'mysql'], system_tables: List[str] = ['sqlite_master'], include_views=True)
class great_expectations.datasource.data_connector.RuntimeDataConnector(name: str, datasource_name: str, execution_engine: Optional[ExecutionEngine] = None, batch_identifiers: Optional[list] = None, batch_spec_passthrough: Optional[dict] = None)

Bases: great_expectations.datasource.data_connector.data_connector.DataConnector

A DataConnector that allows users to specify a Batch’s data directly using a RuntimeBatchRequest that contains either an in-memory Pandas or Spark DataFrame, a filesystem or S3 path, or an arbitrary SQL query

Parameters
  • name (str) – The name of this DataConnector

  • datasource_name (str) – The name of the Datasource that contains it

  • execution_engine (ExecutionEngine) – An ExecutionEngine

  • batch_identifiers (list) – a list of keys that must be defined in the batch_identifiers dict of

  • RuntimeBatchRequest

  • batch_spec_passthrough (dict) – dictionary with keys that will be added directly to batch_spec

_refresh_data_references_cache(self)
_get_data_reference_list(self, data_asset_name: Optional[str] = None)

List objects in the cache to create a list of data_references. If data_asset_name is passed in, method will return all data_references for the named data_asset. If no data_asset_name is passed in, will return a list of all data_references for all data_assets in the cache.

_get_data_reference_list_from_cache_by_data_asset_name(self, data_asset_name: str)

Fetch data_references corresponding to data_asset_name from the cache.

get_data_reference_list_count(self)

Get number of data_references corresponding to all data_asset_names in cache. In cases where the RuntimeDataConnector has been passed a BatchRequest with the same data_asset_name but different batch_identifiers, it is possible to have more than one data_reference for a data_asset.

get_unmatched_data_references(self)
get_available_data_asset_names(self)

Please see note in : _get_batch_definition_list_from_batch_request()

get_batch_data_and_metadata(self, batch_definition: BatchDefinition, runtime_parameters: dict)

Uses batch_definition to retrieve batch_data and batch_markers by building a batch_spec from batch_definition, then using execution_engine to return batch_data and batch_markers

Parameters

batch_definition (BatchDefinition) – required batch_definition parameter for retrieval

get_batch_definition_list_from_batch_request(self, batch_request: RuntimeBatchRequest)
_get_batch_definition_list_from_batch_request(self, batch_request: BatchRequest)

<Will> 202103. The following behavior of the _data_references_cache follows a pattern that we are using for other data_connectors, including variations of FilePathDataConnector. When BatchRequest contains batch_data that is passed in as a in-memory dataframe, the cache will contain the names of all data_assets (and data_references) that have been passed into the RuntimeDataConnector in this session, even though technically only the most recent batch_data is available. This can be misleading. However, allowing the RuntimeDataConnector to keep a record of all data_assets (and data_references) that have been passed in will allow for the proposed behavior of RuntimeBatchRequest which will allow for paths and queries to be passed in as part of the BatchRequest. Therefore this behavior will be revisited when the design of RuntimeBatchRequest and related classes are complete.

_update_data_references_cache(self, data_asset_name: str, batch_definition_list: List, batch_identifiers: IDDict)
_self_check_fetch_batch(self, pretty_print, example_data_reference, data_asset_name)

Helper function for self_check() to retrieve batch using example_data_reference and data_asset_name, all while printing helpful messages. First 5 rows of batch_data are printed by default.

Parameters
  • pretty_print (bool) – print to console?

  • example_data_reference (Any) – data_reference to retrieve

  • data_asset_name (str) – data_asset_name to retrieve

_generate_batch_spec_parameters_from_batch_definition(self, batch_definition: BatchDefinition)
build_batch_spec(self, batch_definition: BatchDefinition, runtime_parameters: dict)

Builds batch_spec from batch_definition by generating batch_spec params and adding any pass_through params

Parameters

batch_definition (BatchDefinition) – required batch_definition parameter for retrieval

Returns

BatchSpec object built from BatchDefinition

static _get_data_reference_name(batch_identifiers: IDDict)
static _validate_runtime_parameters(runtime_parameters: Union[dict, type(None)])
_validate_batch_request(self, batch_request: BatchRequestBase)
Validate batch_request by checking:
  1. if configured datasource_name matches batch_request’s datasource_name

  2. if current data_connector_name matches batch_request’s data_connector_name

Parameters

batch_request (BatchRequest) – batch_request to validate

_validate_batch_identifiers(self, batch_identifiers: dict)
_validate_batch_identifiers_configuration(self, batch_identifiers: List[str])