great_expectations.datasource.data_connector.util

Module Contents

Functions

batch_definition_matches_batch_request(batch_definition: BatchDefinition, batch_request: BatchRequestBase)

map_data_reference_string_to_batch_definition_list_using_regex(datasource_name: str, data_connector_name: str, data_reference: str, regex_pattern: str, group_names: List[str], data_asset_name: Optional[str] = None)

convert_data_reference_string_to_batch_identifiers_using_regex(data_reference: str, regex_pattern: str, group_names: List[str])

map_batch_definition_to_data_reference_string_using_regex(batch_definition: BatchDefinition, regex_pattern: str, group_names: List[str])

convert_batch_identifiers_to_data_reference_string_using_regex(batch_identifiers: IDDict, regex_pattern: str, group_names: List[str], data_asset_name: Optional[str] = None)

_invert_regex_to_data_reference_template(regex_pattern: str, group_names: List[str])

Create a string template based on a regex and corresponding list of group names.

normalize_directory_path(dir_path: str, root_directory_path: Optional[str] = None)

get_filesystem_one_level_directory_glob_path_list(base_directory_path: str, glob_directive: str)

List file names, relative to base_directory_path one level deep, with expansion specified by glob_directive.

list_s3_keys(s3, query_options: dict, iterator_dict: dict, recursive: bool = False)

For InferredAssetS3DataConnector, we take bucket and prefix and search for files using RegEx at and below the level

build_sorters_from_config(config_list: List[Dict[str, Any]])

_build_sorter_from_config(sorter_config: Dict[str, Any])

Build a Sorter using the provided configuration and return the newly-built Sorter.

great_expectations.datasource.data_connector.util.logger
great_expectations.datasource.data_connector.util.pyspark
great_expectations.datasource.data_connector.util.DEFAULT_DATA_ASSET_NAME :str = DEFAULT_ASSET_NAME
great_expectations.datasource.data_connector.util.batch_definition_matches_batch_request(batch_definition: BatchDefinition, batch_request: BatchRequestBase) → bool
great_expectations.datasource.data_connector.util.map_data_reference_string_to_batch_definition_list_using_regex(datasource_name: str, data_connector_name: str, data_reference: str, regex_pattern: str, group_names: List[str], data_asset_name: Optional[str] = None) → Optional[List[BatchDefinition]]
great_expectations.datasource.data_connector.util.convert_data_reference_string_to_batch_identifiers_using_regex(data_reference: str, regex_pattern: str, group_names: List[str]) → Optional[Tuple[str, IDDict]]
great_expectations.datasource.data_connector.util.map_batch_definition_to_data_reference_string_using_regex(batch_definition: BatchDefinition, regex_pattern: str, group_names: List[str]) → str
great_expectations.datasource.data_connector.util.convert_batch_identifiers_to_data_reference_string_using_regex(batch_identifiers: IDDict, regex_pattern: str, group_names: List[str], data_asset_name: Optional[str] = None) → str
great_expectations.datasource.data_connector.util._invert_regex_to_data_reference_template(regex_pattern: str, group_names: List[str]) → str

Create a string template based on a regex and corresponding list of group names.

For example:

filepath_template = _invert_regex_to_data_reference_template(

regex_pattern=r”^(.+)_(d+)_(d+).csv$”, group_names=[“name”, “timestamp”, “price”],

) filepath_template >> “{name}_{timestamp}_{price}.csv”

Such templates are useful because they can be populated using string substitution:

filepath_template.format(**{

“name”: “user_logs”, “timestamp”: “20200101”, “price”: “250”,

}) >> “user_logs_20200101_250.csv”

NOTE Abe 20201017: This method is almost certainly still brittle. I haven’t exhaustively mapped the OPCODES in sre_constants

great_expectations.datasource.data_connector.util.normalize_directory_path(dir_path: str, root_directory_path: Optional[str] = None) → str
great_expectations.datasource.data_connector.util.get_filesystem_one_level_directory_glob_path_list(base_directory_path: str, glob_directive: str) → List[str]

List file names, relative to base_directory_path one level deep, with expansion specified by glob_directive. :param base_directory_path – base directory path, relative to which file paths will be collected :param glob_directive – glob expansion directive :returns – list of relative file paths

great_expectations.datasource.data_connector.util.list_s3_keys(s3, query_options: dict, iterator_dict: dict, recursive: bool = False) → str

For InferredAssetS3DataConnector, we take bucket and prefix and search for files using RegEx at and below the level specified by that bucket and prefix. However, for ConfiguredAssetS3DataConnector, we take bucket and prefix and search for files using RegEx only at the level specified by that bucket and prefix. This restriction for the ConfiguredAssetS3DataConnector is needed, because paths on S3 are comprised not only the leaf file name but the full path that includes both the prefix and the file name. Otherwise, in the situations where multiple data assets share levels of a directory tree, matching files to data assets will not be possible, due to the path ambiguity. :param s3: s3 client connection :param query_options: s3 query attributes (“Bucket”, “Prefix”, “Delimiter”, “MaxKeys”) :param iterator_dict: dictionary to manage “NextContinuationToken” (if “IsTruncated” is returned from S3) :param recursive: True for InferredAssetS3DataConnector and False for ConfiguredAssetS3DataConnector (see above) :return: string valued key representing file path on S3 (full prefix and leaf file name)

great_expectations.datasource.data_connector.util.build_sorters_from_config(config_list: List[Dict[str, Any]]) → Optional[dict]
great_expectations.datasource.data_connector.util._build_sorter_from_config(sorter_config: Dict[str, Any]) → Sorter

Build a Sorter using the provided configuration and return the newly-built Sorter.