great_expectations.util

Module Contents

Functions

pluralize(singular_ge_noun)

Pluralizes a Great Expectations singular noun

singularize(plural_ge_noun)

Singularizes a Great Expectations plural noun

underscore(word: str)

Borrowed from inflection.underscore

profile(func: Callable = None)

measure_execution_time(func: Callable = None)

get_project_distribution()

get_currently_executing_function()

get_currently_executing_function_call_arguments(include_module_name: bool = False, include_caller_names: bool = False, **kwargs)

param include_module_name

bool If True, module name will be determined and included in output dictionary (default is False)

verify_dynamic_loading_support(module_name: str, package_name: str = None)

param module_name

a possibly-relative name of a module

import_library_module(module_name: str)

param module_name

a fully-qualified name of a module (e.g., “great_expectations.dataset.sqlalchemy_dataset”)

is_library_loadable(library_name: str)

load_class(class_name: str, module_name: str)

_convert_to_dataset_class(df, dataset_class, expectation_suite=None, profiler=None)

Convert a (pandas) dataframe to a great_expectations dataset, with (optional) expectation_suite

_load_and_convert_to_dataset_class(df, class_name, module_name, expectation_suite=None, profiler=None)

Convert a (pandas) dataframe to a great_expectations dataset, with (optional) expectation_suite

read_csv(filename, class_name=’PandasDataset’, module_name=’great_expectations.dataset’, dataset_class=None, expectation_suite=None, profiler=None, *args, **kwargs)

Read a file using Pandas read_csv and return a great_expectations dataset.

read_json(filename, class_name=’PandasDataset’, module_name=’great_expectations.dataset’, dataset_class=None, expectation_suite=None, accessor_func=None, profiler=None, *args, **kwargs)

Read a file using Pandas read_json and return a great_expectations dataset.

read_excel(filename, class_name=’PandasDataset’, module_name=’great_expectations.dataset’, dataset_class=None, expectation_suite=None, profiler=None, *args, **kwargs)

Read a file using Pandas read_excel and return a great_expectations dataset.

read_table(filename, class_name=’PandasDataset’, module_name=’great_expectations.dataset’, dataset_class=None, expectation_suite=None, profiler=None, *args, **kwargs)

Read a file using Pandas read_table and return a great_expectations dataset.

read_feather(filename, class_name=’PandasDataset’, module_name=’great_expectations.dataset’, dataset_class=None, expectation_suite=None, profiler=None, *args, **kwargs)

Read a file using Pandas read_feather and return a great_expectations dataset.

read_parquet(filename, class_name=’PandasDataset’, module_name=’great_expectations.dataset’, dataset_class=None, expectation_suite=None, profiler=None, *args, **kwargs)

Read a file using Pandas read_parquet and return a great_expectations dataset.

from_pandas(pandas_df, class_name=’PandasDataset’, module_name=’great_expectations.dataset’, dataset_class=None, expectation_suite=None, profiler=None)

Read a Pandas data frame and return a great_expectations dataset.

read_pickle(filename, class_name=’PandasDataset’, module_name=’great_expectations.dataset’, dataset_class=None, expectation_suite=None, profiler=None, *args, **kwargs)

Read a file using Pandas read_pickle and return a great_expectations dataset.

validate(data_asset, expectation_suite=None, data_asset_name=None, expectation_suite_name=None, data_context=None, data_asset_class_name=None, data_asset_module_name=’great_expectations.dataset’, data_asset_class=None, *args, **kwargs)

Validate the provided data asset. Validate can accept an optional data_asset_name to apply, data_context to use

gen_directory_tree_str(startpath)

Print the structure of directory as a tree:

lint_code(code: str)

Lint strings of code passed in. Optional dependency “black” must be installed.

filter_properties_dict(properties: dict, keep_fields: Optional[list] = None, delete_fields: Optional[list] = None, clean_nulls: Optional[bool] = True, clean_falsy: Optional[bool] = False, keep_falsy_numerics: Optional[bool] = True, inplace: Optional[bool] = False)

Filter the entries of the source dictionary according to directives concerning the existing keys and values.

is_numeric(value: Any)

is_int(value: Any)

is_float(value: Any)

is_parseable_date(value: Any, fuzzy: bool = False)

get_context()

is_sane_slack_webhook(url: str)

Really basic sanity checking.

is_list_of_strings(_list)

generate_library_json_from_registered_expectations()

Generate the JSON object used to populate the public gallery

delete_blank_lines(text: str)

great_expectations.util.logger
great_expectations.util.SINGULAR_TO_PLURAL_LOOKUP_DICT
great_expectations.util.PLURAL_TO_SINGULAR_LOOKUP_DICT
great_expectations.util.pluralize(singular_ge_noun)

Pluralizes a Great Expectations singular noun

great_expectations.util.singularize(plural_ge_noun)

Singularizes a Great Expectations plural noun

great_expectations.util.underscore(word: str) → str

Borrowed from inflection.underscore Make an underscored, lowercase form from the expression in the string.

Example:

>>> underscore("DeviceType")
'device_type'

As a rule of thumb you can think of underscore() as the inverse of camelize(), though there are cases where that does not hold:

>>> camelize(underscore("IOError"))
'IoError'
great_expectations.util.profile(func: Callable = None) → Callable
great_expectations.util.measure_execution_time(func: Callable = None) → Callable
great_expectations.util.get_project_distribution() → Optional[Distribution]
great_expectations.util.get_currently_executing_function() → Callable
great_expectations.util.get_currently_executing_function_call_arguments(include_module_name: bool = False, include_caller_names: bool = False, **kwargs) → dict
Parameters
  • include_module_name – bool If True, module name will be determined and included in output dictionary (default is False)

  • include_caller_names – bool If True, arguments, such as “self” and “cls”, if present, will be included in output dictionary (default is False)

  • kwargs

Returns

dict Output dictionary, consisting of call arguments as attribute “name: value” pairs.

Example usage: # Gather the call arguments of the present function (include the “module_name” and add the “class_name”), filter # out the Falsy values, and set the instance “_config” variable equal to the resulting dictionary. self._config = get_currently_executing_function_call_arguments(

include_module_name=True, **{

“class_name”: self.__class__.__name__,

},

) filter_properties_dict(properties=self._config, clean_falsy=True, inplace=True)

great_expectations.util.verify_dynamic_loading_support(module_name: str, package_name: str = None) → None
Parameters
  • module_name – a possibly-relative name of a module

  • package_name – the name of a package, to which the given module belongs

great_expectations.util.import_library_module(module_name: str) → Optional[ModuleType]
Parameters

module_name – a fully-qualified name of a module (e.g., “great_expectations.dataset.sqlalchemy_dataset”)

Returns

raw source code of the module (if can be retrieved)

great_expectations.util.is_library_loadable(library_name: str) → bool
great_expectations.util.load_class(class_name: str, module_name: str)
great_expectations.util._convert_to_dataset_class(df, dataset_class, expectation_suite=None, profiler=None)

Convert a (pandas) dataframe to a great_expectations dataset, with (optional) expectation_suite

Parameters
  • df – the DataFrame object to convert

  • dataset_class – the class to which to convert the existing DataFrame

  • expectation_suite – the expectation suite that should be attached to the resulting dataset

  • profiler – the profiler to use to generate baseline expectations, if any

Returns

A new Dataset object

great_expectations.util._load_and_convert_to_dataset_class(df, class_name, module_name, expectation_suite=None, profiler=None)

Convert a (pandas) dataframe to a great_expectations dataset, with (optional) expectation_suite

Parameters
  • df – the DataFrame object to convert

  • class_name (str) – class to which to convert resulting Pandas df

  • module_name (str) – dataset module from which to try to dynamically load the relevant module

  • expectation_suite – the expectation suite that should be attached to the resulting dataset

  • profiler – the profiler to use to generate baseline expectations, if any

Returns

A new Dataset object

great_expectations.util.read_csv(filename, class_name='PandasDataset', module_name='great_expectations.dataset', dataset_class=None, expectation_suite=None, profiler=None, *args, **kwargs)

Read a file using Pandas read_csv and return a great_expectations dataset.

Parameters
  • filename (string) – path to file to read

  • class_name (str) – class to which to convert resulting Pandas df

  • module_name (str) – dataset module from which to try to dynamically load the relevant module

  • dataset_class (Dataset) – If specified, the class to which to convert the resulting Dataset object; if not specified, try to load the class named via the class_name and module_name parameters

  • expectation_suite (string) – path to great_expectations expectation suite file

  • profiler (Profiler class) – profiler to use when creating the dataset (default is None)

Returns

great_expectations dataset

great_expectations.util.read_json(filename, class_name='PandasDataset', module_name='great_expectations.dataset', dataset_class=None, expectation_suite=None, accessor_func=None, profiler=None, *args, **kwargs)

Read a file using Pandas read_json and return a great_expectations dataset.

Parameters
  • filename (string) – path to file to read

  • class_name (str) – class to which to convert resulting Pandas df

  • module_name (str) – dataset module from which to try to dynamically load the relevant module

  • dataset_class (Dataset) – If specified, the class to which to convert the resulting Dataset object; if not specified, try to load the class named via the class_name and module_name parameters

  • expectation_suite (string) – path to great_expectations expectation suite file

  • accessor_func (Callable) – functions to transform the json object in the file

  • profiler (Profiler class) – profiler to use when creating the dataset (default is None)

Returns

great_expectations dataset

great_expectations.util.read_excel(filename, class_name='PandasDataset', module_name='great_expectations.dataset', dataset_class=None, expectation_suite=None, profiler=None, *args, **kwargs)

Read a file using Pandas read_excel and return a great_expectations dataset.

Parameters
  • filename (string) – path to file to read

  • class_name (str) – class to which to convert resulting Pandas df

  • module_name (str) – dataset module from which to try to dynamically load the relevant module

  • dataset_class (Dataset) – If specified, the class to which to convert the resulting Dataset object; if not specified, try to load the class named via the class_name and module_name parameters

  • expectation_suite (string) – path to great_expectations expectation suite file

  • profiler (Profiler class) – profiler to use when creating the dataset (default is None)

Returns

great_expectations dataset or ordered dict of great_expectations datasets, if multiple worksheets are imported

great_expectations.util.read_table(filename, class_name='PandasDataset', module_name='great_expectations.dataset', dataset_class=None, expectation_suite=None, profiler=None, *args, **kwargs)

Read a file using Pandas read_table and return a great_expectations dataset.

Parameters
  • filename (string) – path to file to read

  • class_name (str) – class to which to convert resulting Pandas df

  • module_name (str) – dataset module from which to try to dynamically load the relevant module

  • dataset_class (Dataset) – If specified, the class to which to convert the resulting Dataset object; if not specified, try to load the class named via the class_name and module_name parameters

  • expectation_suite (string) – path to great_expectations expectation suite file

  • profiler (Profiler class) – profiler to use when creating the dataset (default is None)

Returns

great_expectations dataset

great_expectations.util.read_feather(filename, class_name='PandasDataset', module_name='great_expectations.dataset', dataset_class=None, expectation_suite=None, profiler=None, *args, **kwargs)

Read a file using Pandas read_feather and return a great_expectations dataset.

Parameters
  • filename (string) – path to file to read

  • class_name (str) – class to which to convert resulting Pandas df

  • module_name (str) – dataset module from which to try to dynamically load the relevant module

  • dataset_class (Dataset) – If specified, the class to which to convert the resulting Dataset object; if not specified, try to load the class named via the class_name and module_name parameters

  • expectation_suite (string) – path to great_expectations expectation suite file

  • profiler (Profiler class) – profiler to use when creating the dataset (default is None)

Returns

great_expectations dataset

great_expectations.util.read_parquet(filename, class_name='PandasDataset', module_name='great_expectations.dataset', dataset_class=None, expectation_suite=None, profiler=None, *args, **kwargs)

Read a file using Pandas read_parquet and return a great_expectations dataset.

Parameters
  • filename (string) – path to file to read

  • class_name (str) – class to which to convert resulting Pandas df

  • module_name (str) – dataset module from which to try to dynamically load the relevant module

  • dataset_class (Dataset) – If specified, the class to which to convert the resulting Dataset object; if not specified, try to load the class named via the class_name and module_name parameters

  • expectation_suite (string) – path to great_expectations expectation suite file

  • profiler (Profiler class) – profiler to use when creating the dataset (default is None)

Returns

great_expectations dataset

great_expectations.util.from_pandas(pandas_df, class_name='PandasDataset', module_name='great_expectations.dataset', dataset_class=None, expectation_suite=None, profiler=None)

Read a Pandas data frame and return a great_expectations dataset.

Parameters
  • pandas_df (Pandas df) – Pandas data frame

  • class_name (str) – class to which to convert resulting Pandas df

  • module_name (str) – dataset module from which to try to dynamically load the relevant module

  • dataset_class (Dataset) – If specified, the class to which to convert the resulting Dataset object; if not specified, try to load the class named via the class_name and module_name parameters

  • expectation_suite (string) – path to great_expectations expectation suite file

  • profiler (profiler class) – The profiler that should be run on the dataset to establish a baseline expectation suite.

Returns

great_expectations dataset

great_expectations.util.read_pickle(filename, class_name='PandasDataset', module_name='great_expectations.dataset', dataset_class=None, expectation_suite=None, profiler=None, *args, **kwargs)

Read a file using Pandas read_pickle and return a great_expectations dataset.

Parameters
  • filename (string) – path to file to read

  • class_name (str) – class to which to convert resulting Pandas df

  • module_name (str) – dataset module from which to try to dynamically load the relevant module

  • dataset_class (Dataset) – If specified, the class to which to convert the resulting Dataset object; if not specified, try to load the class named via the class_name and module_name parameters

  • expectation_suite (string) – path to great_expectations expectation suite file

  • profiler (Profiler class) – profiler to use when creating the dataset (default is None)

Returns

great_expectations dataset

great_expectations.util.validate(data_asset, expectation_suite=None, data_asset_name=None, expectation_suite_name=None, data_context=None, data_asset_class_name=None, data_asset_module_name='great_expectations.dataset', data_asset_class=None, *args, **kwargs)

Validate the provided data asset. Validate can accept an optional data_asset_name to apply, data_context to use to fetch an expectation_suite if one is not provided, and data_asset_class_name/data_asset_module_name or data_asset_class to use to provide custom expectations.

Parameters
  • data_asset – the asset to validate

  • expectation_suite – the suite to use, or None to fetch one using a DataContext

  • data_asset_name – the name of the data asset to use

  • expectation_suite_name – the name of the expectation_suite to use

  • data_context – data context to use to fetch an an expectation suite, or the path from which to obtain one

  • data_asset_class_name – the name of a class to dynamically load a DataAsset class

  • data_asset_module_name – the name of the module to dynamically load a DataAsset class

  • data_asset_class – a class to use. overrides data_asset_class_name/ data_asset_module_name if provided

  • *args

  • **kwargs

Returns:

great_expectations.util.gen_directory_tree_str(startpath)

Print the structure of directory as a tree:

Ex: project_dir0/

AAA/ BBB/

aaa.txt bbb.txt

#Note: files and directories are sorted alphabetically, so that this method can be used for testing.

great_expectations.util.lint_code(code: str) → str

Lint strings of code passed in. Optional dependency “black” must be installed.

great_expectations.util.filter_properties_dict(properties: dict, keep_fields: Optional[list] = None, delete_fields: Optional[list] = None, clean_nulls: Optional[bool] = True, clean_falsy: Optional[bool] = False, keep_falsy_numerics: Optional[bool] = True, inplace: Optional[bool] = False) → Optional[dict]

Filter the entries of the source dictionary according to directives concerning the existing keys and values.

Parameters
  • properties – source dictionary to be filtered according to the supplied filtering directives

  • keep_fields – list of keys that must be retained, with the understanding that all other entries will be deleted

  • delete_fields – list of keys that must be deleted, with the understanding that all other entries will be retained

  • clean_nulls – If True, then in addition to other filtering directives, delete entries, whose values are None

  • clean_falsy – If True, then in addition to other filtering directives, delete entries, whose values are Falsy

  • the "clean_falsy" argument is specified at "True", then "clean_nulls" is assumed to be "True" as well.) ((If) –

  • inplace – If True, then modify the source properties dictionary; otherwise, make a copy for filtering purposes

  • keep_falsy_numerics – If True, then in addition to other filtering directives, do not delete zero-valued numerics

Returns

The (possibly) filtered properties dictionary (or None if no entries remain after filtering is performed)

great_expectations.util.is_numeric(value: Any) → bool
great_expectations.util.is_int(value: Any) → bool
great_expectations.util.is_float(value: Any) → bool
great_expectations.util.is_parseable_date(value: Any, fuzzy: bool = False) → bool
great_expectations.util.get_context()
great_expectations.util.is_sane_slack_webhook(url: str) → bool

Really basic sanity checking.

great_expectations.util.is_list_of_strings(_list) → bool
great_expectations.util.generate_library_json_from_registered_expectations()

Generate the JSON object used to populate the public gallery

great_expectations.util.delete_blank_lines(text: str) → str