great_expectations

Subpackages

Package Contents

Classes

DataContext(context_root_dir=None, runtime_environment=None)

A DataContext represents a Great Expectations project. It organizes storage and access for

Functions

get_versions()

Get version information or return default if unable to do so.

from_pandas(pandas_df, class_name=’PandasDataset’, module_name=’great_expectations.dataset’, dataset_class=None, expectation_suite=None, profiler=None)

Read a Pandas data frame and return a great_expectations dataset.

measure_execution_time(func)

read_csv(filename, class_name=’PandasDataset’, module_name=’great_expectations.dataset’, dataset_class=None, expectation_suite=None, profiler=None, *args, **kwargs)

Read a file using Pandas read_csv and return a great_expectations dataset.

read_excel(filename, class_name=’PandasDataset’, module_name=’great_expectations.dataset’, dataset_class=None, expectation_suite=None, profiler=None, *args, **kwargs)

Read a file using Pandas read_excel and return a great_expectations dataset.

read_feather(filename, class_name=’PandasDataset’, module_name=’great_expectations.dataset’, dataset_class=None, expectation_suite=None, profiler=None, *args, **kwargs)

Read a file using Pandas read_feather and return a great_expectations dataset.

read_json(filename, class_name=’PandasDataset’, module_name=’great_expectations.dataset’, dataset_class=None, expectation_suite=None, accessor_func=None, profiler=None, *args, **kwargs)

Read a file using Pandas read_json and return a great_expectations dataset.

read_parquet(filename, class_name=’PandasDataset’, module_name=’great_expectations.dataset’, dataset_class=None, expectation_suite=None, profiler=None, *args, **kwargs)

Read a file using Pandas read_parquet and return a great_expectations dataset.

read_pickle(filename, class_name=’PandasDataset’, module_name=’great_expectations.dataset’, dataset_class=None, expectation_suite=None, profiler=None, *args, **kwargs)

Read a file using Pandas read_pickle and return a great_expectations dataset.

read_table(filename, class_name=’PandasDataset’, module_name=’great_expectations.dataset’, dataset_class=None, expectation_suite=None, profiler=None, *args, **kwargs)

Read a file using Pandas read_table and return a great_expectations dataset.

validate(data_asset, expectation_suite=None, data_asset_name=None, expectation_suite_name=None, data_context=None, data_asset_class_name=None, data_asset_module_name=’great_expectations.dataset’, data_asset_class=None, *args, **kwargs)

Validate the provided data asset. Validate can accept an optional data_asset_name to apply, data_context to use

great_expectations.get_versions()

Get version information or return default if unable to do so.

great_expectations.__version__
class great_expectations.DataContext(context_root_dir=None, runtime_environment=None)

Bases: great_expectations.data_context.data_context.BaseDataContext

A DataContext represents a Great Expectations project. It organizes storage and access for expectation suites, datasources, notification settings, and data fixtures.

The DataContext is configured via a yml file stored in a directory called great_expectations; the configuration file as well as managed expectation suites should be stored in version control.

Use the create classmethod to create a new empty config, or instantiate the DataContext by passing the path to an existing data context root directory.

DataContexts use data sources you’re already familiar with. BatchKwargGenerators help introspect data stores and data execution frameworks (such as airflow, Nifi, dbt, or dagster) to describe and produce batches of data ready for analysis. This enables fetching, validation, profiling, and documentation of your data in a way that is meaningful within your existing infrastructure and work environment.

DataContexts use a datasource-based namespace, where each accessible type of data has a three-part normalized data_asset_name, consisting of datasource/generator/data_asset_name.

  • The datasource actually connects to a source of materialized data and returns Great Expectations DataAssets connected to a compute environment and ready for validation.

  • The BatchKwargGenerator knows how to introspect datasources and produce identifying “batch_kwargs” that define particular slices of data.

  • The data_asset_name is a specific name – often a table name or other name familiar to users – that batch kwargs generators can slice into batches.

An expectation suite is a collection of expectations ready to be applied to a batch of data. Since in many projects it is useful to have different expectations evaluate in different contexts–profiling vs. testing; warning vs. error; high vs. low compute; ML model or dashboard–suites provide a namespace option for selecting which expectations a DataContext returns.

In many simple projects, the datasource or batch kwargs generator name may be omitted and the DataContext will infer the correct name when there is no ambiguity.

Similarly, if no expectation suite name is provided, the DataContext will assume the name “default”.

classmethod create(cls, project_root_dir=None, usage_statistics_enabled=True, runtime_environment=None)

Build a new great_expectations directory and DataContext object in the provided project_root_dir.

create will not create a new “great_expectations” directory in the provided folder, provided one does not already exist. Then, it will initialize a new DataContext in that folder and write the resulting config.

Parameters
  • project_root_dir – path to the root directory in which to create a new great_expectations directory

  • runtime_environment – a dictionary of config variables that

  • both those set in config_variables.yml and the environment (override) –

Returns

DataContext

classmethod all_uncommitted_directories_exist(cls, ge_dir)

Check if all uncommitted direcotries exist.

classmethod config_variables_yml_exist(cls, ge_dir)

Check if all config_variables.yml exists.

classmethod write_config_variables_template_to_disk(cls, uncommitted_dir)
classmethod write_project_template_to_disk(cls, ge_dir, usage_statistics_enabled=True)
classmethod scaffold_directories(cls, base_dir)

Safely create GE directories for a new project.

classmethod scaffold_custom_data_docs(cls, plugins_dir)

Copy custom data docs templates

classmethod scaffold_notebooks(cls, base_dir)

Copy template notebooks into the notebooks directory for a project.

_load_project_config(self)

Reads the project configuration from the project configuration file. The file may contain ${SOME_VARIABLE} variables - see self._project_config_with_variables_substituted for how these are substituted.

Returns

the configuration object read from the file

list_checkpoints(self)

List checkpoints. (Experimental)

get_checkpoint(self, checkpoint_name: str)

Load a checkpoint. (Experimental)

_list_ymls_in_checkpoints_directory(self)
_save_project_config(self)

Save the current project to disk.

add_store(self, store_name, store_config)

Add a new Store to the DataContext and (for convenience) return the instantiated Store object.

Parameters
  • store_name (str) – a key for the new Store in in self._stores

  • store_config (dict) – a config for the Store to add

Returns

store (Store)

add_datasource(self, name, **kwargs)

Add a new datasource to the data context, with configuration provided as kwargs. :param name: the name for the new datasource to add :param initialize: if False, add the datasource to the config, but do not

initialize it, for example if a user needs to debug database connectivity.

Parameters

kwargs (keyword arguments) – the configuration for the new datasource

Returns

datasource (Datasource)

classmethod find_context_root_dir(cls)
classmethod get_ge_config_version(cls, context_root_dir=None)
classmethod set_ge_config_version(cls, config_version, context_root_dir=None, validate_config_version=True)
classmethod find_context_yml_file(cls, search_start_dir=None)

Search for the yml file starting here and moving upward.

classmethod does_config_exist_on_disk(cls, context_root_dir)

Return True if the great_expectations.yml exists on disk.

classmethod is_project_initialized(cls, ge_dir)

Return True if the project is initialized.

To be considered initialized, all of the following must be true: - all project directories exist (including uncommitted directories) - a valid great_expectations.yml is on disk - a config_variables.yml is on disk - the project has at least one datasource - the project has at least one suite

classmethod does_project_have_a_datasource_in_config_file(cls, ge_dir)
classmethod _does_context_have_at_least_one_datasource(cls, ge_dir)
classmethod _does_context_have_at_least_one_suite(cls, ge_dir)
classmethod _attempt_context_instantiation(cls, ge_dir)
static _validate_checkpoint(checkpoint: dict, checkpoint_name: str)
great_expectations.from_pandas(pandas_df, class_name='PandasDataset', module_name='great_expectations.dataset', dataset_class=None, expectation_suite=None, profiler=None)

Read a Pandas data frame and return a great_expectations dataset.

Parameters
  • pandas_df (Pandas df) – Pandas data frame

  • class_name (str) – class to which to convert resulting Pandas df

  • module_name (str) – dataset module from which to try to dynamically load the relevant module

  • dataset_class (Dataset) – If specified, the class to which to convert the resulting Dataset object; if not specified, try to load the class named via the class_name and module_name parameters

  • expectation_suite (string) – path to great_expectations expectation suite file

  • profiler (profiler class) – The profiler that should be run on the dataset to establish a baseline expectation suite.

Returns

great_expectations dataset

great_expectations.measure_execution_time(func) → Callable
great_expectations.read_csv(filename, class_name='PandasDataset', module_name='great_expectations.dataset', dataset_class=None, expectation_suite=None, profiler=None, *args, **kwargs)

Read a file using Pandas read_csv and return a great_expectations dataset.

Parameters
  • filename (string) – path to file to read

  • class_name (str) – class to which to convert resulting Pandas df

  • module_name (str) – dataset module from which to try to dynamically load the relevant module

  • dataset_class (Dataset) – If specified, the class to which to convert the resulting Dataset object; if not specified, try to load the class named via the class_name and module_name parameters

  • expectation_suite (string) – path to great_expectations expectation suite file

  • profiler (Profiler class) – profiler to use when creating the dataset (default is None)

Returns

great_expectations dataset

great_expectations.read_excel(filename, class_name='PandasDataset', module_name='great_expectations.dataset', dataset_class=None, expectation_suite=None, profiler=None, *args, **kwargs)

Read a file using Pandas read_excel and return a great_expectations dataset.

Parameters
  • filename (string) – path to file to read

  • class_name (str) – class to which to convert resulting Pandas df

  • module_name (str) – dataset module from which to try to dynamically load the relevant module

  • dataset_class (Dataset) – If specified, the class to which to convert the resulting Dataset object; if not specified, try to load the class named via the class_name and module_name parameters

  • expectation_suite (string) – path to great_expectations expectation suite file

  • profiler (Profiler class) – profiler to use when creating the dataset (default is None)

Returns

great_expectations dataset or ordered dict of great_expectations datasets, if multiple worksheets are imported

great_expectations.read_feather(filename, class_name='PandasDataset', module_name='great_expectations.dataset', dataset_class=None, expectation_suite=None, profiler=None, *args, **kwargs)

Read a file using Pandas read_feather and return a great_expectations dataset.

Parameters
  • filename (string) – path to file to read

  • class_name (str) – class to which to convert resulting Pandas df

  • module_name (str) – dataset module from which to try to dynamically load the relevant module

  • dataset_class (Dataset) – If specified, the class to which to convert the resulting Dataset object; if not specified, try to load the class named via the class_name and module_name parameters

  • expectation_suite (string) – path to great_expectations expectation suite file

  • profiler (Profiler class) – profiler to use when creating the dataset (default is None)

Returns

great_expectations dataset

great_expectations.read_json(filename, class_name='PandasDataset', module_name='great_expectations.dataset', dataset_class=None, expectation_suite=None, accessor_func=None, profiler=None, *args, **kwargs)

Read a file using Pandas read_json and return a great_expectations dataset.

Parameters
  • filename (string) – path to file to read

  • class_name (str) – class to which to convert resulting Pandas df

  • module_name (str) – dataset module from which to try to dynamically load the relevant module

  • dataset_class (Dataset) – If specified, the class to which to convert the resulting Dataset object; if not specified, try to load the class named via the class_name and module_name parameters

  • expectation_suite (string) – path to great_expectations expectation suite file

  • accessor_func (Callable) – functions to transform the json object in the file

  • profiler (Profiler class) – profiler to use when creating the dataset (default is None)

Returns

great_expectations dataset

great_expectations.read_parquet(filename, class_name='PandasDataset', module_name='great_expectations.dataset', dataset_class=None, expectation_suite=None, profiler=None, *args, **kwargs)

Read a file using Pandas read_parquet and return a great_expectations dataset.

Parameters
  • filename (string) – path to file to read

  • class_name (str) – class to which to convert resulting Pandas df

  • module_name (str) – dataset module from which to try to dynamically load the relevant module

  • dataset_class (Dataset) – If specified, the class to which to convert the resulting Dataset object; if not specified, try to load the class named via the class_name and module_name parameters

  • expectation_suite (string) – path to great_expectations expectation suite file

  • profiler (Profiler class) – profiler to use when creating the dataset (default is None)

Returns

great_expectations dataset

great_expectations.read_pickle(filename, class_name='PandasDataset', module_name='great_expectations.dataset', dataset_class=None, expectation_suite=None, profiler=None, *args, **kwargs)

Read a file using Pandas read_pickle and return a great_expectations dataset.

Parameters
  • filename (string) – path to file to read

  • class_name (str) – class to which to convert resulting Pandas df

  • module_name (str) – dataset module from which to try to dynamically load the relevant module

  • dataset_class (Dataset) – If specified, the class to which to convert the resulting Dataset object; if not specified, try to load the class named via the class_name and module_name parameters

  • expectation_suite (string) – path to great_expectations expectation suite file

  • profiler (Profiler class) – profiler to use when creating the dataset (default is None)

Returns

great_expectations dataset

great_expectations.read_table(filename, class_name='PandasDataset', module_name='great_expectations.dataset', dataset_class=None, expectation_suite=None, profiler=None, *args, **kwargs)

Read a file using Pandas read_table and return a great_expectations dataset.

Parameters
  • filename (string) – path to file to read

  • class_name (str) – class to which to convert resulting Pandas df

  • module_name (str) – dataset module from which to try to dynamically load the relevant module

  • dataset_class (Dataset) – If specified, the class to which to convert the resulting Dataset object; if not specified, try to load the class named via the class_name and module_name parameters

  • expectation_suite (string) – path to great_expectations expectation suite file

  • profiler (Profiler class) – profiler to use when creating the dataset (default is None)

Returns

great_expectations dataset

great_expectations.validate(data_asset, expectation_suite=None, data_asset_name=None, expectation_suite_name=None, data_context=None, data_asset_class_name=None, data_asset_module_name='great_expectations.dataset', data_asset_class=None, *args, **kwargs)

Validate the provided data asset. Validate can accept an optional data_asset_name to apply, data_context to use to fetch an expectation_suite if one is not provided, and data_asset_class_name/data_asset_module_name or data_asset_class to use to provide custom expectations.

Parameters
  • data_asset – the asset to validate

  • expectation_suite – the suite to use, or None to fetch one using a DataContext

  • data_asset_name – the name of the data asset to use

  • expectation_suite_name – the name of the expectation_suite to use

  • data_context – data context to use to fetch an an expectation suite, or the path from which to obtain one

  • data_asset_class_name – the name of a class to dynamically load a DataAsset class

  • data_asset_module_name – the name of the module to dynamically load a DataAsset class

  • data_asset_class – a class to use. overrides data_asset_class_name/ data_asset_module_name if provided

  • *args

  • **kwargs

Returns:

great_expectations.rtd_url_ge_version