great_expectations.validator.validator

Module Contents

Classes

Validator(execution_engine, interactive_evaluation=True, expectation_suite=None, expectation_suite_name=None, data_context=None, batches=None, **kwargs)

BridgeValidator(batch, expectation_suite, expectation_engine=None, **kwargs)

This is currently helping bridge APIs

Functions

_calc_validation_statistics(validation_results)

Calculate summary statistics for the validation results and

great_expectations.validator.validator.logger
class great_expectations.validator.validator.Validator(execution_engine, interactive_evaluation=True, expectation_suite=None, expectation_suite_name=None, data_context=None, batches=None, **kwargs)
__dir__(self)

This custom magic method is used to enable expectation tab completion on Validator objects. It also allows users to call Pandas.DataFrame methods on Validator objects

property expose_dataframe_methods(self)
__getattr__(self, name)
validate_expectation(self, name)

Given the name of an Expectation, obtains the Class-first Expectation implementation and utilizes the expectation’s validate method to obtain a validation result. Also adds in the runtime configuration

Args:

name (str): The name of the Expectation being validated

Returns:

The Expectation’s validation result

property execution_engine(self)

Returns the execution engine being used by the validator at the given time

list_available_expectation_types(self)

Returns a list of all expectations available to the validator

build_metric_dependency_graph(self, graph: ValidationGraph, child_node: MetricConfiguration, configuration: ExpectationConfiguration, execution_engine: ExecutionEngine, parent_node: Optional[MetricConfiguration] = None, runtime_configuration: Optional[dict] = None)

Obtain domain and value keys for metrics and proceeds to add these metrics to the validation graph until all metrics have been added.

graph_validate(self, configurations: List[ExpectationConfiguration], metrics: dict = None, runtime_configuration: dict = None)

Obtains validation dependencies for each metric using the implementation of their associated expectation, then proceeds to add these dependencies to the validation graph, supply readily available metric implementations to fulfill current metric requirements, and validate these metrics.

Args:

batches (Dict[str, Batch]): A Dictionary of batches and their corresponding names that will be used for Expectation Validation. configurations(List[ExpectationConfiguration]): A list of needed Expectation Configurations that will be used to supply domain and values for metrics. execution_engine (ExecutionEngine): An Execution Engine that will be used for extraction of metrics from the registry. metrics (dict): A list of currently registered metrics in the registry runtime_configuration (dict): A dictionary of runtime keyword arguments, controlling semantics such as the result_format.

Returns:

A list of Validations, validating that all necessary metrics are available.

resolve_validation_graph(self, graph, metrics, runtime_configuration=None)
_parse_validation_graph(self, validation_graph, metrics)

Given validation graph, returns the ready and needed metrics necessary for validation using a traversal of validation graph (a graph structure of metric ids) edges

_resolve_metrics(self, execution_engine: ExecutionEngine, metrics_to_resolve: Iterable[MetricConfiguration], metrics: Dict, runtime_configuration: dict = None)

A means of accessing the Execution Engine’s resolve_metrics method, where missing metric configurations are resolved

_initialize_expectations(self, expectation_suite=None, expectation_suite_name=None)

Instantiates _expectation_suite as empty by default or with a specified expectation config. In addition, this always sets the default_expectation_args to:

include_config: False, catch_exceptions: False, output_format: ‘BASIC’

By default, initializes data_asset_type to the name of the implementing class, but subclasses that have interoperable semantics (e.g. Dataset) may override that parameter to clarify their interoperability.

Parameters
  • expectation_suite (json) – A json-serializable expectation config. If None, creates default _expectation_suite with an empty list of expectations and key value data_asset_name as data_asset_name.

  • expectation_suite_name (string) – The name to assign to the expectation_suite.expectation_suite_name

Returns

None

append_expectation(self, expectation_config)

This method is a thin wrapper for ExpectationSuite.append_expectation

find_expectation_indexes(self, expectation_configuration: ExpectationConfiguration, match_type: str = 'domain')

This method is a thin wrapper for ExpectationSuite.find_expectation_indexes

find_expectations(self, expectation_configuration: ExpectationConfiguration, match_type: str = 'domain')

This method is a thin wrapper for ExpectationSuite.find_expectations()

remove_expectation(self, expectation_configuration: ExpectationConfiguration, match_type: str = 'domain', remove_multiple_matches: bool = False)

This method is a thin wrapper for ExpectationSuite.remove()

set_config_value(self, key, value)

Setter for config value

get_config_value(self, key)

Getter for config value

property batches(self)

Getter for batches

property active_batch(self)

Getter for active batch

property active_batch_spec(self)

Getter for active batch’s batch_spec

property active_batch_id(self)

Getter for active batch id

property active_batch_markers(self)

Getter for active batch’s batch markers

property active_batch_definition(self)

Getter for the active batch’s batch definition

discard_failing_expectations(self)

Removes any expectations from the validator where the validation has failed

get_default_expectation_arguments(self)

Fetch default expectation arguments for this data_asset

Returns

A dictionary containing all the current default expectation arguments for a data_asset

Ex:

{
    "include_config" : True,
    "catch_exceptions" : False,
    "result_format" : 'BASIC'
}

See also

set_default_expectation_arguments

property default_expectation_args(self)

A getter for default Expectation arguments

set_default_expectation_argument(self, argument, value)

Set a default expectation argument for this data_asset

Parameters
  • argument (string) – The argument to be replaced

  • value – The New argument to use for replacement

Returns

None

See also

get_default_expectation_arguments

get_expectations_config(self, discard_failed_expectations=True, discard_result_format_kwargs=True, discard_include_config_kwargs=True, discard_catch_exceptions_kwargs=True, suppress_warnings=False)

Returns an expectation configuration, providing an option to discard failed expectation and discard/ include’ different result aspects, such as exceptions and result format.

get_expectation_suite(self, discard_failed_expectations=True, discard_result_format_kwargs=True, discard_include_config_kwargs=True, discard_catch_exceptions_kwargs=True, suppress_warnings=False, suppress_logging=False)

Returns _expectation_config as a JSON object, and perform some cleaning along the way.

Parameters
  • discard_failed_expectations (boolean) – Only include expectations with success_on_last_run=True in the exported config. Defaults to True.

  • discard_result_format_kwargs (boolean) – In returned expectation objects, suppress the result_format parameter. Defaults to True.

  • discard_include_config_kwargs (boolean) – In returned expectation objects, suppress the include_config parameter. Defaults to True.

  • discard_catch_exceptions_kwargs (boolean) – In returned expectation objects, suppress the catch_exceptions parameter. Defaults to True.

  • suppress_warnings (boolean) – If true, do not include warnings in logging information about the operation.

  • suppress_logging (boolean) – If true, do not create a log entry (useful when using get_expectation_suite programmatically)

Returns

An expectation suite.

Note

get_expectation_suite does not affect the underlying expectation suite at all. The returned suite is a copy of _expectation_suite, not the original object.

save_expectation_suite(self, filepath=None, discard_failed_expectations=True, discard_result_format_kwargs=True, discard_include_config_kwargs=True, discard_catch_exceptions_kwargs=True, suppress_warnings=False)

Writes _expectation_config to a JSON file.

Writes the DataAsset’s expectation config to the specified JSON filepath. Failing expectations can be excluded from the JSON expectations config with discard_failed_expectations. The kwarg key-value pairs result_format, include_config, and catch_exceptions are optionally excluded from the JSON expectations config.

Parameters
  • filepath (string) – The location and name to write the JSON config file to.

  • discard_failed_expectations (boolean) – If True, excludes expectations that do not return success = True. If False, all expectations are written to the JSON config file.

  • discard_result_format_kwargs (boolean) – If True, the result_format attribute for each expectation is not written to the JSON config file.

  • discard_include_config_kwargs (boolean) – If True, the include_config attribute for each expectation is not written to the JSON config file.

  • discard_catch_exceptions_kwargs (boolean) – If True, the catch_exceptions attribute for each expectation is not written to the JSON config file.

  • suppress_warnings (boolean) – It True, all warnings raised by Great Expectations, as a result of dropped expectations, are suppressed.

validate(self, expectation_suite=None, run_id=None, data_context=None, evaluation_parameters=None, catch_exceptions=True, result_format=None, only_return_failures=False, run_name=None, run_time=None)

Generates a JSON-formatted report describing the outcome of all expectations.

Use the default expectation_suite=None to validate the expectations config associated with the DataAsset.

Parameters
  • expectation_suite (json or None) – If None, uses the expectations config generated with the DataAsset during the current session. If a JSON file, validates those expectations.

  • run_name (str) – Used to identify this validation result as part of a collection of validations. See DataContext for more information.

  • data_context (DataContext) – A datacontext object to use as part of validation for binding evaluation parameters and registering validation results.

  • evaluation_parameters (dict or None) – If None, uses the evaluation_paramters from the expectation_suite provided or as part of the data_asset. If a dict, uses the evaluation parameters in the dictionary.

  • catch_exceptions (boolean) – If True, exceptions raised by tests will not end validation and will be described in the returned report.

  • result_format (string or None) – If None, uses the default value (‘BASIC’ or as specified). If string, the returned expectation output follows the specified format (‘BOOLEAN_ONLY’,’BASIC’, etc.).

  • only_return_failures (boolean) – If True, expectation results are only returned when success = False

Returns

A JSON-formatted dictionary containing a list of the validation results. An example of the returned format:

{
  "results": [
    {
      "unexpected_list": [unexpected_value_1, unexpected_value_2],
      "expectation_type": "expect_*",
      "kwargs": {
        "column": "Column_Name",
        "output_format": "SUMMARY"
      },
      "success": true,
      "raised_exception: false.
      "exception_traceback": null
    },
    {
      ... (Second expectation results)
    },
    ... (More expectations results)
  ],
  "success": true,
  "statistics": {
    "evaluated_expectations": n,
    "successful_expectations": m,
    "unsuccessful_expectations": n - m,
    "success_percent": m / n
  }
}

Notes

If the configuration object was built with a different version of great expectations then the current environment. If no version was found in the configuration file.

Raises

AttributeError - if 'catch_exceptions'=None and an expectation throws an AttributeError

get_evaluation_parameter(self, parameter_name, default_value=None)

Get an evaluation parameter value that has been stored in meta.

Parameters
  • parameter_name (string) – The name of the parameter to store.

  • default_value (any) – The default value to be returned if the parameter is not found.

Returns

The current value of the evaluation parameter.

set_evaluation_parameter(self, parameter_name, parameter_value)

Provide a value to be stored in the data_asset evaluation_parameters object and used to evaluate parameterized expectations.

Parameters
  • parameter_name (string) – The name of the kwarg to be replaced at evaluation time

  • parameter_value (any) – The value to be used

add_citation(self, comment, batch_spec=None, batch_markers=None, batch_definition=None, citation_date=None)

Adds a citation to an existing Expectation Suite within the validator

property expectation_suite_name(self)

Gets the current expectation_suite name of this data_asset as stored in the expectations configuration.

test_expectation_function(self, function, *args, **kwargs)

Test a generic expectation function

Parameters
  • function (func) – The function to be tested. (Must be a valid expectation function.)

  • *args – Positional arguments to be passed the the function

  • **kwargs – Keyword arguments to be passed the the function

Returns

A JSON-serializable expectation result object.

Notes

This function is a thin layer to allow quick testing of new expectation functions, without having to define custom classes, etc. To use developed expectations from the command-line tool, you will still need to define custom classes, etc.

Check out How to create custom Expectations for more information.

great_expectations.validator.validator.ValidationStatistics
great_expectations.validator.validator._calc_validation_statistics(validation_results)

Calculate summary statistics for the validation results and return ExpectationStatistics.

class great_expectations.validator.validator.BridgeValidator(batch, expectation_suite, expectation_engine=None, **kwargs)

This is currently helping bridge APIs

get_dataset(self)

Bridges between Execution Engines in providing access to the batch data. Validates that Dataset classes contain proper type of data (i.e. a Pandas Dataset does not contain SqlAlchemy data)