Skip to main content

Evaluation Parameters

Often, the specific parameters associated with an Expectation will be derived from upstream steps in a processing pipeline. For example, we may want to expect_table_row_count_to_equal a value stored in a previous step.

Great Expectations makes it possible to use "Evaluation Parameters" to accomplish that goal. We declare Expectations using parameters that need to be provided at validation time. During interactive development, we can even provide a temporary value that should be used during the initial evaluation of the Expectation.

my_df.expect_table_row_count_to_equal(    value={"$PARAMETER": "upstream_row_count", "$PARAMETER.upstream_row_count": 10},    result_format={'result_format': 'BOOLEAN_ONLY'})

This will return {'success': True}.

More typically, when validating Expectations, you can provide Evaluation Parameters that are only available at runtime:

my_df.validate(    expectation_suite=my_dag_step_config,     evaluation_parameters={"upstream_row_count": upstream_row_count})

Evaluation Parameter expressions#

In many cases, Evaluation Parameters are most useful when they can allow a range of values. For example, we might want to specify that a new table's row count should be between 90 - 110 % of an upstream table's row count (or a count from a previous run). Evaluation parameters support basic arithmetic expressions to accomplish that goal:

my_df.expect_table_row_count_to_be_between(    min_value={"$PARAMETER": "trunc(upstream_row_count * 0.9)"},    max_value={"$PARAMETER": "trunc(upstream_row_count * 1.1)"},     result_format={'result_format': 'BOOLEAN_ONLY'})

This will return {'success': True}.

Evaluation Parameters are not limited to simple values, for example you could include a list as a parameter value:

my_df.expect_column_values_to_be_in_set(    "my_column",     value_set={"$PARAMETER": "runtime_values"})my_df.validate(    evaluation_parameters={"runtime_values": [1, 2, 3]})

However, it is not possible to mix complex values with arithmetic expressions.

Storing Evaluation Parameters#

Data Context Evaluation Parameter Store#

A Data Context can automatically identify and store Evaluation Parameters that are referenced in other Expectation Suites. The Evaluation Earameter Store uses a URN schema for identifying dependencies between Expectation Suites.

The Data Context-recognized URN must begin with the string urn:great_expectations. Valid URNs must have one of the following structures to be recognized by the Great Expectations Data Context:

urn:great_expectations:validations:<expectation_suite_name>:<metric_name>urn:great_expectations:validations:<expectation_suite_name>:<metric_name>:<metric_kwargs_id>

Replace names in <> with the desired name. For example:

urn:great_expectations:validations:dickens_data:expect_column_proportion_of_unique_values_to_be_between.result.observed_value:column=Title

Storing Parameters in an Expectation Suite#

You can also store parameter values in a special dictionary called evaluation_parameters that is stored in the expectation_suite to be available to multiple Expectations or while declaring additional Expectations:

my_df.set_evaluation_parameter("upstream_row_count", 10)my_df.get_evaluation_parameter("upstream_row_count")

If a parameter has been stored, then it does not need to be provided for a new expectation to be declared:

my_df.set_evaluation_parameter("upstream_row_count", 10)my_df.expect_table_row_count_to_be_between(    max_value={"$PARAMETER": "upstream_row_count"})