Create an Expectation
An Expectation is a verifiable assertion about your data. Expectations make implicit assumptions about your data explicit, and they provide a flexible, declarative language for describing expected behavior. They can help you better understand your data and help you improve data quality.
Prerequisites
Procedure
- Instructions
- Sample code
-
Choose an Expectation to create.
GX comes with many built in Expectations to cover your data quality needs. You can find a catalog of these Expectations in the Expectation Gallery. When browsing the Expectation Gallery you can filter the available Expectations by the data quality issue they address and by the Data Sources they support. There is also a search bar that will let you filter Expectations by matching text in their name or description.
In your code, you will find the classes for Expectations in the
expectations
module:Pythonfrom great_expectations import expectations as gxe
-
Determine the Expectation's required parameters
To determine the parameters your Expectation uses to evaluate data, reference the Expectation's entry in the Expectation Gallery. Under the Args section you will find a list of parameters that are necessary for the Expectation to be evaluated, along with the a description of the value that should be provided.
Parameters that indicate a column, list of columns, table, Data Source, or severity must be provided when the Expectation is created. All other parameters can be set when the Expectation is created or be assigned a dictionary lookup that will allow them to be set at runtime.
-
Optional. Determine the Expectation's other parameters
In addition to the parameters that are required for an Expectation to evaluate data, Expectations also support some optional parameters. In the Expectations Gallery these are found under each Expectation's Other Parameters section.
These parameters are:
meta
: A dictionary of user-supplied metadata to store with an Expectation. This dictionary can be used to add notes about the purpose and intended use of an Expectation.mostly
: A special argument that allows for fuzzy validation based on a percentage of successfully validated rows. If the percentage is at least the value set in themostly
parameter, the Expectation will return asuccess
value oftrue
.severity
: Indicates the impact of the Expectation failing. Accepted values arecritical
,warning
, orinfo
. Defaults tocritical
if not explicitly set. You can trigger Actions based on severity levels or you can condition your data pipeline with theget_maximum_severity_failure
helper method in theExpectationSuiteValidationResult
class. Note that if an Expectation fails to execute, the failure will be recorded as critical, regardless of the Expectation configuration, to bring your attention to the fact that your data is not being tested as intended.
-
Create the Expectation.
Using the Expectation class you picked and the parameters you determined when referencing the Expectation Gallery, you can create your Expectation.
- Preset parameters
- Runtime parameters
In this example the
ExpectColumnMaxToBeBetween
Expectation is created and all of its parameters are defined in advance while leavingstrict_min
andstrict_max
as their default values:Pythonpreset_expectation = gx.expectations.ExpectColumnMaxToBeBetween(
column="passenger_count", min_value=1, max_value=6, severity="warning"
)Runtime parameters are provided by passing a dictionary to the
expectation_parameters
argument of a Checkpoint'srun()
method.To indicate which key in the
expectation_parameters
dictionary corresponds to a given parameter in an Expectation you define a lookup as the value of the parameter when the Expectation is created. This is done by passing in a dictionary with the key$PARAMETER
when the Expectation is created. The value associated with the$PARAMETER
key is the lookup used to find the parameter in the runtime dictionary.In this example,
ExpectColumnMaxToBeBetween
is created for both thepassenger_count
and thefare
fields, and the values formin_value
andmax_value
in each Expectation will be passed in at runtime. To differentiate between the parameters for each Expectation a more specific key is set for finding each parameter in the runtimeexpectation_parameters
dictionary:Pythonpassenger_expectation = gx.expectations.ExpectColumnMaxToBeBetween(
column="passenger_count",
min_value={"$PARAMETER": "expect_passenger_max_to_be_above"},
max_value={"$PARAMETER": "expect_passenger_max_to_be_below"},
)
fare_expectation = gx.expectations.ExpectColumnMaxToBeBetween(
column="fare",
min_value={"$PARAMETER": "expect_fare_max_to_be_above"},
max_value={"$PARAMETER": "expect_fare_max_to_be_below"},
)The runtime
expectation_parameters
dictionary for the above example would look like:Pythonruntime_expectation_parameters = {
"expect_passenger_max_to_be_above": 4,
"expect_passenger_max_to_be_below": 6,
"expect_fare_max_to_be_above": 10.00,
"expect_fare_max_to_be_below": 500.00,
}
import great_expectations as gx
context = gx.get_context()
# All Expectations are found in the `gx.expectations` module.
# This Expectation has all values set in advance:
preset_expectation = gx.expectations.ExpectColumnMaxToBeBetween(
column="passenger_count", min_value=1, max_value=6, severity="warning"
)
# In this case, two Expectations are created that will be passed
# parameters at runtime, and unique lookups are defined for each
# Expectations' parameters.
passenger_expectation = gx.expectations.ExpectColumnMaxToBeBetween(
column="passenger_count",
min_value={"$PARAMETER": "expect_passenger_max_to_be_above"},
max_value={"$PARAMETER": "expect_passenger_max_to_be_below"},
)
fare_expectation = gx.expectations.ExpectColumnMaxToBeBetween(
column="fare",
min_value={"$PARAMETER": "expect_fare_max_to_be_above"},
max_value={"$PARAMETER": "expect_fare_max_to_be_below"},
)
# A dictionary containing the parameters for both of the above
# Expectations would look like:
runtime_expectation_parameters = {
"expect_passenger_max_to_be_above": 4,
"expect_passenger_max_to_be_below": 6,
"expect_fare_max_to_be_above": 10.00,
"expect_fare_max_to_be_below": 500.00,
}