Use SQL to define a custom Expectation
Among the available Expectations, the UnexpectedRowsExpectation
is designed to facilitate the execution of SQL queries as the core logic for an Expectation. By default, UnexpectedRowsExpectation
considers validation successful when no rows are returned by the provided SQL query.
Prerequisites
- Python version 3.9 to 3.12.
- An installation of GX Core.
- A preconfigured Data Context.
- Recommended. A preconfigured Data Source and Data Asset connected to your data for testing your customized Expectation.
Procedure
- Instructions
- Sample code
-
Determine your custom SQL query.
The
UnexpectedRowsExpectation
class takes anunexpected_rows_query
attribute, which is a SQL or Spark-SQL query that returns a selection of rows from the Batch of data being validated. By default, rows that are returned have failed the validation check.The custom SQL query should be written in the SQL dialect your database uses, except that it can also contain the special
{batch}
named query. When the Expectation is evaluated, the{batch}
keyword will be replaced with the Batch of data that is configured for your Data Asset.In this example, the custom query will select any rows where the passenger count is greater than
6
or less than0
:Pythonmy_query = """
SELECT
*
FROM
{batch}
WHERE
passenger_count > 6 or passenger_count < 0
""" -
Customize how the Expectation renders in Data Docs.
As with other Expectations, the
description
attribute contains the text describing the Expectation when your results are rendered into Data Docs. You can format thedescription
string with Markdown syntax:Pythonmy_description = "There should be no more than **6** passengers."
-
Create a new Expectation using the
UnexpectedRowsExpectation
class and your parameters.The class name
UnexpectedRowsExpectation
describes the functionality of the Expectation: it finds rows with unexpected values. When you create your Expectation, you can use a name that is more indicative of your specific use case. In this example, the customized Expectation will be used to find invalid passenger counts in taxi trip data:Pythonexpect_passenger_count_to_be_legal = gx.expectations.UnexpectedRowsExpectation(
unexpected_rows_query=my_query,
description=my_description,
) -
Use your custom SQL Expectation.
Now that you've created a custom SQL Expectation, you can add it to an Expectation Suite and validate it like any other Expectation.
import great_expectations as gx
# Define your custom SQL query.
my_query = """
SELECT
*
FROM
{batch}
WHERE
passenger_count > 6 or passenger_count < 0
"""
# Customize how the Expectation renders in Data Docs.
my_description = "There should be no more than **6** passengers."
# Create an Expectation using the UnexpectedRowsExpectation class and your parameters.
expect_passenger_count_to_be_legal = gx.expectations.UnexpectedRowsExpectation(
unexpected_rows_query=my_query,
description=my_description,
)
# Test the Expectation.
context = gx.get_context()
data_source_name = "my_sql_data_source"
data_asset_name = "my_data_asset"
batch_definition_name = "my_batch_definition"
batch = (
context.data_sources.get(data_source_name)
.get_asset(data_asset_name)
.get_batch_definition(batch_definition_name)
.get_batch()
)
batch.validate(expect_passenger_count_to_be_legal)