Skip to main content
Version: 1.3.0

Use SQL to define a custom Expectation

Among the available Expectations, the UnexpectedRowsExpectation is designed to facilitate the execution of SQL or Spark-SQL queries as the core logic for an Expectation. By default, UnexpectedRowsExpectation considers validation successful when no rows are returned by the provided SQL query.

Like any other Expectation, you can instantiate the UnexpectedRowsExpectation directly. You can also customize an UnexpectedRowsExpectation in essentially the same manner as you would define a custom Expectation, by subclassing UnexpectedRowsExpectation and providing customized default attributes and text for Data Docs. However, there are some caveats around the UnexpectedRowsExpectation's unexpected_rows_query attribute that deserve further detail.

Prerequisites

Procedure

  1. Create a new Expectation class that inherits the UnexpectedRowsExpectation class.

    The class name UnexpectedRowsExpectation describes the functionality of the Expectation: it finds rows with unexpected values. When you create a customized Expectation class you can provide a class name that is more indicative of your specific use case. In this example, the customized subclass of UnexpectedRowsExpectation will be used to find invalid passenger counts in taxi trip data:

    Python
    class ExpectPassengerCountToBeLegal(gx.expectations.UnexpectedRowsExpectation):
  2. Override the Expectation's unexpected_rows_query attribute.

    The unexpected_rows_query attribute is a SQL or Spark-SQL query that returns a selection of rows from the Batch of data being validated. By default, rows that are returned have failed the validation check.

    The unexpected_rows_query should be written in standard SQL or Spark-SQL syntax, except that it can also contain the special {batch} named query. When the Expectation is evaluated, the {batch} keyword will be replaced with the Batch of data that is configured for your Data Asset.

    In this example, unexpected_rows_query will select any rows where the passenger count is greater than 6 or less than 0. These rows will fail validation for this Expectation:

    Python
    class ExpectPassengerCountToBeLegal(gx.expectations.UnexpectedRowsExpectation):
    unexpected_rows_query: str = (
    "SELECT * FROM {batch} WHERE passenger_count > 6 or passenger_count < 0"
    )
  3. Customize the rendering of the new Expectation when displayed in Data Docs.

    As with other Expectations, the description attribute contains the text describing the customized Expectation when your results are rendered into Data Docs. It can be set when an Expectation class is defined or edited as an attribute of an Expectation instance. You can format the description string with Markdown syntax:

    Python
    class ExpectPassengerCountToBeLegal(gx.expectations.UnexpectedRowsExpectation):
    unexpected_rows_query: str = (
    "SELECT * FROM {batch} WHERE passenger_count > 6 or passenger_count < 0"
    )
    description: str = "There should be no more than **6** passengers."
  4. Use the customized subclass as an Expectation.

    Once the customized Expectation subclass has been defined, instances of it can be created, added to Expectation Suites, and validated just like any other Expectation class:

    Python
    expectation = ExpectPassengerCountToBeLegal()