Skip to main content
Version: 1.9.0

Restrict an Expectation to specific rows

By default, Expectations apply to the entire dataset retrieved in a Batch. However, there are instances when an Expectation may not be relevant for every row. Validating every row could lead to false positives or false negatives in the Validation Results.

For example, you might define an Expectation that a column indicating the country of origin for a product should not be null. If this Expectation is only applicable when the product is an import, applying it to every row in the Batch could result in many false negatives when the country of origin column is null for products produced locally.

To address this issue, GX Core allows you to restrict Expectations to apply to only a subset of the data retrieved in a Batch.

Create an Expectation with row conditions

To restrict an Expectation to a subset of the data retrieved in a Batch, use the row_condition argument. The row_condition argument takes a boolean expression built with Python objects. Rows will be validated for the Expectation when the row_condition expression evaluates to True. Conversely, if the row_condition evaluates to False, the corresponding row will not be validated for the Expectation.

Prerequisites

Procedure

  1. Determine the row_condition expression.

    To support complex business use cases, logical clauses can be combined with AND / OR relationships within the row_condition argument.

    Python
    from great_expectations.expectations.row_conditions import Column

    # Create condition statements with column references and Python comparisons.
    statement_1 = Column("tenure") > 2
    statement_2 = Column("salary") <= 50000
    statement_3 = Column("department") == "Sales"

    # Combine condition statements with an AND relationship into condition blocks.
    block_1 = statement_1 & statement_2
    block_2 = statement_3

    # Combine condition blocks with OR.
    row_condition = block_1 | block_2

    Here are some examples of how to create common patterns in row conditions:

    • A and B.

      Python
      # Two condition statements within a single condition block.

      statement_1 = Column("A") == "a"
      statement_2 = Column("B") == "b"

      block_1 = statement_1 & statement_2

      row_condition = block_1
    • A or B.

      Python
      # Two condition statements, each in its own condition block.

      statement_1 = Column("A") == "a"
      statement_2 = Column("B") == "b"

      block_1 = statement_1
      block_2 = statement_2

      row_condition = block_1 | block_2
    • (A and B) or (C and D).

      Python
      # Two condition statements in one condition block and two statements in another block.

      statement_1 = Column("A") == "a"
      statement_2 = Column("B") == "b"
      statement_3 = Column("C") == "c"
      statement_4 = Column("D") == "d"

      block_1 = statement_1 & statement_2
      block_2 = statement_3 & statement_4

      row_condition = block_1 | block_2
    • A and (B or C). This pattern is not supported verbatim, but you can achieve the same result with (A and B) or (A and C).

      Python
      # Two condition statements in one condition block and two statements in another block.

      statement_1 = Column("A") == "a"
      statement_2 = Column("B") == "b"
      statement_3 = Column("C") == "c"

      block_1 = statement_1 & statement_2
      block_2 = statement_1 & statement_3

      row_condition = block_1 | block_2

    The following comparison operators are supported: ==, !=, >, <, >=, <=, is_in, is_not_in, is_null, is_not_null. Here are some examples of using different kinds of operators:

    Python
    # Single value comparisons: ==, !=, >, <, >=, <=
    statement_1 = Column("count") == 1
    statement_2 = Column("date") > datetime(year=2025, month=1, day=31, tzinfo=timezone.utc)

    # Set comparisons: is_in, is_not_in
    statement_3 = Column("department").is_in(["sales", "finance"])

    # Nullity checks: is_null, is_not_null
    statement_4 = Column("name").is_null()
  2. Configure the Expectation.

    Python
    # Add the `row_condition` parameter alongside the Expectation's other arguments.
    expectation = gx.expectations.ExpectColumnValuesToBeBetween(
    column="bonus", min_value=5000, max_value=10000, row_condition=row_condition
    )
  3. Optional. Configure additional variations of the Expectation.

    Expectations that have different row conditions are treated as unique, even if they are of the same type, apply to the same column, and belong to the same Expectation Suite. This allows you to validate your data through multiple lenses.

    For instance, the following code establishes an Expectation that the value in the cycle_type column is either unicycle, bicycle, or tricycle.

    Python
    expectation_without_row_conditions = (
    gx.expectations.ExpectColumnDistinctValuesToBeInSet(
    column="cycle_type", value_set=["unicycle", "bicycle", "tricycle"]
    )
    )

    While, for example, the following code creates an Expectation that the value of the cycle_type column is unicycle if the item has one wheel.

    Python
    expectation_with_row_conditions = gx.expectations.ExpectColumnValuesToBeInSet(
    column="cycle_type",
    value_set=["unicycle"],
    row_condition=(Column("wheels") == 1),
    )

Now you can add your Expectations to an Expectation Suite.

Row conditions in Data Docs

If an Expectation has row conditions, this will be indicated in the Data Docs. Each Expectation with row conditions is prefaced with if row_condition, then values must ... as illustrated in the following example:

if PClass==&quot;1st&quot;, then values must belong to this set: 1.

If the row_condition is a complex expression, it will be divided into several components to enhance readability.

Scope and limitations

Keep the following in mind when working with row conditions:

  • An Expectation can have up to 100 condition statements grouped in any number of condition blocks.
  • The following Expectations do not accept the row_condition argument:
    • expect_column_to_exist
    • expect_query_results_to_match_comparison
    • expect_table_columns_to_match_ordered_list
    • expect_table_columns_to_match_set
    • expect_table_column_count_to_be_between
    • expect_table_column_count_to_equal
    • unexpected_rows_expectation