Skip to main content
Version: 0.18.9

Create example cases for a Custom Expectation

This guide will help you add example cases to document and test the behavior of your ExpectationA verifiable assertion about data..

Prerequisites

Example cases in Great Expectations serve a dual purpose:

  • First, they help the users of the Expectation understand its logic by providing examples of input data that the Expectation will evaluate.
  • Second, they provide test cases that the Great Expectations testing framework can execute automatically.

If you decide to contribute your Expectation, its entry in the Expectations Gallery will render these examples.

We will explain the structure of these tests using the Custom ExpectationAn extension of the `Expectation` class, developed outside of the Great Expectations library. implemented in our guide on how to create Custom Column Aggregate Expectations.

Decide which tests you want to implement

Expectations can have a robust variety of possible applications. We want to create tests that demonstrate (and verify) the capabilities and limitations of our Custom Expectation.

Details

What kind of tests can I create? These tests can include examples intended to pass, fail, or error out, and expected results can be as open-ended as {"success": False}, or as granular as:

{ "success": True, "expectation_config": { "expectation_type": "expect_column_value_z_scores_to_be_less_than", "kwargs": { "column": "a", "mostly": 0.9, "threshold": 4, "double_sided": True, }, "meta": {}, }, "result": { "element_count": 6, "unexpected_count": 0, "unexpected_percent": 0.0, "partial_unexpected_list": [], "missing_count": 0, "missing_percent": 0.0, "unexpected_percent_total": 0.0, "unexpected_percent_nonmissing": 0.0, }, "exception_info": { "raised_exception": False, "exception_traceback": None, "exception_message": None, } }

At a minimum, we want to create tests that show what our Custom Expectation will and will not do.

These basic positive and negative example cases are the minimum amount of test coverage required for a Custom Expectation to be accepted into the Great Expectations codebase at an Experimental level.

To begin with, let's implement those two basic tests: one positive example case, and one negative example case.

Define your data

Search for examples = [] in the template file you are modifying for your new Custom Expectation.

We're going to populate examples with a list of example cases.

Details

What is an example case? Each example is a dictionary with two keys:

  • data: defines the input data of the example as a table/dataframe.
  • tests: a list of test cases that use the data defined above as input to validate against.
    • title: a descriptive name for the test case. Make sure to have no spaces.
    • include_in_gallery: set it to True if you want this test case to be visible in the gallery as an example (true for most test cases).
    • in: contains exactly the parameters that you want to pass in to the Expectation. "in": {"column": "x", "min_value": 4} would be equivalent to expect_column_max_to_be_between_custom(column="x", min_value=4)
    • out: indicates the results the test requires from the ValidationResult needed to pass.
    • exact_match_out: if you set exact_match_out=False, then you don’t need to include all the elements of the result object - only the ones that are important to test, such as {"success": True}.

In our example, data will have two columns, "x" and "y", each with five rows. If you define multiple columns, make sure that they have the same number of rows. When possible, include test data and tests that includes null values (None in the Python test definition).

"data": {"x": [1, 2, 3, 4, 5], "y": [0, -1, -2, 4, None]},

When you define data in your examples, we will mostly guess the type of the columns. Sometimes you need to specify the precise type of the columns for each backend. Then you use the schemas attribute (on the same level as data and tests in the dictionary):

"schemas": {
"spark": {
"x": "IntegerType",
},
"sqlite": {
"x": "INTEGER",
},
info

While Pandas is fairly flexible in typing, Spark and many SQL dialects are much more strict.

You may find you wish to use data that is incompatible with a given backend, or write different individual tests for different backends. To do this, you can use the only_for attribute, which accepts a list containing pandas, spark, sqlite, a SQL dialect, or a combination of any of the above:

"only_for": ["spark", "pandas"]

Passing this attribute on the same level as data, tests, and schemas will tell Great Expectations to only instantiate the data specified in that example for the given backend, ensuring you don't encounter any backend-related errors relating to data before your Custom Expectation can even be tested:

Passing this attribute within a test (at the same level as title, in, out, etc.) will execute that individual test only for that specified backend.

Define your tests

In our example, tests will be a list containing dictionaries defining each test.

You will need to:

  1. Title your tests (title)
  2. Define the input for your tests (in)
  3. Decide how precisely you want to test the output of your tests (exact_match_out)
  4. Define the expected output for your tests (out)

If you are interested in contributing your Custom Expectation back to Great Expectations, you will also need to decide if you want these tests publicly displayed to demonstrate the functionality of your Custom Expectation (include_in_gallery).

Python
examples = [
{
"data": {"x": [1, 2, 3, 4, 5], "y": [0, -1, -2, 4, None]},
"only_for": ["pandas", "spark", "sqlite", "postgresql"],
"tests": [
{
"title": "basic_positive_test",
"exact_match_out": False,
"include_in_gallery": True,
"in": {
"column": "x",
"min_value": 4,
"strict_min": True,
"max_value": 5,
"strict_max": False,
},
"out": {"success": True},
},
{
"title": "basic_negative_test",
"exact_match_out": False,
"include_in_gallery": True,
"in": {
"column": "y",
"min_value": -2,
"strict_min": False,
"max_value": 3,
"strict_max": True,
},
"out": {"success": False},
},
],
}
]
note

The optional only_for and suppress_test_for keys can be specified at the top-level (next to data and tests) or within specific tests (next to title, and so on).

Allowed backends include: "bigquery", "mssql", "mysql", "pandas", "postgresql", "redshift", "snowflake", "spark", "sqlite", "trino"

Details

Can I test for errors? Yes! If you would like to define an example case illustrating when your Custom Expectation should throw an error, you can pass an empty out key, and include an error key defining a traceback_substring.



For example:



"out": {}, "error": { "traceback_substring" : "TypeError: Column values, min_value, and max_value must either be None or of the same type." }

Verify your tests

If you now run your file, print_diagnostic_checklist() will attempt to execute these example cases.

If the tests are correctly defined, and the rest of the logic in your Custom Expectation is already complete, you will see the following in your Diagnostic Checklist:

✔ Has at least one positive and negative example case, and all test cases pass

Congratulations!
🎉 You've successfully created example cases & tests for a Custom Expectation! 🎉

Contribution (Optional)

This guide will leave you with test coverage sufficient for contribution back to Great Expectations at an Experimental level.

If you're interested in having your contribution accepted at a Beta level, these tests will need to pass for all supported backends (Pandas, Spark, & SQLAlchemy).

For full acceptance into the Great Expectations codebase at a Production level, we require a more robust test suite. If you believe your Custom Expectation is otherwise ready for contribution at a Production level, please submit a Pull Request, and we will work with you to ensure adequate testing.

note

For more information on our code standards and contribution, see our guide on Levels of Maturity for Expectations.

To view the full script used in this page, see it on GitHub: