Version: 0.18.21

Create Expectations interactively with Python

To Validate data we must first define a set of Expectations for that data to be Validated against. In this guide, you'll learn how to create Expectations and interactively edit them with feedback from Validating each against a Batch of data. Validating your Expectations as you define them allows you to quickly determine if the Expectations are suitable for our data, and identify where changes might be necessary.

Does this process edit my data?

No. The interactive method used to create and edit Expectations does not edit or alter the Batch data.

Prerequisites

Great Expectations installed in a Python environment
A Filesystem Data Context for your Expectations
Created a Data Source from which to request a Batch of data for introspection

Import the Great Expectations module and instantiate a Data Context

For this guide we will be working with Python code in a Jupyter Notebook. Jupyter is included with GX and lets us easily edit code and immediately see the results of our changes.

Run the following code to import Great Expectations and instantiate a Data Context:

Python
import great_expectations as gx

context = gx.get_context()

Data Contexts and persisting data

If you're using an Ephemeral Data Context, your configurations will not persist beyond the current Python session. However, if you're using a Filesystem or Cloud Data Context, they do persist. The get_context() method returns the first Cloud or Filesystem Data Context it can find. If a Cloud or Filesystem Data Context has not be configured or cannot be found, it provides an Ephemeral Data Context. For more information about the get_context() method, see Instantiate a Data Context.

Use an existing Data Asset to create a Batch Request

Add the following method to retrieve a previously configured Data Asset from the Data Context you initialized and create a Batch Request to identify the Batch of data that you'll use to validate your Expectations:

Python
data_asset = context.get_datasource("my_datasource").get_asset("my_data_asset")
batch_request = data_asset.build_batch_request()

Limit the Batches returned by a Batch Request

You can provide a dictionary as the options parameter of build_batch_request() to limit the Batches returned by a Batch Request. If you leave the options parameter empty, your Batch Request will include all the Batches configured in the corresponding Data Asset. For more information about Batch Requests, see How to request data from a Data Asset.

Create a Validator

When you use a Validator to interactively create your Expectations, the Validator needs two parameters. One parameter identifies the Batch that contains the data that is used to Validate the Expectations. The second parameter provides a name for the combined list of Expectations you create.

Working outside a Jupyter Notebook

If you're using a Jupyter Notebook you'll automatically see the results of the code you run in a new cell when you run the code. If you're using a different interpreter, you might need to explicitly print these results to view them. For example:

Python
print(validator.head())

Optional. Run the following command if you haven't created an Expectation Suite:

Python
context.add_or_update_expectation_suite("my_expectation_suite")
# Optional. Run assert "my_expectation_suite" in context.list_expectation_suite_names() to veriify the Expectation Suite was created.

Run the following command to create a Validator:

Python
validator = context.get_validator(
    batch_request=batch_request,
    expectation_suite_name="my_expectation_suite",
)
validator.head()

Use the Validator to create and run an Expectation

The Validator provides access to all the available Expectations as methods. When an expect_*() method is run from the Validator, the Validator adds the specified Expectation to an Expectation Suite (or edits an existing Expectation in the Expectation Suite, if applicable) in its configuration, and then the specified Expectation is run against the data that was provided when the Validator was initialized with a Batch Request.

Python
validator.expect_column_values_to_not_be_null(column="vendor_id")

Since we are working in a Python interpreter, the results of the Validation are printed after we run an expect_*() method. We can examine those results to determine if the Expectation needs to be edited.

Working outside a Jupyter Notebook

If you are not working in a Python interpreter you may need to explicitly print your results:

Python
expectation_validation_result = validator.expect_column_values_to_not_be_null(
    column="vendor_id"
)
print(expectation_validation_result)

Edit Expectations or create additional Expectations (Optional)

If you choose to edit an Expectation after you've viewed the Validation Results that were returned when it was created, you can do so by running the validator.expect_*() method with different parameters than you supplied previously. You can also have the Validator run an entirely different expect_*() method and create additional Expectations. All the Expectations that you create are stored in a list in the Validator's in-memory configuration.

What if I want to use the same Expectation more than once?

GX takes into account certain parameters when determining if an Expectation is being added to the list or if an existing Expectation should be edited. For example, if you are created an Expectation with a method such as expect_column_*() you could later edit it by providing the same column parameter when running the expect_column_*() method a second time, and different values for any other parameters. However, if you ran the same expect_column_*() method and provided a different column parameter, you will create an additional instance of the Expectation for the new column value, rather than overwrite the Expectation you defined with the first column value.

Save your Expectations for future use (Optional)

The Expectations you create with the interactive method are saved in an Expectation Suite on the Validator object. Validators do not persist outside the current Python session and for this reason these Expectations will not be kept unless you save them to your Data Context. This can be ideal if you are using a Validator for quick data validation and exploration, but in most cases you'll want to reuse your newly created Expectation Suite in future Python sessions.

To keep your Expectations for future use, you save them to your Data Context. A Filesystem or Cloud Data Context persists outside the current Python session, so saving the Expectation Suite in your Data Context's Expectations Store ensures you can access it in the future:

Python
validator.save_expectation_suite(discard_failed_expectations=False)

Ephemeral Data Contexts and persistence

Ephemeral Data Contexts don't persist beyond the current Python session. If you're working with an Ephemeral Data Context, you'll need to convert it to a Filesystem Data Context using the Data Context's convert_to_file_context() method. Otherwise, your saved configurations won't be available in future Python sessions as the Data Context itself is no longer available.

Next steps

Now that you have created and saved an Expectation Suite, you can Validate your data.

Prerequisites​

Import the Great Expectations module and instantiate a Data Context​

Use an existing Data Asset to create a Batch Request​

Create a Validator​

Use the Validator to create and run an Expectation​

Edit Expectations or create additional Expectations (Optional)​

Save your Expectations for future use (Optional)​

Next steps​