Create Expectations interactively with Python
To Validate data we must first define a set of Expectations for that data to be Validated against. In this guide, you'll learn how to create Expectations and interactively edit them with feedback from Validating each against a Batch of data. Validating your Expectations as you define them allows you to quickly determine if the Expectations are suitable for our data, and identify where changes might be necessary.
No. The interactive method used to create and edit Expectations does not edit or alter the Batch data.
Prerequisites
- Great Expectations installed in a Python environment
- A Filesystem Data Context for your Expectations
- Created a Data Source from which to request a Batch of data for introspection
Import the Great Expectations module and instantiate a Data Context
For this guide we will be working with Python code in a Jupyter Notebook. Jupyter is included with GX and lets us easily edit code and immediately see the results of our changes.
Run the following code to import Great Expectations and instantiate a Data Context:
import great_expectations as gx
context = gx.get_context()
If you're using an Ephemeral Data Context, your configurations will not persist beyond the current Python session. However, if you're using a Filesystem or Cloud Data Context, they do persist. The get_context()
method returns the first Cloud or Filesystem Data Context it can find. If a Cloud or Filesystem Data Context has not be configured or cannot be found, it provides an Ephemeral Data Context. For more information about the get_context()
method, see Instantiate a Data Context.
Use an existing Data Asset to create a Batch Request
Add the following method to retrieve a previously configured Data Asset from the Data Context you initialized and create a Batch Request to identify the Batch of data that you'll use to validate your Expectations:
data_asset = context.get_datasource("my_datasource").get_asset("my_data_asset")
batch_request = data_asset.build_batch_request()
You can provide a dictionary as the options
parameter of build_batch_request()
to limit the Batches returned by a Batch Request. If you leave the options
parameter empty, your Batch Request will include all the Batches configured in the corresponding Data Asset. For more information about Batch Requests, see How to request data from a Data Asset.
Create a Validator
When you use a Validator to interactively create your Expectations, the Validator needs two parameters. One parameter identifies the Batch that contains the data that is used to Validate the Expectations. The second parameter provides a name for the combined list of Expectations you create.
If you're using a Jupyter Notebook you'll automatically see the results of the code you run in a new cell when you run the code. If you're using a different interpreter, you might need to explicitly print these results to view them. For example:
print(validator.head())
-
Optional. Run the following command if you haven't created an Expectation Suite:
Pythoncontext.add_or_update_expectation_suite("my_expectation_suite")
# Optional. Run assert "my_expectation_suite" in context.list_expectation_suite_names() to veriify the Expectation Suite was created. -
Run the following command to create a Validator:
Pythonvalidator = context.get_validator(
batch_request=batch_request,
expectation_suite_name="my_expectation_suite",
)
validator.head()
Use the Validator to create and run an Expectation
The Validator provides access to all the available Expectations as methods. When an expect_*()
method is run from the Validator, the Validator adds the specified Expectation to an Expectation Suite (or edits an existing Expectation in the Expectation Suite, if applicable) in its configuration, and then the specified Expectation is run against the data that was provided when the Validator was initialized with a Batch Request.
validator.expect_column_values_to_not_be_null(column="vendor_id")
Since we are working in a Python interpreter, the results of the Validation are printed after we run an expect_*()
method. We can examine those results to determine if the Expectation needs to be edited.
If you are not working in a Python interpreter you may need to explicitly print your results:
expectation_validation_result = validator.expect_column_values_to_not_be_null(
column="vendor_id"
)
print(expectation_validation_result)
Edit Expectations or create additional Expectations (Optional)
If you choose to edit an Expectation after you've viewed the Validation Results that were returned when it was created, you can do so by running the validator.expect_*()
method with different parameters than you supplied previously. You can also have the Validator run an entirely different expect_*()
method and create additional Expectations. All the Expectations that you create are stored in a list in the Validator's in-memory configuration.
GX takes into account certain parameters when determining if an Expectation is being added to the list or if an existing Expectation should be edited. For example, if you are created an Expectation with a method such as expect_column_*()
you could later edit it by providing the same column
parameter when running the expect_column_*()
method a second time, and different values for any other parameters. However, if you ran the same expect_column_*()
method and provided a different column
parameter, you will create an additional instance of the Expectation for the new column
value, rather than overwrite the Expectation you defined with the first column
value.
Save your Expectations for future use (Optional)
The Expectations you create with the interactive method are saved in an Expectation Suite on the Validator object. Validators do not persist outside the current Python session and for this reason these Expectations will not be kept unless you save them to your Data Context. This can be ideal if you are using a Validator for quick data validation and exploration, but in most cases you'll want to reuse your newly created Expectation Suite in future Python sessions.
To keep your Expectations for future use, you save them to your Data Context. A Filesystem or Cloud Data Context persists outside the current Python session, so saving the Expectation Suite in your Data Context's Expectations Store ensures you can access it in the future:
validator.save_expectation_suite(discard_failed_expectations=False)
Ephemeral Data Contexts don't persist beyond the current Python session. If you're working with an Ephemeral Data Context, you'll need to convert it to a Filesystem Data Context using the Data Context's convert_to_file_context()
method. Otherwise, your saved configurations won't be available in future Python sessions as the Data Context itself is no longer available.
Next steps
Now that you have created and saved an Expectation Suite, you can Validate your data.