Skip to main content
Version: 0.18.9

Configure Expectation Stores

An Expectation Store is a connector to store and retrieve information about collections of verifiable assertions about data.

By default, new ExpectationsA verifiable assertion about data. are stored as Expectation SuitesA collection of verifiable assertions about data. in JSON format in the expectations/ subdirectory of your gx/ folder. Use the information provided here to configure a store for your Expectations.

Amazon S3

Use the information provided here to configure a new storage location for Expectations in Amazon S3.

Prerequisites

Install boto3 with pip

Python interacts with AWS through the boto3 library. Great Expectations makes use of this library in the background when working with AWS. Although you won't use boto3 directly, you'll need to install it in your virtual environment.

Run one of the following pip commands to install boto3 in your virtual environment:

Terminal command
python -m pip install boto3

or

Terminal command
python3 -m pip install boto3

To set up boto3 with AWS, and use boto3 within Python, see the Boto3 documentation.

Verify your AWS credentials

Run the following command in the AWS CLI to verify that your AWS credentials are properly configured:

Terminal command
aws sts get-caller-identity

When your credentials are properly configured, your UserId, Account, and Arn are returned. If your credentials are not configured correctly, an error message appears. If you received an error message, or you couldn't verify your credentials, see Configuring the AWS CLI.

Identify your Data Context Expectations Store

Your Expectation StoreA connector to store and retrieve information about collections of verifiable assertions about data. configuration is in your Data ContextThe primary entry point for a Great Expectations deployment, with configurations and methods for all supporting components..

The following section in your Data ContextThe primary entry point for a Great Expectations deployment, with configurations and methods for all supporting components. great_expectations.yml file tells Great Expectations to look for Expectations in a Store named expectations_store:

Python
stores:
expectations_store:
class_name: ExpectationsStore
store_backend:
class_name: TupleFilesystemStoreBackend
base_directory: expectations/

expectations_store_name: expectations_store

The default base_directory for expectations_store is expectations/.

Update your configuration file to include a new Store for Expectations

To manually add an Expectations StoreA connector to store and retrieve information about collections of verifiable assertions about data. to your configuration, add the following configuration to the stores section of your great_expectations.yml file:

Python
stores:
expectations_S3_store:
class_name: ExpectationsStore
store_backend:
class_name: TupleS3StoreBackend
bucket: '<your>'
prefix: '<your>' # Bucket and prefix in combination must be unique across all stores

expectations_store_name: expectations_S3_store

Change the default store_backend settings to make the Store work with S3. The class_name is set to TupleS3StoreBackend, bucket is the address of your S3 bucket, and prefix is the folder in your S3 bucket where Expectations are located.

The following example shows the additional options that are available to customize TupleS3StoreBackend:

File contents: great_expectations.yml
class_name: ExpectationsStore
store_backend:
class_name: TupleS3StoreBackend
bucket: '<your_s3_bucket_name>'
prefix: '<your_s3_bucket_folder_name>' # Bucket and prefix in combination must be unique across all stores
boto3_options:
endpoint_url: ${S3_ENDPOINT} # Uses the S3_ENDPOINT environment variable to determine which endpoint to use.
region_name: '<your_aws_region_name>'

In the previous example, the Store name is expectations_S3_store. If you use a personalized Store name, you must also update the value of the expectations_store_name key to match the Store name. For example:

File contents: great_expectations.yml
expectations_store_name: expectations_S3_store

When you update the expectations_store_name key value, Great Expectations uses the new Store for Validation Results.

Add the following code to great_expectations.yml to configure the IAM user:

File contents: great_expectations.yml
class_name: ExpectationsStore
store_backend:
class_name: TupleS3StoreBackend
bucket: '<your_s3_bucket_name>'
prefix: '<your_s3_bucket_folder_name>'
boto3_options:
aws_access_key_id: ${AWS_ACCESS_KEY_ID} # Uses the AWS_ACCESS_KEY_ID environment variable to get aws_access_key_id.
aws_secret_access_key: ${AWS_ACCESS_KEY_ID}
aws_session_token: ${AWS_ACCESS_KEY_ID}

Add the following code to great_expectations.yml to configure the IAM Assume Role:

File contents: great_expectations.yml
class_name: ExpectationsStore
store_backend:
class_name: TupleS3StoreBackend
bucket: '<your_s3_bucket_name>'
prefix: '<your_s3_bucket_folder_name>' # Bucket and prefix in combination must be unique across all stores
boto3_options:
assume_role_arn: '<your_role_to_assume>'
region_name: '<your_aws_region_name>'
assume_role_duration: session_duration_in_seconds
caution

If you're storing Validations in S3 or DataDocs in S3, make sure that the prefix values are disjoint and one is not a substring of the other.

Copy existing Expectation JSON files to the S3 bucket (Optional)

If you are converting an existing local Great Expectations deployment to one that works in AWS, you might have Expectations saved that you want to transfer to your S3 bucket.

Run the following aws s3 synccommand to copy Expectations into Amazon S3:

Terminal command
aws s3 sync '<base_directory>' s3://'<your_s3_bucket_name>'/'<your_s3_bucket_folder_name>'

The base_directory is set to expectations/ by default.

In the following example, the Expectations exp1 and exp2 are copied to Amazon S3 and a confirmation message is returned:

Terminal output
upload: ./exp1.json to s3://'<your_s3_bucket_name>'/'<your_s3_bucket_folder_name>'/exp1.json
upload: ./exp2.json to s3://'<your_s3_bucket_name>'/'<your_s3_bucket_folder_name>'/exp2.json

Confirm Expectation Suite availability

If you copied your existing Expectation Suites to the S3 bucket, run the following Python code to confirm that Great Expectations can find them:

Python
import great_expectations as gx

context = gx.get_context()
context.list_expectation_suite_names()

The Expectations you copied to S3 are returned as a list. Expectations that weren't copied to the new Store aren't listed.