By default, Validation results are stored in JSON format in the
uncommitted/validations/ subdirectory of your
great_expectations/ folder. Since Validations may include examples of data (which could be sensitive or regulated) they should not be committed to a source control system. This guide will help you configure a new storage location for Validations in Amazon S3.
Prerequisites: This how-to guide assumes you have:
Configure boto3 to connect to the Amazon S3 bucket where Validation results will be stored.
Identify your Data Context Validations Store
Look for the following section in your Data Context's
validations_store_name: validations_store stores: validations_store: class_name: ValidationsStore store_backend: class_name: TupleFilesystemStoreBackend base_directory: uncommitted/validations/
The configuration file tells Great Expectations to look for Validations in a store called
validations_store. It also creates a
validations_storethat is backed by a Filesystem and will store validations under the
Update your configuration file to include a new store for Validation results on S3.
In the example below, the new store's name is set to
validations_S3_store, but it can be any name you like. We also need to make some changes to the
class_namewill be set to
bucketwill be set to the address of your S3 bucket, and
prefixwill be set to the folder in your S3 bucket where Validation results will be located.
If you are also storing Expectations in S3 (How to configure an Expectation store to use Amazon S3), or DataDocs in S3 (How to host and share Data Docs on Amazon S3), then please ensure that the
prefixvalues are disjoint and one is not a substring of the other.
validations_store_name: validations_S3_store stores: validations_S3_store: class_name: ValidationsStore store_backend: class_name: TupleS3StoreBackend bucket: '<your_s3_bucket_name>' prefix: '<your_s3_bucket_folder_name>'
Copy existing Validation results to the S3 bucket. (This step is optional).
One way to copy Validations into Amazon S3 is by using the
aws s3 synccommand. As mentioned earlier, the
base_directoryis set to
uncommitted/validations/by default. In the example below, two Validation results,
Validation2are copied to Amazon S3. Your output should looks something like this:
aws s3 sync '<base_directory>' s3://'<your_s3_bucket_name>'/'<your_s3_bucket_folder_name>'upload: uncommitted/validations/val1/val1.json to s3://'<your_s3_bucket_name>'/'<your_s3_bucket_folder_name>'/val1.jsonupload: uncommitted/validations/val2/val2.json to s3://'<your_s3_bucket_name>'/'<your_s3_bucket_folder_name>'/val2.json
Confirm that the new Validations store has been added by running
great_expectations --v3-api store list.
Notice the output contains two Validations Stores: the original
validations_storeon the local filesystem and the
validations_S3_storewe just configured. This is ok, since Great Expectations will look for Validation results on the S3 bucket as long as we set the
great_expectations --v3-api store list - name: validations_store class_name: ValidationsStore store_backend: class_name: TupleFilesystemStoreBackend base_directory: uncommitted/validations/ - name: validations_S3_store class_name: ValidationsStore store_backend: class_name: TupleS3StoreBackend bucket: '<your_s3_bucket_name>' prefix: '<your_s3_bucket_folder_name>'
Confirm that the Validations store has been correctly configured.
If it would be useful to you, please comment with a +1 and feel free to add any suggestions or questions below. Also, please reach out to us on Slack if you would like to learn more, or have any questions.