Skip to main content

How to configure a Validation Result store in GCS

By default, Validation ResultsGenerated when data is Validated against an Expectation or Expectation Suite. are stored in JSON format in the uncommitted/validations/ subdirectory of your great_expectations/ folder. Since Validation Results may include examples of data (which could be sensitive or regulated) they should not be committed to a source control system. This guide will help you configure a new storage location for Validation Results in a Google Cloud Storage (GCS) bucket.

Prerequisites: This how-to guide assumes you have:

Steps

1. Configure your GCP credentials

Check that your environment is configured with the appropriate authentication credentials needed to connect to the GCS bucket where Validation Results will be stored.

The Google Cloud Platform documentation describes how to verify your authentication for the Google Cloud API, which includes:

  1. Creating a Google Cloud Platform (GCP) service account,
  2. Setting the GOOGLE_APPLICATION_CREDENTIALS environment variable,
  3. Verifying authentication by running a simple Google Cloud Storage client library script.

2. Identify your Data Context Validation Results Store

As with other StoresA connector to store and retrieve information about metadata in Great Expectations., you can find your Validation Results StoreA connector to store and retrieve information about objects generated when data is Validated against an Expectation Suite. through your Data ContextThe primary entry point for a Great Expectations deployment, with configurations and methods for all supporting components.. In your great_expectations.yml, look for the following lines. The configuration tells Great Expectations to look for Validation Results in a Store called validations_store. The base_directory for validations_store is set to uncommitted/validations/ by default.

stores:
validations_store:
class_name: ValidationsStore
store_backend:
class_name: TupleFilesystemStoreBackend
base_directory: uncommitted/validations/

validations_store_name: validations_store

3. Update your configuration file to include a new Store for Validation Results on GCS

In our case, the name is set to validations_GCS_store, but it can be any name you like. We also need to make some changes to the store_backend settings. The class_name will be set to TupleGCSStoreBackend, project will be set to your GCP project, bucket will be set to the address of your GCS bucket, and prefix will be set to the folder on GCS where Validation Result files will be located.

stores:
validations_GCS_store:
class_name: ValidationsStore
store_backend:
class_name: TupleGCSStoreBackend
project: <YOUR GCP PROJECT NAME>
bucket: <YOUR GCS BUCKET NAME>
prefix: <YOUR GCS PREFIX NAME>

validations_store_name: validations_GCS_store
danger

If you are also storing Expectations in GCS or DataDocs in GCS, please ensure that the prefix values are disjoint and one is not a substring of the other.

4. Copy existing Validation Results to the GCS bucket (This step is optional)

One way to copy Validation Results into GCS is by using the gsutil cp command, which is part of the Google Cloud SDK. In the example below, two Validation results, validation_1 and validation_2 are copied to the GCS bucket. Information on other ways to copy Validation results, like the Cloud Storage browser in the Google Cloud Console, can be found in the Documentation for Google Cloud.

gsutil cp uncommitted/validations/my_expectation_suite/validation_1.json gs://<YOUR GCS BUCKET NAME>/<YOUR GCS PREFIX NAME>/validation_1.json
gsutil cp uncommitted/validations/my_expectation_suite/validation_2.json gs://<YOUR GCS BUCKET NAME>/<YOUR GCS PREFIX NAME>/validation_2.json
Operation completed over 2 objects

5. Confirm that the new Validation Results Store has been added by running

great_expectations store list

Only the active Stores will be listed. Great Expectations will look for Validation Results in GCS as long as we set the validations_store_name variable to validations_GCS_store, and the config for validations_store can be removed if you would like.

- name: validations_GCS_store
class_name: ValidationsStore
store_backend:
class_name: TupleGCSStoreBackend
project: <YOUR GCP PROJECT NAME>
bucket: <YOUR GCS BUCKET NAME>
prefix: <YOUR GCS PREFIX NAME>

6. Confirm that the Validation Results Store has been correctly configured

Run a Checkpoint to store results in the new Validation Results Store on GCS then visualize the results by re-building Data Docs.

Additional Notes

To view the full script used in this page, see it on GitHub: