Skip to main content

Optional - Customize your deployment

At this point, you have your first, working local deployment of Great Expectations. You’ve also been introduced to the foundational concepts in the library: Data Contexts, Datasources, Expectations, Profilers, Data Docs, Validation, and Checkpoints.

Congratulations! You’re off to a very good start.

The next step is to customize your deployment by upgrading specific components of your deployment. Data Contexts make this modular, so that you can add or swap out one component at a time. Most of these changes are quick, incremental steps—so you can upgrade from a basic demo deployment to a full production deployment at your own pace and be confident that your Data Context will continue to work at every step along the way.

This last section of this tutorial is designed to present you with clear options for upgrading your deployment. For specific implementation steps, please check out the linked How-to guides.

Components#

Here’s an overview of the components of a typical Great Expectations deployment:

Options for storing Great Expectations configuration#

The simplest way to manage your Great Expectations configuration is usually by committing great_expectations/great_expectations.yml to Git. However, it’s not usually a good idea to commit credentials to source control. In some situations, you might need to deploy without access to source control (or maybe even a file system).

Here’s how to handle each of those cases:

Options for storing Expectations#

Many teams find it convenient to store Expectations in Git. Essentially, this approach treats Expectations like test fixtures: they live adjacent to code and are stored within version control. Git acts as a collaboration tool and source of record.

Alternatively, you can treat Expectations like configs, and store them in a blob store. Finally, you can store them in a database.

Options for storing Validation Results#

By default, Validation Results are stored locally, in an uncommitted directory. This is great for individual work, but not good for collaboration. The most common pattern is to use a cloud-based blob store such as S3, GCS, or Azure blob store. You can also store Validation Results in a database.

Options for customizing generated notebooks#

Great Expectations generates and provides notebooks as interactive development environments for Expectation Suites. You might want to customize parts of the notebooks to add company-specific documentation, or change the code sections to suit your use-cases.

Deployment Patterns#

Connecting to Data#

Great Expectations allows you to connect to data in a wide variety of sources, and the list is constantly getting longer. If you have an idea for a source not listed here, please speak up in the public discussion forum.

Options for hosting Data Docs#

By default, Data Docs are stored locally, in an uncommitted directory. This is great for individual work, but not good for collaboration. A better pattern is usually to deploy to a cloud-based blob store (S3, GCS, or Azure Blob Storage), configured to share a static website.

Additional Checkpoints and Actions#

Most teams will want to configure various Checkpoints and Validation Actions as part of their deployment. There are two primary patterns for deploying Checkpoints. Sometimes Checkpoints are executed during data processing (e.g. as a task within Airflow). From this vantage point, they can control program flow. Sometimes Checkpoints are executed against materialized data. Great Expectations supports both patterns. There are also some rare instances where you may want to validate data without using a Checkpoint.