Great Expectations overview
The information provided here is intended for new users of Great Expectations (GX) and those looking for an understanding of its components and its primary workflows. This overview of GX doesn’t require an in-depth understanding of the code that governs GX processes and interactions. This is an ideal place to start before moving to more advanced topics, or if you want a better understanding of GX functionality.
What is GX
GX is a Python library that provides a framework for describing the acceptable state of data and then validating that the data meets those criteria.
GX core components
When working with GX you use the following four core components to access, store, and manage underlying objects and processes:
- Data Context: Manages the settings and metadata for a GX project, and provides an entry point to the GX Python API.
- Data Sources: Connects to your Data Source, and organizes retrieved data for future use.
- Expectations: Identifies the standards to which your data should conform.
- Checkpoints: Validates a set of Expectations against a specific set of data.
Data Context
A Data Context manages the settings and metadata for a GX project. In Python, the Data Context object serves as the entry point for the GX API and manages various classes to limit the objects you need to directly manage yourself. A Data Context contains all the metadata used by GX, the configurations for GX objects, and the output from validating data.
The following are the available Data Context types:
- Ephemeral Data Context: Exists in memory, and does not persist beyond the current Python session.
- File Data Context: Exists as a folder and configuration files. Its contents persist between Python sessions.
- Cloud Data Context: Supports persistence between Python sessions, but additionally serves as the entry point for Great Expectations Cloud.
For more information, see Configure Data Contexts.
The GX API
A Data Context object in Python provides methods for configuring and interacting with GX. These methods and the objects and additional methods accessed through them compose the GX public API.
For more information, see The GX API reference.
Stores
Stores contain the metadata GX uses. This includes configurations for GX objects, information that is recorded when GX validates data, and credentials used for accessing data sources or remote environments. GX utilizes one Store for each type of metadata, and the Data Context contains the settings that tell GX where that Store should reside and how to access it.
For more information, see Configure your GX environment.
Data Docs
Data Docs are human-readable documentation generated by GX. Data Docs describe the standards that you expect your data to conform to, and the results of validating your data against those standards. The Data Context manages the storage and retrieval of this information.
You can configure where your Data Docs are hosted. Unlike Stores, you can define configurations for multiple Data Docs sites. You can also specify what information each Data Doc site provides, allowing you to format and provide different Data Docs for different use cases.
For more information, see Host and share Data Docs.