Setting up Great Expectations includes installing Great Expectations and initializing your deployment. Optionally, you can customize the configuration of some components, such as Stores, Data Docs, and Plugins.
After you've completed the setup for your production deployment, you can access all Great Expectations features from your Data ContextThe primary entry point for a Great Expectations deployment, with configurations and methods for all supporting components.. Also, your StoresA connector to store and retrieve information about metadata in Great Expectations. and Data DocsHuman readable documentation generated from Great Expectations metadata detailing Expectations, Validation Results, etc. will be optimized for your business requirements.
To set up DatasourcesProvides a standard API for accessing and interacting with data from a wide variety of source systems., Expectation SuitesA collection of verifiable assertions about data., and CheckpointsThe primary means for validating data in a production deployment of Great Expectations. see the specific topics for these components.
If you don't want to manage your own configurations and infrastructure, then Great Expectations Cloud might be the solution. If you're interested in participating in the Great Expectations Cloud Beta program, or you want to receive progress updates, sign up for the Beta program.
Windows support for the open source Python version of GX is currently unavailable. If you’re using GX in a Windows environment, you might experience errors or performance issues.
- Completion of the Quickstart guide.
- A supported version of Python. GX supports Python versions 3.7 to 3.10.
- pip (the package installer for Python).
- An internet connection.
- A web browser (for Jupyter Notebooks).
- A virtual environment. Recommended for your project workspace.
1. Install Great Expectations
Run the following pip command in a terminal to install Great Expectations and its dependencies:
pip install great_expectations
If you experience difficulty with the installation, see Supporting ResourcesA resource external to the Great Expectations code base which Great Expectations utilizes..
2. Initialize a Data Context
Your Data Context contains your Great Expectations project, and it is the entry point for configuring and interacting with Great Expectations. The Data Context manages various classes and helps limit the number of objects you need to manage to get Great Expectations working.
Run the following command to retrieve your Data Context:
import great_expectations as gx
context = gx.get_context()
To configure your Data Context, see Data Context.
3. Optional configurations
After you've initialized your Data Context, you can start using Great Expectations. However, a few components such as Stores, Data Docs, and Plugins that are configured by default to operate locally can be changed to hosted if it better suits your use case.
Stores are the locations where your Data Context stores information about your ExpectationsA verifiable assertion about data., your Validation ResultsGenerated when data is Validated against an Expectation or Expectation Suite., and your MetricsA computed attribute of data such as the mean of a column.. By default, these are stored locally. To reconfigure a Store to work with a specific backend, see Stores for more information.
Data Docs provide human-readable renderings of your Expectation Suites and Validation Results, and they are built locally by default. To host and share Data Docs differently, see Data Docs.
Python files are treated as PluginsExtends Great Expectations' components and/or functionality. when they are in the
plugins directory of your project (which is created automatically when you initialize your Data Context) and they can be used to extend Great Expectations. If you have Custom ExpectationsAn extension of the `Expectation` class, developed outside of the Great Expectations library. or other extensions that you want to use as Plugins with Great Expectations, add them to the