Great Expectations Quickstart
Use this quickstart to install GX, connect to sample data, build your first Expectation, validate your data, and review the validation results. This is a great place to start if you're new to GX and aren't sure if it's the right solution for you or your organization.
This quickstart introduces you to the open source Python version of GX. A Cloud interface will soon be available to simplify collaboration between data teams and domain experts.
If you're interested in participating in the Great Expectations Cloud Beta program, or you want to receive progress updates, sign up for the Beta program.
Windows support for the open source Python version of GX is currently unavailable. If you’re using GX in a Windows environment, you might experience errors or performance issues.
- Python versions 3.8 to 3.10. See Python downloads.
- An internet browser
Run the following command in an empty base directory inside a Python virtual environment:Terminal input
pip install great_expectations
It can take several minutes for the installation to complete. Jupyter Notebook is included with Great Expectations, and it lets you edit code and view the results of code runs.
Open Jupyter Notebook or Terminal and then run the following command to import the
import great_expectations as gx
Create a DataContext
Run the following command to import the existing
context = gx.get_context()
Connect to Data
Run the following command to connect to existing
.csvdata stored in the
validator = context.sources.pandas_default.read_csv(
The example code uses the default Data Context Datasource for Pandas to access the
.csvdata in the file at the specified
Run the following command to create two Expectations. The first Expectation uses domain knowledge (the
pickup_datetimeshouldn't be null), and the second Expectation uses
auto=Trueto detect a range of values in the
Run the following command to define a Checkpoint and examine the data to determine if it matches the defined Expectations:
checkpoint = gx.checkpoint.SimpleCheckpoint(
Run the following command to return the Validation results:
checkpoint_result = checkpoint.run()
Run the following command to view an HTML representation of the Validation results:
validation_result_identifier = checkpoint_result.list_validation_result_identifiers()
If you're ready to continue your Great Expectations journey, the following topics can help you implement a tailored solution for your specific environment and business requirements:
Install GX in a specific environment and connect to a source data system:
- How to install Great Expectations locally
- How to set up GX to work with data on AWS S3
- How to set up GX to work with data in Azure Blob Storage
- How to set up GX to work with data on GCS
- How to set up GX to work with SQL databases
- How to instantiate a Data Context on an EMR Spark Cluster
- How to use Great Expectations in Databricks
Initialize, instantiate, and save a Data Contex: