Skip to main content

Review and next steps

SetupArrowConnect to DataArrowCreate ExpectationsArrowValidate Data
Prerequisites

Review#

In this tutorial we've taken you through the four steps you need to be able to perform to use Great Expectations.

Let's review each of these steps and take a look at the important concepts and features we used.

Setup

Step 1: Setup

You installed Great Expectations and initialized your Data Context.

  • Data Context: The folder structure that contains the entirety of your Great Expectations project. It is also the entry point for accessing all the primary methods for creating elements of your project, configuring those elements, and working with the metadata for your project.
  • CLI: The Command Line Interface for Great Expectations. The CLI provides helpful utilities for deploying and configuring Data Contexts, as well as a few other convenience methods.
Connect to Data

Step 2: Connect to Data

You created and configured your Datasource.

  • Datasource: An object that brings together a way of interacting with data (an Execution Engine) and a way of accessing that data (a Data Connector). Datasources are used to obtain Batches for Validators, Expectation Suites, and Profilers.
  • Jupyter Notebooks: These notebooks are launched by some processes in the CLI. They provide useful boilerplate code for everything from configuring a new Datasource to building an Expectation Suite to running a Checkpoint.
Create Expectations

Step 3: Create Expectations

You used the automatic Profiler to build an Expectation Suite.

  • Expectation Suite: A collection of Expectations.
  • Expectations: A verifiable assertion about data. Great Expectations is a framework for defining Expectations and running them against your data. In the tutorial's example, we asserted that NYC taxi rides should have a minimum of one passenger. When we ran that expectation against our second set of data Great Expectations reported back that some records in the new data indicated a ride with zero passengers, which failed to meet this expectation.
  • Profiler: A tool that automatically generates Expectations from a Batch of data.
Validate Data

Step 4: Validate Data

You created a Checkpoint which you used to validate new data. You then viewed the Validation Results in Data Docs.

  • Checkpoint: An object that uses a Validator to run an Expectation Suite against a batch of data. Running a Checkpoint produces Validation Results for the data it was run on.
  • Validation Results: A report generated from an Expectation Suite being run against a batch of data. The Validation Result itself is in JSON and is rendered as Data Docs.
  • Data Docs: Human readable documentation that describes Expectations for data and its Validation Results. Data docs van be generated both from Expectation Suites (describing our Expectations for the data) and also from Validation Results (describing if the data meets those Expectations).

Going forward#

Your specific use case will no doubt differ from that of our tutorial. However, the four steps you'll need to perform in order to get Great Expectations working for you will be the same. Setup, connect to data, create Expectations, and validate data. That's all there is to it! As long as you can perform these four steps you can have Great Expectations working to validate data for you.

For those who only need to know the basics in order to make Great Expectations work our documentation includes Core Concepts for each step.

For those who prefer working from examples, we have "How to" guides which show working examples of how to configure objects from Great Expectations according to specific use cases.

Finally, we have Deployment Patterns. These show you how to perform all four steps for implementing Great Expectations with a specific environment and data type.