Skip to main content

How to install Great Expectations on a Spark EMR Cluster

This guide will help you Install Great Expectations on a Spark EMR cluster.

Steps#

1. Install Great Expectations#

The guide demonstrates the recommended path for instantiating a Data Context without a full configuration directory and without using the Great Expectations Command Line Interface (CLI)

sc.install_pypi_package("great_expectations")

2. Configure a Data Context in code#

Follow the steps for creating an in-code Data Context in How to instantiate a Data Context without a yml file

Here is Python code that instantiates and configures a Data Context in code for an EMR Spark cluster. Copy this snippet into a cell in your EMR Spark notebook or use the other examples to customize your configuration. Execute the snippet to instantiate a Data Context in memory.

Then copy the following code snippet into a cell in your EMR Spark notebook, run it and verify that no error is displayed:

context.list_datasources()

πŸš€πŸš€ Congratulations! πŸš€πŸš€ You successfully installed Great Expectations.