Install GX
GX Core is a Python library. Follow the instructions in this guide to install GX Core in your local Python environment, or as a notebook-scoped library in hosted environments such as EMR Spark clusters.
Prerequisites
- Python version 3.10 to 3.13
- Recommended. A Python virtual environment
- Internet access
- Permissions to download and install packages in your environment
Install the GX Python library
- Local
- Hosted environment
GX Core is a Python library and as such can be used with a local Python installation to access the functionality of GX through Python scripts.
Installation and setup
-
Optional. Activate your virtual environment.
If you created a virtual environment for your GX Python installation, browse to the folder that contains your virtual environment and run the following command to activate it:
Terminal inputsource my_venv/bin/activate -
Ensure you have the latest version of
pip:Terminal inputpython -m ensurepip --upgrade -
Install the GX Core library:
Terminal inputpip install great_expectations -
Verify that GX installed successfully.
In Python, run the following code:
Pythonimport great_expectations as gx
print(gx.__version__)If GX was installed correctly, the version number of the installed GX library will be printed.
Hosted environments such as EMR Spark do not provide a filesystem to install your GX Core instance. Instead, you must install GX Core in memory using the Python-style notebooks available on those platforms.
Use the information provided here to install GX on an EMR Spark cluster and instantiate a Data Context without a full configuration directory.
Additional prerequisites
- An EMR Spark cluster.
- Access to the EMR Spark notebook.
Installation and setup
-
To install GX on your EMR Spark cluster copy this code snippet into a cell in your EMR Spark notebook and then run it:
Pythonsc.install_pypi_package("great_expectations") -
Create an in-code Data Context. See Instantiate an Ephemeral Data Context.
-
Copy the Python code at the end of How to instantiate an Ephemeral Data Context into a cell in your EMR Spark notebook, or use the other examples to customize your configuration. The code instantiates and configures a Data Context for an EMR Spark cluster.
-
Execute the cell with your Data Context initialization and configuration.
-
Run the following command to verify that GX was installed and your in-memory Data Context was instantiated successfully:
Pythoncontext.list_datasources()