Connect to a Data Source
This is where you'll find information about creating your Great Expectations (GX) Python environment, installing GX locally, and how to configure the dependencies necessary to access Data Assets stored on Amazon S3, Google Cloud Storage (GCS), Microsoft Azure Blob Storage, or SQL databases. GX uses the term Data Asset when referring to data in its original format, and the term Data Source when referring to the storage location for Data Assets.
- Amazon S3
- Microsoft Azure Blob Storage
- Google Cloud Storage
- SQL databases
Amazon S3
Create your GX Python environment, install Great Expectations locally, and then configure the necessary dependencies to access data stored on Amazon S3.
Prerequisites
- An installation of Python 3.8 to 3.11. To download and install Python, see Python downloads.
- The ability to install Python modules with pip
- The AWS CLI. See Installing or updating the latest version of the AWS CLI.
- AWS credentials. See Configuring the AWS CLI.
Ensure your AWS CLI version is the most recent
You can verify that the AWS CLI has been installed by running the command:
aws --version
If this command does not respond by informing you of the version information of the AWS CLI, you may need to install the AWS CLI or otherwise troubleshoot your current installation. For detailed guidance on how to do this, please refer to Amazon's documentation on how to install the AWS CLI)
Ensure your AWS credentials are correctly configured
You can verify that the AWS CLI has been installed by running the command:
aws --version
If this command does not respond by informing you of the version information of the AWS CLI, you may need to install the AWS CLI or otherwise troubleshoot your current installation. For detailed guidance on how to do this, please refer to Amazon's documentation on how to install the AWS CLI)
Check your Python version
You can check your version of Python by running:
python --version
GX currently supports Python versions 3.8 to 3.11
python
or python3
Depending on your installation and configuration of Python 3, you may find that executing Python commands from the terminal by calling python
doesn't work as desired. If a command using python
does not work, try using python3
.
Instead of:
python --version
Try:
python3 --version
If this produces the desired result, simply replace python
with python3
in our example terminal commands.
If this does not work, you may need to look into your Python 3 installation or configuration.
Create a Python virtual environment
As a best practice, we recommend using a virtual environment to partition your GX installation from any other Python projects that may exist on the same system. This ensures that there will not be dependency conflicts between the GX installation and other Python projects.
Once we have confirmed that Python 3 is installed locally, we can create a virtual environment with venv
.
venv
?We have chosen to use venv
for virtual environments in this guide because it is included with Python 3. You are not limited to using venv
, and can just as easily install Great Expectations into virtual environments with tools such as virtualenv
, pyenv
, etc.
We will create our virtual environment by running:
python -m venv my_venv
This command will create a new directory called my_venv
. Our virtual environment will be located in this directory.
In order to activate the virtual environment we will run:
source my_venv/bin/activate
my_venv
?You can name your virtual environment anything you like. Simply replace my_venv
in the examples above with the name that you would like to use.
Install GX with optional dependencies for S3
To install Great Expectations with the optional dependencies needed to work with AWS S3 we execute the following pip command from the terminal:
python -m pip install 'great_expectations[s3]'
This will install Great Expectations and the boto3
package. GX uses boto3
to access S3.
Verify the GX has been installed correctly
You can verify that GX installed successfully with the CLI command:
great_expectations --version
The output you receive if GX was successfully installed will be:
great_expectations, version 0.18.9
Next steps
Now that you have installed GX with the necessary dependencies for working with S3, you are ready to initialize your Data ContextThe primary entry point for a Great Expectations deployment, with configurations and methods for all supporting components.. The Data Context will contain your configurations for GX components, as well as provide you with access to GX's Python API.
To create a Data Context, see Instantiate a Data Context.
Microsoft Azure Blob Storage
Create your GX Python environment, install Great Expectations locally, and then configure the necessary dependencies to access data stored on Microsoft Azure Blob Storage.
Prerequisites
- An installation of Python 3.8 to 3.11. To download and install Python, see Python downloads.
- The ability to install Python modules with pip
- An Azure Storage account. A connection string is required to complete the setup.