How to connect to in-memory data using Pandas
In this guide we will demonstrate how to connect to an in-memory Pandas DataFrame. Pandas can read many types of data into its DataFrame class, but in our example we will use data originating in a parquet file.
- A Great Expectations instance. See Install Great Expectations locally.
- A Data Context.
- Access to data that can be read into a Pandas DataFrame
1. Import the Great Expectations module and instantiate a Data Context
The code to import Great Expectations and instantiate a Data Context is:
import great_expectations as gx
context = gx.get_context()
2. Create a Datasource
To access our in-memory data, we will create a Pandas Datasource:
datasource = context.sources.add_pandas(name="my_pandas_datasource")
3. Read your source data into a Pandas DataFrame
For this example, we will read a parquet file into a Pandas DataFrame, which we will then use in the rest of this guide.
The code to create the Pandas DataFrame we are using in this guide is defined with:
import pandas as pd
dataframe = pd.read_parquet(
4. Add a Data Asset to the Datasource
A Pandas DataFrame Data Asset can be defined with two elements:
name: The name by which the Datasource will be referenced in the future
dataframe: A Pandas DataFrame containing the data
We will use the
dataframe from the previous step as the corresponding parameter's value. For the
name parameter, we will define a name in advance by storing it in a Python variable:
name = "taxi_dataframe"
Now that we have the
dataframe for our Data Asset, we can create the Data Asset with the code:
data_asset = datasource.add_dataframe_asset(name=name)
dataframe Data Assets, the
dataframe is always specified as the argument of exactly one API method:
my_batch_request = data_asset.build_batch_request(dataframe=dataframe)
Now that you have connected to your data, you may want to look into:
- How to request Data from a Data Asset
- How to create Expectations while interactively evaluating a set of data
- How to use the Onboarding Data Assistant to evaluate data and create Expectations
For more information on Pandas read methods, please reference the official Pandas Input/Output documentation.