Skip to main content
Version: 0.18.9

Connect to in-memory Data Assets

Use the information provided here to connect to an in-memory pandas or Spark DataFrame. Great Expectations (GX) uses the term Data Asset when referring to data in its original format, and the term Data Source when referring to the storage location for Data Assets.

pandas

pandas can read many types of data into its DataFrame class, but the following examples use data originating in a parquet file.

Prerequisites

  • Access to data that can be read into a Pandas DataFrame

Import the Great Expectations module and instantiate a Data Context

Run the following Python code to import GX and instantiate a Data Context:

Python
import great_expectations as gx

context = gx.get_context()

Create a Data Source

Run the following Python code to create a Pandas Data Source:

Python
datasource = context.sources.add_pandas(name="my_pandas_datasource")

Read your data into a Pandas DataFrame

In the following example, a parquet file is read into a Pandas DataFrame that will be used in subsequent code examples.

Run the following Python code to create the Pandas DataFrame:

Python
import pandas as pd

dataframe = pd.read_parquet(
"https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2022-11.parquet"
)

Add a Data Asset to the Data Source

The following information is required when you create a Pandas DataFrame Data Asset:

  • name: The Data Source name.

  • dataframe: The Pandas DataFrame containing the Data Assets.

The DataFrame you created previously is the value you'll enter for dataframe parameter.

  1. Run the following Python code to define the name parameter and store it as a Python variable:

    Python
    name = "taxi_dataframe"
  2. Run the following Python code to create the Data Asset:

    Python
    data_asset = datasource.add_dataframe_asset(name=name)

    For dataframe Data Assets, the dataframe is always specified as the argument of one API method. For example:

    Python
    my_batch_request = data_asset.build_batch_request(dataframe=dataframe)

Next steps

For more information on Pandas read methods, see the Pandas Input/Output documentation.