Version: 0.18.21

Connect to in-memory Data Assets

Use the information provided here to connect to an in-memory pandas or Spark DataFrame. Great Expectations (GX) uses the term Data Asset when referring to data in its original format, and the term Data Source when referring to the storage location for Data Assets.

pandas
Spark

pandas

pandas can read many types of data into its DataFrame class, but the following examples use data originating in a parquet file.

Prerequisites

A Great Expectations instance. See Install Great Expectations with Data Source dependencies.
A Data Context.

Access to data that can be read into a Pandas DataFrame

Import the Great Expectations module and instantiate a Data Context

Run the following Python code to import GX and instantiate a Data Context:

Python
import great_expectations as gx

context = gx.get_context()

Create a Data Source

Run the following Python code to create a Pandas Data Source:

Python
datasource = context.sources.add_pandas(name="my_pandas_datasource")

Read your data into a Pandas DataFrame

In the following example, a parquet file is read into a Pandas DataFrame that will be used in subsequent code examples.

Run the following Python code to create the Pandas DataFrame:

Python
import pandas as pd

dataframe = pd.read_parquet(
    "https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2022-11.parquet"
)

Add a Data Asset to the Data Source

The following information is required when you create a Pandas DataFrame Data Asset:

name: The Data Source name.
dataframe: The Pandas DataFrame containing the Data Assets.

The DataFrame you created previously is the value you'll enter for dataframe parameter.

Run the following Python code to define the name parameter and store it as a Python variable:
Python
```
name = "taxi_dataframe"
```
Run the following Python code to create the Data Asset:
Python
```
data_asset = datasource.add_dataframe_asset(name=name)
```
For dataframe Data Assets, the dataframe is always specified as the argument of one API method. For example:
Python
```
my_batch_request = data_asset.build_batch_request(dataframe=dataframe)
```

Next steps

For more information on Pandas read methods, see the Pandas Input/Output documentation.

Spark

Connect to in-memory Data Assets using Spark.

Prerequisites

A Great Expectations instance. See Install Great Expectations with Data Source dependencies.
A Data Context.

Access to data that can be read into a Spark
An active Spark Context

Import the Great Expectations module and instantiate a Data Context

Run the following Python code to import GX and instantiate a Data Context:

Python
import great_expectations as gx

context = gx.get_context()

Create a Data Source

Run the following Python code to create a Spark Data Source:

Python
datasource = context.sources.add_spark("my_spark_datasource")

Read your data into a Spark DataFrame

In the following example, you'll create a simple Spark DataFrame that is used in the following code examples.

Run the following Python code to create the Spark DataFrame:

Python
df = pd.DataFrame(
    {
        "a": [1, 2, 3, 4, 5, 6],
        "b": [100, 200, 300, 400, 500, 600],
        "c": ["one", "two", "three", "four", "five", "six"],
    },
    index=[10, 20, 30, 40, 50, 60],
)

dataframe = spark.createDataFrame(data=df)

Add a Data Asset to the Data Source

The following information is required when you create a Spark DataFrame Data Asset:

name: The Data Source name.
dataframe: The Spark DataFrame containing the Data Assets.

The DataFrame you created previously is the value you'll enter for dataframe parameter.

Run the following Python code to define the name parameter and store it as a Python variable:
Python
```
name = "my_df_asset"
```
Run the following Python code to create the Data Asset:
Python
```
data_asset = datasource.add_dataframe_asset(name=name)
```
For dataframe Data Assets, the dataframe is always specified as the argument of one API method. For example:
Python
```
my_batch_request = data_asset.build_batch_request(dataframe=dataframe)
```

Next steps

For more information on Spark read methods, see the Spark Input/Output documentation.

pandas​

Prerequisites​

Import the Great Expectations module and instantiate a Data Context​

Create a Data Source​

Read your data into a Pandas DataFrame​

Add a Data Asset to the Data Source​

Next steps​

Related documentation​

Spark​

Prerequisites​

Import the Great Expectations module and instantiate a Data Context​

Create a Data Source​

Read your data into a Spark DataFrame​

Add a Data Asset to the Data Source​

Next steps​

Related documentation​

pandas

Prerequisites

Import the Great Expectations module and instantiate a Data Context

Create a Data Source

Read your data into a Pandas DataFrame

Add a Data Asset to the Data Source

Next steps

Related documentation

Spark

Prerequisites

Import the Great Expectations module and instantiate a Data Context

Create a Data Source

Read your data into a Spark DataFrame

Add a Data Asset to the Data Source

Next steps

Related documentation