Version: 0.18.21

Request data from a Data Asset

Learn how you can request data from a Data Source that has been defined with the context.sources.add_* method.

Prerequisites

An installation of GX
A Data Source with a configured Data Asset

Import GX and instantiate a Data Context

Run the following Python code to import GX and instantiate a Data Context:

Python
import great_expectations as gx

context = gx.get_context()

Retrieve your Data Asset

If you already have an instance of your Data Asset stored in a Python variable, you do not need to retrieve it again. If you do not, you can instantiate a previously defined Data Source with your Data Context's get_datasource(...) method. Likewise, a Data Source's get_asset(...) method will instantiate a previously defined Data Asset.

In this example we will use a previously defined Data Source named my_datasource and a previously defined Data Asset named my_asset.

Python
my_asset = context.get_datasource("my_datasource").get_asset("my_asset")

Build an `options` dictionary for your Batch Request (Optional)

An options dictionary can be used to limit the Batches returned by a Batch Request. Omitting the options dictionary will result in all available Batches being returned.

The structure of the options dictionary will depend on the type of Data Asset being used. The valid keys for the options dictionary can be found by checking the Data Asset's batch_request_options property.

Python
print(my_asset.batch_request_options)

The batch_request_options property is a tuple that contains all the valid keys that can be used to limit the Batches returned in a Batch Request.

You can create a dictionary of keys pulled from the batch_request_options tuple and values that you want to use to specify the Batch or Batches your Batch Request should return, then pass this dictionary in as the options parameter when you build your Batch Request.

Build your Batch Request

Use the build_batch_request(...) method of your Data Asset to generate a Batch Request.

Python
my_batch_request = my_asset.build_batch_request()

For dataframe Data Assets, the dataframe is always specified as the argument of exactly one API method:

Python
my_batch_request = my_asset.build_batch_request(dataframe=dataframe)

Extract a Batch from a Batch Request (Optional)

You can use the Python slice function to remove a subset of data from a Batch Request and use a specific selection of records to build Metrics, Validations, and Profiles. In the following example, data is sliced and filtered by column, but you can also use other parameters such as time or date to slice and filter data.

Run the following code to retrieve an entire table of data from a SQL datasource:
Python
```
table_asset = datasource.add_table_asset(name="my_asset", table_name=my_table_name)
```
Run the following code to define the column to slice:
Python
```
table_asset.add_splitter_column_value("vendor_id")
```

Run the following code to slice and filter the column:

Python
my_batch_request = my_asset.build_batch_request({"vendor_id": 1})

Verify that the correct Batches were returned

The get_batch_list_from_batch_request(...) method will return a list of the Batches a given Batch Request refers to.

Python
batches = my_asset.get_batch_list_from_batch_request(my_batch_request)

Because Batch definitions are quite verbose, it is easiest to determine what data the Batch Request will return by printing just the batch_spec of each Batch.

Python
for batch in batches:
    print(batch.batch_spec)

Next steps

Now that you have a retrieved data from a Data Asset, you may be interested in creating Expectations about your data:

How to create Expectations while interactively evaluating a set of data

Prerequisites​

Import GX and instantiate a Data Context​

Retrieve your Data Asset​

Build an options dictionary for your Batch Request (Optional)​

Build your Batch Request​

Extract a Batch from a Batch Request (Optional)​

Verify that the correct Batches were returned​

Next steps​