How to request data from a Data Asset
This guide demonstrates how you can request data from a Datasource that has been defined with the
- An installation of GX
- A Datasource with a configured Data Asset
1. Import GX and instantiate a Data Context
The code to import Great Expectations and instantiate a Data Context is:
import great_expectations as gx
context = gx.get_context()
2. Retrieve your Data Asset
If you already have an instance of your Data Asset stored in a Python variable, you do not need to retrieve it again. If you do not, you can instantiate a previously defined Datasource with your Data Context's
get_datasource(...) method. Likewise, a Datasource's
get_asset(...) method will instantiate a previously defined Data Asset.
In this example we will use a previously defined Datasource named
my_datasource and a previously defined Data Asset named
my_asset = context.get_datasource("my_datasource").get_asset("my_asset")
3. (Optional) Build an
options dictionary for your Batch Request
options dictionary can be used to limit the Batches returned by a Batch Request. Omitting the
options dictionary will result in all available Batches being returned.
The structure of the
options dictionary will depend on the type of Data Asset being used. The valid keys for the
options dictionary can be found by checking the Data Asset's
batch_request_options property is a tuple that contains all the valid keys that can be used to limit the Batches returned in a Batch Request.
You can create a dictionary of keys pulled from the
batch_request_options tuple and values that you want to use to specify the Batch or Batches your Batch Request should return, then pass this dictionary in as the
options parameter when you build your Batch Request.
4. Build your Batch Request
We will use the
build_batch_request(...) method of our Data Asset to generate a Batch Request.
my_batch_request = my_asset.build_batch_request()
dataframe Data Assets, the
dataframe is always specified as the argument of exactly one API method:
my_batch_request = my_asset.build_batch_request(dataframe=dataframe)
5. Verify that the correct Batches were returned
get_batch_list_from_batch_request(...) method will return a list of the Batches a given Batch Request refers to.
batches = my_asset.get_batch_list_from_batch_request(my_batch_request)
Because Batch definitions are quite verbose, it is easiest to determine what data the Batch Request will return by printing just the
batch_spec of each Batch.
for batch in batches:
Now that you have a retrieved data from a Data Asset, you may be interested in creating Expectations about your data: