Skip to main content
Version: 0.18.9

Request data from a Data Asset

Learn how you can request data from a Data Source that has been defined with the context.sources.add_* method.

Prerequisites

    Import GX and instantiate a Data Context

    Run the following Python code to import GX and instantiate a Data Context:

    Python
    import great_expectations as gx

    context = gx.get_context()

    Retrieve your Data Asset

    If you already have an instance of your Data Asset stored in a Python variable, you do not need to retrieve it again. If you do not, you can instantiate a previously defined Data Source with your Data Context's get_datasource(...) method. Likewise, a Data Source's get_asset(...) method will instantiate a previously defined Data Asset.

    In this example we will use a previously defined Data Source named my_datasource and a previously defined Data Asset named my_asset.

    Python
    my_asset = context.get_datasource("my_datasource").get_asset("my_asset")

    Build an options dictionary for your Batch Request (Optional)

    An options dictionary can be used to limit the Batches returned by a Batch Request. Omitting the options dictionary will result in all available Batches being returned.

    The structure of the options dictionary will depend on the type of Data Asset being used. The valid keys for the options dictionary can be found by checking the Data Asset's batch_request_options property.

    Python
    print(my_asset.batch_request_options)

    The batch_request_options property is a tuple that contains all the valid keys that can be used to limit the Batches returned in a Batch Request.

    You can create a dictionary of keys pulled from the batch_request_options tuple and values that you want to use to specify the Batch or Batches your Batch Request should return, then pass this dictionary in as the options parameter when you build your Batch Request.

    Build your Batch Request

    Use the build_batch_request(...) method of your Data Asset to generate a Batch Request.

    Python
    my_batch_request = my_asset.build_batch_request()

    For dataframe Data Assets, the dataframe is always specified as the argument of exactly one API method:

    Python
    my_batch_request = my_asset.build_batch_request(dataframe=dataframe)

    Extract a Batch from a Batch Request (Optional)

    You can use the Python slice function to remove a subset of data from a Batch Request and use a specific selection of records to build Metrics, Validations, and Profiles. In the following example, data is sliced and filtered by column, but you can also use other parameters such as time or date to slice and filter data.

    1. Run the following code to retrieve an entire table of data from a SQL datasource:

      Python
      table_asset = datasource.add_table_asset(name="my_asset", table_name=my_table_name)
    2. Run the following code to define the column to slice:

      Python
      table_asset.add_splitter_column_value("vendor_id")
    3. Run the following code to slice and filter the column:

      Python
      my_batch_request = my_asset.build_batch_request({"vendor_id": 1})

    Verify that the correct Batches were returned

    The get_batch_list_from_batch_request(...) method will return a list of the Batches a given Batch Request refers to.

    Python
    batches = my_asset.get_batch_list_from_batch_request(my_batch_request)

    Because Batch definitions are quite verbose, it is easiest to determine what data the Batch Request will return by printing just the batch_spec of each Batch.

    Python
    for batch in batches:
    print(batch.batch_spec)

    Next steps

    Now that you have a retrieved data from a Data Asset, you may be interested in creating Expectations about your data: