How to dynamically load evaluation parameters from a database

This guide will help you create an Expectation that loads part of its Expectation Configuration from a database at runtime. Using a dynamic Evaluation Parameter makes it possible to maintain part of an Expectation Suite in a shared database.

Prerequisites - This how-to guide assumes you have already:

Steps

  1. Add a new SqlAlchemy Query Store to your context

    A SqlAlchemy Query Store acts as a bridge that can query a SqlAlchemy-connected database and return the result of the query to be available for an evaluation parameter.

    Find the stores section in your great_expectations.yml file, and add the following configuration for a new store called “my_query_store”. You can add and reference multiple Query Stores with different names.

    my_query_store:
      class_name: SqlAlchemyQueryStore
      credentials: ${rds_movies_db}
      queries:
        current_genre_ids: "SELECT id FROM genres;"  # The query name and value can be replaced with your desired query
    

    Ensure you have added valid credentials to the config-variables.yml file:

    rds_movies_db:
      drivername: postgresql
      host: <<hostname>>  # Update with your hostname
      port: 5432
      username: testuser  # Update with your username
      password: <<password>>  # Update with your password
      database: testdb  # Update with your database
    
  2. In a notebook, get a test batch of data to use for validation.

    import great_expectations as ge
    context = ge.DataContext()
    
    
    batch_kwargs = {
        "datasource": "movies_db",
        "table": "genres_movies"
    }
    expectation_suite_name = "genres_movies.fkey"
    context.create_expectation_suite(expectation_suite_name, overwrite_existing=True)
    batch = context.get_batch(
        batch_kwargs=batch_kwargs,
        expectation_suite_name=expectation_suite_name
    )
    
  3. Define an expectation that relies on a dynamic query

    Great Expectations recognizes several types of Evaluation Parameters that can use advanced features provided by the Data Context. To dynamically load data, we will be using a store-style URN, which starts with “urn:great_expectations:stores”. The next component of the URN is the name of the store we configured above (my_query_store), and the final component is the name of the query we defined above (current_genre_ids):

    batch.expect_column_values_to_be_in_set(
        column="genre_id",
        value_set={"$PARAMETER": "urn:great_expectations:stores:my_query_store:current_genre_ids"}
    )
    

    The SqlAlchemyQueryStore that you configured above will execute the defined query and return the results as the value of the value_set parameter to evaluate your expectation:

    {
      "meta": {
        "substituted_parameters": {
          "value_set": [
            1,
            2,
            3,
            4,
            5,
            6,
            7,
            8,
            9,
            10,
            11,
            12,
            13,
            14,
            15,
            16,
            17,
            18
          ]
        }
      },
      "result": {
        "element_count": 2891,
        "missing_count": 0,
        "missing_percent": 0.0,
        "unexpected_count": 0,
        "unexpected_percent": 0.0,
        "unexpected_percent_nonmissing": 0.0,
        "partial_unexpected_list": []
      },
      "success": true,
      "exception_info": null
    }
    

Comments