A Metric is a computed attribute of data such as the mean of a column.
Features and promises
Metrics are values derived from one or more BatchesA selection of records from a Data Asset. that can be used to evaluate ExpectationsA verifiable assertion about data. or to summarize the result of ValidationThe act of applying an Expectation Suite to a Batch.. It can be helpful to think of a Metric as the answer to a question. A Metric could be a statistic, such as the minimum value of the column, or a more complex object, such as a histogram. Metrics are a core part of Validating data.
Relationship to other objects
Metrics are generated as part of running Expectations against a Batch (and can be referenced as such). For example, if you have an Expectation that the mean of a column falls within a certain range, the mean of the column must first be computed to see if its value is as expected. The generation of Metrics involves Execution EngineA system capable of processing data to compute Metrics. specific logic. These Metrics can be included in Validation ResultsGenerated when data is Validated against an Expectation or Expectation Suite., based on the
result_format configured for them. In memory Validation Results can in turn be accessed by Actions, including the
StoreValidationResultAction which will store them in the Validation Results StoreA connector to store and retrieve information about objects generated when data is Validated against an Expectation Suite.. Therefore, Metrics from previously run Expectation Suites can also be referenced by accessing stored Validation Results that contain them.
Metrics are generated in accordance with the requirements of an Expectation when an Expectation is evaluated. This includes Expectations that are evaluated as part of the interactive process for creating Expectations and the Profiler process for creating Expectations.
Past Metrics can also be accessed by some Expectations through Evaluation Parameters. However, when you are creating Expectations there may not be past Metrics to provide. In these cases, it is possible to define a temporary value that the Evaluation Parameter can use in place of the missing past Metric.
Metrics are core to the Validation of data
When an Expectation should be evaluated, Great Expectations collects all the Metrics requested by the Expectation and provides them to the Expectation's validation logic. Most validation is done by comparing values from a column or columns to a Metric associated with the Expectation being evaluated.
Past Metrics are available to other Expectations and Data Docs
An Expectation can also expose Metrics, such as the observed value of a useful statistic via an Expectation Validation Result, where Data DocsHuman readable documentation generated from Great Expectations metadata detailing Expectations, Validation Results, etc. -- or other Expectations -- can use them. This is done through an Action (to which the Expectation's Validation Result has been passed) which will save them to a Metric Store. The Action in question is the
StoreMetricsAction. You can view the implementation of this Action in our GitHub.
How to access
Validation Results can expose Metrics that are defined by specific Expectations that have been validated, called "Expectation Defined Metrics." To access those values, you address the Metric as a dot-delimited string that identifies the value, such as
expect_column_values_to_be_between.result.unexpected_percent. These Metrics may be stored in a Metrics Store.
metric_kwargs_id is a string representation of the Metric Kwargs that can be used as a database key. For simple cases, it could be easily readable, such as
column=Age, but when there are multiple keys and values or complex values, it will most likely be a md5 hash of key/value pairs. It can also be
None in the case that there are no kwargs required to identify the Metric.
The following examples demonstrate how Metrics are defined:
res = df.expect_column_values_to_be_in_set(
See the How to configure a MetricsStore guide for more information.
How to create
Metrics are produced using logic specific to the Execution Engine associated with the Datasource that provides the data for the Batch Request/s that the Metric is calculated for. That logic that is defined in a
MetricProvider. When a
MetricProvider class is first encountered, Great Expectations will register the Metric and any methods that it defines as able to produce Metrics. The registered metric will then be able to be used with
Configuration of Metrics is applied when they are defined as part of an Expectation.
Metrics naming conventions
Metrics can have any name. However, for the "core" Great Expectations Metrics, we use the following conventions:
- For aggregate Metrics, such as the mean value of a column, we use the domain and name of the statistic, such as
- For map Metrics, which produce values for individual records or rows, we define the domain using the prefix "column_values" and use several consistent suffixes to provide related Metrics. For example, for the Metric that defines whether specific column values fall into an expected set, several related Metrics are defined:
column_values.in_set.unexpected_countprovides the total number of unexpected values in the domain.
column_values.in_set.unexpected_valuesprovides a sample of unexpected_values; "result_format" is one of its value_keys to determine how many values should be returned.
column_values.in_set.unexpected_rowsprovides full rows for which the value in the domain column was unexpected
column_values.in_set.unexpected_value_countsprovides a count of how many times each unexpected value occurred