Automating data quality rules
With GX Cloud, you can automatically generate data quality rules to more quickly achieve test coverage for your data. This page provides an overview of the following options:
- Automating Anomaly Detection rules as part of adding a new Data Asset.
- Generating personalized AI-recommended rules for an existing Data Asset.
Anomaly Detection
When you add a new Data Asset, GX Cloud by default generates Expectations to detect anomalies in the following:
- Schema
- Volume
- Completeness
- Uniqueness (coming soon)
Schema
To detect schema anomalies, we automatically generate a rule to expect table columns to match set using the Data Asset’s initial columns as the set to match. If the number or names of columns in the Data Asset change, this Expectation will fail.
Volume
To detect non-increasing volume, we automatically generate a rule to expect table row count to be between with dynamic parameters that test that the current validation run has more rows than the previous run. If the row count shrinks or stays the same between runs, this Expectation will fail.
Completeness
To detect completeness anomalies, we automatically generate rules for every column to expect column proportion of non-null values to be between thresholds that depend on the column's initial proportion of non-null values.
- If a column initially has no null values, GX Cloud generates a rule to test that the column continues to have no null values.
- If a column initially has all null values, GX Cloud generates a rule to test that the column continues to have all null values.
- If a column starts with a mix of null and non-null values, GX Cloud generates a rule with dynamic parameters to test that the proportions stay close to the average of the last 5 Validation runs.
If the proportions change at all for a column that started with all null values or no null values, its generated completeness Expectation will fail. If the proportions change a bit for a column that started with a mix of null and non-null values, its generated completeness Expectation will pass; if the change is drastic, the generated completeness Expectation will fail.
Personalized recommendations with ExpectAI Beta
ExpectAI (BETA) performs deep analysis on a given Data Asset to set Expectations based on patterns in the data. These AI-recommended data quality rules are sometimes based on anomalies detected in the data, so they may fail on the first validation to bring your attention to potential problems.