Automating data quality rules
With GX Cloud, you can automatically generate data quality rules to more quickly achieve test coverage for your data. This page provides an overview of the following options:
- Automating standard rules as part of adding a new Data Asset.
- Generating personalized AI-recommended rules for an existing Data Asset.
Monitoring common issues
When you add a new Data Asset, GX Cloud by default generates Expectations to test the following common data quality issues:
- Schema
- Volume
- Completeness
- Uniqueness (coming soon)
Schema
To detect schema changes, we automatically generate a rule to expect table columns to match set using the Data Asset’s initial columns as the set to match. If the number or names of columns in the Data Asset change, this Expectation will fail.
Volume
To detect non-increasing volume, we automatically generate a rule to expect table row count to be between with dynamic parameters that test that the current validation run has more rows than the previous run. If the row count shrinks or stays the same between runs, this Expectation will fail.
Completeness
To detect completeness changes, we automatically generate rules for every column to expect column values to not be null and/or expect column values to be null. The Expectation(s) and parameters for a column depend on the column's initial null percentage.
- If a column initially has no null values, GX generates one completeness Expectation to test that the column continues to have 100% non-null values.
- If a column initially has all null values, GX generates one completeness Expectation to test that the column continues to have 100% null values.
- If a column starts with a mix of null and non-null values, GX generates two completeness Expectations with dynamic parameters to test that the null percentage stays close to the average of the last 5 Validation runs.
If the null percentage changes at all for a column that started with all null values or no null values, its generated completeness Expectation will fail. If the null percentage changes a bit for a column that started with a mix of null and non-null values, its generated completeness Expectations will pass; if the change is drastic, one of the generated completeness Expectations will fail. Which one fails depends on whether the null percentage increased or decreased.
Personalizing recommendations with ExpectAI Beta
ExpectAI (BETA) performs deep analysis on a given Data Asset to set Expectations based on patterns in the data. These AI-recommended data quality rules are sometimes based on anomalies detected in the data, so they may fail on the first validation to bring your attention to potential problems.