modellerUpdated 2026-05-02

Data Quality Rules

What this covers

Data quality rules let you declare constraints on physical source columns and automatically detect violations after each aggregate refresh. This page explains the five rule types, when to use them, how violations are recorded, and the block_on_failure option.

Why data quality rules matter

Aggregates are only useful if the underlying source data is clean. A fact table with NULL transaction amounts produces misleading subtotals. Duplicate keys on a dimension corrupt join cardinalities. A date column accepting free text silently breaks time-variant measure calculations. By the time a data analyst notices that "March revenue" doubled overnight, the corrupt aggregate has already been served to a dashboard.

Data quality rules close this loop. They run automatically after each refresh and write violation counts back to the aggregate metadata. When violations breach a threshold, a model alert is raised. If block_on_failure is enabled, the alert is escalated to error severity and clearly signals that downstream data should not be trusted.

Rule types

Rule typeWhat it checksConfigured by
not_nullCount of NULL values in the columnNo extra config needed
uniqueCount of duplicate values in the columnNo extra config needed
rangeRows where the value falls outside [min, max]rule_config: { "min": 0, "max": 100 }
regexRows where the column does not match a regular expressionrule_config: { "pattern": "^\\d{4}-\\d{2}-\\d{2}$" }
custom_sqlA hand-written SQL query that must return the violation count as its first columnrule_config: { "sql": "SELECT COUNT(*) FROM my_table WHERE status NOT IN ('active','closed')" }

custom_sql considerations

Targets

A rule targets a specific object within the model:

target_typeMeaning
columnA specific ModelColumn row; the rule's target_id is the column UUID
dimensionThe source column of a dimension; target_id is the dimension UUID
measureThe source column of a standard measure; target_id is the measure UUID

Severity levels

SeverityEffect on model alerts
infoNo model alert raised
warnWarning alert raised
errorError alert raised

When block_on_failure is enabled, any violation produces an error-severity alert regardless of the rule's severity setting.

block_on_failure

When block_on_failure is set to true on a rule:

Use block_on_failure for rules where a single violation means the data is definitively wrong and the aggregate should not be served. Examples: a primary key column on the fact table that must be unique, a date column that must never be NULL.

Automatic validation

Rules run automatically after each full aggregate refresh. You can also trigger validation manually using the Run Checks button in the Data Quality panel.

After validation completes:

Violation history

Each rule keeps a violation history. Click the expand button on a rule row to see past violation events with timestamps and sample values. To clear stale violation history, use the Clear button on the rule row.

Common patterns

Fact table NOT NULL guard. Create a not_null rule targeting the fact table's primary key column with severity=error and block_on_failure=true. Any NULL in that column means the CTAS joined incorrectly and the aggregate should not be trusted.

Dimension code uniqueness. Create a unique rule targeting the natural key of a dimension table. A duplicate key means two rows will join to the same fact record, inflating measures.

Date format guard. For a string-typed date column, create a regex rule with a date pattern. Out-of-format values will cause time-variant measure calculations to fail silently.

Business range check. For a unit_price column, create a range rule with min=0 and max=99999. Negative or absurdly large prices indicate upstream ETL issues.

Limits

Related