Data Quality
This functionality helps to check data for certain metrics, like completeness, uniqueness, validity, NULLs, consistency, or any custom rule you might need, like the length of the string, etc.
Last updated
This functionality helps to check data for certain metrics, like completeness, uniqueness, validity, NULLs, consistency, or any custom rule you might need, like the length of the string, etc.
Last updated
Masthead leverages Dataplex API, a native Google Cloud solution, to power and execute the data quality rules, which are in essence, SQL queries.
None. Masthead does not request permissions, nor does it read or edit clients' data. The read-only permissions should be granted to Dataplex, a Google Cloud native service, while Masthead only helps to create and manage rules existing in Dataplex.
It can be enabled with just one click. You need to activate the Cloud Dataplex API in the connected Google Cloud Project. The link to this is provided in the Masthead UI, under the Data Quality tab. Additionally, rules must be manually created for each table you wish to monitor. Once the rules are created and scheduled, they will execute automatically.
This depends on each business and the metrics that are crucial for it. Data Quality rules are scheduled SQL queries. There are pre-set rules for aspects like completeness, accuracy, consistency, validity, uniqueness, and null checks.
However, any metric within any dimension can be monitored through a custom SQL rule.
Dataplex (with Masthead Data Quality) is a SQL-based solution, in contrast to Masthead Monitors, which are based on logs and metadata. Unlike Dataplex, Masthead Monitors provide automated anomaly detection across the entire data platform by identifying anomalies in time-series tables and errors in pipelines in real-time. Additionally, it does not increase cloud costs as it does not run SQL queries.
At Masthead, we believe Dataplex is the optimal solution for Google Cloud users to implement data quality checks. The benefits include:
It's 100x cheaper. Pay only for SQL execution, which helps avoid the 100X additional costs associated with purchasing another SQL-first solution plus to compute costs for executed queries.
The safest approach. Doesn't expose data to a third-party vendor, which also eliminates the need to spend time with security and legal teams for onboarding a third-party tool.
No additional vendor lock-in.
Can we sample the data?
Yes, to make data quality checks more cost-efficient, there's an easy-to-use UI-based setting that allows you to limit the scope and sample the data queried. You can also filter it like any SQL query, based on data dimensions.
How much does it cost to run data quality checks?
Masthead itself doesn't charge for running data quality checks. Dataplex pricing is on pay-as-you-go usage and cost to run data quality check is based on Dataplex Processing price. The cost of each data quality check is equivalent to the data processed in BigQuery at the chosen data server location and the selected pricing plan for the particular Google Cloud project. Put simply, customers pay only for the execution of rules.
After creating a rule in Masthead, will it appear in Dataplex in Google Cloud?
Yes, once a rule is created in Masthead, it will be available in Dataplex UI and vice versa. Rules established in Masthead can be modified or deleted in Dataplex at any time.