Skip to content

What's Masthead Data?

Masthead is an end-to-end data reliability solution for Google Cloud users that monitors data pipelines and data products in Google BigQuery.

Masthead automates checks for table health and pipeline health out of the box for the entire warehouse at scale without ever reading clients’ data. Masthead processes logs and metadata to deliver any issues with freshness, volume, schema changes, or pipeline errors. You don’t have to go through logs to debug it anymore, Masthead surfaces and alerts about any malfunction in Google Cloud Project.

The goal is to build a platform for data engineers to stay informed when pipelines or data fail.

Masthead data trust framework

Masthead data trust framework

In today’s complex data environments, effective metadata management is essential for searching and accessing necessary data or products. Masthead enhances this process with a column-level lineage data dictionary for Google BigQuery and Looker, improving data discoverability and accessibility within organizations.

The platform enables efficient navigation and search through large datasets for relevant information, while also facilitating quicker root cause analysis of issues in pipelines or data.

Systems break, and so does data. Having a system that monitors each time-series table, every pipeline, and every model running in data environment ensures comprehensive visibility across the entire system at scale. Imagine managing 15,000 tables, views, and external tables in Google BigQuery as a part of everyday data warehouse scaling.

By utilizing logs, Masthead provides insights into freshness, volume, and schema changes for every table within Google BigQuery, and Google Cloud Project acting like a cardiogram for the platform.

Pipelines and models often fail due to various reasons such as inappropriate setup, retry errors, SQL errors, permission issues, among others. The challenge lies in identifying when and what’s failing, as well as who can rectify it. With the multitude of solutions that data engineers use today for data ingestion, transformation, and distribution, achieving observability across the entire ecosystem has become increasingly difficult.

Masthead provides observability for pipelines and models without requiring integration with any ETL or data modeling tools. It alerts data teams about any pipeline or system failures in real-time, within 15 minutes after deploying Masthead, ensuring that every job runs as expected.

Crucial business metrics require constant monitoring, such as active users per app version and the length of strings for hashed credit card numbers. Because every business has unique metrics and use cases, you can monitor them by setting store rules. However, the challenge arises in enabling this monitoring without compromising data privacy by exposing sensitive business information to third parties.

Masthead prioritizes client data privacy and security, utilizing Google Cloud Dataplex to create and implement rules-based checks without ever accessing client data.

Managing cloud costs has become a top priority for businesses increasingly reliant on cloud services, yet many still struggle to manage their costs effectively. Data teams frequently run experiments that impact both compute and storage costs.

Masthead provides detailed cost analyses for every running pipeline and table stored. This enables a precise understanding of the costs associated with your data products or technologies in your Google BigQuery environment, such as dbt, Looker, and Fivetran. As a result, teams can quickly identify and eliminate orphan processes, instantly reducing compute costs, and accurately attribute the cost of delivering any data products.