What is Masthead Data?

Masthead is an end-to-end data reliability solution for Google Cloud users that monitors data pipelines and data products starting in CloudSQL - Google BigQuery - Looker.

Why Masthead Data?

Masthead automates checks for table health and pipeline health out of the box for the entire warehouse at scale without ever reading clients' data. We process logs and metadata to deliver any issues with freshness, volume, schema changes, or pipeline errors. You do not have to go through logs to debug it anymore, Masthead surfaces and alerts you about any malfunction in Google Cloud Project.

Our goal is to build a platform for data engineers where they will be on top of things when pipelines or data fail.

Data anomaly detection

Systems break, and so does data. Having a system that monitors each time-series table, every pipeline, and every model running in your data environment ensures comprehensive visibility across the entire system at scale. Imagine managing 15,000 tables, views, and external tables in Google BigQuery as a part of everyday data warehouse scaling.

By utilizing logs, Masthead provides insights into freshness, volume, and schema changes for every table within Google BigQuery, and Google Cloud Project acting like a cardiogram for the platform.

Pipeline and model observability

Pipelines and models often fail due to various reasons such as inappropriate setup, retry errors, SQL errors, permission issues, among others. The challenge lies in identifying when and what is failing, as well as who can rectify it. With the multitude of solutions that data engineers use today for data ingestion, transformation, and distribution, achieving observability across the entire ecosystem has become increasingly difficult.

At Masthead, we provide observability for pipelines and models without requiring integration with any ETL or data modeling tools. We alert data teams about any pipeline or system failures in real-time, within 15 minutes of Masthead's deployment, ensuring that every job runs as expected.

Data quality

There are crucial business metrics that need constant monitoring, such as active users per app version, and the length of strings for hashed credit card numbers. Every business has its unique metrics and use cases, which can be monitored by setting store rules. However, the challenge arises in empowering this monitoring without compromising data privacy by exposing sensitive business information to third parties.

At Masthead, we place utmost importance on our clients' data privacy and security. That's why we utilize Google Cloud's Dataplex to create and implement rules-based checks without ever accessing client data.

Metadata management & Lineage

In today's complex data environments, effective metadata management is essential for searching and accessing necessary data or products. Masthead enhances this process with a column-level lineage data dictionary for Google BigQuery and Looker, improving data discoverability and accessibility within organizations.

Our goal is to enable efficient navigation and search through large datasets for relevant information, while also facilitating quicker root cause analysis of issues in pipelines or data.

Compute Cost Management

Cloud cost management has become a top priority for businesses increasingly reliant on cloud services, yet many still struggle to manage their costs effectively. Data teams frequently run experiments that impact both compute and storage costs.

At Masthead, we provide detailed cost analyses for every running pipeline, job, or model. This enables a precise understanding of the costs associated with using any solution in your Google BigQuery environment, including dbt, Fivetran, or Data Transfer. As a result, teams can quickly identify and eliminate orphan processes, instantly reducing compute costs, and accurately attribute the cost of delivering any data products.

Last updated