Pipeline and Model Observability

As the number of data pipelines in an organization increases, the complexity of creating, managing, deploying, and operating these pipelines also rises. Similarly, diagnosing, debugging, rolling back, or even identifying the owner of a pipeline becomes more challenging when issues arise in production. Additionally, complex SQL workflows for data transformation in BigQuery are prone to many different types of errors.

Masthead monitors all jobs executed in connected GCP projects and provides real-time alerts for any pipeline and model errors. It offers detailed descriptions of issues, along with other relevant information about the failed job, aiding in quick troubleshooting.

Error examples include infrastructure-related issues such as access denials for pipeline processing, exceeded deadlines while globbing file patterns, or data-related issues like missing staging files in Google Storage. Common data transformation errors might range from 'unrecognized file formats' and 'missing required files' to 'syntax errors'.

These and many more types of errors can break your continuous data import and transformation.

In the Incident tab, you will find:

  1. The name of the destination table or project where the incident occurred.

  2. The project location of the job.

  3. The service account that executed the job.

  4. The key.

  5. Details of the error.

  6. For each incident, the incident card has the time of the error and the source code of the pipeline or model, which can be accessed by clicking on it.

Commonly asked questions

Does the user need to implement any sort of SDK to receive alerts about these types of anomalies?

No. Masthead notifies users about these types of errors by monitoring logs.

How quickly are pipeline errors reported?

Masthead notifies users about pipeline and data model errors within seconds after their occurrence.

Does Masthead catch errors produced by dbt, Fivetran, and Python scripts?

Masthead alerts users about pipeline errors regardless of the solution used; it does not require direct integration with any of your data stack solutions.

Last updated