Data Anomaly Detection

This functionality helps to detect anomalies or outliers in time series data by utilizing ML and accessing logs and metadata of your Google BigQuery.

What data do you query?

None. Masthead does not request permissions nor read or edit clients' data.

How long does it take to deliver the first insights (catch first anomalies)?

Up to 6 hours, depending on the number of tables in GBQ. During the deployment, Masthead parses retrospective logs, which allows an understanding of the patterns for every time-series table within GBQ.

What metrics are included in the data anomaly detection?

Freshness – the recency of table update. To identify the frequency of each table update, Masthead uses GCP logs and delivers this automatically.

Volume – volume of data received per insert and per aggregate step.

Schema changes – any schema changes that affected the tables.

Do you need to enable it manually?

No. Masthead parses retrospective logs and detects time-series tables automatically. It does not need any input from users.

Anomaly detection. Freshness

Our automated freshness monitoring system tracks the frequency of updates on a table and notifies you when the latest update becomes outdated by creating an accident in Masthead UI and sending an alert in the Masthead Slack channel.

Masthead analyzes one month's worth of retrospective logs to inform its ML model, ensuring that the freshness metric is applied to all time series tables within 5-6 hours of Masthead's deployment.

Freshness anomalies are detected in real-time. As Masthead examines log patterns, any deviation from the expected data ingestion schedule immediately triggers an alert about the anomaly.

In the incident tab, you will find:

  1. The name of the table where the incident occurred.

  2. Location on the table: its project and dataset.

  3. A graph indicating when the table failed to update, marked in red. Each blue bar represents an update event for the table.

  4. Below each incident, there is a display showing the duration of the missed updates and the expected update frequency based on past patterns.

Anomaly detection. Volume

Our automated volume anomaly detection examines variations in the number of rows within a table and provides real-time alerts for unexpected data volume changes. This includes significant additions or deletions of data or any unusual patterns in row changes.

Masthead analyzes one month's worth of retrospective logs to inform its ML model, ensuring that the freshness metric is applied to all time series tables within 5-6 hours of Masthead's deployment.

Volume anomalies are detected in real-time. When Masthead analyzes log patterns, any deviation from the expected data range during ingestion instantly triggers an anomaly alert, both in the Masthead interface and on Slack.

In the example above, you can see a table consistently adding rows and a sudden drop below the expected range during two ingestions.

In the incident tab, you will find:

  1. The name of the table where the incident occurred.

  2. Location on the table: its project and dataset.

  3. A graph indicating when the table failed to update, marked in red. Each blue bar represents an updated event for the table.

  4. Below each incident, there is a display showing the duration of the missed updates and the expected update frequency based on past patterns.

Schema change. Anomaly detection

Our automated schema monitoring system detects any alterations in the structure of a table or view, such as the addition or removal of columns, changes in their data types, or wrong column count during insertion.

In the example above, you can see that the pipeline was ingesting 8 columns in a table, whereas it was supposed to have 9.

In the incident tab, you will find:

  1. The name of the table where the incident occurred.

  2. Location on the table: its project and dataset.

  3. Service account that did the change.

  4. Details of the change.

  5. Time of the error and source code of the change.

Commonly asked questions

What is the look-back period during onboarding?

We use 4 weeks of retrospective logs available in the audit log.

Do you access the schema table?

No, to collect the necessary data points, Masthead uses only logs.

Which tables can be monitored in the data warehouse using Masthead's anomaly detection?

All tables that have been updated within the cadences in the past month prior to Masthead's deployment will be automatically monitored.

Last updated