Data Anomaly Detection

This functionality helps to detect anomalies or outliers in time series data by accessing logs and metadata of Google BigQuery and utilizing machine learning.

What data do you query?

None. Masthead does not request permissions nor read or edit clients' data.

How long does it take to deliver the first insights (catch first anomalies)?

Up to 6 hours, depending on the number of tables in BigQuery. During the deployment, Masthead parses retrospective logs, which allows an understanding of the patterns for every time-series table within BigQuery.

What metrics are included in the data anomaly detection?

Freshness – the recency of a table update. Masthead automatically identifies the frequency of each table update by using GCP logs.
Volume – volume of data received per insert and per aggregate step.
Data Quality scans – custom data property changes. Masthead uses the data of the regularly scheduled data scans to analyze the anomalies.

Do you need to enable it manually?

No. For Freshness and Volume Masthead parses retrospective logs and detects time-series automatically. Only Data Quality anomaly detection requires the configuration of custom data scans.

Freshness

Our automated freshness monitoring system tracks the frequency of updates on a table and notifies when the latest update becomes outdated by creating an accident in Masthead UI and sending an alert to the Slack channel.

Masthead analyzes one month's worth of retrospective logs to inform its ML model, ensuring that the freshness metric is applied to all time series tables within 5-6 hours of Masthead's deployment.

Freshness anomalies are detected in real-time. As Masthead examines log patterns, any deviation from the expected data ingestion schedule immediately triggers an alert about the anomaly.

In the Incident tab, you will find:

The name of the table where the incident occurred.
Location on the table: its project and dataset.
A graph indicating when the table failed to update, marked in red. Each blue bar represents an update event for the table.
Below each incident, there is a display showing the duration of the missed updates and the expected update frequency based on past patterns.

Volume

Our automated volume anomaly detection examines variations in the number of rows within a table and provides real-time alerts for unexpected data volume changes. This includes significant additions or deletions of data or any unusual patterns in row changes.

Masthead analyzes one month's worth of retrospective logs to inform its ML model, ensuring that the freshness metric is applied to all time series tables within 5-6 hours of Masthead's deployment.

Volume anomalies are detected in real-time. When Masthead analyzes log patterns, any deviation from the expected data range during ingestion instantly triggers an anomaly alert, both in the Masthead interface and on Slack.

In the example above, you can see a table consistently adding rows and a sudden drop below the expected range during two ingestions.

In the Incident tab, you will find:

The name of the table where the incident occurred.
Location on the table: its project and dataset.
A graph indicating when the table failed to update, marked in red. Each blue bar represents an updated event for the table.
Below each incident, there is a display showing the duration of the missed updates and the expected update frequency based on past patterns.

Data Quality

By leveraging custom data quality rules, you can implement specific validation queries tailored to your business needs. The results of these checks are stored in dedicated tables within your project, providing a clear record of data quality status over time. See integration steps required in Data Quality.

Masthead will analyze the scan results, identify thresholds and triggers for the incidents, and send notifications via your notification service integration.

Commonly asked questions

What is the look-back period during onboarding?

Masthead uses 4 weeks of retrospective logs available in the audit log.

Do you access the schema table?

No, to collect the necessary data points, Masthead uses only logs.

Which tables can be monitored in the data warehouse using Masthead's anomaly detection?

All tables that have been updated within the cadences in the past month prior to Masthead's deployment will be automatically monitored.

Last updated 4 months ago