Insights
insights
table
insights
tableThe core insights data table that provides actionable recommendations for cost optimization and data management.
Description
An alpha version of the core insights data table for Masthead's customers. This table consolidates recommendations for cost optimization by identifying unused and dead-end tables, along with potential actions to reduce storage costs and improve data management efficiency.
Schema
category
STRING
Category of the insight (currently only 'Cost'; other categories may be added in the future)
type
STRING
Type of insight; valid values include 'Dead end', 'Unused', etc.
subtype
STRING
Specific subtype of the insight (e.g., 'Dead end table', 'Unused table')
tenant
STRING
Account/tenant identifier
project_id
STRING
Google Cloud project ID
target_resource
STRING
Name of the target resource
last_updated_time
TIMESTAMP
When this insight was last updated
Insight Types
Cost Category
Dead End
Leaf dead end table
- Tables that are not used by any downstream processes and are at the end of the data lineageDead end table
- Tables that have no downstream dependencies and may be candidates for removalDead End pipelines
- Pipelines that do not produce any useful output and are not used by any downstream processes. (Not yet implemented)
Unused Tables
Unused table
- Tables that haven't been accessed or queried within the configured time window
Switching Storage Classes
Storage billing model
- Datasets that can be moved to a more cost-effective storage billing model based on their update patterns and data compression. (Not yet implemented)
Compute Model Adjustments
Compute Model per pipeline
- Recommendations for adjusting compute model assignment based on resources usage. (Not yet implemented)
Overview object
The overview
field contains detailed information about each insight:
{
"resource_type": "TABLE|VIEW",
"resource_name": "project.dataset.table_name",
"num_bytes": 1234567890,
"potential_savings": null
}
Operations object
The operations
field contains recommended actions:
[
{
"action": "remove",
"resource_type": "TABLE|VIEW",
"resource_name": "project.dataset.table_name"
}
]
Table usage examples
Find dead end tables
-- Get all dead end tables with their details
SELECT
subtype,
project_id,
target_resource,
INT64(overview.num_bytes) AS num_bytes,
STRING(overview.resource_type) AS resource_type,
last_updated_time
FROM `masthead-prod`.account_name.insights
WHERE category = 'Cost'
AND type = 'Dead end'
ORDER BY num_bytes DESC;
Find unused tables
-- Get all unused tables sorted by size
SELECT
project_id,
target_resource,
INT64(overview.num_bytes) AS num_bytes,
STRING(overview.resource_type) AS resource_type,
last_updated_time
FROM `masthead-prod`.account_name.insights
WHERE category = 'Cost'
AND type = 'Unused'
ORDER BY num_bytes DESC;
Calculate potential storage savings
-- Calculate total storage used by dead end and unused tables
SELECT
type,
COUNT(*) AS resource_count,
SUM(SAFE.INT64(overview.num_bytes)) AS total_bytes,
ROUND(SUM(SAFE.INT64(overview.num_bytes)) / POW(1024, 3), 2) AS total_gb
FROM `masthead-prod`.account_name.insights
WHERE category = 'Cost'
AND subtype IN ('Dead end table', 'Unused table')
AND overview.num_bytes IS NOT NULL
GROUP BY type
ORDER BY total_bytes DESC;
Get insights by project
-- View insights grouped by project
SELECT
project_id,
type,
COUNT(*) AS insight_count,
SUM(SAFE.INT64(overview.num_bytes)) AS total_bytes
FROM `masthead-prod`.account_name.insights
WHERE category = 'Cost'
AND subtype IN ('Dead end table', 'Unused table')
AND overview.num_bytes IS NOT NULL
GROUP BY project_id, type
ORDER BY total_bytes DESC;
Recent insights
-- Get insights updated in the last 7 days
SELECT
category,
type,
subtype,
project_id,
target_resource,
last_updated_time
FROM `masthead-prod`.account_name.insights
WHERE DATE(last_updated_time) >= DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY)
ORDER BY last_updated_time DESC;
Limitations
Alpha Version: This is an early alpha version and the schema may change as we gather feedback.
Data Freshness: Insights are updated daily based on usage patterns and lineage analysis.
Data Retention: Insights are based on activity within the account's configured lookback window (30 days by default).
False Positives: Some tables marked as "unused" may be accessed through external tools not visible to Masthead's monitoring.
Best Practices
Review Before Action: Always manually verify insights before taking action on production data.
Test Removals: Consider moving tables to a staging area before permanent deletion.
Check Dependencies: Cross-reference with the lineage table to understand data dependencies.
Monitor Trends: Regular review of insights can help identify data management patterns and opportunities.
Coordinate with Teams: Ensure data owners are aware before removing resources they may depend on.
Last updated