Insights
insights table
The core insights data table that provides actionable recommendations for cost optimization and data management.
Description
The core insights data table for Masthead’s customers consolidates recommendations for cost optimization by identifying unused and dead-end tables, along with potential actions to reduce storage costs and improve data management efficiency.
Schema
| Column name | Data type | Value |
|---|---|---|
category | STRING | Category of the insight (currently only ‘Cost’; other categories may be added in the future) |
type | STRING | Type of insight; valid values include ‘Dead end’, ‘Unused’, etc. |
subtype | STRING | Specific subtype of the insight (e.g., ‘Dead end table’, ‘Unused table’) |
tenant | STRING | Account/tenant identifier |
project_id | STRING | Google Cloud project ID |
target_resource | STRING | Name of the target resource |
last_updated_time | TIMESTAMP | When this insight was last updated |
overview | JSON | JSON object with insight details (see details per insight type) |
operations | JSON | Array of recommended actions to address the insight (see details per insight type) |
Insight Types
Cost Category
Dead End
Leaf dead end table- Tables that are not used by any downstream processes and are at the end of the data lineageDead end table- Tables that have no downstream dependencies and may be candidates for removal
JSON columns schema
Overview:
{ "resource_type": "TABLE|VIEW", "resource_name": "project.dataset.table", "num_bytes": 123456, "creation_time": "2025-01-01T00:00:00", "last_modified_time": "2025-01-15T00:00:00", "cost_30d": 10.50, "savings_30d": 10.50}Operations:
[ { "action": "drop", "resource_type": "TABLE|VIEW", "resource_name": "project.dataset.table" }]Dead end pipeline- Pipelines that do not produce any useful output and are not used by any downstream processes. (Not yet implemented)
Unused Tables
Unused table- Tables that haven’t been accessed or queried within the configured time window
JSON columns schema
Overview:
{ "resource_type": "TABLE|VIEW", "resource_name": "project.dataset.table", "num_bytes": 123456, "creation_time": "2025-01-01T00:00:00", "last_modified_time": "2025-01-15T00:00:00", "cost_30d": 10.50, "savings_30d": 10.50}Operations:
[ { "action": "drop", "resource_type": "TABLE|VIEW", "resource_name": "project.dataset.table" }]Storage Billing Model
Storage billing model- Datasets that can be moved to a more cost-effective storage billing model based on their update patterns and data compression.
JSON columns schema
Overview:
{ "resource_type": "dataset", "resource_name": "project.dataset", "storage_billing_model": "LOGICAL", "location": "us-central1", "cost_30d": 100.00, "savings_30d": 20.00}Operations:
[ { "action": "alter", "resource_type": "dataset", "resource_name": "project.dataset", "recommended_storage_billing_model": "PHYSICAL" }]Compute Model
Compute model per pipeline- Recommendations for adjusting compute model assignment based on resources usage. (Not yet implemented)
Usage Examples
Find Dead End Tables
-- Get all dead end tables with their detailsSELECT subtype, project_id, target_resource, INT64(overview.num_bytes) AS num_bytes, STRING(overview.resource_type) AS resource_type, last_updated_timeFROM `masthead-prod`.DATASET_NAME.insightsWHERE category = 'Cost' AND type = 'Dead end'ORDER BY num_bytes DESC;-- Get all dead end tables with their detailsSELECT subtype, project_id, target_resource, INT64(overview.num_bytes) AS num_bytes, STRING(overview.resource_type) AS resource_type, last_updated_timeFROM `masthead-prod`.DATASET_NAME.insightsWHERE category = 'Cost' AND type = 'Dead end'ORDER BY num_bytes DESC;Find Unused Tables
-- Get all unused tables sorted by sizeSELECT project_id, target_resource, INT64(overview.num_bytes) AS num_bytes, STRING(overview.resource_type) AS resource_type, last_updated_timeFROM `masthead-prod`.DATASET_NAME.insightsWHERE category = 'Cost' AND type = 'Unused'ORDER BY num_bytes DESC;Calculate Potential Storage Savings
-- Calculate total storage used by dead end and unused tablesSELECT type, COUNT(*) AS resource_count, SUM(SAFE.INT64(overview.num_bytes)) AS total_bytes, ROUND(SUM(SAFE.INT64(overview.num_bytes)) / POW(1024, 3), 2) AS total_gb, ROUND(SUM(overview.cost_30d), 2) AS monthly_cost_usd, ROUND(SUM(overview.savings_30d), 2) AS monthly_savings_usdFROM `masthead-prod`.DATASET_NAME.insightsWHERE category = 'Cost' AND subtype IN ('Dead end table', 'Unused table') AND overview.num_bytes IS NOT NULLGROUP BY typeORDER BY total_bytes DESC;Get Insights by Project
-- View insights grouped by projectSELECT project_id, type, COUNT(*) AS insight_count, SUM(SAFE.INT64(overview.num_bytes)) AS total_bytesFROM `masthead-prod`.DATASET_NAME.insightsWHERE category = 'Cost' AND subtype IN ('Dead end table', 'Unused table') AND overview.num_bytes IS NOT NULLGROUP BY project_id, typeORDER BY total_bytes DESC;Storage Billing Model Recommendations
-- Get a DDL statement to alter the storage billing model for recommended datasetsSELECT *, CONCAT('ALTER SCHEMA `', SAFE.STRING(operations[0].resource_name), '` SET OPTIONS(storage_billing_model = "', SAFE.STRING(operations[0].recommended_storage_billing_model),'");') AS ddlFROM `masthead-prod`.DATASET_NAME.insightsWHERE type = 'Storage billing model'ORDER BY FLOAT64(overview.savings_30d) DESCLimitations
- Preview Version: This is an early preview version and the schema may change as we gather feedback.
- Data Freshness: Insights are updated daily based on usage patterns and lineage analysis.
- Data Retention: Insights include the most recent data, based on the activity within the account’s configured lookback window (30 days by default).
- False Positives: see details in Dead-end and Unused Tables section.
Best Practices
- Review Before Action: Always manually verify insights before taking action on production data.
- Test Removals: Consider moving tables to a staging area before permanent deletion.
- Check Dependencies: Cross-reference with the lineage table to understand data dependencies.
- Monitor Trends: Regular review of insights can help identify data management patterns and opportunities.
- Coordinate with Teams: Ensure data owners are aware before removing resources they may depend on.