Skip to content

Insights

insights table

The core insights data table that provides actionable recommendations for cost optimization and data management.

Description

The core insights data table for Masthead’s customers consolidates recommendations for cost optimization by identifying unused and dead-end tables, along with potential actions to reduce storage costs and improve data management efficiency.

Schema

Column nameData typeValue
categorySTRINGCategory of the insight (currently only ‘Cost’; other categories may be added in the future)
typeSTRINGType of insight; valid values include ‘Dead end’, ‘Unused’, etc.
subtypeSTRINGSpecific subtype of the insight (e.g., ‘Dead end table’, ‘Unused table’)
tenantSTRINGAccount/tenant identifier
project_idSTRINGGoogle Cloud project ID
target_resourceSTRINGName of the target resource
last_updated_timeTIMESTAMPWhen this insight was last updated
overviewJSONJSON object with insight details (see details per insight type)
operationsJSONArray of recommended actions to address the insight (see details per insight type)

Insight Types

Cost Category

Dead End
  • Leaf dead end table - Tables that are not used by any downstream processes and are at the end of the data lineage
  • Dead end table - Tables that have no downstream dependencies and may be candidates for removal
JSON columns schema

Overview:

{
"resource_type": "TABLE|VIEW",
"resource_name": "project.dataset.table",
"num_bytes": 123456,
"creation_time": "2025-01-01T00:00:00",
"last_modified_time": "2025-01-15T00:00:00",
"cost_30d": 10.50,
"savings_30d": 10.50
}

Operations:

[
{
"action": "drop",
"resource_type": "TABLE|VIEW",
"resource_name": "project.dataset.table"
}
]
  • Dead end pipeline - Pipelines that do not produce any useful output and are not used by any downstream processes. (Not yet implemented)
Unused Tables
  • Unused table - Tables that haven’t been accessed or queried within the configured time window
JSON columns schema

Overview:

{
"resource_type": "TABLE|VIEW",
"resource_name": "project.dataset.table",
"num_bytes": 123456,
"creation_time": "2025-01-01T00:00:00",
"last_modified_time": "2025-01-15T00:00:00",
"cost_30d": 10.50,
"savings_30d": 10.50
}

Operations:

[
{
"action": "drop",
"resource_type": "TABLE|VIEW",
"resource_name": "project.dataset.table"
}
]
Storage Billing Model
  • Storage billing model - Datasets that can be moved to a more cost-effective storage billing model based on their update patterns and data compression.
JSON columns schema

Overview:

{
"resource_type": "dataset",
"resource_name": "project.dataset",
"storage_billing_model": "LOGICAL",
"location": "us-central1",
"cost_30d": 100.00,
"savings_30d": 20.00
}

Operations:

[
{
"action": "alter",
"resource_type": "dataset",
"resource_name": "project.dataset",
"recommended_storage_billing_model": "PHYSICAL"
}
]
Compute Model
  • Compute model per pipeline - Recommendations for adjusting compute model assignment based on resources usage. (Not yet implemented)

Usage Examples

Find Dead End Tables

-- Get all dead end tables with their details
SELECT
subtype,
project_id,
target_resource,
INT64(overview.num_bytes) AS num_bytes,
STRING(overview.resource_type) AS resource_type,
last_updated_time
FROM `masthead-prod`.DATASET_NAME.insights
WHERE category = 'Cost'
AND type = 'Dead end'
ORDER BY num_bytes DESC;
-- Get all dead end tables with their details
SELECT
subtype,
project_id,
target_resource,
INT64(overview.num_bytes) AS num_bytes,
STRING(overview.resource_type) AS resource_type,
last_updated_time
FROM `masthead-prod`.DATASET_NAME.insights
WHERE category = 'Cost'
AND type = 'Dead end'
ORDER BY num_bytes DESC;

Find Unused Tables

-- Get all unused tables sorted by size
SELECT
project_id,
target_resource,
INT64(overview.num_bytes) AS num_bytes,
STRING(overview.resource_type) AS resource_type,
last_updated_time
FROM `masthead-prod`.DATASET_NAME.insights
WHERE category = 'Cost'
AND type = 'Unused'
ORDER BY num_bytes DESC;

Calculate Potential Storage Savings

-- Calculate total storage used by dead end and unused tables
SELECT
type,
COUNT(*) AS resource_count,
SUM(SAFE.INT64(overview.num_bytes)) AS total_bytes,
ROUND(SUM(SAFE.INT64(overview.num_bytes)) / POW(1024, 3), 2) AS total_gb,
ROUND(SUM(overview.cost_30d), 2) AS monthly_cost_usd,
ROUND(SUM(overview.savings_30d), 2) AS monthly_savings_usd
FROM `masthead-prod`.DATASET_NAME.insights
WHERE category = 'Cost'
AND subtype IN ('Dead end table', 'Unused table')
AND overview.num_bytes IS NOT NULL
GROUP BY type
ORDER BY total_bytes DESC;

Get Insights by Project

-- View insights grouped by project
SELECT
project_id,
type,
COUNT(*) AS insight_count,
SUM(SAFE.INT64(overview.num_bytes)) AS total_bytes
FROM `masthead-prod`.DATASET_NAME.insights
WHERE category = 'Cost'
AND subtype IN ('Dead end table', 'Unused table')
AND overview.num_bytes IS NOT NULL
GROUP BY project_id, type
ORDER BY total_bytes DESC;

Storage Billing Model Recommendations

-- Get a DDL statement to alter the storage billing model for recommended datasets
SELECT
*,
CONCAT('ALTER SCHEMA `', SAFE.STRING(operations[0].resource_name), '` SET OPTIONS(storage_billing_model = "', SAFE.STRING(operations[0].recommended_storage_billing_model),'");') AS ddl
FROM `masthead-prod`.DATASET_NAME.insights
WHERE type = 'Storage billing model'
ORDER BY
FLOAT64(overview.savings_30d) DESC

Limitations

  • Preview Version: This is an early preview version and the schema may change as we gather feedback.
  • Data Freshness: Insights are updated daily based on usage patterns and lineage analysis.
  • Data Retention: Insights include the most recent data, based on the activity within the account’s configured lookback window (30 days by default).
  • False Positives: see details in Dead-end and Unused Tables section.

Best Practices

  1. Review Before Action: Always manually verify insights before taking action on production data.
  2. Test Removals: Consider moving tables to a staging area before permanent deletion.
  3. Check Dependencies: Cross-reference with the lineage table to understand data dependencies.
  4. Monitor Trends: Regular review of insights can help identify data management patterns and opportunities.
  5. Coordinate with Teams: Ensure data owners are aware before removing resources they may depend on.