insights table

`insights` table

The core insights data table that provides actionable recommendations for cost optimization and data management.

Description

The core insights data table for Masthead’s customers consolidates recommendations for cost optimization by identifying unused and dead-end tables, along with potential actions to reduce storage costs and improve data management efficiency.

Schema

Column name	Data type	Value
`category`	`STRING`	Category of the insight. Currently, this field supports only ‘Cost’. Future releases support additional categories.
`type`	`STRING`	Type of insight; valid values include ‘Dead end’, ‘Unused’, etc.
`subtype`	`STRING`	Specific subtype of the insight, such as ‘Dead end table’ or ‘Unused table’
`tenant`	`STRING`	Account/tenant identifier
`project_id`	`STRING`	Google Cloud project ID
`target_resource`	`STRING`	Name of the target resource
`last_updated_time`	`TIMESTAMP`	When this insight was last updated
`overview`	`JSON`	JSON object with insight details. For details, see the insight types section.
`operations`	`JSON`	Array of recommended actions to address the insight. For details, see the insight types section.

Insight types

Cost category

Dead end

Leaf dead end table - Tables that aren’t used by any downstream processes and are at the end of the data lineage
Dead end table - Tables that have no downstream dependencies and may be candidates for removal

JSON columns schema

Overview:

{
  "resource_type": "TABLE|VIEW",
  "resource_name": "project.dataset.table",
  "num_bytes": 123456,
  "creation_time": "2025-01-01T00:00:00",
  "last_modified_time": "2025-01-15T00:00:00",
  "cost_30d": 10.5,
  "savings_30d": 10.5
}

Operations:

[
  {
    "action": "drop",
    "resource_type": "TABLE|VIEW",
    "resource_name": "project.dataset.table"
  }
]

Dead end pipeline - Pipelines that don’t produce any useful output and that downstream processes don’t use. This feature isn’t yet implemented.

Unused tables

Unused table - Tables that users haven’t accessed or queried within the configured time window

JSON columns schema

Overview:

{
  "resource_type": "TABLE|VIEW",
  "resource_name": "project.dataset.table",
  "num_bytes": 123456,
  "creation_time": "2025-01-01T00:00:00",
  "last_modified_time": "2025-01-15T00:00:00",
  "cost_30d": 10.5,
  "savings_30d": 10.5
}

Operations:

[
  {
    "action": "drop",
    "resource_type": "TABLE|VIEW",
    "resource_name": "project.dataset.table"
  }
]

Storage billing model

Storage billing model - Datasets that you can move to a more cost-effective storage billing model based on their update patterns and data compression.

JSON columns schema

Overview:

{
  "resource_type": "dataset",
  "resource_name": "project.dataset",
  "storage_billing_model": "LOGICAL",
  "location": "us-central1",
  "cost_30d": 100.0,
  "savings_30d": 20.0
}

Operations:

[
  {
    "action": "alter",
    "resource_type": "dataset",
    "resource_name": "project.dataset",
    "recommended_storage_billing_model": "PHYSICAL"
  }
]

Compute model

Compute model per pipeline - Recommendations for adjusting compute model assignment based on resource usage. This feature isn’t yet implemented.

Usage examples

Find dead end tables

-- Get all dead end tables with their details
SELECT
  subtype,
  project_id,
  target_resource,
  INT64(overview.num_bytes) AS num_bytes,
  STRING(overview.resource_type) AS resource_type,
  last_updated_time
FROM `masthead-prod`.DATASET_NAME.insights
WHERE category = 'Cost'
  AND type = 'Dead end'
ORDER BY num_bytes DESC;

-- Get all dead end tables with their details
SELECT
  subtype,
  project_id,
  target_resource,
  INT64(overview.num_bytes) AS num_bytes,
  STRING(overview.resource_type) AS resource_type,
  last_updated_time
FROM `masthead-prod`.DATASET_NAME.insights
WHERE category = 'Cost'
  AND type = 'Dead end'
ORDER BY num_bytes DESC;

Find unused tables

-- Get all unused tables sorted by size
SELECT
  project_id,
  target_resource,
  INT64(overview.num_bytes) AS num_bytes,
  STRING(overview.resource_type) AS resource_type,
  last_updated_time
FROM `masthead-prod`.DATASET_NAME.insights
WHERE category = 'Cost'
  AND type = 'Unused'
ORDER BY num_bytes DESC;

Calculate potential storage savings

-- Calculate total storage used by dead end and unused tables
SELECT
  type,
  COUNT(*) AS resource_count,
  SUM(SAFE.INT64(overview.num_bytes)) AS total_bytes,
  ROUND(SUM(SAFE.INT64(overview.num_bytes)) / POW(1024, 3), 2) AS total_gb,
  ROUND(SUM(overview.cost_30d), 2) AS monthly_cost_usd,
  ROUND(SUM(overview.savings_30d), 2) AS monthly_savings_usd
FROM `masthead-prod`.DATASET_NAME.insights
WHERE category = 'Cost'
  AND subtype IN ('Dead end table', 'Unused table')
  AND overview.num_bytes IS NOT NULL
GROUP BY type
ORDER BY total_bytes DESC;

Get insights by project

-- View insights grouped by project
SELECT
  project_id,
  type,
  COUNT(*) AS insight_count,
  SUM(SAFE.INT64(overview.num_bytes)) AS total_bytes
FROM `masthead-prod`.DATASET_NAME.insights
WHERE category = 'Cost'
  AND subtype IN ('Dead end table', 'Unused table')
  AND overview.num_bytes IS NOT NULL
GROUP BY project_id, type
ORDER BY total_bytes DESC;

Storage billing model recommendations

-- Get a DDL statement to alter the storage billing model for recommended datasets
SELECT
  *,
  CONCAT('ALTER SCHEMA `', SAFE.STRING(operations[0].resource_name), '` SET OPTIONS(storage_billing_model = "', SAFE.STRING(operations[0].recommended_storage_billing_model),'");') AS ddl
FROM `masthead-prod`.DATASET_NAME.insights
WHERE type = 'Storage billing model'
ORDER BY
  FLOAT64(overview.savings_30d) DESC

AI agent skills

To automate the analysis of cost optimization recommendations programmatically using AI agents, such as Google Antigravity or Claude Code, you can configure pre-built agent skills. These skills help agents query and interpret this insights table to recommend standard or enterprise compute models, alter dataset billing options, or identify dead-end tables.

For a list of available FinOps agent skills, see the AI agent skills guide.

Limitations

Preview Version: This is an early preview version and the schema may change based on feedback.
Data Freshness: Masthead updates insights daily based on usage patterns and lineage analysis.
Data Retention: Insights include the most recent data from the account’s configured lookback window, which defaults to 30 days.
False Positives: see details in Dead-end and Unused Tables section.

Best practices

Review Before Action: Always manually verify insights before taking action on production data.
Test Removals: Consider moving tables to a staging area before permanent deletion.
Check Dependencies: Cross-reference with the lineage table to understand data dependencies.
Monitor Trends: Regular review of insights can help identify data management patterns and opportunities.
Coordinate with Teams: Ensure data owners are aware before removing resources they may depend on.