Storage Costs

Storage Costs Insights

Masthead collects and analyzes BigQuery resource metadata and usage logs to estimate the cost for the data assets and shows the recommendations to optimize the spending.

Storage costs insights and saving recommendations

Storage Cost Insights page shows the details of the aggregated estimated metrics per dataset:

  • storage size - most recent data point,

  • storage cost - estimation based on the data storage usage over the last 30 days.

Masthead analyses each of these parameters and offers a set of recommendations optimizing the total storage bill.

Alternative Billing Model for a Dataset

BigQuery provides different storage billing models (Logical and Physical) that offer the flexibility in selecting the costs based on the data properties and operation patterns. Masthead analyzes the datasets metadata and the storage usage to estimate the optimal cost for each billing model.

Google Cloud Billing calculates the storage cost in the following way:

  • logical storage cost:

active_logical_storage_size * active_logical_storage_price + long_term_logical_storage_size * long_term_logical_storage_price
  • or physical storage cost:

(active_physical_storage_size + time_travel_physical_storage_size + fail_safe_physical_storage_size) * active_physical_storage_price + long_term_physical_storage_size * long_term_physical_storage_price

Masthead analyses your data storage usage retrospectively, and with the information about alternative costs of each of the billing model creates a recommendation for the opportunities where the switch will provide consistent and confident saving outcome.

Recommendation

  1. Review storage billing recommendations for the datasets on Storage Cost Insights page.

  2. Update storage billing model for a dataset to the recommended storage billing model.

The dataset configuration can be adjusted by running DDL statement:

ALTER SCHEMA {project.dataset} SET OPTIONS(
    storage_billing_model = {[LOGICAL|PHYSICAL]}
);

This dataset configuration has no impact on data processing performance.

When you change a dataset's billing model, it takes 24 hours for the change to take effect.

Once you change a dataset's storage billing model, you must wait 14 days before you can change the storage billing model again.

Dead-end tables

By analyzing BigQuery lineage end-to-end Masthead identifies the tables that are being regularly updated, but have no downstream consumption.

The Dead-end label can be applied to the following resources:

  • the regularly updated tables that don't have downstream consumption during last 30 days (default period),

  • pipelines that update them (see Dead-end pipelines)

  • other upstream tables and pipelines that contribute solely to this process.

You can explore complete lineage with Dead-end labels by opening a Lineage page for a corresponding table or pipeline.

Recommendation

  1. review the tables labeled as Dead-end (the complete list is available on the Dictionary page),

  2. optimize the table updates to resemble the data consumption requirements,

  3. delete the tables when the are no clear consumers for the data.

Unused tables

Masthead helps you to track the costs related to the data assets that are not being actively used by labeling such tables as Unused . Based on your lineage these tables don't have upstream or downstream consumption during last 30 days (default period).

Recommendation

  1. review the tables labeled as Unused (the complete list is available on the Dictionary page),

  2. delete unused tables to save on storage costs.

Last updated