Lineage
lineage table
Section titled “lineage table”The core lineage data table that consolidates and maintains data lineage relationships.
Description
Section titled “Description”An alpha version of the core lineage data table for Masthead’s customers. This table consolidates and maintains data lineage relationships by joining edge and node information from the Marcia lineage system, providing a clean interface for tracking data flow connections, target types, and last update timestamps.
Schema
Section titled “Schema”| Column name | Data type | Value |
|---|---|---|
source | STRING | Name of the source data object |
target | STRING | Name of the target data object |
target_type | STRING | Type of target data object (see “Target types” section below) |
updated_at | TIMESTAMP | Timestamp of the last update for this lineage relationship |
Target types
Section titled “Target types”VIEW- logical or materialized viewTABLE- native or external tableANONYMOUS_TABLE- query run without a destination definedURI- import source or export destinationSERVICE_ACCOUNT- email address of the user who ran the jobPROJECT- Google Cloud project ID
Table Usage Examples
Section titled “Table Usage Examples”-- View all lineage relationships for an accountSELECT *FROM masthead-prod.DATASET_NAME.lineageORDER BY updated_at DESC;-- Find all tables that depend on a specific sourceSELECT target, target_type, updated_atFROM masthead-prod.DATASET_NAME.lineageWHERE source = 'project_id.dataset_name.table_name'AND target_type = 'TABLE';-- Get recent lineage changes in the last 7 daysSELECT source, target, target_type, updated_atFROM masthead-prod.DATASET_NAME.lineageWHERE DATE(updated_at) >= DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY)ORDER BY updated_at DESC;list_lineage procedure
Section titled “list_lineage procedure”A stored procedure for recursive lineage exploration both upstream and downstream from any data object.
Procedure Description
Section titled “Procedure Description”An alpha version of programmatic lineage exploration for Masthead’s customers. This stored procedure enables recursive traversal of data lineage relationships both upstream and downstream from a given origin reference, providing a comprehensive view of data dependencies and their hierarchical relationships within the account’s data ecosystem.
Procedure Signature
Section titled “Procedure Signature”CALL `masthead-prod`.DATASET_NAME.list_lineage(origin_ref STRING)Parameters
Section titled “Parameters”origin_ref(STRING): The reference of the data object to start lineage exploration from
Output Schema
Section titled “Output Schema”| Column name | Data type | Description |
|---|---|---|
origin | STRING | The reference of the starting point, e.g. project_id.dataset_name.table_name |
direction | STRING | UPSTREAM or DOWNSTREAM |
depth | INTEGER | Distance from origin (1, 2, 3, etc.) |
source | STRING | Source object reference |
target | STRING | Target object reference |
target_type | STRING | Type of target object |
updated_at | TIMESTAMP | Last observed relationship timestamp |
Procedure Usage Examples
Section titled “Procedure Usage Examples”Basic Usage
Section titled “Basic Usage”-- Explore lineage for a specific tableCALL `masthead-prod`.DATASET_NAME.list_lineage('project_id.dataset_name.table_name');Filtering Results
Section titled “Filtering Results”-- Get only upstream dependenciesCALL `masthead-prod`.DATASET_NAME.list_lineage('project_id.dataset_name.table_name');-- Then filter in your application or with a view:-- WHERE direction = 'UPSTREAM'-- Get only direct dependencies (depth 1)CALL `masthead-prod`.DATASET_NAME.list_lineage('project_id.dataset_name.table_name');-- Then filter: WHERE depth = 1-- Get only downstream tablesCALL `masthead-prod`.DATASET_NAME.list_lineage('project_id.dataset_name.table_name');-- Then filter: WHERE direction = 'DOWNSTREAM' AND target_type = 'TABLE'Further Analysis
Section titled “Further Analysis”-- Find the deepest dependency chainCALL `masthead-prod`.DATASET_NAME.list_lineage('project_id.dataset_name.table_name');-- Then analyze: SELECT MAX(depth) as max_depth FROM results-- Identify critical upstream dependenciesCALL `masthead-prod`.DATASET_NAME.list_lineage('project_id.dataset_name.table_name');-- Focus on: WHERE direction = 'UPSTREAM' AND depth <= 2Limitations
Section titled “Limitations”- Performance: Large lineage graphs may have slower query performance.
- Cycle Detection: The recursive queries include basic cycle prevention but complex cycles may still cause issues.
- Data Freshness: Lineage is updated daily.
- Cross-Project Dependencies: External project references may have limited detail.
- Data Retention: Data relationships in the lineage data are included only if they were updated within the account’s configured lookback window (30 days by default). This ensures lineage data remains current and relevant.