Lineage
lineage table
The core lineage data table that consolidates and maintains data lineage relationships.
Description
An alpha version of the core lineage data table for Masthead’s customers. This table consolidates and maintains data lineage relationships by joining edge and node information from the Marcia lineage system, providing a clean interface for tracking data flow connections, target types, and last update timestamps.
Schema
| Column name | Data type | Value |
|---|---|---|
source | STRING | Name of the source data object |
target | STRING | Name of the target data object |
target_type | STRING | Type of target data object (see “Target types” section below) |
updated_at | TIMESTAMP | Timestamp of the last update for this lineage relationship |
Target types
VIEW- logical or materialized viewTABLE- native or external tableANONYMOUS_TABLE- query run without a destination definedURI- import source or export destinationSERVICE_ACCOUNT- email address of the user who ran the jobPROJECT- Google Cloud project ID
Table Usage Examples
-- View all lineage relationships for an accountSELECT *FROM masthead-prod.DATASET_NAME.lineageORDER BY updated_at DESC;-- Find all tables that depend on a specific sourceSELECT target, target_type, updated_atFROM masthead-prod.DATASET_NAME.lineageWHERE source = 'project_id.dataset_name.table_name'AND target_type = 'TABLE';-- Get recent lineage changes in the last 7 daysSELECT source, target, target_type, updated_atFROM masthead-prod.DATASET_NAME.lineageWHERE DATE(updated_at) >= DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY)ORDER BY updated_at DESC;list_lineage procedure
A stored procedure for recursive lineage exploration both upstream and downstream from any data object.
Procedure Description
An alpha version of programmatic lineage exploration for Masthead’s customers. This stored procedure enables recursive traversal of data lineage relationships both upstream and downstream from a given origin reference, providing a comprehensive view of data dependencies and their hierarchical relationships within the account’s data ecosystem.
Procedure Signature
CALL `masthead-prod`.DATASET_NAME.list_lineage(origin_ref STRING)Parameters
origin_ref(STRING): The reference of the data object to start lineage exploration from
Output Schema
| Column name | Data type | Description |
|---|---|---|
origin | STRING | The reference of the starting point, e.g. project_id.dataset_name.table_name |
direction | STRING | UPSTREAM or DOWNSTREAM |
depth | INTEGER | Distance from origin (1, 2, 3, etc.) |
source | STRING | Source object reference |
target | STRING | Target object reference |
target_type | STRING | Type of target object |
updated_at | TIMESTAMP | Last observed relationship timestamp |
Procedure Usage Examples
Basic Usage
-- Explore lineage for a specific tableCALL `masthead-prod`.DATASET_NAME.list_lineage('project_id.dataset_name.table_name');Filtering Results
-- Get only upstream dependenciesCALL `masthead-prod`.DATASET_NAME.list_lineage('project_id.dataset_name.table_name');-- Then filter in your application or with a view:-- WHERE direction = 'UPSTREAM'-- Get only direct dependencies (depth 1)CALL `masthead-prod`.DATASET_NAME.list_lineage('project_id.dataset_name.table_name');-- Then filter: WHERE depth = 1-- Get only downstream tablesCALL `masthead-prod`.DATASET_NAME.list_lineage('project_id.dataset_name.table_name');-- Then filter: WHERE direction = 'DOWNSTREAM' AND target_type = 'TABLE'Further Analysis
-- Find the deepest dependency chainCALL `masthead-prod`.DATASET_NAME.list_lineage('project_id.dataset_name.table_name');-- Then analyze: SELECT MAX(depth) as max_depth FROM results-- Identify critical upstream dependenciesCALL `masthead-prod`.DATASET_NAME.list_lineage('project_id.dataset_name.table_name');-- Focus on: WHERE direction = 'UPSTREAM' AND depth <= 2Limitations
- Performance: Large lineage graphs may have slower query performance.
- Cycle Detection: The recursive queries include basic cycle prevention but complex cycles may still cause issues.
- Data Freshness: Lineage is updated daily.
- Cross-Project Dependencies: External project references may have limited detail.
- Data Retention: Data relationships in the lineage data are included only if they were updated within the account’s configured lookback window (30 days by default). This ensures lineage data remains current and relevant.