Lineage
lineage
table
lineage
tableThe core lineage data table that consolidates and maintains data lineage relationships.
Description
An alpha version of the core lineage data table for Masthead's customers. This table consolidates and maintains data lineage relationships by joining edge and node information from the Marcia lineage system, providing a clean interface for tracking data flow connections, target types, and last update timestamps.
Schema
source STRING, -- Name of the source data object
target STRING, -- Name of the target data object
target_type STRING, -- Type of target (TABLE, VIEW, PROCEDURE, PROJECT, ANONYMOUS_TABLE, etc.)
updated_at TIMESTAMP -- Last update timestamp for this lineage relationship
Target types
VIEW
- logical or materialized viewTABLE
- native or external tableANONYMOUS_TABLE
- query run without a destination definedSOURCE_URI
- import sourceDESTINATION_URI
- export destinationSERVICE_ACCOUNT
- email address of the user who ran the job.PROJECT
- Google Cloud projects
Table Usage Examples
-- View all lineage relationships for an account
SELECT * FROM account_name.lineage
ORDER BY updated_at DESC;
-- Find all tables that depend on a specific source
SELECT target, target_type, updated_at
FROM account_name.lineage
WHERE source = 'your_source_table'
AND target_type = 'TABLE';
-- Get recent lineage changes in the last 7 days
SELECT source, target, target_type, updated_at
FROM account_name.lineage
WHERE DATE(updated_at) >= DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY)
ORDER BY updated_at DESC;
list_lineage
procedure
list_lineage
procedureA stored procedure for recursive lineage exploration both upstream and downstream from any data object.
Procedure Description
An alpha version of programmatic lineage exploration for Masthead's customers. This stored procedure enables recursive traversal of data lineage relationships both upstream and downstream from a given origin reference, providing a comprehensive view of data dependencies and their hierarchical relationships within the account's data ecosystem.
Procedure Signature
CALL `masthead-prod`.{account_name}.list_lineage(origin_ref STRING)
Parameters
origin_ref
(STRING): The reference of the data object to start lineage exploration from
Output Schema
origin STRING, -- The reference of a starting point
direction STRING, -- 'UPSTREAM' or 'DOWNSTREAM'
depth INTEGER, -- Distance from origin (1, 2, 3, etc.)
source STRING, -- Source object reference
target STRING, -- Target object reference
target_type STRING, -- Type of target object
updated_at TIMESTAMP -- Last observed relationship timestamp
Procedure Usage Examples
Basic Usage
-- Explore lineage for a specific table
CALL `masthead-prod`.account_name.list_lineage('project_id.dataset_name.table_name');
Filtering Results
-- Get only upstream dependencies
CALL `masthead-prod`.account_name.list_lineage('your_table_name');
-- Then filter in your application or with a view:
-- WHERE direction = 'UPSTREAM'
-- Get only direct dependencies (depth 1)
CALL `masthead-prod`.account_name.list_lineage('your_table_name');
-- Then filter: WHERE depth = 1
-- Get only downstream tables
CALL `masthead-prod`.account_name.list_lineage('your_table_name');
-- Then filter: WHERE direction = 'DOWNSTREAM' AND target_type = 'TABLE'
Further Analysis
-- Find the deepest dependency chain
CALL `masthead-prod`.account_name.list_lineage('your_table_name');
-- Then analyze: SELECT MAX(depth) as max_depth FROM results
-- Identify critical upstream dependencies
CALL `masthead-prod`.account_name.list_lineage('your_table_name');
-- Focus on: WHERE direction = 'UPSTREAM' AND depth <= 2
Limitations
Performance: Large lineage graphs may have slower query performance.
Cycle Detection: The recursive queries include basic cycle prevention but complex cycles may still cause issues.
Data Freshness: Lineage is updated daily.
Cross-Project Dependencies: External project references may have limited detail.
Data Retention: Data relationships in the lineage data are included only if they were updated within the account's configured lookback window (30 days by default). This ensures lineage data remains current and relevant.
Last updated