Lineage

lineage table

The core lineage data table that consolidates and maintains data lineage relationships.

Description

An alpha version of the core lineage data table for Masthead's customers. This table consolidates and maintains data lineage relationships by joining edge and node information from the Marcia lineage system, providing a clean interface for tracking data flow connections, target types, and last update timestamps.

Schema

Column name
Data type
Value

source

STRING

Name of the source data object

target

STRING

Name of the target data object

target_type

STRING

Type of target data object (see "Target types" section below)

updated_at

TIMESTAMP

Timestamp of the last update for this lineage relationship

Target types

Table Usage Examples

list_lineage procedure

A stored procedure for recursive lineage exploration both upstream and downstream from any data object.

Procedure Description

An alpha version of programmatic lineage exploration for Masthead's customers. This stored procedure enables recursive traversal of data lineage relationships both upstream and downstream from a given origin reference, providing a comprehensive view of data dependencies and their hierarchical relationships within the account's data ecosystem.

Procedure Signature

Parameters

  • origin_ref (STRING): The reference of the data object to start lineage exploration from

Output Schema

Column name
Data type
Description

origin

STRING

The reference of the starting point, e.g. project_id.dataset_name.table_name

direction

STRING

UPSTREAM or DOWNSTREAM

depth

INTEGER

Distance from origin (1, 2, 3, etc.)

source

STRING

Source object reference

target

STRING

Target object reference

target_type

STRING

Type of target object

updated_at

TIMESTAMP

Last observed relationship timestamp

Procedure Usage Examples

Basic Usage

Filtering Results

Further Analysis

Limitations

  • Performance: Large lineage graphs may have slower query performance.

  • Cycle Detection: The recursive queries include basic cycle prevention but complex cycles may still cause issues.

  • Data Freshness: Lineage is updated daily.

  • Cross-Project Dependencies: External project references may have limited detail.

  • Data Retention: Data relationships in the lineage data are included only if they were updated within the account's configured lookback window (30 days by default). This ensures lineage data remains current and relevant.

Last updated