Metadata management: Column-level lineage and Data Dictionary

What is Data lineage?

Masthead's lineage functionality facilitates the visualization of data's origin and destination, enabling the identification of dependencies across tables, columns, views, spreadsheets, or dashboards and also highlighting pipelines. This feature enhances understanding of the interconnectedness of data assets in platforms like Google BigQuery, CloudSQL, and Looker, and tracks how data is ingested and modeled by technologies such as Fivetran, Stitch, dbt, and Dataform, among others. Coupled with anomaly detection, pipeline observability, and data quality, Masthead has proven to offer rapid and effective troubleshooting of data issues, pinpointing their root causes. It also efficiently assesses the impact of any changes on your data platform, providing insights in a matter of minutes.

How does Masthead visualize lineage?

Masthead leverages Google BigQuery logs and metadata to instantly visualize column-level lineage 20 min following the deployment. The system operates without the need for external APIs, ensuring that lineage is provided without querying any client's data.

What is Data Dictionary?

The Data Dictionary in Masthead is a comprehensive list of all assets within the connected Google BigQuery and Looker environments. It's a handy collection of metadata, enabling users to easily search for datasets, tables, columns, Looker dashboards, or looks.

For GBQ tables, the dictionary provides essential information like the creation date, last update, size, and first-level upstream and downstream dependencies. It also includes any available descriptions from BigQuery.

For columns, the dictionary allows users to search any term across the entire warehouse agnostically, providing details about the column type, its location, and description.

For Looker assets, the dictionary offers visibility into all existing dashboards and reports. It includes their IDs with direct links to Looker, dashboard elements, upstream dependencies in Google BigQuery, and information on the dashboard's creation and last update dates.

The Data Dictionary offers centralized access to metadata, significantly simplifying data exploration and analysis, thereby boosting efficiency when using Google BigQuery and Looker. It benefits both data and business teams by providing an intuitive UI to easily understand, locate, and interact with available data assets, data products, and its dependencies within the data platform.

Is Data Dictionary and Data Lineage available automatically?

Yes, the Data Dictionary and Data Lineage are available automatically with Masthead. Upon deployment, you'll immediately see a list of tables, views, and external tables (such as spreadsheets or Cloud Storage tables) from the connected Google BigQuery. Column-level lineage becomes available within 15-20 minutes after deployment. The Dictionary for Looker will be accessible as soon as Masthead is integrated with your Looker account.

Commonly asked questions

What permissions do you need to access client data to visualize the lineage and provide a Data Dictionary?

Masthead's service account will need to have following permissions within your Google Cloud project:

bigquery.datasets.get
bigquery.tables.get
bigquery.tables.list

None of these permissions require reading, querying, or editing client data. Read more about BigQuery Permissions.

Do you provide Column-Level Lineage to Looker Dashboards?

Yes, in the Lineage view, we show table upstream dependencies involved in sourcing specific reports, dashboards, and looks in Looker.

Does Masthead integrate with Looker Studio?

Not yet. If this functionality is crucial for you, please do not hesitate to contact us at team@mastheadata.com.

Last updated