Data Products

What is a data product?

In Masthead, a data product is a curated collection of related data assets (like datasets or tables) that are treated as a single, logical unit. It represents a valuable, ready-to-use data resource designed for a specific purpose or audience within your organization. Think of it as packaging data for consumption, complete with metadata, ownership, cost tracking, and quality monitoring.

Let's have a look into key components of a data product:

  • Name: A unique identifier for the data product.

  • Data Assets: The underlying data included. Currently supported types:

    • datasets

    • tables

  • Domain: An optional category or business area the product belongs to. See Domains.

  • Description: Textual information about the product's purpose and content. Markdown formatting is supported.

Create data product

You can create a Data Products to formally package and manage your key data assets.

1

Go to Data Products

Open Data Products page and click Create. This will open the "Create Data Product" page.

2

Enter name

Provide a clear and descriptive Name for your data product. This field is required.

3

Select data assets

Choose whether you are adding Datasets or Tables using the toggle buttons (if applicable, based on screenshot only Datasets seems active currently).

In the "Data assets" input field, start typing the name of an existing dataset (or table) you want to include.

Select the desired asset(s) from the dropdown list. You can add multiple assets. This field is required.

4

Assign Domain

Click the "Select a domain" dropdown. Choose an existing domain from the list to categorize your data product. This step is optional.

Alternatively, click "Create new domain" if the domain doesn't exist yet. See more about Domains.

5

Add Description

Provide a detailed Description in the text area. Explain the purpose, content, intended use, or any other relevant context for the data product. Use the Markdown formatting if needed. This step is optional.

6

Click Create

Once all required information is entered and optional details are added, click the "Create" button. To discard changes, click "Cancel".

Upon successful creation, you will be redirected to the detail page for the newly created data product. Here you can see the configuration you've defined and the aggregated insights will be populated shortly.

You can explore the associated costs, incidents, assets, or subscribers and dive deeper into the details where applicable. See more in Product Metrics.

Why use data products?

Creating and managing data products in Masthead provides several benefits:

  • Group logically related datasets and tables under a single, meaningful entity.

  • Assign clear ownership (via domains) and track consumers (subscribers).

  • Monitor core data product metrics in an easy overview:

    • Estimated compute and storage costs associated with the upstream data assets and pipelines within the product.

    • Track associated pipeline and table incidents impacting the data product's health.

    • Understand who is using the data product and how often through job execution tracking.

Referenced data assets

The data products are designed so that you can assign just your curated data assets for the data product. Masthead will use the lineage connection information to identify 2 levels of the referenced data assets upstream. Assigned and referenced data assets and all the related pipelines are included in the calculation of the costs and incidents metrics.

This allows to keep data product assets management easy, and at the same time track all important operational information to the data product owners and subscribers.

Product metrics

Subscribers

Users or service accounts that consume or interact with the data product.

We identify and show you all the subscribers to help make the consumption more transparent. See also usage to analyze the frequency of such interactions.

Usage

Metric measures a number of job executions where the product's assets are used as a source. This metric offers an insights about the overall consumption frequency across data assets. The usage metric visualization is aggregated on the dataset level.

Incidents

Associated table and pipeline issues affecting reliability.

Costs

Aggregated compute and storage costs. To see more details regarding the compute costs of this product click on the Compute costs panel to go to the Pipeline Costs page.

Last updated