Skip to main content

DataHub Releases

Summary

VersionRelease DateLinks
v0.10.12023-03-23Release Notes, View on GitHub
v0.10.02023-02-07Release Notes, View on GitHub
v0.9.6.12023-01-31Release Notes, View on GitHub
v0.9.62023-01-13Release Notes, View on GitHub
v0.9.52022-12-23View on GitHub
v0.9.42022-12-20View on GitHub
v0.9.32022-11-30View on GitHub
v0.9.22022-11-04View on GitHub
v0.9.12022-10-31View on GitHub
v0.9.02022-10-11View on GitHub
v0.8.452022-09-23View on GitHub
v0.8.442022-09-01View on GitHub
v0.8.432022-08-09View on GitHub
v0.8.422022-08-03View on GitHub
v0.8.412022-07-15View on GitHub
v0.8.402022-06-30View on GitHub
v0.8.392022-06-24View on GitHub
v0.8.382022-06-09View on GitHub
v0.8.372022-06-09View on GitHub
v0.8.362022-06-02View on GitHub
v0.8.352022-05-18View on GitHub
v0.8.342022-05-04View on GitHub
v0.8.332022-04-15View on GitHub
v0.8.322022-04-04View on GitHub
v0.8.312022-03-17View on GitHub
v0.8.302022-03-17View on GitHub
v0.8.292022-03-10View on GitHub
v0.8.282022-03-07View on GitHub

DataHub v0.10.1

Released on 2023-03-23 by @aditya-radhakrishnan.

Known Issues

CLI

  • BigQuery: Table and Column Level profile broken due to bad assumption introduced in this version. Please use an alternate version if you are using the BigQuery Profiling feature.

ElasticSearch

7.9 and below clusters are no longer supported with this release due to lack of case sensitivity support in term queries

Release Highlights

User Experience
  • The Queries Tab has a new look - supports manually adding and annotating queries directly from the UI, making it easier to share trusted SQL logic with others
  • Glossary Terms now shows “Contained by" and "Inherited by" relationships
  • Resolved issues with Download to CSV for large volumes of entities
  • Update to the Analytics tab - view Monthly Active users to keep track of DataHub adoption and activity within your organization
  • Ongoing UI optimizations focused on improve navigation experience
Metadata Ingestion

BigQuery

  • Improvements to memory usage during metadata extraction
  • Ingestion now captures Dataset Labels
  • Emit cross-project usage

PowerBI

  • Support for Platform Instance and uniquely identify multiple instances of the same Platform
  • Support for PowerBI <> (Redshift, BigQuery) lineage extraction
  • Extract entity descriptions

Miscellaneous

  • DataHub Integrations Catalog to quickly filter and search for supported integrations
  • Kafka Connect - support for stateful ingestion & lowercasing URNs
  • Snowflake: improvements to memory usage during metadata extraction
  • Postgres: supports estimated row counts during profiling
  • Fix to dbt ingestion to address inconsistent upper/lower casing
  • S3 ingestion now supports path_specs of multiple buckets in the same recipe
  • Looker: Upgrade Looker API from 3.1 to 4.0
  • Great Expectations: support for lowercasing URNs
  • Tableau: Support for Project Path & Containers; ingestion more resilient to timeout exceptions
Developer Experience

Miscellaneous

  • Neo4j support for lineage time filter
  • Metadata model support for JSON schemas stored in Files, Directories, and Kafka Schema Registry
  • Timeline API now supports Glossary Terms
  • Improvements to startup time for DataHub CLI

API Docs & Guides

  • Table of contents to understand DataHub APIs at a glance
  • Guides:
    • Add Tags, Terms, Owners to entities
    • Create datasets
    • Manage Lineage

Search Improvements

  • searchAcrossEntities/Lineage improvements
  • support searchAfter
  • advanced query, identity autocomplete, exact match weight
Breaking Changes

Lineage Graph UI

  • Previously, DataHub would display Nodes in Lineage Viz even for URNs that do not technically exist (do not have any aspects defined). Now, those nodes are filtered out. This means that lineage which previously existed may not appear anymore in Lineage Graph. This change was done to improve the correctness and consistency of the DataHub experience. If you have feedback, feel free to reach out to the core team. To fix this issue, simply produce "DatasetKey" aspects for any URNs that you'd like to show in Lineage graph.

What's Changed

New Contributors

Full Changelog: https://github.com/datahub-project/datahub/compare/v0.10.0...v0.10.1

v0.10.0

Release Highlights

Potential Downtime

This release introduces substantial improvements to search functionality which require reindexing indices.

During the reindexing:

  • a system-update job will set indices to read-only and create a backup/clone of each index
  • new components will be prevented from start-up until the reindex completes
  • Helm deployments will go into read-only mode and new ingestion runs will fail

This process can take anywhere from 5 minutes to multiple hours; as rough estimate, please expect it to take 1 hour for every 2.3 million entities. After the reindex is complete, please check your ingestion run to re-run any that did not complete.

User Experience

We have some really exciting improvements to the DataHub user experience in this release!

Improved documentation editor, contributed by @ngamanda and the Grab Team. This work provides a much more intuitive documentation editing experience within the UI, providing “what you see is what you get” formatting & removing the need for markdown expertise.

Additionally, you can easily:

  • Add links to other entities/users within DataHub
  • embed and resize tables & images
  • toggle between font sizes and formats
  • embed syntax-highlighted code blocks

Filter lineage graphs based on time windows You can now easily see the full lineage graph of an entity at a specific point in time. This makes it much easier to understand how interdependencies have evolved over time and to troubleshoot data issues in the past.

Improvements in Search As noted above, we have rolled out substantial improvements to Search functionality, making it easier than ever for end-user to find the entities that matter most. This release includes:

  • Stemm & Synonyms
  • Search by full or partial URN
  • Autocomplete improvements
  • Quoted search analyzer for exact & prefix match
Metadata Ingestion

Here are some of the most notable ingestion-related improvements:

  • Redshift: You can now extract lineage information from unload queries – thanks for the contrib, @mmmeeedddsss
  • PowerBI: Ingestion now maps Workspaces to DataHub Containers – thanks for the contrib, @looppi
  • BigQuery: You can now extract lineage metadata from the Catalog API – thanks for the crontrib, @PatrickfBraz
  • Glue: Ingestion now uses table name as the human-readable name – thanks for the contrib, @danielcmessias
Developer Experience
  • This release introduces DataHub Lite - a new experimental lightweight implementation of DataHub. It is intended to enable local developer tooling use-cases such as simple access to metadata for scripts and other tools. DataHub Lite is compatible with the DataHub metadata format and all the ingestion connectors that DataHub supports. Checkout the docs here.
Breaking Changes

[#7103](https://github.com/datahub-project/datahub/pull/7103) This should only impact users who have configured explicit non-default names for DataHub's Kafka topics. The environment variables used to configure Kafka topics for DataHub used in the kafka-setup docker image have been updated to be in-line with other DataHub components, for more info see our docs on Configuring Kafka in DataHub . They have been suffixed with _TOPIC where as now the correct suffix is _TOPIC_NAME. This change should not affect any user who is using default Kafka names.

What's Changed

New Contributors

Full Changelog: https://github.com/datahub-project/datahub/compare/v0.9.6...v0.9.7 v0.9.6.1

Release Highlights

Please disregard release v0.9.6 in favor of this release v0.9.6.1

Bug fix for secrets encryption

  • Prevents decryption errors for existing secrets
  • Affects reading ingestion secret created with a previous release
  • Affects native user password validation

What's Changed

Full Changelog: https://github.com/datahub-project/datahub/compare/v0.9.6...v0.9.6.1

v0.9.6 ​​# Release Highlights

User Experience

We now support embedding Dashboards, Charts, and Datasets. This allows us to do things like directly embed Looker / Tableau / Mode / Redash Looks, Dashboards, Explores into the Dataset pages themselves.

[Experimental] You can now customize the number of queries displayed on the Query tab of a Dataset entity

Improved error messaging for bulk editing via the UI

Metadata Ingestion

Update to data profiling to allow configurable number of sample values to be returned Postgres ingestion now supports emitting lineage edges for Views - shoutout to @LucasRoesler for the contribution! Snowflake ingestion now supports extracting tags - shoutout to @frsann for the contribution! Vertica ingestion now supports projections and lineage- thanks for the contribution, @vishalkSimplify! Glue ingestion now emits an s3 lineage edge when data was written with an s3a/s3n client - thanks for the contribution, @danielli-ziprecruiter!

Developer Experience

Fixes quickstart/docker compose issues for M1 machines Improvements in reliability and performance of the Restli Service endpoints for ingestion: Scale Restli Service thread pool based on CPU Add retry (exp backoff) to Restli Entity Client MCE no longer relies on GMS for Restli service Converted Restli Service from standalone servlet to Spring injectable Docker build externalized (significantly faster on m1, <7 minute build times, based on this) Frontend asset generation refactor (causing tests to fail intermittently)

What's Changed

New Contributors

Full Changelog: https://github.com/datahub-project/datahub/compare/v0.9.5...v0.9.6

v0.9.4 ​​# Release Highlights

KNOWN ISSUES

There is a known issue with OIDC which we will address in a fast-follow release. If you use OIDC, please wait for v0.9.5 to upgrade.

User Experience

Manual Lineage is LIVE! You can now add and remove lineage between entities in the Lineage Visualization screen, making it easier than ever to manage the complex relationships between your data resources.

Our new Views feature makes it easy to create curated sets of Entities within DataHub. This is a great way to start to isolate the entities that matter most, and provide your DataHub end-users with a streamlined view of the assets that are relevant to their use cases.

In-App Product Tours are here! When logging into DataHub and/or visiting a new page type for the first time, new users will be prompted with a helpful walkthrough of core functionality to get them familiar with the platform. We’ll continue to add modules as we roll out new features!

Automatically send updates to Slack and/or Microsoft Teams when changes are made within DataHub by leveraging our the new Slack and Teams Actions

Metadata Ingestion

We’re continuing to improve the user experience for UI-based ingestion for the following sources: dbt Cloud DataBricks Unity Catalog MySQL Trino/Preso MSSQL MariaDB If you’re just getting started with UI-based Ingestion, check out our new BigQuery & Snowflake guides Stateful ingestion is now supported for Iceberg (thanks for the contrib, @cccs-Dustin!) and LDAP (thanks for the contrib, @bda618!) Speaking of Stateful Ingestion, we’re taking some steps to simplify the code behind Sta

What's Changed

New Contributors

Full Changelog: https://github.com/datahub-project/datahub/compare/v0.9.3...v0.9.4

V0.9.3 ​​# Release Highlights

User Experience

Column Level Lineage Impact Analysis is live! Read more about it here You can now sort Dataset field names alphabetically - this is super handy for finding columns within wide datasets that may not have an easy-to-follow order by default [gif] Miscellaneous UX improvements: “Explore All” button on home page, making it easier to jump into the search experience [gif] “Share” button on entity pages [screenshot][Community Contribution] You can now assign the same user as different owner types - thanks for the contrib, @rtekal!

Metadata Ingestion

Snowflake Automated PII Classification is here! We’re eager for feedback on the utility of this feature - check out this guide, take it for a spin, and let us know what you think!
We’ve simplified the configs required to add stateful ingestion to an ingestion source - check out the updated docs here Speaking of stateful ingestion, it’s now supported with: Looker & LookML ingestion sources [Community Contribution] Container-level ingestion – thanks for the contrib, @wangsaisai!

Developer Experience

NEW! dbt Cloud ingestion is ready for ya - check out the module details here [Community Contribution] For those of you deploying DataHub with Neo4j, we now support Lineage Impact analysis via Neoj4 mulithop functionality. Thanks for the contrib, @djordje-mijatovic! We’ve loosened our SQLAlchemy dependencies to support Airflow 2.3+

What's Changed

New Contributors

Full Changelog: https://github.com/datahub-project/datahub/compare/v0.9.2...v0.9.3 V0.9.2 ​​# Release Highlights

User Experience
Metadata Ingestion

New ingestion source PowerBI Report Server

DataHub Docs Site

What's Changed

DataHub v0.10.0

Released on 2023-02-07 by @david-leifker.

Release Highlights

Potential Downtime

This release introduces substantial improvements to search functionality which require reindexing indices.

During the reindexing:

  • a system-update job will set indices to read-only and create a backup/clone of each index
  • new components will be prevented from start-up until the reindex completes
  • Helm deployments will go into read-only mode and new ingestion runs will fail

This process can take anywhere from 5 minutes to multiple hours; as rough estimate, please expect it to take 1 hour for every 2.3 million entities. After the reindex is complete, please check your ingestion run to re-run any that did not complete.

If you are deploying containers yourself

If you're deploying the Docker containers yourself (without Helm or Docker-Compose Quickstart), then you'll need to ensure that you first run the acryldata/datahub-upgrade docker image (v0.10.0 tag) with the following environment variables enabled.

Then, run the container this with the command

docker run acryldata/datahub-upgrade:v0.10.0 -u SystemUpdate

For the full set of environment variables required, check out the default docker.env provided for Docker Compose deployments.

This will run the required reindex against your elasticsearch instance, after which other DataHub components should start correctly. If you do not run the datahub-upgrade container successfully, other components in the stack will fail to start correctly.

User Experience

We have some really exciting improvements to the DataHub user experience in this release!

Improved documentation editor, contributed by @ngamanda and the Grab Team. This work provides a much more intuitive documentation editing experience within the UI, providing “what you see is what you get” formatting & removing the need for markdown expertise.

Additionally, you can easily:

  • Add links to other entities/users within DataHub
  • embed and resize tables & images
  • toggle between font sizes and formats
  • embed syntax-highlighted code blocks

<img src="https://user-images.githubusercontent.com/114954101/217367791-3d392ae4-f422-4188-8d3c-768cb7c120ea.png" width="800">

Filter lineage graphs based on time windows You can now easily see the full lineage graph of an entity at a specific point in time. This makes it much easier to understand how interdependencies have evolved over time and to troubleshoot data issues in the past.

Improvements in Search As noted above, we have rolled out substantial improvements to Search functionality, making it easier than ever for end-user to find the entities that matter most. This release includes:

  • Stemm & Synonyms
  • Search by full or partial URN
  • Autocomplete improvements
  • Quoted search analyzer for exact & prefix match
Metadata Ingestion

Here are some of the most notable ingestion-related improvements:

  • Redshift: You can now extract lineage information from unload queries – thanks for the contrib, @mmmeeedddsss
  • PowerBI: Ingestion now maps Workspaces to DataHub Containers – thanks for the contrib, @looppi
  • BigQuery: You can now extract lineage metadata from the Catalog API – thanks for the crontrib, @PatrickfBraz
  • Glue: Ingestion now uses table name as the human-readable name – thanks for the contrib, @danielcmessias
Developer Experience
  • This release introduces DataHub Lite - a new experimental lightweight implementation of DataHub. It is intended to enable local developer tooling use-cases such as simple access to metadata for scripts and other tools. DataHub Lite is compatible with the DataHub metadata format and all the ingestion connectors that DataHub supports. Checkout the docs here.
Breaking Changes

[#7103](https://github.com/datahub-project/datahub/pull/7103) This should only impact users who have configured explicit non-default names for DataHub's Kafka topics. The environment variables used to configure Kafka topics for DataHub used in the kafka-setup docker image have been updated to be in-line with other DataHub components, for more info see our docs on Configuring Kafka in DataHub . They have been suffixed with _TOPIC where as now the correct suffix is _TOPIC_NAME. This change should not affect any user who is using default Kafka names.

What's Changed

New Contributors

Full Changelog: https://github.com/datahub-project/datahub/compare/v0.9.6...v0.10.0

DataHub v0.9.6.1

Released on 2023-01-31 by @david-leifker.

Release Highlights

Please upgrade from 0.9.6 ASAP to avoid ongoing issues creating and using secrets.

Important Release Notes

With this release, if you are using Neo4J as your graph implementation, you need to set: GRAPH_SERVICE_DIFF_MODE_ENABLED=false

For GMS (or MAE Consumer for standalone mode).

Bug fix for secrets encryption

  • Prevents decryption errors for existing secrets
  • Affects reading ingestion secret created with a previous release
  • Affects native user password validation

What's Changed

Full Changelog: https://github.com/datahub-project/datahub/compare/v0.9.6...v0.9.6.1

DataHub v0.9.6

Released on 2023-01-13 by @maggiehays.


⚠️ This Release has been patched. Please upgrade to 0.9.6.1 ⚠️

As of January 19th, 2023 0.9.6.1 is now the official release build, and should be used over 0.9.6. Upgrade to 0.9.6.1 when possible to avoid issues creating and using secrets.

</br></br>

Release Highlights

Important Release Notes

With this release, if you are using Neo4J as your graph implementation, you need to set: GRAPH_SERVICE_DIFF_MODE_ENABLED=false

For GMS (or MAE Consumer for standalone mode).

User Experience
  • We now support embedding Dashboards, Charts, and Datasets. This allows us to do things like directly embed Looker / Tableau / Mode / Redash Looks, Dashboards, Explores into the Dataset pages themselves.

image

  • [Experimental] You can now customize the number of queries displayed on the Query tab of a Dataset entity

image

  • Improved error messaging for bulk editing via the UI
Metadata Ingestion
  • Update to data profiling to allow configurable number of sample values to be returned
  • Postgres ingestion now supports emitting lineage edges for Views - shoutout to @LucasRoesler for the contribution!
  • Snowflake ingestion now supports extracting tags - shoutout to @frsann for the contribution!
  • Vertica ingestion now supports projections and lineage- thanks for the contribution, @vishalkSimplify!
  • Glue ingestion now emits an s3 lineage edge when data was written with an s3a/s3n client - thanks for the contribution, @danielli-ziprecruiter!
Developer Experience
  • Fixes quickstart/docker compose issues for M1 machines
  • Improvements in reliability and performance of the Restli Service endpoints for ingestion:
    • Scale Restli Service thread pool based on CPU
    • Add retry (exp backoff) to Restli Entity Client
    • MCE no longer relies on GMS for Restli service
    • Converted Restli Service from standalone servlet to Spring injectable
    • Docker build externalized (significantly faster on m1, <7 minute build times, based on this)
    • Frontend asset generation refactor (causing tests to fail intermittently)

What's Changed

New Contributors

Full Changelog: https://github.com/datahub-project/datahub/compare/v0.9.5...v0.9.6

What's Changed

New Contributors

Full Changelog: https://github.com/datahub-project/datahub/compare/v0.9.5...v0.9.6

DataHub v0.9.5

Released on 2022-12-23 by @jjoyce0510.

View the release notes for DataHub v0.9.5 on GitHub.

[Known Issues] DataHub v0.9.4

Released on 2022-12-20 by @maggiehays.

View the release notes for [Known Issues] DataHub v0.9.4 on GitHub.

DataHub v0.9.3

Released on 2022-11-30 by @maggiehays.

View the release notes for DataHub v0.9.3 on GitHub.

DataHub v0.9.2

Released on 2022-11-04 by @maggiehays.

View the release notes for DataHub v0.9.2 on GitHub.

DataHub v0.9.1

Released on 2022-10-31 by @maggiehays.

View the release notes for DataHub v0.9.1 on GitHub.

DataHub v0.9.0

Released on 2022-10-11 by @szalai1.

View the release notes for DataHub v0.9.0 on GitHub.

DataHub v0.8.45

Released on 2022-09-23 by @gabe-lyons.

View the release notes for DataHub v0.8.45 on GitHub.

DataHub v0.8.44

Released on 2022-09-01 by @jjoyce0510.

View the release notes for DataHub v0.8.44 on GitHub.

DataHub v0.8.43

Released on 2022-08-09 by @maggiehays.

View the release notes for DataHub v0.8.43 on GitHub.

v0.8.42

Released on 2022-08-03 by @gabe-lyons.

View the release notes for v0.8.42 on GitHub.

v0.8.41

Released on 2022-07-15 by @anshbansal.

View the release notes for v0.8.41 on GitHub.

v0.8.40

Released on 2022-06-30 by @gabe-lyons.

View the release notes for v0.8.40 on GitHub.

v0.8.39

Released on 2022-06-24 by @maggiehays.

View the release notes for v0.8.39 on GitHub.

[!] DataHub v0.8.38

Released on 2022-06-09 by @jjoyce0510.

View the release notes for [!] DataHub v0.8.38 on GitHub.

[!] DataHub v0.8.37

Released on 2022-06-09 by @jjoyce0510.

View the release notes for [!] DataHub v0.8.37 on GitHub.

DataHub V0.8.36

Released on 2022-06-02 by @treff7es.

View the release notes for DataHub V0.8.36 on GitHub.

[!] DataHub v0.8.35

Released on 2022-05-18 by @dexter-mh-lee.

View the release notes for [!] DataHub v0.8.35 on GitHub.

v0.8.34

Released on 2022-05-04 by @maggiehays.

View the release notes for v0.8.34 on GitHub.

DataHub v0.8.33

Released on 2022-04-15 by @dexter-mh-lee.

View the release notes for DataHub v0.8.33 on GitHub.

DataHub v0.8.32

Released on 2022-04-04 by @dexter-mh-lee.

View the release notes for DataHub v0.8.32 on GitHub.

DataHub v0.8.31

Released on 2022-03-17 by @dexter-mh-lee.

View the release notes for DataHub v0.8.31 on GitHub.

Datahub v0.8.30

Released on 2022-03-17 by @rslanka.

View the release notes for Datahub v0.8.30 on GitHub.

DataHub v0.8.29

Released on 2022-03-10 by @shirshanka.

View the release notes for DataHub v0.8.29 on GitHub.

DataHub v0.8.28

Released on 2022-03-07 by @shirshanka.

View the release notes for DataHub v0.8.28 on GitHub.

DataHub Release Candidate v0.8.28 (rc1)

Released on 2022-03-05 by @shirshanka.

View the release notes for DataHub Release Candidate v0.8.28 (rc1) on GitHub.

Release Candidate v0.8.28

Released on 2022-03-05 by @shirshanka.

View the release notes for Release Candidate v0.8.28 on GitHub.