During a recent client meeting we discussed Azure Purview, Microsoft’s new data governance SaaS service that recently went GA. The client’s plan is to use Purview to store data lineage information. When I asked about a data catalogue the client planned to use their existing CMDB (Configuration Management Database) to store details of data assets.
This threw me for a moment, but reflecting on it I suppose that at a high level:
- CMDB is an ITIL term for a database used to store information about an organisation’s assets, their relationships, and dependencies.
- A data catalogue is used to store information about an organisation’s data assets, their relationships, and dependencies (lineage).
At a glance, one might think both CMDB and data catalogues are providing the same thing. In reality, what they are used for is very different.
|Plays a crucial role in configuration management. The impact of change to the IT environment can only be based on accurate and current information in the CMDB.|
Its aim is to help IT teams better at resolving problems, responding to incidents. For example, you may be more efficient in tracking a malware outbreak if you know the version of each desktop operating system and can see who is affected.
|Plays a crucial role in business intelligence, application development, data science or any task where data assets are needed.|
Its aim is to help business and technical users to quickly and easily find the data assets they need.
Allows the data community to annotate data assets with descriptive metadata, unlocking information that might only be held in the minds of experts.
Often, data catalogues provide automated data discovery and tagging of data assets.
The data catalogue links data assets to the expert or team responsible for the data.
May include details of data assets that are external to the organisation
In an ideal world, some level of integration between the data catalogue and CMDB would be beneficial. IT teams might be upgrading a software application and need to use the CMDB to determine which data assets will be impacted by the change. Data lineage information in the CMDB would provide details of downstream data assets that are indirectly impacted.
In summary, CMDBs and Data Catalogues are very different but there is a clear need for both tools; a data catalogue helps support an organisational data culture, a CMDB will allow IT teams to evaluate the effect of changes to the IT environment on related data assets.