The concept of industry data models (IDM) appears to be gaining traction once again, with the buzz of Microsoft’s acquisition of ADRM Software, a leading developer of IDMs.
So what are IDMs? They are logical data models that are widely applied in an industry (e.g. finance, travel, automotive). Typically these are relational models, conveyed in an entity relationship diagram (ERD).
IDMs are based on the assumption that companies within the same industry utilise the same fundamental data and types of information.
To provide a sense of the size and complexity of an IDM, ADRM Software states that a typical IDM may consist of around 300 entities and 2,500 attributes
When I first looked at an IDM, my impression was it resembled a normalised data warehouse. I was immediately reminded of Bill Inmon who developed a data warehouse modelling approach using a normalised schema back in the 90s. These days, the Inmon “Corporate Information Factory” architecture has largely been dropped in favour of dimensional models, advocated by Ralph Kimball. In my 20 years of working in the data warehousing field, I’ve never met anyone that has seen a data warehouse implementation following Inmon’s principles… until recently. One of the main disadvantages of Inmon’s heavily normalised format is it locks the data in difficult-to-query normalised structures, often requiring complex ETL to load, complex SQL to query and a steep learning curve for business users.
It turns out that my initial impression of IDMs was along the right lines, and is confirmed in the Best-practice Vertical-industry Enterprise Data Model white paper available on the ADRM web site. The white paper was written in 2004 and refers to Bill Inmon’s work on industry data models that took place in the 1990s.
The vertical-by-vertical nature of IDMs also reminded me of Kimball’s book, The Data Warehouse Toolkit, which takes a series of industry patterns (financial services, education, telecommunications, etc.) to demonstrate the concepts of dimensional modelling. That’s where the similarity ends though; IDMs are typically 3NF or OLTP models that would be better considered in the design of a custom application, rather than a dimensional data warehouse.
So, why Microsoft’s acquisition of ADRM Software? Microsoft have already made some moves towards standard data models with the Common Data Model (CDM). CDM is not an industry standard model in the sense that it’s aligned to a specific vertical (although there are a few vertical models available at the time of writing e.g. healthcare), but it does attempt to provide a standard schema, allowing data to be brought together in a canonical model. My guess is Microsoft will use ADRM Software’s industry-vertical data models to develop industry-vertical aligned flavours of Dynamics 365 applications, each with an industry standard data model defined in the CDM (see Industry Solution Accelerators).
What benefits might standard industry data models bring? Anyone experienced in dimensional data modelling would admit there are similar data warehousing patterns within each industry-vertical, but these are Kimball denormalised dimensional models, not industry models in 3NF. I see standard industry data models bringing the following benefits:
- The use of standard data definitions, entities and formats may help regulated industries with compliance
- In certain industries, where competitors exchange data, this will be straightforward if they are all aligning to a standard model
- ETL solutions and dimensional models derived from the IDM can be shared across organisations within the same industry
- The models may be used as a basis and quickly modified for a specific organisation, rather than starting from scratch
- The concept of a best practice, industry-vertical model, may allow organisations to reduce costs and become more efficient
- Data defined that is common with other industries (e.g. customer, geography, accounts) will be defined consistently, supporting an organisation’s entry into a new industry without impacting the existing data model.
- Using the models for the merger or acquisition of two businesses within the same industry
There may be some benefits but what are industry data models for? ADRM Software’s white paper describes the purpose of an enterprise model as “ultimately to provide an architectural foundation upon which to build applications”. Organisations rarely build LOB applications from scratch (where industry data models might provide a benefit), preferring to buy 3rd party vendor applications and configuring them for their needs. Complex ETL would be required to transform an organisation’s data into the normalised data model, that may not provide mechanisms to capture data changing over time, such as the concept of slowly changing dimensions in a dimensional model.
It’s interesting to see logical data models as an artefact that would be maintained by enterprise architecture in Svyatoslav Kotusev’s EA on a page. IDMs would provide a solid, best practice basis for an EA’s logical data model and associated enterprise data governance. Such a reference model would help application groups with their design and/or configuration, particularly when combined with a data catalogue providing detailed definitions of each entity and attribute.
I’m curious to see where IDMs and Microsoft’s acquisition of ADRM Software will lead. I am a natural sceptic, wondering if IDMs are an anachronism, from the late 20th century where mainframes acted as an organisation’s central data repository. At a time where schemaless data stores are gaining much traction with development teams, data intelligence solutions are moving away from enforcing schemas and referential integrity (data lakes and ‘schema on read’), perhaps the resurgence of industry standard data models is an expected reaction.