What is a data mart?

A data mart is a simple form of data warehouse, a data mart for a single subject domain, such as: sales, finance, marketing, etc. The data source of the data mart can be an operating system or a data warehouse. According to different data sources, it can be divided into independent data marts and dependent data marts. Generally, what we call data marts is based on data warehouses, that is, dependent data marts.

What is the difference between a data mart and a data warehouse?

Unlike data marts, data warehouses contain multiple subject domains, and an enterprise generally has one data warehouse, so it is also called a data center. An enterprise-level data warehouse needs to integrate data from many operational systems, clean and process it in the data warehouse, and then provide it to the data mart for business analysis and decision-making. It can be seen that the complexity of the data mart and the data to be processed are smaller than that of the data warehouse, so it is easier to build and maintain.

Why build a data mart?

The fast-moving, competitive business world has an insatiable need for timely, accurate information. Some IT experts believe that the inevitable result is the creation of data marts. Other experts question the effort and cost of users and customers. After all, can’t the same information be obtained directly from legacy systems and On Line Transaction Processing (OLTP) systems through ad hoc reports? In the business intelligence group at EDS, we are often asked this question. Experience has given us many opportunities to educate our peers and customers about the value of this useful technology.

So why would an organization build a data mart? While OLTP and legacy systems hold valuable information, extracting meaningful information from these systems can be difficult and slow. And these systems, while generally supporting reporting with predefined operations, often fail to support an organization’s needs for historical, federated, “smart” or easily accessible information. Because data is spread across many tables across systems and platforms, and is often “dirty”, containing inconsistent and invalid values, making it difficult to analyze. Data marts will combine data sources from different systems to meet business information needs.

Implemented effectively, data marts will provide quick and easy access to simple information and systematic and historical views. A well-designed data mart will:

Publish the information needed by specific groups of users without being constrained by the high demands and operational crises of the source system.
Supports access to nonvolatile business information. (Non-volatile information is updated at predetermined intervals and is not affected by ongoing updates to the OLTP system.)
Reconcile information from multiple operating systems in the organization, such as accounting, sales, inventory and customer management, as well as industry data outside the organization.
Provides cleansed data by defaulting to valid values, making values consistent across systems, and adding descriptions to make sense of implicit code.
Provides reasonable query response times for ad hoc analysis and predefined reports (different from the tuning needs required in OLTP systems).
Reduce the demands on legacy systems and OLTP applications by offering a choice of those applications to get more of the information you need.

The Construction Principle of Data Mart

Data marts are mainly used for analytical applications at the departmental level. The data mart generally adopts the dimensional model design method, and the data structure uses the star schema or the snowflake schema. Designing a dimensional model starts with identifying dimension tables, fact tables, and data granularity levels. The next step is to define relationships between fact and dimension tables using primary and foreign keys.

The primary key in the data mart is preferably a system-generated auto-incrementing single-column numeric surrogate key. After the model is established, the design ETL step extracts the data of the operational source system, and after data cleaning and data transformation, it is finally loaded into the dimension table and fact table in the data mart.

At present, the data structure of enterprises is generally handled by the data warehouse + data mart (DW+DM) structure. DW does the standardization of data cleaning and processing. The data mart is mainly responsible for multi-dimensional analysis, fixed reports and other data services.