As discussed in earlier article Data Warehouse Defined, we saw that the data in data warehouse is pulled from various OLTP or legacy systems, staged and then loaded in a Data Warehouse. Broadly below elements are involved when you create a data warehouse.
Typically in any organization the data is stored in various databases, usually divided up by the systems. There may be data for marketing, sales, payroll, engineering, etc. These systems might be legacy/mainframe systems or relational database systems.
The data coming from various source systems is first kept in a staging area. The staging area is used to clean, transform, combine, de-duplicate, household, archive, and to prepare source data for use in data warehouse. The data coming from source system is kept as it is in this area. This need not be based on relational terminology. Sometimes managers of the data are comfortable with normalized set of data. In these cases, normalized structure of the data staging storage is certainly acceptable. Also, staging area doesnt provide querying/presentation services.
Once the data is in staging area, it is cleansed, transformed and then sent to Data warehouse. You may or may not have ODS before transferring data to Data Warehouse.
The data in Data Warehouse has to be easily manipulated in order to answer the business questions from management and other users. This is accomplished by connecting the data to fast and easy-to-use tools known as Online Analytical Processing (OLAP) tools. OLAP tools can be thought of as super high-speed forklifts that have knowledge of the warehouse and the operators built into them in order to allow ordinary people off the street to jump in and quickly find products by asking English-like questions. Within the OLAP server, data is reorganized to meet the reporting and analysis requirements of the business, including:
- Exception reporting
- Ad-hoc analysis
- Actual vs. budget reporting
- Data mining (looking for trends or anomalies in the data)
In order to process business queries at high speed, answers to common questions are pre-processed in some OLAP servers, resulting in exceptional query responses at the cost of having an OLAP database that may be several times bigger than the data warehouse itself.
Data mart is a logical subset of complete data warehouse. It is often viewed as the restriction of data warehouse to a single business process or to a group of related business processes targeted toward a particular business group. For example an organization may have a data mart for Sales or Inventory.
Data Warehouse Tools
|ETL Tools||Informatica PowerCenter, IBM Data Stage, Ab Initio, BO Data Integrator, Clover.ETL, DMExpress, Microsoft SSIS, Oracle Warehouse Builder, LogiXML, Pentaho, Talend Open Studio|
|OLAP Server||Hyperion Essbase, IBM DB2 OLAP Server, Microsoft SQL Server OLAP Services, Oracle Express Server, Palo OLAP Server, Seagate HOLOS, SAS/MDDB|
|OLAP Tools||SAP Business Objects, Hyperion, IBM Cognos, MicroStrategy, Microsoft SSAS, Microsoft SSRS, Oracle Express Suite, Pentaho, Brio Query|
|Data Warehouse||Informix, IBM DB2 UDB, Microsoft SQL Server, Netezza, Oracle, RedBricks, Sybase, Teradata|