Consider an example where business analyst uses the systems containing operational data (the data that runs the daily transactions of your business). Analysts can use information about, which products were sold in which regions at what time of the year, to look for anomalies or to project future sales. However, there are several problems if analyst accesses operational data directly:
- He might not have the expertise to query the operational database. For example, querying IMS databases requires an application program that uses a specialized type of data manipulation language. In general, those programmers who have the expertise to query the operational database have a full-time job in maintaining the database and its applications.
- Performance is critical for many operational databases, such as databases for a bank. The system cannot handle users making ad-hoc queries.
- The operational data generally is not in the best format to be used for reporting queries
Data warehousing solves these problems. In data warehousing, you create stores of informational data data that is extracted from the operational data and then transformed for reporting and decision making. For example, a data warehousing tool might copy all the sales data from the operational database, perform calculations to summarize the data, and write it to a new database. End-users can query the new database (the warehouse) without impacting the operational databases.
- The purpose of data warehouse is to store data consistently across the organization and to make the organizational information accessible.
- It is adaptive and resilient source of information. When new data is added to the Data Warehouse, the existing data and technologies are not disrupted. The design of separate data marts that make up the data warehouse must be distributed and incremental. Anything else is a compromise.
- The data warehouse not only controls the access to the data, but gives its owners great visibility into the uses and abuses of the data, even after it has left the data warehouse.
- Data warehouse is the foundation for decision-making.
Data Warehouse Definition
"Data Warehouse is a Time Variant, Subject Oriented, Non-Volatile and Integrated collection of data, that enables easier decision making for management professionals."
Let us now understand what Time Variant, Subject Oriented, Non-Volatile and Integrated really means. In a typical Sales & Marketing firm there would be entities like Sales Rep, Customer, Products, Regions and Various Date Attributes. Typical OLTP systems are process oriented where the data is stored in highly normalized form. Refer to below diagram to understand how OLTP (Process Oriented) System gets a different view when moved to a DWH (Subject Oriented) System.
In typical OLTP systems data is very transactional. Every point in time there are a lot of data manipulation happening like inserts, updates, deletes along with data retrieval. While in DWH systems data is mass loaded at scheduled time and access is done through a reporting layer. This is why a DWH system is called as non-volatile as compared to OLTP systems.
As it is mentioned earlier as the current data in transactional storage (OLTP systems) is changing every fraction of seconds, the same is also reflected in the reports generated out of such transactional storage. Two reports generated in a single day but at two different points in time, the data in the reports would vary. As DWH systems store historic data and it's updated daily (for example) two reports generated at different points in time in a single day would be same.