Data Warehousing

Based on an article in 3X/400 Systems Management by Michael Corcoran.

roduction data, the information created and maintained by our OLTP (On Line Transaction Processing) Systems is rarely in a format structured to enable rapid decision making. The answer is often a data warehouse, a relational repository of information, designed especially for end-user query, analysis and reporting. The basic concept of a data warehouse is to separate real-time data used by production applications from historical data needed by knowledge workers. By creating "metadata", or data about data, an effective warehouse transforms raw transaction data into a form more meaningful to analytical tools. A data warehouse is not something you can buy. It is something you build. Each one has a unique architecture derived from individual needs of the organization. Building a data warehouse is often the first step towards client/server. It involves much more than just copying data from a production database into another and turning the end-users loose with a PC-based query tool. Significant problems are inherent in making the raw data of operational systems available to these tools.

Two Basic Approaches

There are two basic ways to approach data warehouse development. First is the top-down approach which builds an enterprise wide warehouse all at once. This takes tremendous planning and enormous resource commitment to implement, but the benefit is a new information infrastructure that enhances the entire organization.

The second approach is from the bottom up. This involves delivering a small departmental warehouse implementation that evolves to include other departments. Most IS organizations favor the bottom-up approach to allow them to show results more quickly and demonstrate success early to justify continuation of the project.

The key to bottom-up implementation is first establishing an architecture that allows the data warehouse to grow and evolve. The simplest warehouse involves a two-tiered homogeneous architecture - DB2 for OS/400 data is periodically extracted to another DB2 for OS/400 database on the host. Then a data access product reads, analyzes and re-posts on the warehouse data from a front-end graphical client. This is fairly easy to implement with the new V3R1 ODBC drivers into DB2. This approach is often bolstered by a hefty amount of custom RPG code to provide distributed data integrity. ODBC offers a point-to-point solution for quick and dirty relational database transfers.

Long Term Implementation

A better long-term approach is to implement a three-tiered architecture that uses a separate middle-ware layer for data access, connectivity and integrity. First tier is the AS/400 where the production application runs. Second tier is the departmental server close to the warehouse users (this is often a server model AS/400). The third tier is the desktop. Three tiers allows for effective division of labor - the host CPU remains devoted to real-time production applications, the server is optimized for relational query processing and reporting, while the desktop manages resource-intensive personal computational tasks and graphical presentation of the data. This desktop layer is often called the business intelligence layer. It needs to shield the users from where that data resides, how it is stored and the type of tool delivering it to a screen.

Replication is a technique often used to provide the warehouse with its own copy of the data. Data in the warehouse is not always completely synchronized with production versions. Intelligent replication allows data to be transformed and enhanced. It provides the warehouse with its own copy of the data and can minimize network traffic. Data warehousing assists in off-loading query processing from production database.

Developing a data warehouse is no different from any other IT project. It requires careful planning, requirements definition, design, prototyping and implementation. A bottom-up development allows developers to approach complex warehouse projects one piece at a time to iteratively deliver results that will quickly prove their worth.

Most teams don't accurately estimate the real costs associated with a comprehensive warehouse. It is not just hardware and software but on-going support requirements and the costs of reacting to constantly changing business processes. Careful thought to the initial warehouse architecture is the best way to contain costs and provide future growth. The challenge to warehouse designers is to present the data in a user friendly manner that works with the user's query tools. It is necessary to have clean data and translate it into a decision support format for user viewing. T < G