Based on an article in 3X/400 Systems Management by Michael Corcoran.
roduction data, the information created and maintained by our OLTP (On Line Transaction
Processing) Systems is rarely in a format structured to enable
rapid decision making. The answer is often a data warehouse,
a relational repository of information, designed especially for
end-user query, analysis and reporting. The basic concept of a data warehouse is to separate
real-time data used by production applications from historical
data needed by knowledge workers. By creating "metadata",
or data about data, an effective warehouse transforms raw transaction
data into a form more meaningful to analytical tools. A data
warehouse is not something you can buy. It is something you build.
Each one has a unique architecture derived from individual needs
of the organization. Building a data warehouse is often the first
step towards client/server. It involves much more than just copying
data from a production database into another and turning the end-users
loose with a PC-based query tool. Significant problems are inherent
in making the raw data of operational systems available to these
tools.
There are two basic ways to approach data warehouse development. First is the top-down approach which builds an enterprise wide warehouse all at once. This takes tremendous planning and enormous resource commitment to implement, but the benefit is a new information infrastructure that enhances the entire organization.
The second approach is from the bottom up. This involves delivering a small departmental warehouse implementation that evolves to include other departments. Most IS organizations favor the bottom-up approach to allow them to show results more quickly and demonstrate success early to justify continuation of the project.
The key to bottom-up implementation is first establishing
an architecture that allows the data warehouse to grow and evolve.
The simplest warehouse involves a two-tiered homogeneous architecture
- DB2 for OS/400 data is periodically extracted to another DB2
for OS/400 database on the host. Then a data access product reads,
analyzes and re-posts on the warehouse data from a front-end graphical
client. This is fairly easy to implement with the new V3R1 ODBC
drivers into DB2. This approach is often bolstered by a hefty
amount of custom RPG code to provide distributed data integrity.
ODBC offers a point-to-point solution for quick and dirty relational
database transfers.
A better long-term approach is to implement a three-tiered architecture that uses a separate middle-ware layer for data access, connectivity and integrity. First tier is the AS/400 where the production application runs. Second tier is the departmental server close to the warehouse users (this is often a server model AS/400). The third tier is the desktop. Three tiers allows for effective division of labor - the host CPU remains devoted to real-time production applications, the server is optimized for relational query processing and reporting, while the desktop manages resource-intensive personal computational tasks and graphical presentation of the data. This desktop layer is often called the business intelligence layer. It needs to shield the users from where that data resides, how it is stored and the type of tool delivering it to a screen.
Replication is a technique often used to provide the warehouse with its own copy of the data. Data in the warehouse is not always completely synchronized with production versions. Intelligent replication allows data to be transformed and enhanced. It provides the warehouse with its own copy of the data and can minimize network traffic. Data warehousing assists in off-loading query processing from production database.
Developing a data warehouse is no different from any other IT project. It requires careful planning, requirements definition, design, prototyping and implementation. A bottom-up development allows developers to approach complex warehouse projects one piece at a time to iteratively deliver results that will quickly prove their worth.
Most teams don't accurately estimate the real costs
associated with a comprehensive warehouse. It is not just hardware
and software but on-going support requirements and the costs of
reacting to constantly changing business processes. Careful thought
to the initial warehouse architecture is the best way to contain
costs and provide future growth. The challenge to warehouse designers
is to present the data in a user friendly manner that works with
the user's query tools. It is necessary to have clean data and
translate it into a decision support format for user viewing.
T
<
G