TORONTO USERS GROUP for Midrange Systems
TUG eServer magazine November 1995: Vol. 11, No. 2

Data Warehousing 101

By Jacqueline Jansen

Data Warehousing Wouldn't it be nice to work in an industry that didn't expect you to learn new buzzwords every month. Here are some of the current buzzwords in our industry: Data warData Warehousing 101ehousing, data mining, data marts, metadata, multi-dimensional databases, OLAP (On-line Analytical Processing), and Business Intelligence. This article will try to bring meaning and some level of priority to these terms.

Many companies today suffer from an excess of raw information. What they want is to unlock the information and obtain knowledge. The purpose behind data warehousing is to take a company's production data and transform it into the knowledge required to give the company a competitive advantage. In the past this type of decision making knowledge was required at the executive and upper management levels. We had EIS systems for this. Today with rightsizing and empowerment many levels of an organization require access to this type of knowledge. We now have data warehouses which are systems used exclusively for decision support and data mining activities.

The information that most companies have stored in their computer systems today is called operational or production data. Operational data is updated frequently. An update will overlay the previous entry. "What are the year to date sales for product A?" is easy to answer. You wouldn't normally have the information about the sales and profit of product A over the last three years. Operational data is optimized to give very good response time for transaction processing. Operational data is stored in a normalized form in a relational database.

These characteristics are quite different from the characteristics of a Business Intelligence solution. Business Intelligence is a relatively new term that encompasses decision support systems and executive information systems. The term 'Business Intelligence solutions' is used to describe tools that manipulate and analyze data and present the results. Data stored in a format that makes end user queries, reports, and analysis easier is called informational data. As opposed to operational data, informational data is infrequently updated. The data is optimized for decision support applications and is usually read-only.

A data warehouse consists of informational data. Informational data provides a base for the powerful analytical and decision making tools that exist today. Transforming operational data into informational data is one of the major tasks involved in creating a data warehouse. Some of the data will need to be enhanced by providing summaries rather than including all the detailed data. This summarized data gives tremendous performance benefits to executive queries. A time element is often added to the data. This gives you the capability of doing trend analysis which is missing from an operational database. A history of changed data is a rich source of information. Data from external sources such as Dow Jones, market analysis firms, and D&B may be added into your informational data. Data is organized into business subjects. The datamart or data warehouse should be organized the way the business professional using the data looks at it. A datamart is simply a departmental data warehouse. Some of the Decision Support packages listed in the AS/400 Client Series include IBM's Visualizer, BrioQuery and Impromptu from Cognos.

Data Mining tools are used to identify patterns, relationships or trends in your data without the end user having to formulate specific queries. IBM's Neural Network Utility for OS/400 can be used to develop data mining applications. Some examples where data mining tools are used include: analysis of consumer buying patterns in the retail industry, risk assessment in the finance industry, and fraud detection in the insurance industry.

Metadata is data about the data. End users need to understand what data is available to them, what this data represents and how current the data is.

Data Warehousing

Relational databases are two dimensional: rows and columns or fields and records. Multi- dimensional data structures or databases allow a user to view the data over more than two axis. For example you might ask to see sales by product by region over time.

Multi-dimensional databases are implemented in various ways: over a relational model using multiple indexes and arrays, or using an on-line analytical processing (OLAP) product. On the AS/400 DB2 for OS/400 is the relational database and business partner products such as Hoskyns Group PLC's AMIS, SAMACS' MIT/400 and Silvon Software, Inc.'s Data Tracker provide OLAP products.

IBM has announced a Data Warehousing category in the Client Series program which will include data transformation, metadata, database parallelism and data mining packages. The AS/400 product family includes the Advanced Server models which are specifically tuned for the types of workloads that are prevalent in a decision support and data warehousing environment. Data warehouses tend to grow and will often contain more data than a company's production system. DB2 Multisystem for OS/400 will allow up to 32 systems to be interconnected to allow the processing power and storage capacity of all the systems to be utilized. From a database perspective, these multiple AS/400s appear as a single, very large, and very powerful system with storage capacities up to 16,000 gigabytes (16 terabytes). The DB2 Symmetric Multiprocessing for OS/400 feature can be used to bring every processor available into resolving a query. The AS/400 system and DB2 for OS/400 will provide a scalable solution for data warehousing that spans from the smallest datamart to the largest enterprise data warehouse.

There are many excellent Business Intelligence tools on the market today and the majority of these are ODBC enabled. The AS/400 provides excellent ODBC support thus making your companies information available to your end users through a variety of interfaces.

The AS/400 with DB2 for OS/400 in conjunction with the many products available from other software vendors, provides an industry-leading information warehousing architecture from which data warehouses are easy to implement.

Making sense of the buzzwords is the first step to developing a data warehouse. Happy Warehousing!

T < G

Jackie Jansen is an AS/400 IT Specialist and Consulting SE, providing national technical and marketing support for the AS/400 in Canada. Jackie frequently speaks at AS/400 Technical Conferences and User Group meetings.