TORONTO USERS GROUP
for Midrange Systems
TUG eServer magazine November
1995: Vol. 11, No. 2
Data Warehousing
101
Wouldn't
it be nice to work in an industry that didn't expect you to learn
new buzzwords every month. Here are some of the current buzzwords
in our industry: Data warData Warehousing 101ehousing, data mining, data marts, metadata,
multi-dimensional databases, OLAP (On-line Analytical Processing),
and Business Intelligence. This article will try to bring meaning
and some level of priority to these terms.
Many companies today suffer from an excess of raw information. What they want is to unlock the information and obtain knowledge. The purpose behind data warehousing is to take a company's production data and transform it into the knowledge required to give the company a competitive advantage. In the past this type of decision making knowledge was required at the executive and upper management levels. We had EIS systems for this. Today with rightsizing and empowerment many levels of an organization require access to this type of knowledge. We now have data warehouses which are systems used exclusively for decision support and data mining activities.
The information that most companies
have stored in their computer systems today is called operational
or production data. Operational data is updated frequently.
An update will overlay the previous entry. "What are the
year to date sales for product A?" is easy to answer. You
wouldn't normally have the information about the sales and profit
of product A over the last three years. Operational data is optimized
to give very good response time for transaction processing. Operational
data is stored in a normalized form in a relational database.
These characteristics are quite different
from the characteristics of a Business Intelligence solution.
Business Intelligence is a relatively new term that encompasses
decision support systems and executive information systems. The
term 'Business Intelligence solutions' is used to describe tools
that manipulate and analyze data and present the results. Data
stored in a format that makes end user queries, reports, and analysis
easier is called informational data. As opposed to operational
data, informational data is infrequently updated. The data is
optimized for decision support applications and is usually read-only.
A data warehouse consists of informational
data. Informational data provides a base for the powerful analytical
and decision making tools that exist today. Transforming operational
data into informational data is one of the major tasks involved
in creating a data warehouse. Some of the data will need to be
enhanced by providing summaries rather than including all the
detailed data. This summarized data gives tremendous performance
benefits to executive queries. A time element is often added to
the data. This gives you the capability of doing trend analysis
which is missing from an operational database. A history of changed
data is a rich source of information. Data from external sources
such as Dow Jones, market analysis firms, and D&B may be added
into your informational data. Data is organized into business
subjects. The datamart or data warehouse should be organized the
way the business professional using the data looks at it. A datamart
is simply a departmental data warehouse. Some of the Decision
Support packages listed in the AS/400 Client Series include IBM's
Visualizer, BrioQuery and Impromptu from Cognos.
Data Mining tools are used to identify
patterns, relationships or trends in your data without the end
user having to formulate specific queries. IBM's Neural Network
Utility for OS/400 can be used to develop data mining applications.
Some examples where data mining tools are used include: analysis
of consumer buying patterns in the retail industry, risk assessment
in the finance industry, and fraud detection in the insurance
industry.
Metadata is data about the data. End
users need to understand what data is available to them, what
this data represents and how current the data is.
Relational databases are two dimensional: rows and columns or fields and records. Multi- dimensional data structures or databases allow a user to view the data over more than two axis. For example you might ask to see sales by product by region over time.
Multi-dimensional databases are implemented in various
ways: over a relational model using multiple indexes and arrays,
or using an on-line analytical processing (OLAP) product. On
the AS/400 DB2 for OS/400 is the relational database and business
partner products such as Hoskyns Group PLC's AMIS, SAMACS' MIT/400
and Silvon Software, Inc.'s Data Tracker provide OLAP products.
IBM has announced a Data Warehousing
category in the Client Series program which will include data
transformation, metadata, database parallelism and data mining
packages. The AS/400 product family includes the Advanced Server
models which are specifically tuned for the types of workloads
that are prevalent in a decision support and data warehousing
environment. Data warehouses tend to grow and will often contain
more data than a company's production system. DB2 Multisystem
for OS/400 will allow up to 32 systems to be interconnected to
allow the processing power and storage capacity of all the systems
to be utilized. From a database perspective, these multiple AS/400s
appear as a single, very large, and very powerful system with
storage capacities up to 16,000 gigabytes (16 terabytes). The
DB2 Symmetric Multiprocessing for OS/400 feature can be used to
bring every processor available into resolving a query. The AS/400
system and DB2 for OS/400 will provide a scalable solution for
data warehousing that spans from the smallest datamart to the
largest enterprise data warehouse.
There are many excellent Business Intelligence
tools on the market today and the majority of these are ODBC enabled.
The AS/400 provides excellent ODBC support thus making your companies
information available to your end users through a variety of interfaces.
The AS/400 with DB2 for OS/400 in conjunction
with the many products available from other software vendors,
provides an industry-leading information warehousing architecture
from which data warehouses are easy to implement.
Making sense of the buzzwords is the first step to
developing a data warehouse. Happy Warehousing!
T
<
G
Jackie Jansen is an AS/400 IT Specialist and Consulting
SE, providing national technical and marketing support for the
AS/400 in Canada. Jackie frequently speaks at AS/400 Technical
Conferences and User Group meetings.