The Power of Transformational Data
Replication
By Nigel Stokes
One of the top
items on every AS/400 shop's wish list is the ability to easily
manage and manipulate the huge volume of data associated with
today's corporate operations. Data needed for query or data warehousing
often must be transformed and moved to different machines. Tools
that enhance the ability to manage data assets can add real value.
As data volumes are increasing, batch windows are
evaporating. More and more sites are finding it difficult to get
the batch windows required for widely dispersed geographic or
global operations. In addition, service levels are increasing.
Customers are demanding better data availability through a range
of access tools and database platforms. Pressing business issues,
including mixed applications and operating systems, distributed
databases, support for legacy applications and network availability,
are driving most IS shops to a wide range of innovative tools
and techniques.
Simply acquiring more hardware is not the answer.
Instead, better middleware solutions are needed to help I/S get
more out of existing AS/400 installations. One of these solutions
is intelligent transformational data replication. In many environments
replication has been demonstrated as highly effective in helping
I/S achieve efficient management and control over data. Although
the concept has typically been associated with data warehousing,
replication has demonstrated benefits in all areas of IS management,
including application development, database administration, workload
management and operations.
Intelligent replication involves automatically propagating
or copying data from one system to another or within one computer
from one file set to another. Depending on the replication algorithm,
data can be copied selectively and filtered based on defined criteria
such as column or row characteristics. During the intelligent
replication process data can also be transformed and enhanced.
Specific coded field values can be re-mapped to new value sets.
Software vendors, including IBM, Sybase, Oracle,
Ingres and DataMirror are now delivering replication products.
Microsoft will soon be on the bandwagon with a replication product
for NT platforms.
Replication has real potential in many areas and
provides a simplified alternative for implementing distributed
DBMS. Replication provides users with their own local copies of
data, supporting localised processing and in some cases reduced
network traffic.
Benefits of Replication
Improved Data Access
Replication improves data access by creating query
databases that can be maintained in real-time on separate AS/400
computers or as secondary copies on a single machine. These duplicate
query databases can even be on completely different relational
databases or operating system environments run on non AS/400 technology.
Duplicate databases assist in off-loading query processing
from production systems. For decision support systems (DSS) and
executive information systems (EIS) this can be a major boon,
because of resource-intensive global queries and summaries frequently
used by these systems. Up to date, duplicate versions of data
can be maintained on remote computers, improving data availability
at distributed sites. AS/400 data access tools such as Query,
BRIO Query or a whole list of third party access tools can then
be used to perform data analysis.
Development Aid
Intelligent replication algorithms allow for the
transformation and conversion of data without making changes to
existing AS/400 application code. Data can be enhanced by value
translation algorithms which substitute data content, change field
length and type and even create new derived field values.
Using replication can make it easy to supply accurate
and current production data to test environments. Even if the
database format changes between releases of new applications or
packages, replication can operate as an effective data bridge
between the old application and the new. Replication helps address
problems of AS/400 release management, upgrades and parallel testing
of old and new releases without double entry of data. Nothing
can test new applications releases like the real-time flow of
a representative sample of production data. Finally, with AS/400
flat-file support, replication products can provide you with the
ability to provide relational SQL mode access to data generated
by S/36 mode legacy applications.
Workload Management and Operations
Replication can enhance operations by extending AS/400
batch processing windows and improving data security. Fully synchronised
AS/400 databases can be maintained on one or more computers for
query, backup and reporting purposes.
With the appropriate operational controls, two-way
replication between AS/400's can make 7 day / 24-Hour online operation
feasible, since it reduces the need to lock-out online transactions
in order to complete batch processing tasks.
With two-way replication, the main AS/400 production
database is replicated continuously or on a net change basis to
a target database for batch processing. The target database can
then be used to run backups, standard reporting and even batch
data summarisation jobs without having to interrupt the operation
of the production system. While the target batch database is off-line,
on-line changes can be stored on a net change basis, and then
fully replicated when the target batch database is available again.
Once the target database has been updated with batch results it
becomes the source for those changes which are replicated to the
on-line version of the AS/400 database. To make this technique
work effectively your replication algorithm must detect and avoid
recursive updates between the AS/400 on-line source and batch
target systems.
Conversion Aid
Replication can be a time-saving conversion aid and
utility when installing new applications or migrating to new computer
platforms. Existing AS/400 applications can be run in parallel,
while replication re-maps the data to the format needed for new
applications. To carry out this re-mapping, intelligent data re-mapping
features are required. Re-mapping focuses on processing the record
layout and content of the records being replicated. As a result,
data from existing applications can be used to populate and synchronise
data to new applications on a real-time or periodic basis.
Platform and Database-Independent Replication
Data replication can be used to bridge AS/400 applications
to a wide variety of different computer platforms and proprietary
databases. Replication software often provides both an AS/400
source component and a target system component. Provided that
additional platform-independent target components are available,
replication can act as an effective bridge to new environments
and allow coexistence with existing systems.
Reduced Communication Costs
By transmitting only changes across a network, (e.g.
adds, modifies and deletes as they occur), net change data replication
substantially reduces the cost of maintaining synchronisation
across duplicate AS/400 databases. Change synchronisation can
be scheduled for off-peak hours, further reducing costs.
"I See a Future Here"
With the major revitalisation and repositioning being
undertaken by IBM it is certain that the AS/400 has a bright and
promising future. It provides customers with a robust, mid-range
platform that offers good price/performance characteristics and
low operating costs.
However, it's becoming clear that this future will
include new roles for the AS/400. It is being "opened up"
with improved TCP/IP and POSIX compliant operating system services
for UNIX application vendors. The new role of the AS/400 will
be increasingly flexible, requiring coexistence with heterogeneous
hardware and database platforms.
Replication will play a role in ensuring the future
of the AS/400, by allowing a greater degree of cross-platform
independence. Given its technically robust design and low cost
operation, the AS/400 is suitable for new roles as a data warehouse,
data distributor and network server.
Replication also shows promise, not just as solution
to short-term problems of opening up legacy applications and application
migration, but also for the practical implementation of distributed
DBMS and client/server computing.
Replication offers significant benefits to most AS/400
shops. It should be considered as one of the new middleware tools
that can help with many day-to-day problems. You will need to
complete a comprehensive analysis of what data you need, and where
you need it as you step into this technology. As a rule of thumb,
it is important to keep it simple at first as you prove out the
real tangible benefits of data replication technology.
What Is Intelligent Replication?
Replication is the process of propagating existing
databases to selected target databases residing on other systems,
or residing on the same computer. In its very simplest form this
might mean simply copying data or transferring files from one
machine to another.
Intelligent data replication provides the ability
to select which segments of existing databases will propagated,
and provides a variety of data enhancement features such as value
translations, column mappings and datatype conversions.
Typically, implementing replication involves choosing
replication tools from any one of a number of software vendors.
A good replication tool should meet the following criteria.
asynchronicity -
It should be possible to choose the timing of replication to coincide
with off-peak periods, or if needed to support continuously replication.
Replication jobs should be able to be regularly scheduled, or
run on an as-needed basis. Update processes should be applied
asynchronously, without relying on two-phase commit logic at the
source database. Updates should not tie up lifeblood production
systems, but should be applied as CPU cycles are available.
selectivity - Updates
can be distributed based on row and column selection. Rather than
copy an entire database, changes can be distributed on a Net Change
basis, record-by-record. To be even more selective, changes can
be distributed conditionally, based on critical column updates,
or row selection criteria.
plasticity - Replication
is an excellent opportunity for transforming data. The replication
tool should provide the ability to apply a wide variety of data
type transformations, including NULL and date type transforms,
result field definitions.
heterogeneity -
Replication should make it possible to bridge computer platforms,
including network types, operating systems, databases and file
structures. In addition, the tool should provide database and
operating system independence and link legacy data to these new
environments.
T
<
G
Nigel Stokes, CEO DataMirror -