TORONTO USERS GROUP for Midrange Systems
TUG eServer magazine March 1996: Volume 11, Number 4

The Power of Transformational Data Replication

By Nigel Stokes

One of the top items on every AS/400 shop's wish list is the ability to easily manage and manipulate the huge volume of data associated with today's corporate operations. Data needed for query or data warehousing often must be transformed and moved to different machines. Tools that enhance the ability to manage data assets can add real value.

As data volumes are increasing, batch windows are evaporating. More and more sites are finding it difficult to get the batch windows required for widely dispersed geographic or global operations. In addition, service levels are increasing. Customers are demanding better data availability through a range of access tools and database platforms. Pressing business issues, including mixed applications and operating systems, distributed databases, support for legacy applications and network availability, are driving most IS shops to a wide range of innovative tools and techniques.

Simply acquiring more hardware is not the answer. Instead, better middleware solutions are needed to help I/S get more out of existing AS/400 installations. One of these solutions is intelligent transformational data replication. In many environments replication has been demonstrated as highly effective in helping I/S achieve efficient management and control over data. Although the concept has typically been associated with data warehousing, replication has demonstrated benefits in all areas of IS management, including application development, database administration, workload management and operations.

Intelligent replication involves automatically propagating or copying data from one system to another or within one computer from one file set to another. Depending on the replication algorithm, data can be copied selectively and filtered based on defined criteria such as column or row characteristics. During the intelligent replication process data can also be transformed and enhanced. Specific coded field values can be re-mapped to new value sets.

Software vendors, including IBM, Sybase, Oracle, Ingres and DataMirror are now delivering replication products. Microsoft will soon be on the bandwagon with a replication product for NT platforms.

Replication has real potential in many areas and provides a simplified alternative for implementing distributed DBMS. Replication provides users with their own local copies of data, supporting localised processing and in some cases reduced network traffic.

Benefits of Replication

Improved Data Access

Replication improves data access by creating query databases that can be maintained in real-time on separate AS/400 computers or as secondary copies on a single machine. These duplicate query databases can even be on completely different relational databases or operating system environments run on non AS/400 technology.

Duplicate databases assist in off-loading query processing from production systems. For decision support systems (DSS) and executive information systems (EIS) this can be a major boon, because of resource-intensive global queries and summaries frequently used by these systems. Up to date, duplicate versions of data can be maintained on remote computers, improving data availability at distributed sites. AS/400 data access tools such as Query, BRIO Query or a whole list of third party access tools can then be used to perform data analysis.

Development Aid

Intelligent replication algorithms allow for the transformation and conversion of data without making changes to existing AS/400 application code. Data can be enhanced by value translation algorithms which substitute data content, change field length and type and even create new derived field values.

Using replication can make it easy to supply accurate and current production data to test environments. Even if the database format changes between releases of new applications or packages, replication can operate as an effective data bridge between the old application and the new. Replication helps address problems of AS/400 release management, upgrades and parallel testing of old and new releases without double entry of data. Nothing can test new applications releases like the real-time flow of a representative sample of production data. Finally, with AS/400 flat-file support, replication products can provide you with the ability to provide relational SQL mode access to data generated by S/36 mode legacy applications.

Workload Management and Operations

Replication can enhance operations by extending AS/400 batch processing windows and improving data security. Fully synchronised AS/400 databases can be maintained on one or more computers for query, backup and reporting purposes.

With the appropriate operational controls, two-way replication between AS/400's can make 7 day / 24-Hour online operation feasible, since it reduces the need to lock-out online transactions in order to complete batch processing tasks.

With two-way replication, the main AS/400 production database is replicated continuously or on a net change basis to a target database for batch processing. The target database can then be used to run backups, standard reporting and even batch data summarisation jobs without having to interrupt the operation of the production system. While the target batch database is off-line, on-line changes can be stored on a net change basis, and then fully replicated when the target batch database is available again. Once the target database has been updated with batch results it becomes the source for those changes which are replicated to the on-line version of the AS/400 database. To make this technique work effectively your replication algorithm must detect and avoid recursive updates between the AS/400 on-line source and batch target systems.

Conversion Aid

Replication can be a time-saving conversion aid and utility when installing new applications or migrating to new computer platforms. Existing AS/400 applications can be run in parallel, while replication re-maps the data to the format needed for new applications. To carry out this re-mapping, intelligent data re-mapping features are required. Re-mapping focuses on processing the record layout and content of the records being replicated. As a result, data from existing applications can be used to populate and synchronise data to new applications on a real-time or periodic basis.

Platform and Database-Independent Replication

Data replication can be used to bridge AS/400 applications to a wide variety of different computer platforms and proprietary databases. Replication software often provides both an AS/400 source component and a target system component. Provided that additional platform-independent target components are available, replication can act as an effective bridge to new environments and allow coexistence with existing systems.

Reduced Communication Costs

By transmitting only changes across a network, (e.g. adds, modifies and deletes as they occur), net change data replication substantially reduces the cost of maintaining synchronisation across duplicate AS/400 databases. Change synchronisation can be scheduled for off-peak hours, further reducing costs.

"I See a Future Here"

With the major revitalisation and repositioning being undertaken by IBM it is certain that the AS/400 has a bright and promising future. It provides customers with a robust, mid-range platform that offers good price/performance characteristics and low operating costs.

However, it's becoming clear that this future will include new roles for the AS/400. It is being "opened up" with improved TCP/IP and POSIX compliant operating system services for UNIX application vendors. The new role of the AS/400 will be increasingly flexible, requiring coexistence with heterogeneous hardware and database platforms.

Replication will play a role in ensuring the future of the AS/400, by allowing a greater degree of cross-platform independence. Given its technically robust design and low cost operation, the AS/400 is suitable for new roles as a data warehouse, data distributor and network server.

Replication also shows promise, not just as solution to short-term problems of opening up legacy applications and application migration, but also for the practical implementation of distributed DBMS and client/server computing.

Replication offers significant benefits to most AS/400 shops. It should be considered as one of the new middleware tools that can help with many day-to-day problems. You will need to complete a comprehensive analysis of what data you need, and where you need it as you step into this technology. As a rule of thumb, it is important to keep it simple at first as you prove out the real tangible benefits of data replication technology.

What Is Intelligent Replication?

Replication is the process of propagating existing databases to selected target databases residing on other systems, or residing on the same computer. In its very simplest form this might mean simply copying data or transferring files from one machine to another.

Intelligent data replication provides the ability to select which segments of existing databases will propagated, and provides a variety of data enhancement features such as value translations, column mappings and datatype conversions.

Typically, implementing replication involves choosing replication tools from any one of a number of software vendors. A good replication tool should meet the following criteria.

asynchronicity - It should be possible to choose the timing of replication to coincide with off-peak periods, or if needed to support continuously replication. Replication jobs should be able to be regularly scheduled, or run on an as-needed basis. Update processes should be applied asynchronously, without relying on two-phase commit logic at the source database. Updates should not tie up lifeblood production systems, but should be applied as CPU cycles are available.

selectivity - Updates can be distributed based on row and column selection. Rather than copy an entire database, changes can be distributed on a Net Change basis, record-by-record. To be even more selective, changes can be distributed conditionally, based on critical column updates, or row selection criteria.

plasticity - Replication is an excellent opportunity for transforming data. The replication tool should provide the ability to apply a wide variety of data type transformations, including NULL and date type transforms, result field definitions.

heterogeneity - Replication should make it possible to bridge computer platforms, including network types, operating systems, databases and file structures. In addition, the tool should provide database and operating system independence and link legacy data to these new environments. T < G

Nigel Stokes, CEO DataMirror - DataMirror Corporation is a Toronto based software product company focused on the problems of Transformational Replication. With unique expertise in DB2/400 and other relational databases, the company helps customers get their " Data where it's needed ".