In every system implementation, data conversion is an important function. For example, when you implement an operational system such as a magazine subscription application, you have to initially populate your database with data from the prior system records. You may be converting over from a manual system. Or, you may he moving from a file-oriented system to a modern system supported with relational data-base tables. In either case, you will convert the data front the prior systems. So, what is so different for a data warehouse? I low is data transformation for a data .warehouse inure involved than for an operational system'?
Again, as you know, data for a data warehouse comes from many disparate sources. II' data extraction for a data warehouse poses great challenges data transformation presents even greater challenges. Another factor in the data warehouse is that the data feed is not just an initial load. You will have to continue to pick up the ongoing changes from the source systems. Any transformation tasks you set up for the initial load will be adapted for the ongoing revisions as well.
You perform a number of individual tasks as .part of data transformation. First, you clean the data extracted from each source. Cleaning may just be correction of misspellings or may include resolution of conflicts between state codes and zip codes in the source data, or may deal with providing delimit values for missing data elements, or elimination of duplicates when you bring in the same data from multiple source systems.
Standardization of data elements forms a large part of data transformation. You standardize the data types and field lengths for same data elements retrieved front the various sources. Semantic standardization is another major task. You resolve synonyms and homonyms. When two or 1110re terms from different source systems mean the same thing, you resolve the synonyms. When a single term means many different things in different source systems, you resolve the homonym.
Data transformation involves many forms of combining pieces of data from the differ-em sources. You combine data from single source record or related data elements from many source records. On the other hand, data 'transformation also involves purging source data that is not use rut and separating out source records into new combinations. Sorting and merging of data takes place on a large scale in the data staging area.
In many cases, the keys chosen for the operational systems are field values with built-in meanings. For example, the product key value may be a combination of characters indicating the 'product category, the code of the warehouse where the product is stored, and some code to show the production batch. Primary keys in the data warehouse cannot have built-in meanings. We will discuss this further in Chapter '10. Data transformation also includes the assignment of surrogate keys derived from the source system primary keys.
A grocery chain point-of-sale operational system keeps the unit sales and review amounts by individual transactions at the check-out counter at each store. But in the data warehouse it may not be necessary to keep the data at this detailed level. You may want h summarize the totals by product at each store for a given day and keep the summary total of the sale units and revenue in the data warehouse storage. hi such cases, the data trans formation function would include appropriate summation.
When the data transformation function ends, you have a collection of integrated dat that is cleaned, standardized, and summarized. You now have data ready to load into ear data set in your data warehouse.