Data transformation is required because data stored in disparate source systems may be stored in different formats, with different codes, or using different data types. For example, one system might store a Yes as 1 and a No as 0, while another system may store the strings 'Y' and 'N.' If data is pulled from two systems and a Yes/No field is needed, then the data must be transformed so that it is consistent in the warehouse. Data consistency is one of the cornerstones of a data warehouse; all data must be in the same format for analysis to be effective.
Other issues arise when considering items such as currencies. In a global business, there are often source systems in each country recording receivables, and normally the business systems store these receivables in the local currency. If a business has branches in London, Paris, Tokyo, and New York, it's likely that data is stored in Pounds, Euros, Yen, and US Dollars. In building the warehouse, sales measured in Yen cannot simply be added to sales in US Dollars in order to get non-European sales. Instead, the data must be transformed by applying some form of currency conversion to the data before storing it in the warehouse. Whether this is done by using a day-to-day exchange rate table, the average exchange rate over a period of time, or any other method, is strictly up to the business. Therefore, business requirements often come into play during the design phase of the ETL process.
While this section on ETL will be relatively brief, there is one point that is absolutely essential: ETL often consumes the majority of the effort in building a data mart or warehouse. Some projects spend up to 80 percent of the total time in the ETL phase. This is due to a number of factors, including the complexity of identifying and extracting back end data, applying complex transformations to data to meet business rules, handling data for slowly changing dimensions, and working with dirty data. Most warehousing projects continue to tweak and maintain their ETL process long after the warehouse is in production; be sure to include ongoing maintenance in any project and budget projections.
No comments:
Post a Comment