Tuesday, August 11, 2009

SAS tutorial - generation of a sample business datawarehouse scenario

This tutorial shows how to use SAS to implement ETL process which generates a star schema datawarehouse architecture.

We assume that you already have basic SAS/BASE knowledge and are familiar with SAS environment, assigning libraries, running SAS programs.

The aim of this tutorial is to generate a datawarehouse structure which would help monitor performance of a sample business scenario described here:
A manufacturing company Data Warehouse Business Scenario - requirements from a palm and tropical plants nursery which implements an analysis and reporting system to track sales, costs, forecasts and business performance management figures.

Tutorial overview:
- First step will be to read dimensions and populate sample dimensions data
- Then a fact table will be created.
- In the next step we will randomly generate transactions for the fact table with sales data for three years. To generate the numbers we will use SAS random number generators with uniform and random distributions.
- The final tasks will be to validate and extract generated data and feed the reporting application.

Please also be aware of the fact that SAS is very powerful and flexible system and the things we show in this tutorial can be done in many different ways. It is just one way to get the expected results.

SAS tutorial chapters


  • 1. Load extracts into SAS - A couple of programs responsible for loading the dimensions extracts into SAS and an example on how to create dynamically additional dimension tables
  • 2. Populate dimensions - Example of how to populate random dimensions in a fact table
  • 3. Generate measures - Random generation of a given set of measures. The measures are generated randomly, however they apply business rules described in a business scenario
  • 4. Sales fact table - In that lesson we create a fact table with sales figures designed in a star schema datawarehouse architecture. Additionally, we perform a statistical analysis of the newly generated data
  • 5. Costs fact table - We randomly generate a fact table with costs. Costs are divided into fixed and variable costs and allocated on year and month detail level
  • 6. SAS ETL Process - Run the whole process in a sequence which may be considered as a representation of ETL Process in SAS. The process could also be set up in SAS ETL Studio or SAS Warehouse Administrator

  • Header and trailer structured textfile processing in SAS - Example from the Data Warehousing Tutorial

  • 1 comment: