Planning a project
Typically, there are three key players involved in building an ETL process: a database administrator, a data modeler, and an ETL developer. Sometimes, one person fills multiple roles. The key factor for success when creating an ETL process is to diligently plan the project before starting the ETL process development. Planning a project requires:
*Knowing your data sources and completing a detailed mapping specification. A successful ETL process depends entirely on a completed mapping specification.
*Documenting how you are going to be extracting, cleansing, and transforming your data sources to get to your database table definitions.
*Establishing rules that define how to update database tables. This is an important factor in the ETL design. Depending on the nature of the database, the update can be as simple as a data refresh, cleaning old data, and loading new data. In some occasions, the update could require appending data based on complex criteria such as keyed reads or joins against an existing dimension before writing the new or updated rows to a database table. Building these rules may potentially add days or weeks to your project timeline.
In the planning process you must define:
*Input and output record layouts
*The location of source and target files and databases
*File and table sizing information
*File and table volume information
*Documentation on how the data will be transformed, if at all