Planning a project
Typically, there are three key players involved in building an ETL process: a database administrator, a data modeler, and an ETL developer. Sometimes, one person fills multiple roles. The key factor for success when creating an ETL process is to diligently plan the project before starting the ETL process development.
Planning a project requires:
*Knowing your data sources and completing a detailed mapping specification. A successful ETL process depends entirely on a completed mapping specification.
*Documenting how you are going to extract, clean, and transform your data sources to add to your database table definitions.
*Establishing rules that define how to update database tables.
This is an important factor in the ETL design. Depending on the nature of the database, the update is as small as a data refresh, cleaning old data, and loading new data. In some occasions, the update requires appending data based on complex condition such as keyed reads or joins against an existing dimension before writing the new or updated rows to a database table. Building these rules can potentially add days or weeks to your project timeline.
In the planning process you must define:
*Input and output record layouts
*The location of source and target files and databases
*File and table sizing information
*File and table volume information
*Documentation on how to transform the data