Mapping data from files
The mapping process requires defining the source and the target data. The software allows you to validate the data connection and create the table column definitions. You can automatically import the target table structure from the source file or manually create the column definitions. When the table structure does not map the source file data precisely, the loading process cannot identify the data to load for each table column and finishes unsuccessfully.
To map data from files you need to know how to do the following:
*Defining a target database
*Defining a source file format
*Defining table columns
The example in “How to map data from a CSV file” provides more information about the mapping process.
Defining a target database
To define the target database and table, and the way the data is loaded, you must define the following information, as shown in Figure 3‑27:
Figure 3‑27 Creating a table mapping to a file
*Database
Type a name for the target database, or select a listed database name.
*Name
Type a unique name for a target data table.
*Description
Optionally, type characters that describe the target data table.
*Action
Select one of the following load operation types:
*Create
Adds a new table and overwrites any content stored in an existing table, using the displayed table name.
*Append
Adds records to any existing table that uses the displayed table name.
*Overwrite
Choose this option when the loading transaction overwrites the target table.
*Ignore
Choose this option when you must exclude the transaction from the load process
*Fail on error
Choose this option by toggling on or off the corresponding check-box. When the box is checked, any exception in the loading process will cause it to stop.
Defining a source file format
Based on your knowledge of source file properties, select formatting properties that most closely match those of the source file. More accurately selecting source file properties minimizes manual definition of target table properties.
*Type
Select the file type. The files can be flat, or delimited. The flat files do not have delimiters and qualifiers.
*Encoding
Select an encoding type from the list. Table 3‑2 lists and describes the file encoding types that BIRT Analytics supports.
Table 3‑2 Supported encoding
Encoding
Description
ASCII
American Standard Code for Information Interchange
CP1252
Windows Latin-1
BIG5
Chinese character encoding method used in Taiwan, Hong Kong, and Macau for Traditional Chinese characters
BIG5-HKSCS
Standard supplementary character set used for information exchange in Chinese between the public and the Hong Kong SAR Government
CP1252
Windows Latin-1
GB18030
Chinese government standard describing the required language and character support necessary for software in China
GB2312
Official character set of the People’s Republic of China, used for simplified Chinese characters
GBK
An extension of the GB2312 character set for simplified Chinese characters, used in the People’s Republic of China
ISO-8859-1
ISO 8859-1, Latin Alphabet No. 1
ISO-8859-2
Latin Alphabet No. 2
ISO-8859-3
Latin Alphabet No. 3
ISO-8859-4
Latin Alphabet No. 4
ISO-8859-5
Latin/Cyrillic Alphabet
ISO-8859-6
Latin/Arabic Alphabet
ISO-8859-7
Latin/Greek Alphabet
ISO-8859-8
Latin/Hebrew Alphabet
ISO-8859-9
Latin Alphabet No. 5
ISO-8859-13
Latin Alphabet No. 7
ISO-8859-15
Latin Alphabet No. 9
UTF8
8-bit UCS Transformation Format
UTF16
16-bit UCS Transformation Format, byte order identified by an optional byte-order mark
UTF16LE
16-bit Unicode Transformation Format, little‑endian byte order
UTF16BE
16-bit Unicode Transformation Format, big‑endian byte order
WINDOWS-1252
Windows Latin-1
*End Line
Select an end-of-line character from the list. Table 3‑3 lists and describes the end-of-line characters and results supported by BIRT Analytics Loader.
Table 3‑3 End-of-line characters and results
End-of-line character
Result description
CRLF
Carriage return and line feed
CR
Carriage return only
LF
Line feed only
*File
Select a source file name from the list.
*Delimiter
Select the character used to delimit or separate values in a delimited file only. The following delimiter characters are supported by BIRT Loader:
, ->| | : ; @ # " + - = '
*Header
Select Header option if the source file includes information in a header. A file header contains multiple column names, each separated by a delimiter character.
*Skip
In Skip, type a number that sets how many blank rows to insert between header information and record information in a file. If you select Header option, 1 appears in Skip by default.
*Offset
Define the initial position of each column when the file format is flat file.
*Qualifier
Select a qualifier character. BIRT Loader supports the following qualifier characters:
" ' ~
Choose None, if the file format does not use a qualifier.
*Wellformed
Select the Wellformed option to enforce verification that each record contains all columns.
Wellformed explanation:
*“Checked” - indicates the same number of columns in data source and table definition
*Unchecked” - extra columns are ignored and Null values are set for missing columns
Defining table columns
To automate defining the structure of a target data table, extract data from a source data file, using Get columns. During extraction, Get columns adds column type and index information to each data element. You must review and edit, as necessary, the type and index information for every column. Consider removing the index from every column not strictly requiring an index. Removing unnecessary indexes speeds data load and analysis processes, but may limit performing aggregation and link operations. These operations require indexed columns.
*View
Choose View to see the data in the file and check the connectivity to the file.
*Get columns
Choose Get columns to get the column names and type automatically from the source file.
To create a target data table manually, define each column using the New, Edit, Remove, and Duplicate options. While you edit the structure of a target data table, use Get columns to retrieve existing column information, then use Edit to make modifications. To create a new column, choose New, then provide information using Column Declaration, as shown in Figure 3‑28.
Figure 3‑28 Creating a new column
*New
*Creates a new column in the target table
*Name
Enter the name of the column.
*Description
Enter a description of the column.
*Type
Enter the data type. The supported data types are described in Table 3‑4.
Table 3‑4 Data types
Type
Description
String
A sequence of characters, up to 1024.
Integer
Integer (whole number) data from -2^31 (-2,147,483,648) through 2^31 - 1 (2,147,483,647).
Longint
Integer (whole number) data from -2^63 (‑9,223,372,036,854,775,808) through 2^63-1 (9,223,372,036,854,775,807).
Real
Floating precision number data with the following valid values: ‑3.40E + 38 through -1.18E - 38, 0 and 1.18E - 38 through 3.40E + 38.
Date
The default format is mm_dd_yyyy, where the date separator can be slash (/), comma (,), period (.), or no separator.
Datetime
Date and time data from January 1, 1400, through December 31, 9999, with an accuracy of three-hundredths of a second, or 3.33 milliseconds. The default format is yyyy_mm_dd_hh_MM_ss.
Time
The default format is hh_MM_ss.
Unicode
A sequence of Unicode characters, up to 1024.
*Size
The size of the column in characters, or digits.
*Format
The format of the columns of type Date, Datetime, and Time.
*Decimals
The number of digits to the right of the decimal point, or comma. Applies for data of type Real.
*Decimal separator
You can choose Point or Comma to visualize a decimal separator. Applies for data of type Real.
*Indexed
Indicates if the column is an index.
*Skip
Select Skip to exclude a new column from the load process.
*Edit
Opens the Table Declaration editor for a single column.
*Remove
Removes a column from the target table.
*Duplicate
Creates a new column by copying the properties of an existing column.
How to map data from a CSV file
The following procedure assumes that you have created a data file connection. The procedure uses as an example, a CSV file having the following structure:
PRODUCTNAME,QUANTITYINSTOCK,MSRP
1969 Harley Davidson Ultimate Chopper,7933,95.7
1952 Alpine Renault 1300,7305,214.3
1 In Options, choose New from file.
2 In Table Declaration, set properties for the target database, as shown in Figure 3‑29.
1 In Database, type a database name. Alternatively, select an existing database name from the list of target databases.
2 In Name, type the name a target table to which you will map data. For example, CarInventory.
3 In Description, type characters that describe information the table will contain, such as Car Inventory and MSRP.
4 In Action, select Create.
5 Select Overwrite.
Figure 3‑29 Defining file structure
3 Select options that match properties of the source file, as shown in Figure 3‑29.
1 In Type, select Delimited.
2 In Encoding, select ASCII.
3 Select Header. In this example, 1 appears in Skip. Edit the value in Skip to match the number of header rows in the source data file.
4 In End Line, select CrLf.
5 In File, select a source file name from the list.
6 In Delimiter, select comma (,).
7 In Qualifier, select ".
8 Select Wellformed.
4 Choose View to validate the data connection. The file content appears, as shown in Figure 3‑30. Choose OK.
Figure 3‑30 Viewing file content
5 Choose Get columns to import the file metadata.
During the metadata import process, Get columns reads column name information from the header row in the source file, determines a type for each column, and indexes each column.
Definitions for each new column created in the target table appear listed in Table Declaration, as shown in Figure 3‑31.
Figure 3‑31 Getting column definitions
6 To modify properties for a column, select a listed column definition and choose Edit.
1 In Column Declaration, select or modify any available properties.
2 To remove index information from a column, deselect Indexed.
3 To exclude a column from loading, select Skip.
4 Choose OK.
For example, change the type of the MSRP column from Integer to Real, and set the column to not indexed, as shown in Figure 3‑32.
Figure 3‑32 Editing column definition properties
7 To close a selected database table, in Table Declaration, choose OK.
8 In Project, choose Save. Choose Yes to confirm saving the script for the current project, as shown in Figure 3‑33.
Figure 3‑33 Saving a project