Specifying a join

To define the joins for an information object, use the Joins page. For example, you can create the following SQL fragment:

FROM Customers INNER JOIN Orders ON (Customers.custID = Orders.custID)

About joins

A join specifies how to combine data from two maps or information objects. The maps or information objects do not have to be based on the same data source. A join consists of one or more conditions that must all be true. In the resulting SQL SELECT statement, join conditions are linked with AND.

A join can consist of multiple conditions in the following form:

columnA = columnB

A join can have only one condition that uses an operator other than equality (=) or an expression, for example:

columnA < columnB

does not support right outer joins or full outer joins.

How to define a join condition

1 In the graphical information object editor, choose Joins.

2 In the upper pane, drag the join column from the first information object or map, and drop it on the join column in the second information object or
map.

The upper pane shows the join condition, like the one in Figure 3‑17, and the join columns and operator are listed in the lower pane.

3 In the lower pane, select the row that describes the new join condition.

4 If necessary, select a different join condition operator from the drop-down list. By default, uses the equality operator (=) to relate two columns.

Figure 3‑17 Joined columns from two information objects

5 To change a column name to an expression, select the column name, and type the expression, or choose Ellipsis to display the expression builder, as shown in Figure 3‑18.

Figure 3‑18 Defining a join

If the join has a condition that uses an operator other than equality (=) or an expression, the upper pane marks the join line with the symbol that appears in Figure 3‑19.

Figure 3‑19 A join condition that uses an expression or an operator other than equality

6 If the join consists of more than one condition, repeat this procedure for the other conditions.

7 Choose one of the following join types:

Inner join

Left outer join

8 Optimize the join.

How to delete a join condition

To delete a join condition, select the join condition in the upper pane of the graphical information object editor and press Delete.

Optimizing joins

You can improve a query’s performance by optimizing the joins. To optimize a join, you can specify the cardinality of the join. Specifying the cardinality of the join adds the CARDINALITY keyword to the Actuate SQL query.

Figure 3‑20 shows how to specify the cardinality of an information object or map in a join, whether an information object is optional, and how to specify a join algorithm in Joins.

Figure 3‑20 Optimizing a join

When you join information objects that are built from different data sources,
the Actuate SQL compiler chooses a join algorithm. If you have a good understanding of the size and distribution of the data, however, you can specify the join algorithm. Choosing the correct join algorithm can significantly reduce information object query execution time. Actuate SQL supports three join algorithms:

Dependent

Merge

Nested Loop

When you join information objects that are built from the same data source, specifying a join algorithm has no effect. The join is processed by the data source.

About dependent joins

A dependent join is processed in the following way:

The left side of the join statement is executed, retrieving all the results. The results are then processed one at a time (pipelined).

For each left side result, the right side of the join is executed, parameterized by the values provided by the current left side row.

A dependent join is advantageous when the cardinality of the left side is small, and the selectivity of the join criteria is both high and can be delegated to the data source. When the cardinality of the left side is high, a dependent join is relatively slow because it repeatedly executes the right side of the join.

Dependent joins can be used for any join criteria, although only join expressions that can be delegated to the right side’s data source result in improved selectivity performance.

About merge joins

A merge join is processed in the following way:

The left side of the join statement is executed, retrieving all the results sorted by the left side data source. The results are then processed one at a time (pipelined).

The right side of the join statement is executed, retrieving all the results sorted by the right side data source. The results are then processed one at a time (pipelined).

A merge join can only be used with an equijoin. A merge join has much lower memory requirements than a nested loop join and can be much faster. A merge join is especially efficient if the data sources sort the rows.

About nested loop joins

A nested loop join is processed in the following way:

The left side of the join statement is executed, retrieving all the results. The results are then processed one at a time (pipelined).

The right side of the join statement is executed. The results are materialized in memory. For each row on the left side, the materialized results are scanned to find matches for the join criteria.

A nested loop join is advantageous when the cardinality of the right side is small. A nested loop join performs well when the join expression cannot be delegated to the data source. A nested loop join can be used for any join criteria, not just an equijoin.

A nested loop join is a poor choice when the cardinality of the right side is large or unknown, because it may encounter memory limitations. Increasing the memory available to the Integration service removes this limitation. The Integration service parameter Max memory per query specifies the maximum amount of memory to use for an Integration service query.

How to specify a join algorithm

In Joins, select the appropriate join and choose one of the following from the Specify join algorithm drop-down list shown in Figure 3‑21:

Dependent

Merge

Nested loop

Figure 3‑21 Specifying the join algorithm

Improving the selectivity of a join

When you join maps that are based on different data sources, you can optimize the join by providing values for map and join column properties. Providing values for these properties improves the selectivity of the join. You should provide values for:

The maps’ Cardinality property

Cardinality specifies the number of rows returned by the map, or gives an approximation based on the possible parameter values.

The following join column properties:

Distinct Values Count

Distinct Values Count specifies the number of distinct column values.

Max Value

Max Value specifies the maximum column value.

Min Value

Min Value specifies the minimum column value.

Max Value and Min Value are not used for columns of character data type. When providing values for Max Value and Min Value, use the appropriate format. For example, if the column is of type TIMESTAMP, Max Value must be in the following format:

TIMESTAMP '2001-02-03 12:11:10'

You should also provide values for these properties for a column used in a WHERE clause.

How to provide a value for the Cardinality property

1 In Navigator, double-click one of the maps in the join.

2 On Output Columns, choose Show map properties, as shown in Figure 3‑22.

Figure 3‑22 Choosing Show map properties

3 In Properties, type a value for Cardinality.

4 Repeat this procedure for the other map.

How to provide values for join column properties

1 In Output Columns, select the join column.

2 In Properties, type values for:

Distinct Values Count

Max Value

Min Value

3 Repeat this procedure for the join column in the other map.

Creating a Cartesian join

By default, an information object user cannot create a query with two information objects, for example a Customers information object and an Orders information object, without explicitly joining the information objects. The absence of an explicit join is called a Cartesian join. Cartesian joins can consume database resources and return very large result sets. In some cases, however, it is acceptable to create a Cartesian join. For example, a map of a single-row system information table does not have to be joined to another map or information object. If it is acceptable for a map or information object to be used in a Cartesian join, set Allow this Source to be used in Cartesian Joins to True. To display this property for a map, choose Show map properties as shown in Figure 3‑22. To display this property for an information object, click in the white space in the upper pane of the query editor as shown in Figure 3‑57.