Understanding Clustering
Video tutorial
Clustering enables you to organize data based on variables you specify. Usually, the clustering algorithm produces segments of data that help you identify groups with the largest number of attributes in common. In other words, clustering provides an idea about the similarity and differences between records in the same group.
After BIRT Analytics applies this algorithm, a new field is created in the selected table to group records into a specified number of clusters (N). Because each record is given a value for the clustering, you can see a count of records in each cluster in the Data Tree’s Discrete Values view. For example, customers grouped in the same category or cluster may have common demographic features.
You must use continuous variables because clustering calculates the distance between values to set up a group, and only fields with continuous values work for clustering. Continuous means that there are many discrete values. Categorical variables, or fields with few discrete values like gender or occupation, do not work.
To set up a clustering model, create a training process.
How to set up a training process
1  
*
*
*
*
2  
How to use the results
When training finishes, Results contains a list with all the groups that have been created, the number of records in each group, and the mean of each attribute used to set up the groups. Note that every mean value acts as a centroid of the group.
1  
2  
3  
4  
5  
6  
Video tutorial
Finding common groups in your data

Additional Links:

Copyright Actuate Corporation 2013 BIRT Analytics 4.2