Training and testing a predictive model

A decision tree must be trained on sample data. The decision tree learns with each successive application of the predictive model. Some patterns found by data mining algorithms, however, are invalid. Data mining algorithms often find patterns in the training set that are not present in the general data set. This is called overfitting. To solve this problem, test the predictive model on a set of data different from the training set. The learned patterns are applied to the test set and the resulting output is compared to the desired output. For example, a data mining algorithm that distinguishes spam from legitimate e-mails is trained on a set of sample e-mails. Once trained, the learned patterns are applied to a test set of emails. The accuracy of the predictive model is measured from how many e-mails it classifies correctly.