Understanding data mining and predictive analytics
This information is a summary of the material available on Wikipedia. For more information about data mining and predictive analytics, see the following pages:
http://en.wikipedia.org/wiki/Data_mining
http://en.wikipedia.org/wiki/Predictive_analytics
Data mining is the process of finding patterns in large data sets. Data mining includes:
*Anomaly detection: The identification of unusual data records.
*Association rule learning: The identification of relationships between variables. For example, a supermarket gathers point-of-sale data. Using association rule learning, the supermarket can determine which products are frequently bought together and use this information for marketing purposes. This type of data mining is called market basket analysis.
*Clustering: Discovering groups and structures in the data that are similar.
*Classification: Determining the class of an object based on its attributes. For example, an e­mail program classifies e-mails as legitimate or as spam.
*Regression: Finding a function that models the data with the least error.
*Summarization: Providing a more compact representation of the data set, including visualization and report generation.
*Sequential pattern mining: Finding sets of data items that occur together frequently in some sequences. Sequential pattern mining is the basis for web user analysis, stock trend prediction, DNA sequence analysis, finding language or linguistic patterns from natural language texts, and using a history of symptoms to predict disease.
BIRT Analytics supports association rule learning, clustering, and classification (decision trees). BIRT Analytics also supports using time‑series prediction to produce short-term demand forecasts. For example, your sales data may contain a trend or a seasonal pattern.
Patterns identified by data mining can be further analyzed. A type of analysis that is of particular interest in business applications is predictive analytics. Predictive analytics is the process of analyzing data to make predictions about future or unknown events. In business, predictive models exploit patterns found in historical and transactional data to identify risks and opportunities. Predictive analytics is used in actuarial science, marketing, financial services, insurance, telecommunications, retail, travel, healthcare, pharmaceuticals, and other fields. In financial services, for example, credit scoring is a very common application of predictive analytics. Scoring models analyze a borrower’s credit history in order to rank borrowers by the likelihood that they will repay loans on time. One of the most widely used credit scores is the FICO score. Other applications of predictive analytics include:
*Customer relationship management (CRM): CRM uses predictive analytics in applications for marketing campaigns and customer services. These applications predict customers’ buying habits and identify issues that may result in the loss of customers. For example, promotional activities are based on predictive analytics.
*Clinical decision support systems: Clinicians use predictive analytics to determine which patients are at risk of developing conditions such as diabetes, asthma, and heart disease.
*Collections: Every portfolio has delinquent customers who do not make their payments on time. The financial institution has to undertake collection activities on these customers to recover the amounts due. Collection resources are wasted on customers who are difficult or impossible to recover. Predictive analytics can help optimize the allocation of collection resources by identifying the most effective collection agencies, contact strategies, and legal actions against each customer, thus significantly increasing recovery and reducing collection costs.
*Cross-selling: Businesses collect and maintain data on customers and sales transactions. Identifying relationships in the data can provide a competitive advantage. Predictive analytics can analyze customers’ spending, usage, and other behavior, leading to efficient cross sales. This results in greater profitability per customer and stronger customer relationships.
*Customer retention: Businesses must maintain customer satisfaction by rewarding loyalty and minimizing attrition. Businesses tend to respond too late to customer attrition, acting only after the customer has initiated the termination of service. At this stage, it is very unlikely that the customer will change their decision. Predictive analytics can prompt a more proactive retention strategy. By frequent examination of a customer’s past service usage, service performance, spending, and other behavior patterns, predictive models can determine the likelihood that a customer will terminate service some time in the near future. Generous offers can increase the chances of retaining the customer. Silent attrition, the behavior of a customer to slowly but steadily reduce usage, is another problem that many businesses face. Predictive analytics can also predict this behavior, so that the company can take action to increase customer activity.
*Direct marketing: Predictive analytics can identify the most effective combination of products, marketing material, communication channels, and timing that should be used to target a consumer. The goal of predictive analytics is to lower the cost per order or cost per action.
*Fraud detection: Predictive analytics can reduce a business’s exposure to fraud. Fraud includes inaccurate credit applications, fraudulent transactions (both offline and online), identity theft, and false insurance claims. Credit card issuers, insurance companies, retail merchants, manufacturers, business-to-business suppliers, and service providers are all potential victims of fraud. In the United States, the Internal Revenue Service uses predictive analytics to mine tax returns and identify tax fraud. Web fraud detection utilizes heuristics to study normal web user behavior and detect anomalies indicating fraud attempts.
*Portfolio, product, or economy-level prediction: Often the focus of analysis is not the consumer but a product, portfolio, firm, industry, or even the economy. For example, a retailer might want to predict store-level demand for inventory management purposes. Or the Federal Reserve Board might want to predict the unemployment rate for the next year. These types of problems can be solved with predictive analytics.
*Risk management: For example, the Capital asset pricing model (CAP-M) predicts the optimal portfolio to maximize return.
*Underwriting: Many businesses have to account for risk exposure and determine the cost to cover the risk. For example, auto insurance providers must accurately determine the premium to charge to cover each automobile and driver. A financial company must assess a borrower’s ability to pay before granting a loan. For a health insurance provider, predictive analytics can be used to analyze several years of past medical claims data, as well as lab, pharmacy, and other records, to predict how expensive an enrollee is likely to be in the future. Predictive analytics can help underwrite these quantities by predicting the chances of illness, default, bankruptcy, and so on. Predictive analytics in the form of credit scores has reduced the time it takes for loan approvals, especially in the mortgage market where lending decisions are now made in a matter of hours rather than days or even weeks.