Focus on large data sets and databases Data mining can answer questions that cannot be addressed through simple query and reporting techniques.

Automatic Discovery Data mining is accomplished by building models. A model uses an algorithm to act on a set of data. The notion of automatic discovery refers to the execution of data mining models. Data mining models can be used to mine the data on which they are built, but most types of models are generalizable to new data. The process of applying a model to new data is known as scoring. For example, a model might predict income based on education and other demographic factors.

Predictions have an associated probability How likely is this prediction to be true? Prediction probabilities are also known as confidence How confident can I be of this prediction?

Data Mining in Today’s World

Some forms of predictive data mining generate rules, which are conditions that imply a given outcome. For example, a rule might specify that a person who has a bachelor’s degree and lives in a certain neighborhood is likely to have an income greater than the regional average.

Rules have an associated support What percentage of the population satisfies the rule? Grouping Other forms of data mining identify natural groupings in the data. For example, a model might identify the segment of the population that has an income within a specified range, that has a good driving record, and that leases a new car on a yearly basis.

Actionable Information Data mining can derive actionable information from large volumes of data. For example, a town planner might use a model that predicts income based on demographics to develop a plan for low-income housing. A car leasing agency might a use model that identifies customer segments to design a promotion targeting high-value customers. A general introduction to algorithms is provided in “Data Mining Algorithms”.