Introduction to Data Mining Algorithms

Data mining is an increasingly important branch of computer science that examines data in order to find and describe patterns. Because we live in a world where we can be overwhelmed with information, it is imperative that we find ways to classify this input, to find the information we need, to illuminate structures, and to be able to draw conclusions. Data mining is a very practical discipline with many applications in business, science, and government, such as targeted marketing, web analysis, disease diagnosis and outcome prediction, weather forecasting, credit risk and loan approval, customer relationship modeling, fraud detection, and terrorism threat detection. It is based on methods several fields, but mainly machine learning, statistics, databases, and information visualization.

A course in algorithms presents an opportunity to expose students to some of the fundamentals of data mining, in the form of decision trees. These trees are a basic structure for representing data. Decision tree induction algorithms are used to classify data, perhaps the most common data mining task.

This module presents an explanation of decision trees, suitable for presentation in an upper level algorithms course. The module also gives references and examples.

For notes, see Data Mining Course Notes, modules 6 and 7, which discuss the decision trees.