Data mining is an increasingly important branch of computer science
that examines data in order to find and describe patterns. Because we
live in a world where we can be overwhelmed with information, it is
imperative that we find ways to classify this input, to find the
information we need, to illuminate structures, and to be able to draw
conclusions. Data mining is a very practical discipline with many
applications in business, science, and government, such as targeted
marketing, web analysis, disease diagnosis and outcome prediction,
weather forecasting, credit risk and loan approval, customer
relationship modeling, fraud detection, and terrorism threat
detection. It is based on methods several fields, but mainly machine
learning, statistics, databases, and information visualization.
A course in algorithms presents an opportunity to expose students
to some of the fundamentals of data mining, in the form of decision
trees. These trees are a basic structure for representing data.
Decision tree induction algorithms are used to classify data, perhaps
the most common data mining task.
This module presents an explanation of decision trees, suitable for presentation in an upper level algorithms course. The module also gives references and examples.
For notes, see
Data Mining Course Notes, modules 6 and 7, which discuss the decision trees.