--
Discovery community, focusing on the latest research and applications.
Contributions are most welcome and should be emailed,
with a DESCRIPTIVE subject line (and a URL, when available) to (kdd@gte.com).
E-mail add/delete requests to (kdd-request@gte.com).
Nuggets frequency is approximately weekly.
Back issues of Nuggets, a catalog of S*i*ftware (data mining tools),
and a wealth of other information on Data Mining and Knowledge Discovery
is available at Knowledge Discovery Mine site, URL http://info.gte.com/~kdd.
-- Gregory Piatetsky-Shapiro (moderator)
********************* Official disclaimer ***********************************
* All opinions expressed herein are those of the writers (or the moderator) *
* and not necessarily of their respective employers (or GTE Laboratories) *
*****************************************************************************
~~~~~~~~~~~~ Quotable Quote ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The advancement and diffusion of knowledge is the only guardian of
true liberty.....
James Madison, 4th U.S. President
(thanks to Kris Koperski, (koperski@cs.sfu.ca))
Previous1NextTop
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: 'Christoph Schommer' (schommer@dbis.informatik.uni-frankfurt.de)
Date: Fri, 6 Sep 1996 11:41:17 +0200
Subject: Data Mining Lab
Dear Dr. Piatetsky-Shapiro,
this is to inform you that the Dept. of. Computer
Science - Databases and Information System -
(Roberto Zicari) of the Goethe-University of Frankfurt/Main
has etablished a
'Laboratory for Data Mining'.
The lab has the following purposes:
The Laboratory for Data Mining aims at bringing together
research and industry and offers the following services
to industries and institutions:
o A location where advanced software and/or beta releases of
products or advanced appplications can be installed and tested.
o Joint projects with industry, ranging from joint master theses
up to more involved projects.
o Offering an open forum for presentations from industry
Currently we have two cooperations with industry but we
attempt to get more contacts and partners.
Geoff Webb's paper, 'Further Experimental Evidence against the Utility
of Occam's Razor', is available on-line, in the Journal of Artificial
Intelligence Research:
The above URL will give you the abstract of the paper and links to the
PostScript file. With JAIR's new 'Ask the Author' facility, it is
possible to post anonymous questions to JAIR authors, about their
papers.
The following reports are available on the WWW. Could you please post the
notice on your mailing list?
Thanks in advance
Marco Ramoni
--------------------------------------------------------------------------
Title: Robust Learning with Missing Data
Authors: Marco Ramoni [1] and Paola Sebastiani [2]
1.Knowledge Media Institute, The Open University.
2.Department of Actuarial Science and Statistics, City University.
Bayesian methods are becoming increasingly popular in the development
of intelligent machines. Bayesian Belief Networks (BBNs) are nowaday
a prominent reasoning method and, during the past few years, several
efforts have been addressed to develop methods able to learn BBNs
directly from databases. However, all these methods assume that the
database is complete or, at least, that unreported data are missing at
random. Unfortunately, real-world databases are rarely complete and
the 'Missing at Random' assumption is often unrealistic. This paper
shows that this assumption can dramatically affect the reliability of
the learned BBN and introduces a robust method to learn conditional
probabilities in a BBN, which does not rely on this assumption. In
order to drop this assumption, we have to change the overall learning
strategy used by traditional Bayesian methods: our method bounds the
set of all posterior probabilities consistent with the database and
proceed by refining this set as more information becomes available. An
experimental comparison - using both an artificial example and a real
medical application - of our method with a powerful stochastic
simulator will show a dramatic gain in robustness and the
computational advantages of our deterministic method.
--------------------------------------------------------------------------
Title: Robust Parameter Learning in Bayesian Networks with Missing Data
Authors: Marco Ramoni [1] and Paola Sebastiani [2]
1.Knowledge Media Institute, The Open University.
2.Department of Actuarial Science and Statistics, City University.
Keywords: Bayesian Belief Networks; Robustness; Machine Learning; Dirichlet
Distribution; Missing Data; Monte Carlo Methods.
TR number: KMI-TR-29
Date: July, 1996
Abstract:
Bayesian belief Networks (BBNs) are a powerful formalism for knowledge
representation and reasoning under uncertainty. During the past few
years, Artificial Intelligence met Statistics in the quest to develop
effective methods to learn BBNs directly from real-world databases.
Unfortunately, real-world databases include missing and/or unreported
data whose presence challenges traditional learning techniques, from
both the theoretical and computational point of view. This paper
outlines a new method to learn the probabilities defining a BBNs from
incomplete databases. The basic assumption of this method is that the
BBN generated by the learning process should enable the problem solver
to reason and make decisions on the basis of the currently available
information. This assumption requires the learning method to return
results whose precision is a monotonic increasing function of the
available information. The intuition behind our method is close to the
robust sensitivity analysis interpretation of probability: the method
computes the convex set of possible distributions defined by the
available information and proceeds by refining this set as more
information becomes available. Finally, experimental results will be
presented comparing this approach to a popular Monte Carlo method.
--
Marco Ramoni
Knowledge Media Institute Phone: +44-1908-65-5721
The Open University Fax: +44-1908-65-3169
Walton Hall Email: M.Ramoni@open.ac.uk
Milton Keynes MK7 6AA URL: http://kmi.open.ac.uk/~marco
UNITED KINGDOM CUSeeMe: 137.108.81.18
Decision Tree and Rule Induction method have been adopted by data mining
applications among many other algorithms.
Many customers ask us what is the difference between Rule Induction and
Decision Tree and how to select the method for different applications in
order to make the best of each approach.
Therefore, Information Discovery, Inc. worked with several industry analysts
and produced a report,'Rule is better than Decision Tree' in some area of
data mining critical applications. If any reader like to receive a copy of
this research report, please forward their requst with company name, address
and telephone number to
datamine@ix.netcom.com.
The report is free.
Diana Lin
Information Discovery, Inc.
Tel: 310-937-3600
Previous5NextTop
>~~~Siftware:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: Wan Gong (wgong@cs.sfu.ca)
Date: Fri, 6 Sep 1996 11:37:07 -0700 (PDT)
Subject: Siftware entry for DBMiner
Siftware: DBMiner
*URL: http://db.cs.sfu.ca/DBMiner
*Description:
DBMiner, has been developed for interactive mining of multiple-level knowledge in large relational dtabases. It is based on our studies of data mining techniques and our experience in the development of an early system prototype, DBLearn. The system implements a wide spectrum of data mining functions, including generalization, characterization, association, classification, and prediction. By incorporation of several interesting data mining techniques, including attribute-oriented induction, statistical analysis, progressive deepening for mining multiple-level knowledge, and meta-rule guided mining, the system provides a user-friendly, interactive data mining environment with good performance.
*Discovery tasks: Classification, Summarization, Dependency analysis, Visualization, Prediction, Class Comparison
*Comments:
The system has the following distinct features:
It incorporates several interesting data mining techniques, including attribute-oriented induction, progressive deepening for mining
multiple-level rules and meta-rule guided knowledge mining, etc., and implements
a wide spectrum of data mining functions including generalization, characterization, association, classification, and prediction.
It performs interactive data mining at multiple concept levels on any user-specified set of data in a database using an SQL-like Data Mining Query Language, DMQL, or a graphical user interface. Users may interactively set and adjust various thresholds, control a data mining process, perform roll-up or drill-down at multiple concept levels, and generate different forms of outputs, including generalized relations, generalized feature tables, multiple forms of generalized rules, visual presentation of rules, charts, curves, etc.
Efficient implementation techniques have been explored using different data structures, including generalized relations and multiple-dimensional data cubes, and being integrated with relational database techniques. The data mining process may utilize user- or expert-defined set-grouping or schema-level concept hierarchies which can be specified flexibly, adjusted dynamically based on data distribution, and generated automatically for numerical attributes.
Both UNIX and PC (Windows/NT) versions of the system adopt a client/server architecture. The latter communicates with various commercial database systems for data mining using the ODBC technology.
*Platform(s): Windows (95, NT), Unix
*Contact:
Jiawei Han
School of Computing Science
Simon Fraser University
Burnaby, B.C
Canada V5A 1S6
tel: (604)291-4411
fax: (604)291-3045
email: han@cs.sfu.ca