Knowledge Discovery Nuggets 97:11

--
and Knowledge Discovery community, focusing on the latest research and
applications.
Submissions are most welcome and should be emailed, with a DESCRIPTIVE subject
line (and a URL) to gps. Submissions may be edited for
brevity.
To subscribe, see http://www.kdnuggets.com/subscribe.html
KD Nuggets frequency is 3-4 times a month.
Back issues of KD Nuggets, a catalog of data mining tools ('Siftware'), and a
wealth of other information on Data Mining and Knowledge Discovery are available
at Knowledge Discovery Mine site http://www.kdnuggets.com/
-- Gregory Piatetsky-Shapiro (editor)

********************* Official disclaimer ************************************
All opinions expressed herein are those of the contributors and not necessarily
of their respective employers or of KD Nuggets
*****************************************************************************

~~~~~~~~~~~~ Quotable Quote ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The first and simplest emotion which we discover in the human mind, is curiosity.
--Edmund Burke

Previous 1 Next Top

>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: 27 Mar 1997, 17:12:15
From: GPS (gps)
Subject: KDD-97 Tutorials

KDD-97 conference will have a day of excellent tutorials by leading
researchers-many thanks to P. Smyth for putting it together.
See http://www-aig.jpl.nasa.gov/kdd97-docs/kdd97.tutorials.html
for full details
================================================================
KDD97 Tutorial Abstracts and Speakers

Tutorial 1: Data Mining and KDD: An Overview

Usama Fayyad, Microsoft Research and
Evangelos Simoudis, IBM.

We present a basic tutorial of this new and emerging area and
emphasize relations to constituent communities including statistics,
databases, pattern recognition, learning, and visualization. The
tutorial provides a basic overview of the KDD process for extracting
knowledge from databases and covers the basics of each step in the
process including: data warehousing, selection and cleaning,
data transformation, data mining, evaluation, and visualization.
We also cover a sampling of successful applications and outline
challenges and issues to be addressed.

Tutorial 2: Modelling Data and Discovering Knowledge

David Hand, Open University, UK.

Our aim is to extract knowledge from large bodies of data. The size of
these bodies mean that we cannot do it unaided, but must use fast computers,
applying sophisticated statistical tools. Attempts to automate the process
of knowledge extraction date from at least the early 1980s, with the work on
statistical expert systems. We examine this work, noting its successes and
failures and, especially, what researchers in data mining and knowledge
discover can learn from those efforts. We examine what data are, what
information is, and what knowledge is. We contrast modelling with
discovery, especially in the context of large data sets. We examine high
level modelling issues, such as overfitting, generalisability,
overmodelling, and model evaluation. And we examine high level exploration
issues such as the discovery of accidental artefacts. The confluence of
computing and statistics in some areas provides a nice backdrop against
which to examine these issues, and we briefly discuss neural networks and
classification trees from these two perspectives.

Tutorial 3: Text Mining - Theory and Practice

Ronen Feldman, Bar-Ilan University, Israel.

Knowledge Discovery in Databases (KDD) focuses on the computerized
exploration of large amounts of data and on the discovery of interesting
patterns within them. While most work on KDD has been concerned with
structured databases, there has been little work on handling the huge
amount of information that is available only in unstructured textual form.
In this tutorial we will present the general theory of Text Mining and will
demonstrate several systems that use these principles to enable interactive
exploration of large textual collections. We will describe generic
techniques for text categorization and information extraction that are used
by these systems. The systems that will be presented are KDT which is
system for Knowledge Discovery in Texts, FACT, which discovers associations
amongst keywords labeling the items in a collection of textual documents,
and the Text Explorer which is a system that provides a high level language
for interactive exploration of textual collections.
We will present a general architecture for text mining and will outline the
algorithms and data structures behind the systems. We will give special
emphasis to incremental algorithms and to efficient data structures.

Tutorial 4: Exploratory Data Analysis using Interactive Dynamic Graphics

Deborah Swayne, Bell Communications Research
and Diane Cook, Iowa State University.

Researchers and software designers in the field of data mining
are just beginning to make extensive use of graphical methods.
Interactive dynamic data visualization has been explored
in the field of statistics for over twenty years, and we
propose that much of what has been learned in statistics is
relevant for data mining.
This class is an introduction to interactive data visualization as
it is practiced as part of exploratory data analysis. The XGobi
software, publicly available dynamic visualization software, will
be used in the analysis of examples from biology, business,
physics, engineering, and telecommunications.
The examples will illustrate a set of general visualization principles
which are embodied in specific methods such as brushing and
identification of points in simple scatterplots, three dimensional
rotations, rotations in higher dimensions such as the grand tour, and
directed searches in higher dimensions for interesting two dimensional
views using projection pursuit and manual control.

Tutorial 5: Visual Techniques for Exploring Databases

Daniel Keim, University of Munich.

For data exploration to be effective, it is important to include the human in
the exploration process and combine the flexibility, creativity, and general
knowledge of the human with the enormous storage capacity and the
computational power of today's computers. Visual database exploration aims
at integrating the human in the exploration process, applying its perceptual
abilities to the large data sets available in today's computer systems. The
basic idea of visual data exploration is to present the data in some visual
form, allowing the human to get insight into the data and draw conclusions.
Visual data exploration techniques have proven to be of high value in
exploratory data analysis and they also have a high potential for exploring
large databases. Visual database exploration is especially powerful for the
first steps of the data mining process, namely understanding the data and
generating hypotheses about the data, but it may also significantly
contribute to the actual knowledge discovery by guiding the search using
visual feedback.
The goal of the tutorial is to show the potential of visualization technology
for exploring large databases. The tutorial provides an overview of the
state-of-the-art in data visualization and provides a classification of the
existing data visualization techniques. Besides describing each of the
classes, the tutorial focuses on new developments in data visualization,
which are relevant to the area of knowledge discovery, and describes a wide
range of recently developed techniques for visualizing large amounts of
arbitrary multi-attribute data which does not have any two- or
three-dimensional semantics and therefore does not lend itself to an easy
display. A detailed comparison shows the strength and weaknesses of the
existing techniques and reveals potentials for further improvements. Several
examples demonstrate the benefits of visualization techniques for exploring
databases. The tutorial concludes with an overview of existing database
exploration and visualization systems, including research prototypes as well
as commercial products.

Tutorial 6: OLAP and Data Warehousing

Surajit Chaudhuri, Microsoft Research and
Umesh Dayal, Hewlett Packard Labs.

On-Line Analytical Processing (OLAP) and Data Warehousing technologies
enable enterprises to gain competitive advantage by exploiting the
ever-growing amount of data that is collected and stored in corporate
databases and files for better and faster decision making. Over the
past few years, these technologies have experienced explosive growth,
both in the number of products and services offered, and in the extent
of coverage in the trade press. Vendors (including all database companies)
are paying increasing attention to all aspects of decision support.
The area opens up interesting research directions, with ties to past
work in database systems, but with different assumptions and
requirements. Only very recently, however, has the database research
community started to understand and address some of these issues.
This tutorial presents an overview of OLAP and data warehousing, and an
in-depth study of selected aspects. An outline of the tutorial follows:
1. Introduction: definitions, evolution, differences from OLTP, architectures
2. Models and Tools: conceptual model for OLAP,
front-end tools (e.g., multidimensional spreadsheets),
database design (e.g., star and snowflake schema).
3. Database Server technologies for Decision Support
Queries: specialized indexing techniques,
specialized join and scan methods,
data partitioning and use of parallelism,
intelligent processing of aggregates,
complex query processing,
extensions to SQL,
ROLAP vs. MOLAP.
4. Other Services for OLAP/Data warehousing:
data cleaning, loading and refresh,
tools for warehouse, system and process management,
metadata management and the role of repository.
5. State of Commercial Practice.
6. Research Issues.
The target audience is
researchers and developers interested in learning about the concepts,
products and the technical innovations in the area of decision support
technologies.

Tutorial 7: Statistical Models for Categorical Response Data

William DuMouchel, AT&T Research.

This tutorial will survey the most common models and methods statisticians
use to fit and test relationships among categorical (discrete) data. Most
of these techniques are described in statistics texts such as
Categorical
Data Analysis , by Alan Agresti, (Wiley 1990) and are widely available in
popular computer packages such as SAS and Splus. Therefore it is almost de
rigeur for someone with a new classification technique to compare the
proposal to one or more of these standard methods. The tutorial will focus
on loglinear and logistic regression models, and related models such as
probit, poisson regression, and survival models. In the short time
available, priority will be given to explaining why these techniques are so
popular among statisticians, and to how the basic models have been extended
to handle variables having more than two categories or when some of the
variables have continuous or ordinal scales. Examples of model fitting,
model search and model comparison using SAS and Splus will be presented and
discussed.
For Biographical Information on Presenters
see the web site http://www-aig.jpl.nasa.gov/kdd97-docs/kdd97.tutorials.html Contact Information:
Padhraic Smyth
University of California, Irvine (KDD-97 Tutorials Chair).

Previous 2 Next Top

>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: Nascopie@aol.com
Date: Fri, 21 Mar 1997 20:04:25 -0500 (EST)

I am searching for KDD tools/approaches for searching through clinical data to
help develop and fine-tune medical imaging or detection equipment.
Specifically, early detection of skin malignancies.
Perhaps there is a group somewhere working on this.

Thank you.
Best wishes,
Jeff Wiegand

Previous 3 Next Top

>~~~Publications:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Wed, 19 Mar 1997 15:48:16 +0100
From: Paul.Vitanyi@cwi.nl

Ming Li and Paul Vitanyi,
AN INTRODUCTION TO KOLMOGOROV COMPLEXITY AND ITS APPLICATIONS,
REVISED AND EXPANDED SECOND EDITION, Springer-Verlag, New York, 1997,
xx+637 pp, 41 illus. Hardcover $49.95/ISBN 0-387-94868-6
(Graduate Texts in Computer Science Series)

After four years and two printings the second edition has now appeared. During
the preparation the book has been out of stock for a year. In interaction with
many readers and teachers of courses and seminars, all reported errors and
problems have been corrected. The book is revised and expanded by about
90 pages. The price has been *lowered* by over $9.
See the web page 'http://www.cwi.nl/~paulv/kolmogorov.html'.

>From the ``PREFACE TO THE SECOND EDITION'':

When this book was conceived ten years ago,
few scientists realized the width of scope and the
power for applicability of the central ideas. Partially
because of the enthusiastic reception of the first edition,
open problems have been solved and new applications have been
developed. We have added new material on the relation between
data compression and minimum description length induction,
computational learning, and universal prediction; circuit theory; distributed
algorithmics; instance complexity; CD compression;
computational complexity; Kolmogorov random graphs;
shortest encoding of routing tables in communication networks;
resource-bounded computable universal distributions; average case properties;
the equality of statistical entropy and expected Kolmogorov complexity;
and so on. Apart from being used by researchers and
as reference work, the book is now commonly used for graduate courses
and seminars. In recognition of this fact, the second
edition has been produced in textbook style. We have
preserved as much as possible the ordering of
the material as it was in the first edition.
The many exercises bunched together at the ends of
some chapters have been moved to the appropriate sections.
The comprehensive bibliography on Kolmogorov complexity
at the end of the book has been updated, as have
the ``History and References'' sections of the chapters.
Many readers were kind enough to express their appreciation
for the first edition and to send notification of typos, errors,
and comments. Their number is too large to thank them individually,
so we thank them all collectively.

BLURB:

Written by two experts in the field, this is the only
comprehensive and unified treatment of the
central ideas and their applications of Kolmogorov complexity---the
theory dealing with the quantity of information in individual objects.
Kolmogorov complexity is known variously as `algorithmic
information', `algorithmic entropy', `Kolmogorov-Chaitin
complexity', `descriptional complexity', `shortest program length',
`algorithmic randomness', and others.

The book is ideal for advanced undergraduate students, graduate students
and researchers in computer science, mathematics, cognitive sciences,
artificial intelligence, philosophy, statistics and physics.
The book is self contained in the sense that it contains the basic requirements
of computability theory, probability theory, information theory, and coding.
Included are also numerous problem sets, comments, source references and hints
to the solutions of problems, course outlines for classroom use, as well as a
great deal of new material not included in the first edition.

If you are seriously interested in using the text in the course,
contact Springer-Verlag's Editor for Computer Science, Martin
Gilchrist, for a complimentary copy.

Martin Gilchrist marting@springer-sc.com
Suite 200, 3600 Pruneridge Ave. (408) 249-9314
Santa Clara, CA 95051

If you are interested in the text but won't be teaching a course,
we understand that Springer-Verlag sells the book, too.
To order, call toll-free 1-800-SPRINGER (1-800-777-4643); N.J.
residents call 201-348-4033. For information regarding
examination copies for course adoptions, write Springer-Verlag
New York, Inc. , 175 Fifth Avenue, New York,NY 10010.
You can order through the Web site: 'http://www.springer-ny.com/'

For U.S.A./Canada/Mexico- e-mail: orders@springer-ny.com or fax an
order form to: 201-348-4505.
For orders outside U.S.A./Canada/Mexico send this form to: orders@springer.de
Or call toll free: 800-SPRINGER - 8:30 am to 5:30 pm ET (that's 777-4643 and
201-348-4033 in NJ). Write to Springer-Verlag New York, Inc., 175 Fifth Avenue,
New York, NY, 10010.

Visit your local scientific bookstore. Mail payments may be made by check,
purchase order, or credit card (see note below). Prices are payable in U.S.
currency or its equivalent and are subject to change without notice. Remember,
your 30-day return privilege is always guaranteed!

Your complete address is necessary to fulfill your order.

Previous 4 Next Top

>~~~Publications:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: Randall Caldwell (72672.261@CompuServe.COM)
Subject: CFP: Improving Generalization for Nonlinear Financial
Forecasting Models

Journal of Computational Intelligence in Finance
Call for Papers
Special Issue and Competition on
'Improving Generalization for Nonlinear Financial Forecasting Models'

The Journal of Computational Intelligence in Finance, a peer-reviewed
technical journal, published by Finance & Technology Publishing, is
seeking papers for review and publication in 1997 on 'Improving
Generalization for Nonlinear Financial Forecasting Models'. For
comparison of methods submitted, the target variable series and
performance metrics are specified (though not required).

PUBLICATION DATE

November 1997

PAPER SUBMISSION DEADLINE

June 30, 1997

MOTIVATION

The critical issue in applying neural networks and other data-driven
forecasting systems is generalization, the performance on data not used
for training. The key to generalization behavior is model complexity.
Too simple a model cannot approximate the true relationship, and overly
complex models adjust to the noise in the data. Nearly all financial
applications of nonparametric models (such as neural networks and genetic
algorithms) vary model complexity by adjusting the number of parameters.
This special issue intends to highlight other methods to improve
generalization, in particular regularization (e.g., neural network
weight decay and smoothing) and techniques for combining models. Of
particular interest are nonlinear methods including neural networks,
genetic algorithms, nearest neighbor networks, polynomial networks,
fuzzy logic, and hybrids.

Nearly all studies apply cross-validation to select the best model.
Alternatives to cross-validation include 'analytical' selection rules
such as Akaike's Information Criterion, Schwartz's Information Criterion,
and a number of others. Of particular interest are the statistical
properties (i.e., bias and variance) of model selection methods in
estimating out-of-sample performance.

DATA, TARGET VARIABLES and PERFORMANCE METRICS

Data: daily prices of a financial time series (see below)
Target Variable: the relative difference in percent (RDP) between
today's closing price and the price five (5) days ahead
Performance Metrics: MSE (target). nRMSE and DS (to be used in the
analysis).

Participants are encouraged to use the forecast data, target variable and
performance metrics specified for this special issue, which are available
on the Web to those who submit a satisfactory abstract (including brief
biography) as outlined below. Participants are not be restricted regarding
the data used as inputs to their predictors. Especially interesting
original methods using other forecast data, target variables and
performance metrics will also be considered.

The forecast series is derived from daily closing prices for a financial
time series. The target variable is the relative difference in
percent (RDP) between today's closing price and the closing price
five (5) days ahead. The date, the underlying price series and the
target variable series are all provided in the downloadable data file.
The target metric is the MSE. Also, authors' analysis should include
the normalized RMSE (RMSE normalized using the standard deviation of
actual RDP values), and Directional Symmetry (percentage of correctly
predicted directions with respect to the target variable).

The forecast data provided is separated into in-sample (10 years of
daily data) and out-of-sample (2 years of daily data) sets. Participants
are not restricted regarding the data used as input to their predictors.
However, all data used should be disclosed in the paper presentaton,
including the details of all techniques and formulas used to pre-process
the data. Details on the predictor and the methods used for improving
generalization should be presented in the paper.

FORECAST HORIZON AND RE-TRAINING

Participants should test performance of their predictors over the entire
two-year out-of-sample dataset. Of interest are results of analyses and
performance of predictors over the entire two-year prediction period:

(1) without re-training and
(2) with re-training (optional).

The results from (1) and (2) can be useful for estimating the limits
of the forecasting horizon for the prediction methods presented.

For additional details on the forecast data, target variable and
performance metrics, see:

http://ourworld.compuserve.com/homepages/ftpub/call.htm

Previous 5 Next Top

>~~~Positions:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Fri, 21 Mar 1997 11:07:08 -0500
From: Vaughn Petraglia (vaughn@think.com)
Subject: Thinking Machines, Consultant Positions

Thinking Machines Professional Services
Senior Consultant Data Mining
San Francisco bay area and other locations
3/12/97

As a member of the new Thinking Machines Professional Services
Organization, you will be responsible for all aspects of bidding and
delivering consulting products and service to many of our most important
customers. You will lead or participate in small teams of seasoned
professionals to help our customers use Darwin to find new business
opportunities hidden in their very large databases and data warehouses.

Major job functions include:

1. Working with a TMC Account Executive to understand the customer or
prospects requirements, you will provide technical guidance through
the sales cycle.
2. Develop a project plans, risk analysis, and formal services bids.
3. Organizing and managing all resources needed to complete the project
within budget and on time.
4. Providing hands on data analysis and data mining consulting.
5. Consulting and skills transfer on the Darwin product.
6. Follow-up to insure customer satisfaction.

The ideal candidate will have:

1. Project management experience.
2. Excellent written and oral communications skills.
3. Advanced degree in an analytical field or equivalent experience.
4. Experience in data analysis, database systems, knowledge based systems
or data mining.
5. Experience in parallel algorithms and parallel computer systems is
desirable.

Contact: Vaughn Petraglia
vaughn@think.com

Thinking Machines
14 Crosby Dr.
Bedford, Ma 01730

Previous 6 Next Top

>~~~Positions:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Fri, 21 Mar 1997 18:45:32 +0000
From: Marco Ramoni (M.Ramoni@open.ac.uk)
Subject: Research Studentships at the Knowledge Media Institute

The Knowledge Media Institute (KMi) is home to internationally recognised
researchers in Educational Multimedia, Collaboration Technologies,
Artificial Intelligence, Cognitive Science, and Human-Computer Interaction.
KMi offers students an intellectually challenging environment with
exceptional research and computer facilities. We are currently seeking
applications for full-time, 3-year research studentships in the following
areas:

- Migratory Interfaces and Mobile Computing
- Virtual Intelligence and Knowledge Discovery
- Knowledge Management and Knowledge Modelling
- Sharing and Reusing Design Knowledge over the WWW

Applicants are typically expected to have a degree in computer science,
artificial intelligence, cognitive science, psychology, or a related
discipline. As KMi only accepts a very small number of research students
per year, admission is highly competitive. To apply, send a CV and short
project proposal (3 pages) along with a completed application form.
Successful candidates must be willing to live within reasonable commuting
distance from Milton Keynes, and be available to start on October 1, 1997.

Applicants are strongly encouraged to visit the KMi web site
http://kmi.open.ac.uk/studentships for more information on ongoing KMi
projects and the studentships.

An application form with further particulars can be obtained by contacting
Ms. Ortenz Rose by email (O.Rose@open.ac.uk), telephone (+44 (1908) 653
800) or post (Knowledge Media Institute, The Open University, Walton Hall,
Milton Keynes, MK7 6AA, UK). Informal advice on these studentships can be
obtained by contacting Dr. Tamara Sumner, admissions co-ordinator, by email
at T.Sumner@open.ac.uk or by telephone at the number above.

Closing date for applications: 18 April 1997

Further particulars are attached below.

Virtual Intelligence and Knowledge Discovery

Marco Ramoni (KMi)
http://kmi.open.ac.uk/~marco

The Virtual Intelligence Project and the Knowledge Discovery Project at the
Knowledge Media Institute seek a candidate PhD student to work at the
intersection of their areas of research. The Virtual Intelligence Project
focuses on the development of distributed Artificial Intelligence
applications over the World Wide Web. The Knowledge Discovery Project
investigates probabilistic and statistical methods to extract reusable
knowledge sources from databases. The PhD project will fall into their
joint effort to develop a distributed knowledge discovery architecture over
the World Wide Web. The successful candidate will be able to choose a
research topic among a variety of key issues underlying this research,
ranging from methodological aspects of knowledge extraction and distributed
artificial intelligence to design and development issues of the
architecture.

More information on the Virtual Intelligence Project is available at:
http://kmi.open.ac.uk/~marco/projects/wai/vip

More information on the Knowledge Discovery Project is available at:
http://kmi.open.ac.uk/~marco/projects/kdd

For more information on this studentship, contact Marco Ramoni at
M.Ramoni@open.ac.uk.

Previous 7 Next Top

>~~~Meetings:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Mon, 17 Mar 1997 15:21:43 +0100 (MET)
From: Gerhard Widmer (gerhard@ai.univie.ac.at)
Subject: ECML-97 Preliminary Programme

9th EUROPEAN CONFERENCE ON MACHINE LEARNING (ECML-97)
23-25 April 1997, Prague, Czech Republic
PRELIMINARY PROGRAMME

Up-to-date information on the conference (including registration information)
can be found at
http://is.vse.cz/ecml97/home.html
This programme with complete abstracts of all talks and links to the
workshops is also available at
http://www.ai.univie.ac.at/ecml/programme.html
-----------------------------------------------------------------------------

--------------------
WEDNESDAY, APRIL 23:

9.00 - 9.30 Welcome

9.30 - 10.30 INVITED TALK:
Uncertain Learning Agents
Stuart Russell, University of California, Berkeley, USA

10.30 - 11.00 Coffee Break

11.00 - 10.30 Integrated Learning and Planning Based on
Truncating Temporal Differences
Pawel Cichosz

11.30 - 12.00 Finite-Element Methods with Local Triangulation Refinement
for Continuous Reinforcement Learning Problems
Remi Munos

12.00 - 12.15 Learning and Exploitation Do Not Conflict
Under Minimax Optimality
Csaba Szepesvari

12.15 - 12.30 Exploiting Qualitative Knowledge to Enhance Skill Acquisition
Cristina Baroglio

12.30 - 14.00 Lunch

14.00 - 15.00 INVITED TALK:
Constructing and Sharing Perceptual Distinctions
Luc Steels, Free University of Brussels (VUB) and
Sony Computer Science Laboratory, Paris
15.00 - 15.30 Ibots Learn Genuine Team Solutions
Cristina Versino, Luca Maria Gambardella

15.30 - 16.00 Coffee Break

16.00 - 16.30 NeuroLinear: A System for Extracting Oblique Decision Rules
from Neural Networks
Rudy Setiono, Huan Liu

16.30 - 17.00 Learning Different Types of New Attributes by Combining the
Neural Network and Iterative Attribute Construction
Yuh-Jyh Hu

17.00 - 17.45 Commenting Session

-------------------
THURSDAY, APRIL 24:

9.00 - 10.00 INVITED TALK:
On Prediction by Data Compression
Paul Vitanyi, CWI, Amsterdam

10.00 - 10.30 Conditions for Occam's Razor Applicability and
Noise Elimination
Dragan Gamberger, Nada Lavrac

10.30 - 11.00 Coffee Break

11.00 - 11.30 Compression-Based Pruning of Decision Lists
Bernhard Pfahringer

11.30 - 11.45 Inductive Genetic Programming with Decision Trees
Nikolay I. Nikolaev, Vanio Slavov

11.45 - 12.00 Probabilistic Incremental Program Evolution:
Stochastic Search Through Program Space
Rafal Salustowicz, Juergen Schmidhuber

12.00 - 12.30 Constructing Intermediate Concepts by Decomposition
of Real Functions
Janez Demsar, Blaz Zupan, Marko Bohanec, Ivan Bratko

12.30 - 14.00 Lunch

14.00 - 14.30 Global Data Analysis and the Fragmentation Problem in
Decision Tree Induction
Ricardo Vilalta, Gunnar Blix, Larry Rendell

14.30 - 15.00 Model Combination in the Multiple-Data-Batches Scenario
Kai Ming Ting, Boon Toh Low

15.00 - 15.30 Commenting Session

15.30 - 16.00 Coffee Break

16.00 - 17.00 Poster Session

17.00 - open ECML Community Meeting

-----------------
FRIDAY, APRIL 25:

9.00 - 9.15 A Case Study in Loyalty and Satisfaction Research
Koen Vanhoof, Josee Bloemer, Koen Pauwels

9.15 - 9.30 Inducing and Using Decision Rules in the
GRG Knowledge Discovery System
Ning Shan, Howard J. Hamilton, Nick Cercone

9.30 - 9.45 Learning When Negative Examples Abound
Miroslav Kubat, Robert Holte, Stan Matwin

9.45 - 10.00 Search-Based Class Discretization
Luis Torgo, Joao Gama

10.00 - 10.15 Classification by Voting Feature Intervals
G'ulsen Demir'oz, H. Altay G'uvenir

10.15 - 10.30 A Model for Generalization Based on Confirmatory Induction
Nicolas Lachiche, Pierre Marquis

10.30 - 11.00 Coffee Break

11.00 - 11.30 Natural Ideal Operators in Inductive Logic Programming
Fabien Torre, Celine Rouveirol

11.30 - 12.00 Theta-subsumption for Structural Matching
Luc De Raedt, Peter Idestam-Almquist, Gunther Sablon

12.00 - 12.30 Induction of Feature Terms with INDIE
Eva Armengol, Enric Plaza

12.30 - 12.45 Metrics on Terms and Clauses
Alan Hutchinson

12.45 - 13.00 Learning Linear Constraints in Inductive Logic Programming
Lionel Martin, Christel Vrain

Afternoon off - trip and farewell party (optional; see social programme)

------------------
SATURDAY, APRIL 26:

ECML/MLNet WORKSHOPS:
WS 1: Data-Driven Learning of Natural Language Processing Tasks
WS 2: Case-Based Learning: Beyond Classification of Feature Vectors WS 3: Learning in Dynamically Changing Domains:
Theory Revision and Context Dependence Issues
WS 4: Machine Learning and Human-Agent Interaction

Previous 8 Next Top

>~~~Meetings:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: Jiawei Han (han@cs.sfu.ca)
Date: Tue, 18 Mar 1997 22:05:37 -0800 (PST)
Subject: SIGMOD'97 Data Mining Workshop: Call for Participation

Workshop on Research Issues on Data Mining and Knowledge Discovery (DMKD'97)
in cooperation with ACM-SIGMOD'97
Tucson, Arizona, May 11, 1997
(URL: http://fas.sfu.ca/cs/conf/dmkd97.html
PROGRAM
The workshop will be held one day before the SIGMOD/PODS'97 conference.
The program is as follows:
8:30--8:35 Opening Remarks
8:35--9:30 Invited Talk
9:30--9:45 Coffee Break
9:45--11:00 Session I Clustering/Classification

A Fast Clustering Algorithm to Cluster Very Large Categorical
Data Sets in Data Mining
Zhexue Huang

Clustering Based On Association Rule Hypergraphs
Eui-Hong Han, George Karypis, Vipin Kumar and Bamshad Mobasher

Ontology-based Induction of High Level Classification Rules
Merwyn G. Taylor, Kilian Stoffel and James A. Hendler

11:00--11:15 Coffee Break
11:15--12:30 Session II Applications

An efficient domain-independent algorithm for detecting
approximately duplicate database records
Alvaro E. Monge and Charles P. Elkan

An Application of Adaptive Data Mining: Facilitating
Web Information Access
Parvathi Chundi and Umeshwar Dayal

Efficient Roll-Up and Drill-Down Analysis for Large Data Sets
Min Wang and Bala Iyer

12:30--14:15 Lunch, Posters, Demos
14:15--15:30 Session III Association Rules

Mining Association Patterns from Nested Databases
Ke Wang

Maintenance of Discovered Association Rules: When to update?
S.D. Lee and David W. Cheung

Efficient Algorithms for Discovering Frequent Sets in
Incremental Databases
Ronen Feldman, Yonatan Aumann, Amihood Amir and Heikki Mannila

15:30--15:45 Coffee Break
15:45--17:00 Session IV Miscellany

Sharing Processing in Data Mining Systems
Arun Swami and Brian Lent

A Pattern Discovery Algebra
Alexander Tuzhilin

On the Complexity of Mining Temporal Trends
Jef Wijsen and Robert Meersman

17:00-18:00 Summary Discussion

Previous 9 Next Top

>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Thu, 20 Mar 1997 09:29:50 -0600
From: Werner Wothke (wewo@smallwaters.com)
Subject: Chicago ASA Data Mining Conference, May 2, 1997
The Chicago Chapter of the American Statistical Association is
presenting a Data Mining conference on May 2, titled
A Hard Look at Data Mining
The idea of the conference is to peel away most of the hype and present
the local statistical and data analysis community with some solid
technical and statistical information. A web site with additional
information can be found at
http://www.smallwaters.com/datamine

With beste wishes,

Werner Wothke

Previous 10 Next Top

>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Thu, 20 Mar 1997 17:48:34 -0500
From: Gregory Piatetsky-Shapiro (gps)
Subject: Paris Data Mining'97 Event, June 2-4

See http://www.datamining.org/events.htm for full information

Data Mining'97 : Increasing Corporate Performance

Meridien Montparnasse Hotel, Paris, June 2-4, 1997

THE DATA MINING MARKET : TRENDS AND EVOLUTION

Market and players

Perspectives and trends : Data Mining in 2000 and beyond

Mining the Net : maximizing external data retrieval and analysis

Data Mining and the law : situation and perspectives

INTRODUCTION TO DATA MINING

More than a media phenomenon, what are the real issues for data mining ?

Corporate data bases : retrieval and output

The latest technologies

Technology-human interface

DATA MINING BEST PRACTICE

Data warehousing, On Line Analytical Processing and data mining

Data and their representation for data mining

Optimizing access to stored information

Utilizing data mining to further management strategies

Using data mining to measure corporate performance through data mining

DATA MINING APPLICATIONS

Direct marketing and data mining : customer satisfaction and retention

Geomarketing and data mining

Marketing strategy and data mining : optimizing a commercial network

Finance and data mining : credit management and risk assessment

Adapting to changing markets through implementing data mining processes in all fields of business

A unique opportunity to meet your potential customers and peers and hear the latest from the competition !

This forum will be a premier opportunity to network & exchange business cards with CEOs, VPs, and managers of :

Finance

Marketing

Sales

Strategic Planning

Information Systems

Advertising above and below the line

In the fields of :

Financial services

Insurance

Mail order companies

Retail

Healthcare

Computing, Telecommunications

Government

Transport and logistics

This Conference will be a premiere in Europe. Come join us in Paris!

For further information and registration, please contact us at info@datamining.org

Previous 11 Next Top

>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: manther@worldnet.att.net
Date: Sun, 23 Mar 1997 17:05:26 -0800
KNOWLEDGE ACCELERATION
The 1997 XpertUser Conference
2 - 5 November 1997
Boston, Massachusetts
http://www.XpertUser.com

In support of its XpertRule(r) and Profiler(tm) products, Attar Software
announces its 1997 XpertUser Conference entitled: 'Knowledge
Acceleration.' The Conference, to be held in Boston, MA, 2 - 5 November 1997, features a keynote
address by Professor Donald Michie, a pioneer in the field of Machine
Intelligence. In addition, there are planned tutorials on data mining
and knowledge engineering as well as application demonstrations, and
technical sessions with Dr. Akeel Al-Attar, and other experts from Attar's
world-wide customer base. The Conference web page is at http://www.XpertUser.com.
The registration fee is $695 until 1 July when it iincreases to $895.

Previous 12 Next Top

>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~