KDD Nuggets Index


To KD Mine: main site for Data Mining and Knowledge Discovery.
To subscribe to KDD Nuggets, email to kdd-request
Past Issues: 1996 Nuggets, 1995 Nuggets, 1994 Nuggets, 1993 Nuggets


Data Mining and Knowledge Discovery Nuggets 96:10, e-mailed 96-03-24

Contents:
News:
* GPS, What's New at KD Mine
Publications:
* Minton, JAIR article by Quinlan,
http://www.cs.washington.edu/research/jair/abstracts/quinlan96a.html
Siftware:
* D. Zighed, New version of SIPINA-W v1.2
Positions:
* H. Mannila, Position in Intelligent Systems, University of Helsinki
* I. Mclaren, PhD studentship in Logic-based data mining techniques at
U. of Luton, UK
Meetings:
* B. Zupan, IDAMAP-96 (ECAI'96 workshop) submissions due April 2,
http://www-ai.ijs.si/ailab/activities/idamap96.html
* Bramer, UK: Expert Systems 96: Call for Papers (version 2: 23/03/96)
http://www.sis.port.ac.uk/sges/es96.html
--
Data Mining and Knowledge Discovery community,
focusing on the latest research and applications.

Contributions are most welcome and should be emailed,
with a DESCRIPTIVE subject line (and a URL, when available) to (kdd@gte.com).
E-mail add/delete requests to (kdd-request@gte.com).

Nuggets frequency is approximately weekly.
Back issues of Nuggets, a catalog of S*i*ftware (data mining tools),
and a wealth of other information on Data Mining and Knowledge Discovery
is available at Knowledge Discovery Mine site, URL http://info.gte.com/~kdd.

-- Gregory Piatetsky-Shapiro (moderator)

********************* Official disclaimer ***********************************
* All opinions expressed herein are those of the writers (or the moderator) *
* and not necessarily of their respective employers (or GTE Laboratories) *
*****************************************************************************

~~~~~~~~~~~~ Quotable Quote ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
'
(just a single quote)

Previous  1 Next   Top
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Return-Path:
Date: Sun, 24 Mar 1996 14:42:52 -0500
From: gps0 (Gregory Piatetsky-Shapiro)
To: kdd
Subject: What's New at KD Mine info.gte.com/~kdd/what-is-new
Mar 24, 1996


Mar 7, 1996



Previous  2 Next   Top
>~~~Publications:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: minton@ISI.EDU
Date: Wed, 20 Mar 96 15:23:56 PST
Subject: JAIR article by Quinlan


Readers of this mailing list may be interested in the following article
recently published by JAIR.

Quinlan, J.R. (1996)
'Improved Use of Continuous Attributes in C4.5',
Volume 4, pages 77-90.

Available in Postscript (414K) and compressed Postscript (124K).
For quick access via your WWW browser, use this URL:
http://www.cs.washington.edu/research/jair/abstracts/quinlan96a.html
More detailed instructions are below.

Abstract: A reported weakness of C4.5 in domains with continuous
attributes is addressed by modifying the formation and evaluation of
tests on continuous attributes. An MDL-inspired penalty is applied to
such tests, eliminating some of them from consideration and altering
the relative desirability of all tests. Empirical trials show that
the modifications lead to smaller decision trees with higher
predictive accuracies. Results also confirm that a new version of
C4.5 incorporating these changes is superior to recent approaches that
use global discretization and that construct small trees with
multi-interval splits.

The article is available via:

-- comp.ai.jair.papers (also see comp.ai.jair.announce)

-- World Wide Web: The URL for our World Wide Web server is
http://www.cs.washington.edu/research/jair/home.html
For direct access to this article and related files try:
http://www.cs.washington.edu/research/jair/abstracts/quinlan96a.html

-- Anonymous FTP from either of the two sites below.

Carnegie-Mellon University (USA):
ftp://p.gp.cs.cmu.edu/usr/jair/pub/volume4/quinlan96a.ps
The University of Genoa (Italy):
ftp://ftp.mrg.dist.unige.it/pub/jair/pub/volume4/quinlan96a.ps

The compressed PostScript file is named quinlan96a.ps.Z (124K)

-- automated email. Send mail to jair@cs.cmu.edu or jair@ftp.mrg.dist.unige.it
with the subject AUTORESPOND and our automailer will respond. To
get the Postscript file, use the message body GET volume4/quinlan96a.ps
(Note: Your mailer might find this file too large to handle.)
Only one can file be requested in each message.

-- JAIR Gopher server: At p.gp.cs.cmu.edu, port 70.

For more information about JAIR, visit our WWW or FTP sites, or
send electronic mail to jair@cs.cmu.edu with the subject AUTORESPOND
and the message body HELP, or contact jair-ed@ptolemy.arc.nasa.gov.


Previous  3 Next   Top
>~~~Siftware:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Thu, 14 Mar 1996 10:36:54 GMT
From: zighed@univ-lyon2.fr

Dear Gregory,

As you know, the new version (v1.2) of SIPINA_W is ready.
It has been widely improved. I include below a short description.
Could you please insert this description as a contribution for
the next diffusion of kdd Nuggets ?
Thank you very much

Best regards

D.A. Zighed

SIPINA-W v1.2

This version contains several methods of induction graphs and some tools to evaluate the rules bases. We summarise the main modules here:

a - Import / Manipulation of data

The SIPINA_W files (.DAT) have the ASCII format but you may
import data directly from databases with a dBase format (.DBF) or a
Paradoxe format (.DB), or you may export data from a Lotus format
spreadsheet (.WKS).

Continuous data may be recoded by different contextual or non-contextual discretisation
methods: Chi-Merge [Kerber 1992], MDLPC [Fayadd , Irani 1992], FUSINTER
[ZIGHED 1995].

b - Methods

Several methods are implemented:
- CART [Breiman & al. 1984]: complete program proposing two criteria (Twoing Rule,
Gini index), as well as the pruning algorithm;
- Elisee [Bouroche & Tenenhaus 1970]: binary segmentation method using the Chi-2
criterion;
- ID3 [Quinlan 1979/1986];
- C 4.5 [Quinlan 1992]: includes the pruning and the simplification of rules;
- Chi2-link: a method using the Chi-2 critical probability as selection criterion,
cf.Mingers 1987;
- SIPINA [ZIGHED 1985/1992]: generalisation of trees by induction graphs, including
dynamically the discretisation methods seen above.

c - Tests and Evaluation

You may divide the data file into a learning sample and a test sample, and then you
execute the data processing on the first one, generate the rules, followed by the
validation on the second sample.
But you may also activate a cross-validation where the draw of a sub-sample may be
either randomly or stratified. The consequent rules on each analysis are saved in
different bases.

d - Automatic / Interactive Learning

When using the automatic learning procedure you only have to choose the method and
execute the analysis. The interactive learning mode enables you to force the operations
to be executed (Split, Merge), as well as the variables used (surrogate split) on each
vertex. The vertex inspection makes it possible to visualise the available information on
the selected vertex: distribution of the classes, observations list, distribution function
on each variable, variables power on competing splits.

e - Advanced manipulations of rules

e.1. Extracting rules
The generation of rules consequent to the graph has been improved. From each
non-initial vertex it is now possible to produce prediction rules which can be
evaluated through their error rate, their corresponding number of observations and
an implication test based upon the Lerman statistic [Lerman 1981, Intensity of
Implication].

e.2. Rules bases manipulation
The rules bases may evolve by the fusion of two or more bases; the user has the
possibility to input rules manually and to evaluate them by means of the data set.

e.3. Selection of the best rules by validation
During the application step of a rules base on a test or generalisation sample the
specification of the selection criterion for the competing rules may be altered (an
individual may respond to two rules, both having different conclusions; this is
mostly possible when executing a merge of rules bases). The criteria are:
minimisation of the error rate, maximisation of a rules number of individuals,
maximisation of the Goodman index [1988], or maximisation of the intensity of
implication.

e.4. Optimisation and Simplification
The consequent rules of an induction graph may be optimised and simplified. The
applicable methods are:
- detection and elimination of recurring premises;
- use of a symbolic algorithm exploring the whole description domain;
- algorithm of Quinlan [1987]: a hill-climbing for search the minimum
pessimistic error rate.

f - Technical Limitations

The theoretical capacities of the software are:
- 16.384 attributes
- 2^32 - 1 cases
Actually, the limitations are those of the computer.

g- Status:

Shareware.

h - How you can get SIPINA-W v1.2 ?

You can get SIPINA by ftp anonymous from :
eric.univ-lyon2.fr
/pub/sipina

i - Installation

to install this version, You download LESIPINA.EXE.
LESIPINA.EXE is a self-extracting file. Copy it in a temporary directory, and execute.
The installation file is SETUP.EXE. Please, Click on OK when the soft ask you another disk.
j - Updated by:

Ricco Rakotomalala on 1996-06-March (rakotoma@univ-lyon2.fr)

k - Contact:

D.A. Zighed,
Organisation: University of Lyon2,
e-mail : zighed@univ-lyon2.fr,
Tel.: (33) 78 77 23 76,
Fax.: (33) 78 77 23 75,
Adress : E.R.I.C._Lyon bat.L
5 av. Pierre Mendes-France
69676 Bron Cedex
France,


Previous  4 Next   Top
>~~~Positions:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: Heikki Mannila (mannila@mpi-sb.mpg.de)
Date: Fri, 8 Mar 1996 13:17:54 +0100 (MET)
To: kdd@gte.com
Subject: Position in Intelligent Systems at the University of Helsinki
Cc: mannila@mpi-sb.mpg.de
Content-Type: text
Content-Length: 2124

Professorship in Intelligent Systems in Helsinki
================================================

The University of Helsinki, Finland, invites applications for a
full professorship in computer science, especially intelligent systems.
This new position has now been established at the Department of Computer
Science. The position is for a 5-year period, but renewal and/tenure
are not excluded.

Knowledge of Finnish is not required; the lectures can be given in English.

The Department of Computer Science of the University of Helsinki
is the leading computer science department in Finland, with 3 (now 4)
full and 5 associate professorships. The total number of
employees is around 80. For more information on the department, see
http://www.cs.helsinki.fi/

The _hard_ deadline for applications is March 28, 1996 (by 3.45 p.m.).

The application should contain:
- a curriculum vitae in English,
- a numbered complete list of publications and other works with which the
applicant wishes to demonstrate his/her competence and merits, and
- one copy each of all publications and other works, numbered as in the
above list.

Written applications should be addressed to the Faculty of Science,
University of Helsinki, and sent to
The Registrar of the University of Helsinki
PO Box 33 (Yliopistonkatu 4)
FIN-00014 University of Helsinki
Finland

Informal enquiries about the position and application procedure can be made to

Prof. Esko Ukkonen, University of Helsinki, Department of Computer Science
P.O. Box 26 (Teollisuuskatu 23), FIN-00014 University of Helsinki, Finland.
phone: +358 0 708 44172, fax: +358 0 708 44441 Esko.Ukkonen@cs.helsinki.fi

or

Prof. Heikki Mannila, current address:
Max Planck Institut fuer Informatik, Im Stadtwaldt, D-66123 Saarbruecken, Germany
Phone +49 681 9325 107 Fax +49 681 9325 199
Heikki.Mannila@cs.helsinki.fi or mannila@mpi-sb.mpg.de

Formal instructions for application of the professorship can be obtained from the
Faculty Office, Mr. Jorma Aijo (tel. +358 0 19122354) or Mrs. Sirkka
Korsman (tel. +358 0 19122353, fax +358 0 19122179).


Heikki Mannila

Previous  5 Next   Top
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Wed, 20 Mar 1996 16:47:20 GMT
From: Iain Mclaren (iain.mclaren@luton.ac.uk)
To: kdd@gte.com
Subject: PhD in Logic-based data mining techniques
Content-Type: text
Content-Length: 1318


PhD Studentship in Logic-based database mining techniques

Department of Computing, University of Luton, UK.

With the ever increasing quantity of data held in computer based files,
data mining techniques are becoming ever more important. A PhD studentship
is available to investigate and develop new approaches to data mining
using persistent logic programming techniques.

Data mining tools automatically extract previously unknown, potentially
useful, information from the vast amounts of data held in computer
databases. For example, earth observation satellites generate
approximately one terabyte of data every day. Clearly, it is impossible
for a human to understand all this data. Automatic data mining tools
are necessary if any use is to be made of this scale of data.

The expressive nature of logic programming languages and their ability
to interact efficiently with relational database systems make them a
good candidate for developing data mining techniques. The PhD studentship
is proposed to investigate this approach.

If you are interested in knowing more about this PhD studentship and
have obtained a good first degree in Computer Science or a related topic
then contact Dr. Alfred Vella (alfred.vella@luton.ac.uk).

Department of Computing,
University of Luton, LUTON LU1 3JU
Tel: 01582 34111



Previous  6 Next   Top
>~~~Meetings:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Return-Path: (Blaz.Zupan@ijs.si)
From: Blaz Zupan (Blaz.Zupan@ijs.si)
Subject: IDAMAP-96 (ECAI'96 workshop) reminder
To: kdd-request@gte.com
Date: Tue, 12 Mar 1996 11:28:18 +0100 (MET)
X-Mailer: ELM [version 2.4 PL24]
Mime-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset=US-ASCII
Content-Length: 457

This note is to remind you that submissions for the INTELLIGENT DATA
ANALYSIS IN MEDICINE AND PHARMACOLOGY (IDAMAP-96), a workshop at 12th
European Conference on Artificial Intelligence ECAI-96, are due on
Tuesday, 2 April. For submission instructions and other information
about the workshop, consult the workshop's World-Wide Web pages at

http://www-ai.ijs.si/ailab/activities/idamap96.html

or send e-mail to ecai96wk@ijs.si

Nada Lavrac and Blaz Zupan


Previous  7 Next   Top
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: bramerma@cv.port.ac.uk
Date: Sat, 23 Mar 1996 20:53:35 EST
To: kdd@gte.com
Subject: Expert Systems 96: Call for Papers (version 2: 23/03/96)

BRITISH COMPUTER SOCIETY SPECIALIST GROUP ON EXPERT SYSTEMS
===========================================================

ANNUAL CONFERENCE - EXPERT SYSTEMS '96 (ES96)
=============================================

CALL FOR PAPERS
===============

The sixteenth annual Conference of the British Computer Society
Specialist Group on Expert Systems, ES96, is being held at St.
John's College, Cambridge from 16th to 18th December 1996.

The objective of the ES series of conferences is to bring
together researchers and application developers from the
business, industrial and academic communities to discuss issues
and solutions to problems based on techniques derived from
Artificial Intelligence.

The Conference continues to build on the success of previous
years, with a two-track event containing fully refereed technical
and applications papers.

For the Technical Stream, contributions are invited in the form
of papers of up to 5,000 words on knowledge-based systems and
related areas of Artificial Intelligence. Papers representing
original work on theoretical and applied AI relating to:
constraint satisfaction; intelligent agents; knowledge
engineering methods; machine learning; model-based reasoning;
verification and validation of KBS; natural language
understanding; case-based reasoning, knowledge discovery in
databases and other related areas are welcome.

For the Applications Stream, contributions are invited in the
form of papers of up to 5,000 words presenting case studies of
knowledge based systems that address real-world problems such as:
diagnosis, monitoring, scheduling and selection. Most
importantly, the papers should highlight the critical elements of
success and the lessons learned.

Papers submitted to both streams will be refereed and those
accepted will again be published in book form in the 'Research
and Development in Expert Systems' and 'Applications and
Innovations in Expert Systems' series (for the technical and
application streams respectively).

To assist us with our planning of the conference, anyone
intending to submit a paper should provide a short abstract, with
title, at the earliest opportunity to the Conference Secretariat.

Authors should indicate the stream to which their papers are
being submitted. Please include your full name and postal address
in any email submissions.

Formatting instructions for papers will be sent as soon as the
title and abstract are received.

Four copies of papers should be submitted to arrive no later than
Friday 21st June 1996. Submissions should be sent in paper form
by post to the Conference Secretariat.

PLEASE NOTE that presenters of submitted papers will be asked to
cover their costs of attending the conference by paying at the
SGES members' academic rate.

TUTORIALS & WORKSHOPS
=====================
The Conference Committee invites proposals for tutorials or
workshops to be presented on Monday 16th December. Proposals for
full and half day tutorials, from an individual or group of
presenters should be directed in the first instance to the
Conference Secretariat.

EXHIBITION
==========
A table top exhibition will run alongside the Conference. There
will be a limited number of spaces available and potential
exhibitors are encouraged to book early, as these will be on a
first-come, first-served basis.

SPONSORSHIP
===========
The Conference Committee is keen to make contact with any
organisations who may wish to sponsor the Conference, in whole or
in part. Sponsorship of an international conference such as ES96
will ensure the highest visibility for the benefactor, both
through the appearance of the company logo on all promotional
literature and in references to the Conference in all media
exposure prior to and after the event.

CONFERENCE COMMITTEE
=====================
Conference Chair:
Dr Ian Watson, University of Salford, Salford, M5 4WT
i.d.watson@surveying.salford.ac.uk

Deputy Conference Chair:
Prof Max Bramer, University of Portsmouth, Southsea, PO4 8JF
bramerma@csovax.portsmouth.ac.uk

Technical Programme Chair:
Mr John Nealon, Oxford Brookes University, Oxford, OX3 0BP
jlnealon@brookes.ac.uk

Applications Programme Chair:
Mrs Ann Macintosh, Artificial Intelligence Applications
Institute, Edinburgh, EH1 1HN
a.macintosh@ed.ac.uk


CONFERENCE SECRETARIAT
=======================
Mrs. Kit Stones
The Conference Team
17 Spring Road
Kempston, Bedford MK42 8LS

Tel/Fax +44 (0)1234-302490
kstonestct@cix.compulink.co.uk

IMPORTANT DATES
================
Title/Abstract notification: now
Full paper submission: 21 June 1996
Notification of acceptance: 9 August 1996
Camera ready papers due: 20 September 1996

World Wide Web address for conference information
=================================================
http://www.sis.port.ac.uk/sges/es96.html