Knowledge Discovery Nuggets Index

To KD Mine: main site for Data Mining and Knowledge Discovery.
Here is how to subscribe to KD Nuggets

Past Issues: 1997 Nuggets, 1996 Nuggets, 1995 Nuggets, 1994 Nuggets, 1993 Nuggets

Knowledge Discovery Nuggets(TM) 97:26, e-mailed 97-09-02

News:
* G. Grinstein, Network Intrusion Dataset and Challenge,

http://iris.cs.uml.edu:8080

Publications:
* Tom Dietterich, KDD-97 Invited Talk on Machine Learning
* Lance Otis, ACM Information Retrieval Conferences in Philadelphia Pa.
* P. Chan, CFP: MLJ--Integrating Multiple Learned Models (10/1 deadline)

http://www.cs.fit.edu/~imlm/

* J. Ortega, CFP: AI Review: Issues on the application of data mining
Positions:
* M. Kaiser, ABB in Switzerland: Product Data Management
and Software Certification,

http://www.chcrc.abb.com

* D. Berleant, University of Arkansas, Computer Systems Engineering Department
* Emil Weydert, MAX PLANCK INSTITUTE: Postdoc in Uncertain Reasoning,

http://www.mpi-sb.mpg.de/~weydert/depro

----
Data Mining and Knowledge Discovery community, focusing on the
latest research and applications.

Submissions are most welcome and should be emailed, with a
DESCRIPTIVE subject line (and a URL) to gps.
Please keep CFP and meetings announcements short and provide
a URL for details.

To subscribe, see

http://www.kdnuggets.com/subscribe.html

KD Nuggets frequency is 2-3 times a month.
Back issues of KD Nuggets, a catalog of data mining tools
('Siftware'), pointers to Data Mining Companies, Relevant Websites,
Meetings, and more is available at Knowledge Discovery Mine site
at

http://www.kdnuggets.com/

-- Gregory Piatetsky-Shapiro (editor)
gps

********************* Official disclaimer ***************************
All opinions expressed herein are those of the contributors and not
necessarily of their respective employers (or of KD Nuggets)
*********************************************************************

~~~~~~~~~~~~ Quotable Quote ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Ease and speed in doing a thing do not give the work
lasting solidity or exactness of beauty.
--Plutarch: Life of Pericles.

Previous 1 Next Top

Date: August 17, 1997
From: Georges Grinstein (grinstei@cs.uml.edu)
Subject: Network Intrusion Dataset

http://iris.cs.uml.edu:8080

4 Intrusions were caused on the MITRE corp enterprize. Data was collected (2 hour
groups) + 1 baseline dataset.

The problem: identify intrusions

Presentations will take place at IEEE Visualization'97 Conference in October.

Previous 2 Next Top

From: 'Prof. Zicari' (zicari@informatik.uni-frankfurt.de)
Subject: COMDEX Internet Applications Awards
Date: Mon, 25 Aug 1997 18:42:41 +0200 (METDST)

** CALL FOR SUBMISSIONS: DEADLINE September 10! **

Frankfurt - August 1997.

IBM, MICROSOFT and SUN Microsystems
jointly support an international Awards Program designed for the new
generation of Internet-based applications for business.

The first COMDEX Internet Application Awards will be given out in the
following three categories:

- Best Intranet-based application for enterprise usage
Focus: Use of an Intranet for Institutional/Corporate knowledge for
competitive advantage.

- Most Innovative Web Site
Focus: Best or most innovative Web Site with respect to user interface,
easy to use, innovative content.

- Best Transactional Internet Application
Focus: Database, interactive applications.

The Award winners will be selected among the submittals by a jury of
international experts. The Awards ceremony will take place on October 8,
1997 at the trade show COMDEX Internet & Object World Frankfurt '97
(October 7-10,1997, Sheraton Conference Center, Frankfurt/Main Airport).

To participate to the Awards Program, please download the
official Entry Kit from

http://www.ltt.de

or request it by e-mail at LogOn@omg.org
or Fax +49-6173-94 04 20.
or Tel +49-6173-95 58 51

The deadline for participating to the Awards program is September 10, 1997.

Previous 3 Next Top

Date: Sun, 24 Aug 1997 09:32:58 -0700
From: Tom Dietterich (tgd@CS.ORST.EDU)
Subject: KDD-97 Invited Talk on Machine Learning

In my invited talk at KDD-97, I mentioned a paper of mine that will be
appearing in the Winter issue of the AI Magazine. I've made a
preprint version of the paper available for ftp from the following URL:

ftp://ftp.cs.orst.edu/pub/tgd/papers/aimag-survey.ps.gz

This is a gzipped, postscript file that should print on any postscript
printer.

Machine Learning Research: Four Current Directions
Thomas G. Dietterich
Oregon State University

Machine Learning research has been making great progress in many
directions. This article summarizes four of these directions and
discusses some current open problems. The four directions are (a)
improving classification accuracy by learning ensembles of
classifiers, (b) methods for scaling up supervised learning
algorithms, (c) reinforcement learning, and (d) learning complex
stochastic models.

--
Thomas G. Dietterich Voice: 541-737-5559
Department of Computer Science FAX: 541-737-3014
Dearborn Hall, 303 URL:

http://www.cs.orst.edu/~tgd

Oregon State University
Corvallis, OR 97331-3102

Previous 4 Next Top

From: Lance Otis (LanceO@apptechsys.com)
Date: Mon, 25 Aug 1997 09:06:26 -0700
Subject: ACM Information Retrieval Conferences in Philadelphia Pa.

July 22 - 31, 1997

Keywords: unstructured textual information; information access;
search; automatic hypertext linking; automatic thesaurus building;
TREC; human computer interaction; browsing; mixed mode; multimedia;
information retrieval

1. Summary:

This report covers the ACM Digital Libraries - 97 Conference (22-26
July, 1997) and the associated ACM Special Interest Group- Information
Retrieval (SIGIR) Conference on Research and Development in
Information Retrieval (27-31 July, 1997) in Philadelphia Pa. These
two conferences presented a broad spectrum of current research
activity involved in the indexing and acquisition of information from
multimedia and multiple language text data collections. Attendance
included representatives from academic institutions, government, and
private industry from around the world. Major subject matter areas
included:

Pattern analysis, knowledge representation, and classification
Text indexing and retrieval
Multimedia image indexing and retrieval
European and Asian natural language analysis
Accessing Web data, hypertext and active search agents
Improving user query interfaces and assisting the user
Query analysis and processing
Improving display of query results
Navigation and browsing vs. search via queries

2. Observations:

* Over 60 papers were presented during the conferences.

* A significant number of the presentations were dedicated to
language-specific tricks for indexing and retrieval.
Language-independent text pattern analysis was not specifically
discussed. On the other hand, graphic pattern analysis methods are
being actively pursued as a means for indexing.

* Text retrieval efficiency still remains close to 50% (at 50% recall,
precision hovers around 40%). Although presentations indicated that
small-incremental improvements in search efficiency are possible, no
significant break-throughs were presented. Multimedia retrieval
efficiencies, lag text retrieval efficiencies.

* Conference attendees endorsed the use of the TREC text corpuses to
provide a means of evaluation of different text search strategies
against a common standard. A similar standard for multimedia does not
exist, or is not yet in wide use.

* Keyword search, or sets of key words, vs. phrase or concept
searching is still the norm for text searching, however phrase and
concept search techniques are beginning to be addressed. Search
techniques that go beyond the Booleans AND and OR were barely
mentioned.

* Presentations and discussions concerning acquisition and application
of meta-data, primarily consisted of application of existing or
pre-defined hierarchical schemes, such as the Dewey Decimal System, to
represent knowledge classifications.

* Clustering and meta-data: multiple presenters discussed clustering
methods. Work involving hierarchical clustering, i.e.: sub-clusters
in clusters, was presented. Bottom-up analysis of patterns and
subsequent development of natural class hierarchies defined by these
data relationship patterns was not specifically addressed, however
graphical image concept hierarchies based on group similarities are
being researched. Audio analysis and Asian statistical language
analysis methods hold some promise for language-independent lexical
pattern analysis.

* Stanford leads the way in interactive-browsing approach to text
search using phrase hierarchies.

* There is a rising interest in user interface research and display
technologies. Database and index sizes are moving into the terabyte
range. Web search is driving consideration of multi-language, terabyte
data spaces.

* Complex indexes are approaching the size of the corpus.

* Real-life commercial applications have already dismissed some of the
theories and research areas as irrelevant because they have no pay
back or do not improve search results, e.g.: stemming and inverse
frequency weights were mentioned as not helping by a large internet
search company. Given the state of retrieval theory (50% efficiencies
or less), commercial products favor speed over precision and provide
the means for query refinement and/or browsing to allow the user to
home in on the desired answers.

* Almost all text retrieval systems discussed use UNIX.

* There were no presentations by Microsoft people. Possibly a better
place to hear them will be at the August 14, 1997 Knowledge Discovery
and Datamining Conference in Newport Beach CA. See:

http://www-aig.jpl.nasa.gov/kdd97/

(at KDD-97 there were no papers by Microsoft people, but data-mining related tutorials
were given by Usama Fayyad and Surajit Chaudhuri from Microsoft. GPS)

1. Discussion:

* As CPUs and storage media get cheaper, more and more graphics media
(movies, etc.) will be digitized. There is a present and growing need
for the means to find and organize terabytes of multimedia data.

* Commercial 64 bit systems are needed now.

* It was generally acknowledged that there is no single classification
scheme that models general knowledge, i.e.: knowledge does not have a
single hierarchical structure.

* Thesauri are concept hierarchies applicable to the data domain.

* Browsing: People want to be able to see the connections, and the
path they have taken to get to a result

* Buzz words:

Ontology
ethnography and ethnomethodologically
orthogonal
taxonomy
stochastic
Bayesian
Boolean
epiphenomenon
morphology
phoneme
polysym
monosym
lexical lexicon
n-grams
the nym sisters:
troponym
homonym
hypernym
synonym
antonym

1. Comments:

The two conferences were targeted towards presentation of information
retrieval research findings primarily in the academic arena.
Obviously, unpatented commercial research findings were not discussed,
but based on this conference and other information sources, I believe
we have a long way to go before we have the capability to efficiently
and precisely find relevant information in text and graphics corpuses.
Accordingly, we need to keep in contact with SIGIR as well as broaden
our contacts with other research activities involved in information
retrieval. There is/may be research in signal analysis, stochastic
information theory, and network communication theory that can be, or
is being, applied to information retrieval.

2. Links to Similar Research:

1. Annual Symposium on Document Analysis and Information
Retrieval sponsored by the Information Science Research Institute
(ISRI) at the University of Nevada, Las Vegas. The purpose of this
symposium is to present results of state-of-the-art research and to
encourage the exchange of ideas in the general field of automatic
extraction of information from images and printed documents. Chairman:
Jan O. Pedersen, Xerox Palo Alto Research Center,
pedersen@parc.xerox.com

2. International Conference on Pattern Recognition. Emphasis is on
pattern-based methods for storage, retrieval, search and
querying. Contact: Adnan Amin: amin@cse.unsw,edu.au

3. International Symposium On Mathematical Morphology. Emphasis is on
mathematical morphology and its applications in image and signal
processing. Contact: Mrs. L.M. v.d. Eersten-Schultze email:
lieke@cwi.nl

4. TREC: The TREC conference series is co-sponsored by the
National Institute of Standards and Technology (NIST) and the
Information Technology Office of the Defense Advanced Research
Projects Agency (DARPA) as part of the TIPSTER Text Program. The goal
of the conference series is to encourage research in information
retrieval from large text applications by providing a large test
collection, uniform scoring procedures, and a forum for organizations
interested in comparing their results. Attendance at TREC conferences
is restricted to those researchers and developers who have performed
the TREC retrieval tasks and to selected government personnel. TREC is
a large scale experiment involving a number of research groups working
on text retrieval. Each participating team takes on the same text data
and the same set of search requests , and runs them through its own
system. Output is sent to the US National Institute of Standards and
Technology for assessment, and a number of performance measures are
calculated. These measures are not independent, but focus on the
ability of the system to find items which the assessors regard as
relevant. Thus they attempt to measure effectiveness in a
user-oriented sense, rather than efficiency in time or cost. Contact:
Ellen Voorhees, TREC Project Manager, Natural Language Processing and
Information Retrieval Group, NIST, ellen.voorhees@nist.gov

5. ICCS'97 Fifth International Conference on Conceptual
Structures,University of Washington, Seattle, USA August 4,
1997-August 8, 1997.

http://www.cs.uah.edu/~iccs97/

8. CIKM '97, Sixth Int'l ACM Conf. on Information and Knowledge
Management, Nov. 10-14, Las Vegas, Nev. Contact Forouzan Golshani,
Dept. of Comp. Sci. and Eng., Box 5406, Arizona State Univ., Tempe, AZ
85287-5406; voice (602) 965-2855; fax (602) 965-2751;
cikm97@arian.eas.asu.edu;

http://www.arian.eas.asu.edu/cikm97/cikm97.html/.

9. IICIMA'98,InternationalConference on Computational Intelligence
and Multimedia Applications. Monash University, Churchill, Victoria,
Australia, February 9, 1998-February 11, 1998, Contact:

http://www-gscit.fcit.monash.edu.au/~iccima98/

10. ICDAR '97, Fourth International Conference on Document Analysis
and Recognition. Ulm, Germany, August 18, 1997-August 28, 1997,
Contact:

http://wwwicdar97.dbag.ulm.daimlerbenz.com/

11. ACM Multimedia'97. Seattle, USA, November 8, 1997-November 14,
1997, Contact:

http://www.acm.org/sigmm/MM97/cfp.html

12. Text Encoding Initiative Tenth Anniversary User Conference, Brown
University Providence, Rhode Island, USA, USA, November 14,
1997-November 16, 1997, Contact:

http://www.stg.brown.edu/webs/tei10/

13. OOIS '97, 4th International Conference on OO Information
System. Brisbane, Australia, November 10, 1997-November 12, 1997,
Contact:

http://www.it.uq.edu.au/conferences/oois97

14. TReC-VRML visualization. The TReC-VRML visualization is an effort
to provide better ways to visualize the summary data from TReC. It
uses the Virtual Reality Modeling Language (VRML) to create a
3-dimensional graph of the data you are interested in. See:

http://zing.ncsl.nist.gov/~lorax/trec/trecvis.html

15. Second IEEE Metadata Conf., Sept. 16-17, Silver Spring,
Md. Contact Margie Templeton, Data Integration Inc., 11965 Venice
Blvd., Ste. 305, Los Angeles, CA 90066; voice (310) 313-9150; fax
(310) 313-9151;

http://www.llnl.gov/liv_comp/metadata/md97.html.

Previous 5 Next Top

Date: Fri, 29 Aug 1997 12:26:33 -0500 (EST)
From: 'IMLM Workshop (pkc)' (imlm@tuck.cs.fit.edu)
To: ml@ics.uci.edu, gps, INDUCTIVE@hermes.csd.unb.ca,
DAI-List@ece.sc.edu, Connectionists@CS.cmu.edu,
ai-stats@watstat.uwaterloo.ca, hybrid-list@cs.ua.edu, colt@cs.uiuc.edu
Subject: CFP: MLJ--Integrating Multiple Learned Models (10/1 deadline)

Machine Learning Journal
Special Issue on

Integrating Multiple Learned Models for
Improving and Scaling Machine Learning Algorithms

More info:

http://www.cs.fit.edu/~imlm/

Previous 6 Next Top

From: julio@almaden.ibm.com
Date: Tue, 2 Sep 1997 00:49:20 -0700
Subject: AI Review: Issues on the application of data mining

ARTIFICIAL INTELLIGENCE REVIEW:
ISSUES ON THE APPLICATION OF DATA MINING

The goal of data mining is to extract previously unknown,
comprehensible, actionable information from the increasingly large
amounts of data being collected in industry, government, and research
organizations. The continuous evolution of the field, including
future research directions, will be heavily influenced by the
experiences in applying data mining techniques to real-world problems.

Data mining applications vary greatly today and the field can learn
important lessons from this variability. Many important applications
have been developed by using essentially the same data mining
technique. It will be important to understand what type of domain
knowledge or data analysis expertis was used to make such applications
successful. In other successful applications a variety of
complementary techniques had to be used. In such cases it will be
important to understand how the techniques were selected and how the
data was manipulated before it can be mined by each technique, as well
as how the techniques were used cooperatively.

This special issue will highlight some of the current efforts in
applying data mining techniques, with an emphasis on insights that
could help others make the application of those techniques
successful in a real-world situation which is invariably characterized
by large sets of noisy and incomplete data. Of particular interest
would be papers that discuss data mining applications that have been
deployed in production environments or are in the process of being
deployed. Topics could include but are not limited to:

* Issues in data quality, representation, modeling, selection, and
transformation
in preparation for mining. Of particular interest is the relation of these
issued to data warehouses and data marts.

* Criteria for selection of a particular data mining technique or sets
of techniques.

* Introduction of additional prior knowledge into the data mining
process.

* Integrating a data mining methodology into an existing information
infrastructure.

* Efforts in selecting the most appropriate of the mined knowledge and
in formulating actions based on the mined knowledge.

* Human elements in completing a successful data mining project.

In addition to the call for full-length papers, we request that any
researchers working in this area submit abstracts and/or pointers to
recently published applications for the purpose of compiling a
comprehensive survey of the current state the art.

The mission of Artificial Intelligence Review: The Artificial
Intelligence Review serves as a forum for the work of and application
developers from Artificial Intelligence, Cognitive Science, and
related disciplines. The Review publishes state-of-the-art refereed
research applications and critical evaluations of techniques and
algorithms from fields. The Review also presents refereed survey and
tutorial articles, well as reviews and commentary on topics from these
disciplines.

** Instructions for submitting papers **

Papers should be no more than 30 printed pages (approximately 15,000
words) with a 12-point font and 18-point spacing, including figures
and tables. Papers must not have appeared in, nor be under
consideration by, other journals. Include a separate page specifying
the paper's title and providing the address of the contact author for
correspondence (including postal, telephone number, fax number, and
e-mail address). Send FOUR copies of each submission to the guest editor
listed below. Papers in ascii or postscript form may be submitted
electronically.
For additional information, contact the guest editor, or visit Kluwer
Academic Publishers' webpage

http://www.wkap.com/.

** Important dates **

Papers due: December 1, 1997
Acceptance notification: February 1, 1998
Final manuscript due: June 1, 1998
Date of issue: September, 1998

** Guest Editor **

Julio Ortega
IBM Almaden Research Center
650 Harry Road
San Jose, CA 95120
(408) 927-2629 (voice)
(408) 927-2100 (fax)
email: julio@almaden.ibm.com

Previous 7 Next Top

Date: Fri, 22 Aug 1997 15:18:59 +0000 (CUT)
From: Kaiser_Michael/KWRB-T/58662o68319 (h01720@venus.twr.KWR.CHKRA.ABB.COM)
Subject: ABB in Switzerland: Positions in Product Data Management
and Software Certification

ABB Corporate Research, Switzerland

ABB is a multinational electrotechnical engineering group with over
200,000 employees worldwide, serving customers in electric power
generation, transmission and distribution, industrial and building
systems, and rail transportation (see

http://www.abb.com.

The computer
engineering department in our Swiss research lab close to Zurich supports
Swiss and international ABB business units in the application of advanced
information technology. Our employees track technology trends and transfer
appropriate technologies to our development departments. In addition they
participate in the specification and design of our new engineering systems
or software-based products.

We invite applications for regular and postdoctoral positions in two
areas:
(a) product data management,
(b) software certification.
Applicants should have a PhD or MS degree in Computer Engineering/Computer
Science with experiences in:
(a) databases/workflow management/engineering systems integration,
(b) software engineering/testing/verification
or related fields. We seek creative, motivated individuals with proven,
outstanding analytical skills, who are willing to learn German.

For consideration, please send your CV, list of publications, copies of
grade records, possible start date and approximate salary requirements to
ABB Corporate Research Ltd., CHCRC-P, Mrs. B. Brander, 5405 Baden,
Switzerland (Fax +41 56 493 4406). For further information see

http://www.chcrc.abb.com.

--
Michael Kaiser Currently at:
ABB Corporate Research Ltd. ABB PowerGeneration Ltd.
Information Technology Dept. (C2) KWTB-T
CH-5405 Baden-Daettwil CH-5401 Baden
Tel. +41 56 48 68319 Tel. +41 56 20 52612

Previous 8 Next Top

From: djb@engr.uark.edu (BERLEANT DANIEL J)
Date: Tue, 26 Aug 1997 14:38:55 -0500
Subject: University of Arkansas, Computer Systems Engineering Department

NOTE: The chancellor has recently targeted our dept. for a major
expansion. We now have four (not one) open positions. We would like
to see applications at the assistant professor as well as senior
levels. 'Background in engineering' could mean a B.S. or M.S. in
engineering, so a Ph.D. in computer science is fine.

NOTE 2: I am interested in applicants in the areas of data mining,
information retrieval, and related areas, and would like to hear from
you if you decide to apply.

====================================================================

University of Arkansas
Computer Systems Engineering Department

The Department of Computer Systems Engineering invites applicants for
a tenure-track position at the Assistant Professor level. Applicant
must have a Ph.D. in engineering or computer science and possess an
interest in and ability to teach in the software area. Preference will
be given to candidates with a background in engineering. The
successful applicant will emphasize quality teaching (currently two
courses per semester) at the graduate and undergraduate levels and a
demonstrated potential to initiate research projects while attracting
external funding.

The Computer Systems Engineering Department is a dynamic and growing
department within the College of Engineering with research areas in
computer architecture, telecommunications, human computer interaction
and other areas.

Departmental resources include a network of over 50 Sun workstations
and 50 PC's linked to the College of Engineering's network, new NCR
equipment, and laboratories including the Computer Architecture
Laboratory, Image Processing Laboratory, Networking Laboratory, and
Software Artifact R & D Laboratory. The department has approximately
300 undergraduate and 30 graduate students with 10 full time faculty.

The University of Arkansas is located in Fayetteville, situated with
several other dynamic, growing communities in the beautiful
Northwestern part of the state in the Ozark mountains. The academic
units on the campus include eight colleges and schools with nearly
15,000 students, 800 faculty and 2,000 staff members.

A curriculum vitae, three references, and a small amount of optional
supporting materials will be accepted until the position is
filled. All candidates should indicate citizenship and, in the case of
non-citizenship, visa status. Mail to:

Dr. Carl D. Bowling
Search Committee Chair
Professor and Head
Computer Systems Engineering Department
University of Arkansas
Engineering Hall 313
Fayetteville, AR 72701

The University of Arkansas is committed to achieving diversity,
racial, ethnic, and gender in its faculty. Therefore, the University
is especially interested in applications from all qualified
candidates who would contribute to such diversity in the Computer
Systems Engineering Department.

Previous 9 Next Top

From: Emil Weydert (weydert@mpi-sb.mpg.de)
Date: Thu, 28 Aug 1997 10:54:15 +0200 (MET DST)
Subject: MAX PLANCK INSTITUTE: Postdoc in Uncertain Reasoning

URL:

http://www.mpi-sb.mpg.de/~weydert/depro

POST-DOCTORAL FELLOWSHIP

- UNCERTAIN REASONING -

MAX PLANCK INSTITUTE FOR COMPUTER SCIENCE

The Max Planck Institute for Computer Science is devoted to basic
research in Computer Science. The institute is located on the campus
of the University of Saarbruecken (Germany, close to France), which
also hosts a major Computer Science department and the German Research
Center for Artificial Intelligence (DFKI). It was founded in 1990 and
currently consists of two research units, two more to follow in 1998.

1. Algorithms and Complexity (headed by Kurt Mehlhorn).
2. Programming Logics (headed by Harald Ganzinger).

In the area 'Uncertain Reasoning', the research group 'Programming
Logics' offers - starting December 97 or later - a post-doctoral
fellowship for one or two years, which amounts to up to 3,400 DM per
month, taxfree. Travel support is generous and the working conditions
are very pleasant!

We are looking for candidates interested in the combination of
quantitative and qualitative approaches to reasoning under
uncertainty. Our interests include but are not limited to
probabilistic logic, entropy maximization, default reasoning, bayesian
networks, qualitative decision theory and data mining.

Applications (including curriculum vita, list of publications,
research interests, names of references or recommendations, and
intended period of stay) should reach us until September 20, 1997
(preferably by email). Statements of interest should be sent as soon
as possible.

Emil Weydert
Max-Planck-Institut fuer Informatik
Im Stadtwald
D-66123 Saarbruecken
Germany

emil@mpi-sb.mpg.de

http://www.mpi-sb.mpg.de/~weydert/depro

Previous 10 Next Top