KDD Nuggets Index

--
Data Mining and Knowledge Discovery community,
focusing on the latest research and applications.

Contributions are most welcome and should be emailed,
with a DESCRIPTIVE subject line (and a URL, when available) to (kdd@gte.com).
E-mail add/delete requests to (kdd-request@gte.com).

Nuggets frequency is approximately weekly.
Back issues of Nuggets, a catalog of S*i*ftware (data mining tools),
and a wealth of other information on Data Mining and Knowledge Discovery
is available at Knowledge Discovery Mine site, URL http://info.gte.com/~kdd.

-- Gregory Piatetsky-Shapiro (moderator)

********************* Official disclaimer ***********************************
* All opinions expressed herein are those of the writers (or the moderator) *
* and not necessarily of their respective employers (or GTE Laboratories) *
*****************************************************************************

~~~~~~~~~~~~ Quotable Quote ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Maybe you should telephone the Internet and talk to their tech support people.
America Online tech support rep
to customer with lost email, quoted in Internet World
(thanks to John Vittal)

Previous 1 Next Top

>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Fri, 17 May 1996 09:45:38 -0400
From: Gregory Piatetsky-Shapiro (gps@gte.com)
Subject: Tracking Crumbs of Data That Threaten to Define Us
(Thanks to Usama Fayyad for pointing me to this article)
SURF & TURF / By ASHLEY DUNN

Tracking Crumbs of Data
That Threaten to Define Us

Credit: Christine Thompson / CyberTimes

Paranoia was once a simple affair.

There were these huge, faceless
organizations. They had huge
computers. They were out to get all
sorts of people -- Kennedy, Elvis and
you. They got the other two, but
somehow you survived.

The past few years, however, have been
hard on paranoia, what with the
downsizing of the CIA, the breakup of
the phone company and the discovery by
the former Soviet Union of McDonald's.
The big guys are gone and the best that
the Modern Era has been able to come
up with is Microsoft. But really, if the
company can't even win the browser
wars how can anyone expect it to dominate the world?

No, the world has become a fragmented place, lacking the clear and
comforting division of us and Big Brother that for so long offered a
comforting, if simplistic, explanation for why the world was so
screwed up. Today, there is no obvious grand dynamic, no one great
conspiracy to hold on to.

Fortunately, all is not lost for
the paranoid. There is still hope
for unreasoned fear. Modern
paranoia has simply entered a
different phase -- the designer
phase, with countless specific
conspirators for specific events.
They are still watching, but this
time, 'they' is potentially everyone.

The Information Age has brought a new cast of characters to the
drama of paranoia. J. Edgar Hoover has been replaced by anyone with
a computer and the money to tap a new breed of data services that
have flourished in the past decade.

Companies like CDB Infotek and Information America Network have
terabytes of information that have been carefully assembled over
the last 10 or 15 years from the crumbs of data that people scatter
in the normal course of living.

All information legally accessible is deemed public and has been
available for years in county records, credit databases and magazine
mailing lists. Lawyers, business people and government agencies have
long used the information for legitimate purposes (the paranoid, of
course, will dispute that).

What has changed is the assembling of all the little pieces into a
seamless whole that can be easily accessed with a single PC. Unless
you live like the Unabomber, you are easily known.

Much of the information is available to the public through legitimate
private investigators or information resellers, like InfoHawks in San
Francisco, which purchase database access from other companies.
There are restrictions, of course, requiring that buyers show they
have a legitimate purpose. But it's a squishy concept. Who would ever
admit that they had an illegitimate purpose?

Today, with little more than a
name and an approximate age
or residence, any data service
can find most people in a
matter of minutes. Not only can
they find you, but they will be
able to turn up your birth date
and social security number --
the keys to unlocking a treasure chest of information including the
magazines you subscribe to, what credit cards you have, what
property you own, every place you have lived, all the names and ages
of your family members, their social security numbers, and even
(with a little illegal action) your credit card purchases.

As a test, I asked one information and research training company,
Gateway USA of Palm Beach Gardens, Fla., to search for me. They
found me in a few minutes using a database, known as the publisher's
mailing list, which is largely compiled from magazine subscriptions
and address change forms.

Victor Sanchez, Gateway's vice president for marketing, first
entered my name into the database and turned up 78 Ashley Dunn's in
the country. Making an educated guess, he narrowed the search to
New York State and found three people. Two were quickly eliminated
because they were both over 60.

The address he turned up was an apartment on W. 51st Street in
Manhattan where I lived more than two years ago. But that address
was linked to several others.

He soon found my current address in Southern
California. Then using the TRW database he pulled up
what is known as a credit header, which included my
social security number.

To actually get my credit card numbers, he would
have had to order a full credit report, which requires
a legitimate purpose that can be documented. He
stopped at this point.

Just to see what I could do with the information, I called my credit
card company and asked for a list of my recent purchases. The
recording on the phone first asked for my account number. No
problem.

Then it asked from the last four digits of my social security number.
Again, no problem. Mr. Sanchez had already provided that from the
TRW database.

After listening to my balance, a customer service representative
came on the phone and asked for my name (that was easy) and
proceeded to tell me my most recent credit transactions -- at Home
Depot and the YMCA.

Mr. Sanchez's search only scratched the surface. Had he used the
Lexis-Nexis database of real estate transactions, he could have
found the properties I owned and how much I paid. He could have
found my wife's name through the same database and my age through
a variety of sources, including the Metronet Online Information
System's database and Equifax credit bureau.

He said that Gateway has
access to more than 15,000
databases. You are probably in
there somewhere.

All this information has been
accrued in the most innocuous
ways - subscribing to
magazines, using your credit card, putting in change of address
forms, buying a home, getting married, being born.

Most of it never disappears. Computerized death and birth records,
for example, go back to the 1800's. Mr. Sanchez pulled up places I
haven't lived in 20 years.

Those who use the Internet are even more discoverable. E-mail
addresses are easy to find using search engines like MIT's usenet
address finder or Digital Equipment Corporation's Alta Vista search
engine.

Finding phone numbers and addresses
on the Internet is a little more iffy, but
you stand a decent chance of finding a
listed number using a free service like Switchboard, which has more
than 90 million names in its database.

You can make a guess at the approximate location of people by using
the InterNIC whois server to search the domain name in their e-mail
address. And to get a better sense of who they are, you can use search
engines like Alta Vista or Deja News to see what Usenet messages
they've posted lately.

As a test, I posted a message in comp.dcom.isdn,
braunschweig.allgemeines and hsinchu.rec.thbs. Within a few hours,
Alta Vista had had turned up the comp.dcom.isdn posting. The other
two never showed up.

That is perhaps the one saving grace to the Information Age. There is
so much computerized information now, that the biggest problem is
sorting it all out.

An AltaVista search of newsgroups using my name turns up 5,000 hits,
only a handful of which have anything to do with me. Of the dozen or
so Ashley Dunn's that turn up in a Web search, one is a journalism
student at the California State University, Northridge, one is a
cancer researcher in Australia, and another is a black female porno
star in Los Angeles. I happen to be a 41-year-old Chinese-American.

E-mail addresses I haven't used
in years still turn up in some
searches and, of course, I don't
live anywhere near The New
York Times server even though
the domain name in my address
is nytimes.com. Once, I looked
up my marriage in a county
database and found that I was listed as the bride. In many ways, ours
is the Misinformation Age.

Almost all of this information is considered public to begin with.
What is unsettling -- beyond the fact that there is so much personal
data available -- is that the information, both correct and
incorrect, paints a portrait of life that is hardly whole. I am more
than my past addresses, my newsgroup postings and what I buy with
my credit card. I have done more in my life than what has been
digitally recorded.

But I fear that the distinction between the digital me and the real me
is fading. It is so much easier and neater to construct a portrait of
people through databases that in many ways, their digital identities
have become more compelling that their real lives.

Even as I write this, a new Ashley Dunn is being assembled, crumb by
crumb, on somebody's computer. Will people care in the future that
there is a distinction between the digital me and the real me?

If there is anything to be paranoid about in the Information Age, it's
that they will not.

Related Sites

Following are links to the external Web sites mentioned
in this article. These sites are not part of The New
York Times on the Web, and The Times has no control
over their content or availability.

CDB Infotek

Information America Network

InfoHawks in San Francisco

the TRW database

Lexis-Nexis database of real estate
transactions

Equifax credit bureau

MIT's usenet address finder

Digital Equipment Corporation's Alta Vista search
engine

Switchboard

InterNIC whois server

Deja News

Ashley Dunn at href='mailto:asdunn@nytimes.com')
asdunn@nytimes.com(/a) welcomes your comments and
suggestions.

Copyright 1996 The New York Times Company

Previous 2 Next Top

>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Mon, 20 May 1996
From: Gregory Piatetsky-Shapiro (gps0@gte.com)
Subject: Inforworld April 29, 1996 article on Data mining
X-Url: http://www.infoworld.com/cgi-bin/displayArchives.pl?dt_iwe18-96_84.htm

 April 29, 1996 - Data mining's Midas touch Data mining tools are helping companies reap rewards from their data warehouses

Data mining's Midas touch Data mining tools are helping companies reap rewards from their data warehouses

By Julie Bort

Publication Date: April 29, 1996 (Vol. 18, Issue 18)

Deep within the pulsating mass of bits and bytes strung throughout the enterprise lie answers to the most perplexing problems of any business. Which customers will turn to competitors? Which offers will prompt customers to buy more? What are the signs of fraudulent activity?

The newfound data warehouse concept for client/server architectures is a step in the right direction for getting answers. But if an organization really wants huge paybacks from its warehouse or data marts, it will need to turn to data mining. Data mining is the act of drilling through huge volumes of information in order to discover relationships, or to answer specific questions, that are too broad in nature for traditional query tools.

Fundamentally, data mining is statistical analysis and has been in practice as long as there have been mathematicians. But until recently, statistical analysis was a time-consuming, manual process and accuracy depended heavily on the person performing the analysis. No more. Today, thanks to the matur-ing of neural networks and other sophisticated technologies, tools exist that automate the process, making data mining a practical solution for a wide range of companies. Couple these tools with a growing base of accessible enterprise data -- often in the form of a data warehouse -- and a company has at its disposal a tool with immense implications.

'We use the HNC [data mining] product to identify customers who are about to leave our bank, an extremely important application for us. It is far easier to keep a customer than to go out and get a new one. This reduces our expenses,' says Bob Esters, vice president of marketing research and database marketing for Star Bank Inc., a regional bank with 250 branches throughout the Midwest and user of the Database Mining Workstation from HNC Software Inc., in San Diego. 'Beyond that, applications for things like site analysis and cross-selling opportunities are ripe for this kind of tool.'

A MODEL ATTRACTION. Tapping into this potential requires a basic understanding of data mining, which is as complex as its manual statistical counterpart.

'There are four operations for doing discovery driven mining. These are predictive modeling, database segmentation, link analysis, and deviation detection,' says Evangelos Simoudis, director of data mining solutions for IBM's World Wide Decision Support Solutions Division, in San Jose, Calif. 'You need a variety of tools to perform these [operations] because various types of data behave differently.'

Predictive modelers attempt to forecast a particular event -- such as which customers of a bank are likely to move to the competition. They assume that a company has a specific question that it is trying to answer. They will attempt to provide that answer by assigning a rank that determines the likelihood of certain outcomes.

Most of the readily available tools perform predictive modeling. In generic terms, a predictive modeler functions something like this: A company decides what it wants to research -- for example, which customers are likely to leave. It takes a sampling of scrubbed data on customers that have left and feeds it to the predictive modeler, telling it this is the sample of 'bad' customers. It also takes a sample of data from longtime customers and feeds it to the modeler, telling it this is the sample of 'good' customers.

The tool then sifts through these samples to uncover variables and combinations of variables that make up the typical 'bad' and typical 'good' customer profiles, and it returns a ranking of those variables. The results may then read as follows: Customers who are over 50, have an income greater than $100,000, are male, drive a Buick, and own their home have a 30 percent chance of leaving. Customers who are 18 to 25 years of age, have an income of less than $25,000, drive a Honda, rent, and are male have a 70 percent chance of leaving.

With these results, a company can run a query against its customer database to draw lists of customers that fit such profiles and design marketing programs to target the defined groups. Furthermore, as the modeler receives more data it will 'learn' and produce increasingly accurate predictions.

Predictive modeling tools can be segmented into several types, the most common of which are neural network products. Neural networks are computer applications that simulate the function of a human brain. They can be trained and are adept at the nonlinear reasoning that is the hallmark of many 'leap to a conclusion' human beings. Neural network tools include HNC's Database Mining Workstation and the DataCruncher, from DataMind Inc., in Redwood City, Calif.

The neural network predictive modeler is ideal for companies that have a great depth of statistical information and analysts who are already doing their own analyses, because neural networks work far faster than any human being working on a spreadsheet can.

'The beauty of the tool is that it can model in a nonlinear way and the process is fast. It makes the same decisions along the way that an analyst would make regarding which variables to include,' Esters says.

Other users concur.

'With statistics that used to take a month to model, we can have a new model overnight,' says Mike Eichorst, vice president of database marketing for Chase Manhattan Bank Inc., in New York, and a user of HNC's Database Mining Workstation.

Whether the simulated human thinking of a neural network modeler is more accurate than human thought remains debatable. Esters says that neural network products are comparable to, but not better than, functions of the human brain. But Eichorst disagrees.

'We originally used the mining workstation to assign customers to market segments. It consistently outperformed traditional statistical analysis methods,' Eichorst says.

PREDICTIONS INDUCED. The drawback to neural network products is that they are a black box, users say. Data is fed in and results come out, but the tool doesn't report how it comes to its conclusions. And sometimes, the how is as revealing as the what, users say.

An alternative type of predictive modeling tool relies on inductive reasoning algorithms rather than neural networks. It is exemplified by both IDIS Predictive Modeler (IDIS PM) from Information Discovery Inc., in Los Angeles, and SAS Stat, from SAS Institute Inc., in Cary, N.C.

Users say the inductive reasoning method is a better choice for company analysts who have little interest in extremely complex models and would rather have insight into the data itself.

'We need to determine what the data elements are. We want to understand them,' says Ken Zabel, vice president of business development at Customer Focus International Inc. (CFI), in Diamond Bar, Calif. 'We looked at neural network tools, but with some neural networks you can't really understand why certain choices are made.'

CFI builds customer information systems (CIS) for financial institutions. It uses IDIS PM to sort through a client's data before a targeted CIS warehouse can be created.

'We require a product like IDIS to perform affinity analysis to help our banks determine which variables make people have an affinity for purchasing certain products,' Zabel says.

In addition, inductive tools, also known as rule-based or tree-based modelers, may be more appropriate for dealing with data that is not easily quantified, according to vendors.

'Neural network predictors must quantify all the data, even data that isn't naturally quantified. With rule prediction, the data doesn't need to be numeric. It maintains the nature of the data,' explains Diana Lin, manager of application support at Information Discovery.

Lin offers the example of loan payment predictions. If a neural network were to predict how a loan would be paid, either with cash, check, credit card, or fund transfer, it would assign numbers to those options, then offer a numeric prediction that would have to be interpreted. IDIS PM would generate a prediction of the next payment method by name.

USER-UNFRIENDLY. The whole genre of tools is not particularly user-friendly. One factor to consider when shopping for a data mining application is how the data will be fed into the modeler. Some tools, such as IDIS PM, work on a separate workstation but can be attached to a LAN.

'It's a very straightforward process. Using IDIS is very intuitive,' Zabel says. 'We draw a subset of information from the enterprise warehouse into the [IDIS] workstation and we can draw it over the LAN. You can extract information from relational database tables -- prejoined conditions. It's very well structured.'

Other tools, such as HNC's Database Mining Workstation, run on stand-alone machines that cannot automate the burdensome task of dumping data.

'To get ready to use the tool, you've got to prepare an extract [from the data warehouse]. Then you've got to go through the laborious task of organizing it and manipulating data to put it into the Database Mining Workstation,' Esters says.

Beyond the data mining system's physical connection, reviewing the model itself can be tricky and requires, at the very least, a person who excels at mathematical analysis and at best, someone who is trained in statistical analysis.

'You can't just go into data mining saying I'm going to get a packaged tool off the shelf that I'll grab and dump data into,' says Ramin Makili, manager of the knowledge technology group of Andersen Consulting Inc., in Chicago, and a user of DataMind's DataCruncher.

'Once you have the tool you need to explain the models. You've got to have a room full of geek scientists like me,' Makili -- who was a nuclear physicist prior to becoming Andersen's predictive modeling expert -- adds.

Other users agree with Makili.

'This is not intuitive. You have to be analytical,' Eichorst says. 'And you have to be very insightful. You need to be able to look at any two variables to see the correlation.'

OTHER MINING TECHNIQUES. Beyond predictive modelers there exists a group of products that uncover relationships before the hypothesis. These are tools that can be used prior to a predictive modeler to uncover facts about your business you wouldn't think to ask about. The classic example of such exploration is the grocery cart analogy. By using exploration tools, a department store discovered that the two items most commonly found in the same shopping cart were diapers and beer. It then used a predictive modeler to find out which customers were likely to buy diapers and beer to send them marketing materials.

One tool that specializes in such association discovery is Information Discovery's IDIS. This tool, separate from the company's predictive modeler, accesses relational databases directly, via agents, to uncover trends such as market clusters and financial patterns. These patterns can then be modeled for further analysis.

Another tool that performs exploratory analysis is SAS Insight, which belongs in the visualization category. Visualization data modeling tools allow a user to assign colors to variables, which the tool then uses to discover relationships among variables. Again, once relationships are uncovered, further analysis or modeling may be employed.

However, because exploration tools have an unknown return on investment for many applications, they may be most appropriate as the next step after predictive modeling for known needs has been mastered. For instance, if a company had a model to predict attrition, one to predict fraud, and one to predict cross-sales, exploration for more models might be in order.

Users should also be aware of a movement that is beginning to take shape -- the tool suite. Suites combine various technologies and perform multiple forms of mining.

This month, IBM began beta testing its Intelligent Miner data mining development platform, expected to ship in July. The Intelligent Miner combines kernels of several types of mining technologies, including predictive modeling, association discovery, and visualization. It is aimed at corporations that want to develop their own applications. In addition, IBM will offer several targeted applications, which include a customer segmentation application, a market-basket analysis application, and a fraud-detection system.

SAS Institute is also offering a suite that incorporates several data mining tools into its SAS System for Data Warehousing. It has a neural network predictive modeler in development, although the company already offers some neural network capabilities based on SAS macros.

The result is that data mining turns business mysteries into competitive advantages.

'Unlike some other products, we didn't create these applications and go looking for a market,' IBM's Simoudis says. 'The applications we developed have come through our experiences in performing data mining services. Customers would come to us and say, `We have an attrition problem.' or `We need to attract new customers. How can we do that?''

Data mining is the answer.

Julie Bort is a free-lance writer based in Dillon, Colo.

Vendor contact information

HNC Software Inc.
San Diego
(619) 546-8877
http://www.hncs.com

DataMind Inc.
Redwood City, Calif.
(415) 364-5580
http://www.datamindcorp.com

Information Discovery Inc.
Hermosa Beach, Calif.
(310) 937-3600
http://www.datamining.com

SAS Institute Inc.
Cary, N.C.
(919) 677-8000
http://www.sas.com

IBM's World Wide Decision Support Solutions Division
San Jose, Calif.
http://www.dss.ibm.com

Tips for striking it data rich

The best tools in the world won't find you any gems unless you follow a few simple procedures. Here are some tips for mining well:

* Use only scrubbed data. (See 'Scrubbing dirty data,' Dec. 18, 1995, page 1.)

* Have business analysts, statistical analysts, and IT staff on the original application development team. Business analysts help clarify the importance of variables. The tool may scream that a correlation between two items is important, but it may turn out to be a no-brainer. Statisticians can bring understanding to the results. And IT staff can ease the burden of drawing data samples.

* When doing predictive modeling, test the model twice before relying on it. First test the model by feeding it data in a situation with a known outcome. For example, if you're trying to find out which customers might buy a product, use a list containing customers who already bought it and ones who didn't. See if the model points to the correct ones. Then, test the model with a sample promotion. Make the offer to a small sample of the customers indicated by a predictive modeler to see how on target the modeler is with live data.

* Continue to refine the model by feeding it the results of every marketing campaign.

* Add new models gradually as the tool becomes mastered.

* Realize that despite its scientific stance, modeling and all other aspects of data mining are more art than science. How the results of the mining are used will determine the benefits.

Previous 3 Next Top

>~~~Publications:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Mon, 13 May 1996 09:01:00 -0400
From: peter@ai.iit.nrc.ca (Peter Turney)
Subject: Hypertext Bibliographies for Machine Learning

Hypertext Bibliographies for Machine Learning

A new hypertext bibliography (with hyperlinks to the authors
and the papers) on 'Feature Selection' has been added to
the collection of 'Hypertext Bibliographies for Machine Learning'.
Any coments, corrections, or additions are welcome.

Feature Selection
http://ai.iit.nrc.ca/bibliographies/feature-selection.html

Hypertext Bibliographies for Machine Learning
http://ai.iit.nrc.ca/bibliographies/

Sincerely,
Peter Turney, peter@ai.iit.nrc.ca

Previous 4 Next Top

>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Fri, 10 May 1996 18:15:39 +0100
From: jhunter@csd.abdn.ac.uk (Jim Hunter)
Subject: Special Issue of the Journal 'Artificial Intelligence in Medicine'

CALL FOR SUBMISSIONS

Special Issue of the Journal 'Artificial Intelligence in Medicine'

DECISION SUPPORT IN THE OPERATING THEATRE AND INTENSIVE CARE

Guest-Editor: Jim Hunter (University of Aberdeen)

[Apologies if you receive multiple copies of this]

The operating theatre and the intensive care (or therapy) unit (ICU)
provide environments which pose particular challenges for decision support
in clinical medicine. Considerable volumes of clinical patient data are
available there with the possibility of measuring many variables
(particularly those relating to the cardiovascular and respiratory systems)
on a continuous basis. However the provision of so much data to clinicians
and nurses brings its own problems of data overload, false positive alarms
etc. Other challenges include high rates of intervention, obviously
including surgical procedures, but also ventilation, drugs, fluids, etc.
Furthermore it is likely (particularly in the ICU) that the clinical
decision making may be extremely complex, with patients suffering from
multiple system failure.

The last special issue of Artificial Intelligence in Medicine relating to
the ICU was published in 1992 and five years further on it seems
appropriate to review developments over that period. The aim of this
special issue (to appear in 1997) is to bring together papers describing
the use of knowledge to help manage the information available in these data
rich environments. Because of the emphasis on the domain, it is expected
that submissions will tend to be of an applied nature; however papers which
deal with more theoretical aspects can be submitted provided that their
relevance to the domain is clear.

Decision support can be provided in a number of ways, including systems which:

- monitor patient data in the background and raise real-time alarms;
- support the configuration of data displays;
- critique or propose the administration of drugs and other therapeutic
interventions;
- abstract and summarise patient state;
- collate information for later use (e.g. in discharge letters and for audit);
- etc.

These can be based on a number of AI technologies including:

- symbolic reasoning (e.g. rules and frames);
- sub-symbolic reasoning (e.g. neural nets, genetic algorithms);
- qualitative and semi-quantitative models;
- temporal reasoning;
- various ways of dealing with uncertainty;
- planning;
- machine learning;
- natural language generation;
- etc.

These applications and technologies are indicative and not meant to exclude
others.

It is expected that the special issue will contain 4-5 papers of 20-25
manuscript pages each. Manuscripts should be typed on good quality paper of
uniform size (A4 or 8.5 by 11 inches), double-spaced with wide margins of
at least 3 cm. Each manuscript should include the title of the
contribution, the author's or authors' name(s), complete address(es),
e-mail address(es), fax and telephone numbers, an abstract of about 100
words accompanied by a list of a few keywords, consecutively numbered
sections with headings and references. A manuscript should never exceed 40
pages. Complete submission guidelines are available upon request from the
guest editor.

Potential authors should send three copies of their manuscripts by
September 1, 1996 to:

Dr Jim Hunter
Department of Computing Science
University of Aberdeen
King's College
ABERDEEN
AB24 3UE
UNITED KINGDOM

Phone: +44 (0)1224 272287
FAX: +44 (0)1224 273422
email: jhunter@csd.abdn.ac.uk
WWW: http://www.csd.abdn.ac.uk/

It would be helpful to the guest editor if intending contributors could
submit a tentative title and abstract by June 15, 1966.

All manuscripts submitted will be subject to a rigorous review process. It
expected that decisions on acceptance/rejection will be made by December 1,
1996; the final versions of revised papers are due by February 15, 1997. It
is expected that the issue will appear in September 1997.

SUMMARY OF SCHEDULE

June 15, 1996 Submission of tentative title/abstract
September 1, 1996 Submission of manuscripts
December 1, 1996 Notification of acceptance/rejection
February 15, 1997 Submission of final versions of papers
September, 1997 Publication of special issue

A copy of this announcement and complete submission guidelines are
available on the WWW at:

http://www.csd.abdn.ac.uk/~jhunter/AIM_call.html

Previous 5 Next Top

>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Fri, 17 May 1996 13:25:18 +0400
From: sever@eti.cc.hun.edu.tr (Hayri SEVER)
Subject: Special Issue of JASIS on Data Mining
'http://www.usl.edu/~raghavan/JASIS97.html'.

 Call for Papers: A Special Issue of JASIS on Data Mining

Call for Papers

Special Topic Issue of JASIS on Data Mining

The fourteenth Special Topics Issue of _Journal of the American
Society for Information Science_ (JASIS) is scheduled to come out
in late Summer-1997 on the topic of Data Mining. The guest editors
for this special issue will be Vijay Raghavan of the University of
Southwestern Louisiana and Hayri Sever of Hacettepe University, Turkey.

It is estimated that the amount of information in the world doubles
every 20 months; that is, some scientific, government and corporate
information systems are being overwhelmed by a flood of data that
are generated and stored, routinely. These massive amounts of data
exceed human experts' ability to analyze with traditional tools,
though they contain potential gold mine of valuable information.
Unfortunately, the database technology of today offers little
functionality to explore such data. At the same time, knowledge
discovery techniques for intelligent data analysis are not yet
mature enough for large data sets. Therefore, systems with a wide
variety of techniques for automatic (or semi-automatic) discovery of
knowledge from databases will play an increasingly important role.
The data mining, also known as database mining or Knowledge Discovery
in Databases (KDD), is defined to emphasize the challenges of knowledge
discovery in large databases and to motivate researchers and application
developers for meeting that challenge.

Papers in this area are sought. Specific topics of interest include,
but are not limited to, the following:

Theory and Foundational Issues in Data Mining:

Data and knowledge representation for data mining
Probabilistic modeling and uncertainty management in data mining
Metrics for evaluation of data mining results
Fundamental advances in search, retrieval, and discovery methods
Definitions, formalisms, and theoretical issues in data mining

Data Mining Methods and Algorithms:

Algorithmic complexity, efficiency and scalability issues in data mining
Probabilistic and statistical models and methods
Using prior domain knowledge and re-use of discovered knowledge
High dimensional datasets and data preprocessing
Unsupervised discovery and predictive modeling

Applications:

Data mining systems and data mining tools
Application of KDD in business, science, medicine and engineering
Resource and knowledge discovery using the Internet

This list of topics is not intended to be exhaustive but an indication of
typical topics of interest. Prospective authors are encouraged to submit
papers on any topics of relevance to data mining.

Inquiries (by voice, fax, or email) and manuscript submissions (four
copies of full articles) should be addressed to one of the guest editors.
Manuscripts may be submitted in hardcopy, fax, or e-mailed in plain ASCII
format. All manuscripts will be reviewed by a select panel of referees,
and those accepted will be published in a special issue of _JASIS_.
Original artwork and a signed copy of the copyright Release form will
be required for all accepted papers.

IMPORTANT DATES

Manuscripts Due	October 1, 1996
Acceptance Notification	January 15, 1997
Final Manuscripts	March 1, 1997
Publication	Late-Summer 1997

GUEST EDITORS

Professor Vijay Raghavan	Dr. Hayri Sever
Center for Advanced Computer Studies	The Department of Comp. Sc. & Eng.
University of Southwestern Louisiana	Hacettepe University
P.O. Box 44330	06532 Beytepe, Ankara, Turkey
Lafayette, LA 70504	Fax: 90 312/235 4314
Voice: (318) 482-6603	E-mail: Hayri SEVER
Fax: (318) 482-5791
E-mail: Vijay V. RAGHAVAN

Previous 6 Next Top

>~~~Positions:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: hui@dcs.bbk.ac.uk (Xiaohui Liu)
Date: Wed, 15 May 96 22:55:56 BST
To: kdd@gte.com
Subject: EPSRC CASE Research Studentship in Intelligent Data Analysis

BIRKBECK COLLEGE
DEPARTMENT OF COMPUTER SCIENCE
UNIVERSITY OF LONDON

EPSRC CASE Research Studentship in Intelligent Data Analysis

Applications are invited for an EPSRC CASE PhD studentship in Intelligent
Data Analysis (IDA) tenable at the Department of Computer Science, Birkbeck
College. The successful candidate will be expected to work on a joint research
project with Moorfields Eye Hospital in London, in collaboration with the
Institute of Ophthalmology. The project 'Improving Glaucoma Service by
Intelligent Data Analysis' is to start from 1 October 1996 for three years
and is also funded by a research grant from the Moorfields Eye Hospital.

The IDA Group at Birkbeck specialises in the application of computationally
intelligent techniques to data analysis problems. The group has been working
with several external organisations in medicine and industry on a variety of
IDA research projects, funded by government agencies, industrial sponsorships
and charity organisations. This current project follows on from a highly
successful three-year collaboration with the above-mentioned two organisations
on a project entitled 'METRO: Making Eye Test Reliable for Ophthalmologists'.
Exciting research results have been generated and software developed within
the project has been effectively used in a large-scale field investigation
and is being further tested in general practitioners' clinics.

Applicants should have or expect to gain at least a 2(i) in Computer Science,
or an equivalent MSc, and have a good background in Artificial Intelligence,
Databases, or Statistics. Please submit a CV as soon as possible, but not later than 15 June 1996 to Dr X Liu, Department of Computer Science, Birkbeck
College, Malet Street, London WC1E 7HX, UK. Telephone Dr Liu on
(+44) 171-631 6711 or email him (hui@dcs.bbk.ac.uk) if you wish to make an
informal enquiry.

Information regarding this project and research activities of the IDA Group
at Birkbeck can be accessed on the World Wide Web via URL:

http://web.dcs.bbk.ac.uk/~hui/IDA/home.html

Previous 7 Next Top

>~~~Meetings:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: bramerma@cv.port.ac.uk
Date: Sun, 28 Apr 1996 13:33:06 EDT
Subject: Expert Systems 96: Call for Tutorials

BRITISH COMPUTER SOCIETY
SPECIALIST GROUP ON EXPERT SYSTEMS

EXPERT SYSTEMS '96 (ES96)

CALL FOR TUTORIAL & WORKSHOP PROPOSALS

The 16th annual conference of the British Computer Society
Specialist Group on Expert Systems, ES96, is being held at St
John's College, Cambridge between 16th and 18th December 1996.

The SGES Committee invites proposals for Tutorials or
Workshops to be held in conjunction with the Conference.
The tutorials and workshops will be held on the first day,
Monday 16 December 1996.

Proposals for full and half day tutorials or workshops, from
an individual or group of presenters, will be considered. They
may be offered on topics in theoretical and applied AI
relating to: knowledge engineering methods; model-based
reasoning; verification and validation of KBS; constraint
satisfaction; intelligent agents; machine learning; natural
language understanding; case-based reasoning, knowledge
discovery in databases or any other related topics.

Anyone interested in presenting a tutorial or workshop should
submit a proposal to the Conference Secretariat:

Mrs Kit Stones
The Conference Team
17 Spring Road
Kempston
Bedford MK42 8LS
Tel/Fax +44 (0)1234-302490
kstonestct@cix.compulink.co.uk

SUBMISSION
A tutorial or workshop proposal must contain the following
information:

(1) A brief description of the tutorial or workshop, suitable
for inclusion in our promotional literature.

(2) A detailed outline of the tutorial or workshop.

This must be in the following format and adhere to the
following guidelines:

(1) Objectives: What should delegates learn or be able to do
as a result of attending your tutorial or workshop? Have a
small number (often just one) of very general objectives, with
the most important one first. Keep in mind that the intended
audience consists of anyone who would benefit from achieving
the objectives, and has fulfilled any prerequisites.

(2) Content: When determining content, make sure the
correspondence between the content and the objectives is clear
(i.e. how the objectives will be met).

(3) Prerequisites: List any technical skills or background
knowledge that is assumed by the content or will be required
to meet the objectives.

(4) Provisional Timetable: This may need to change to
accommodate tea/coffee breaks, etc.

(5) Special Equipment: Specify any equipment requirements,
indicating whether the Conference organisers would be expected
to meet them. It is unlikely that ES96 can accommodate
practical sessions involving use of computer equipment.

(6) Resume: Include a brief resume of the presenter(s). This
should include: background in the topic area, references to
published work in the topic area. This material, or part of
it, may also be used in Conference promotional material.

(7) Administrative Information: This should include: name,
mailing address, phone number, fax, and email address if
available. In the case of multiple presenters, information for
each presenter should be provided, but one presenter should be
identified as the principal contact.

AFTER ACCEPTANCE
The presenter(s) of accepted tutorials or workshops must
submit a top copy of tutorial or workshop notes for delegates
to the Conference Secretariat not later than 1 November 1996.
If they are not received by this date, the presenters must
bring with them a sufficient number of copies for delegates
attending their tutorial or workshop.

The Conference is willing to pay for reasonable travel and
subsistence for one presenter (per half day tutorial or
workshop) for travel within the UK only. It is up to the
presenter(s) to check this level of subsistence with the
Conference Secretariat before committing to it.

Previous 8 Next Top

>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Fri, 10 May 1996 17:20:16 +0100
From: 'J. Carlos Cubero Talavera' (carlos@goliat.ugr.es)
Subject: Ipmu'96 Final Programme -- http://decsai.ugr.es/ipmu96.html

Dear prof. Piatetsky,

You can find enclosed the final programme of
IPMU'96 International Conference. As you can see in the
topics and in the sessions contents (specially sessions
6A, 9A, 11A, 3B, 14B, 3C, 4C), it is of great interest
for the KDD community, so that we hope it is an important
contribution to the KDD-mail list.

Yours Sincerely.

J.C. Cubero

............... mail contribution to the list begins here ...............

Final programme of Ipmu'96

Apologies if you receive multiple copies of this announcement

==========================================================================
INFORMATION PROCESSING AND MANAGEMENT
OF UNCERTAINTY IN KNOWLEDGE-BASED SYSTEMS

IPMU'96

July 1-5, 1996, Granada (SPAIN)
==========================================================================

==========================================================================
PRESENTATION OF THE CONFERENCE
==========================================================================

Organized at a regular two-year interval, the IPMU International Conference
deals with the difficulties existing in the acquisition, representation,
management and transmission of data in knowledge-based and decision-making
systems. It brings together researches working on various methodologies for
the management of uncertain information and provides a useful exchange
between theorists and practitioners using these different methodologies.

==========================================================================
SECRETARIAT ADDRESS
==========================================================================

IPMU'96
Dpto. Ciencias de la Computacion e Inteligencia Artificial
E.T.S. de Ingenieria Informatica. Avda. Andalucia, 38.
Universidad de Granada.
18071 Granada.
Spain.

Fax: +34.58.243317
Phone: +34.58.244078
E-mail: ipmu96@robinson.ugr.es
URL: http://decsai.ugr.es/ipmu96.html

=========================================================================
LOCATION OF THE CONFERENCE
=========================================================================

Palacio de Exposiciones y Congresos (The Exhibition and Conference Centre)
Paseo de Violon, s/n - 18006 Granada
telf. +34.58.246700
Fax: +34.58.246702

Sessions: Andalucia I, II, III and IV rooms (level 1)

=========================================================================
CHAIRPERSONS
=========================================================================

Bernardette Bouchon-Meunier (CNRS, France)
Miguel Delgado (University of Granada, Spain)
Jose Luis Verdegay (University of Granada, Spain)
Maria Amparo Vila (University of Granada, Spain)
Ronald R. Yager (Iona College, NY, USA)

=========================================================================
HONORARY PRESIDENT
=========================================================================

Lotfi A. Zadeh (University of California, Berkeley, USA)

=========================================================================
CONFERENCE SPONSORED BY
=========================================================================

Ayuntamiento de Granada
CICYT
IFSA (International Fuzzy Systems Association)
Junta de Andalucia
FLAT (Spanish Association for Fuzzy Logic and Technologies)
Universidad de Granada

=========================================================================
WITH THE FINANCIAL SUPPORT OF
=========================================================================

Ayuntamiento de Granada
Caja General de Ahorros de Granada
CICYT
Informatica Valenzuela
Junta de Andalucia
Universidad de Granada

=========================================================================
ORGANIZING COMMITTEE
=========================================================================

S. Moral (President) A. Blanco
J.L. Castro J.C. Cubero
A. Gonzalez F. Herrera
M.T. Lamata J.M. Medina
O. Pons I. Requena
J.M. Zurita

=========================================================================
INTERNATIONAL PROGRAM COMMITTEE
=========================================================================

J. Aczel (Canada) J. Aguilar-Martin (France)
J. Baldwin (UK) S. Barro (Spain)
A. Blanco (Spain) H. Berenji (USA)
J. Bezdek(USA) P. Bonissone (USA)
P. Bosc (France) J.L. Castro(Spain)
D. Dubois (France) F. Esteva (Spain)
M. Fedrizzi (Italy) M.A. Gil (Spain)
A. Gonzalez (Spain) S. Guiasu (Canada)
J. Gutierrez (Spain) F. Herrera (Spain)
K. Hirota(Japan) J. Jacas (Spain)
J.Y. Jaffray (France) J. Kacprzyk (Poland)
A. Kandel (USA) E.P. Klement (Austria)
G. Klir (USA) R. Kruse (Germany)
M.T. Lamata (Spain) H.L. Larsen (Denmark)
S.L. Lauritzen (Denmark) R. Lopez de Mantaras (Spain)
R. Marin (Spain) J. Montero (Spain)
S. Moral (Spain) H. Nguyen (USA)
S. Ovchinnikov(USA) J. Pearl (USA)
H. Prade(France) I. Requena (Spain)
A. Rocha (Brazil) E. Ruspini (USA)
A. Sage (USA) E. Sanchez (France)
G. Shafer (USA) P. Shenoy (USA)
P. Smets (Belgium) M. Sugeno (Japan)
S. Termini (Italy) A. Titli (France)
E. Trillas (Spain) I.B. Turksen (Canada)
R. Valle (France) L. Valverde (Spain)
T. Yamakawa (Japan) H.J. Zimmermann (Germany)

=========================================================================
INVITED SPEAKERS
=========================================================================

Opening Session:
Lecture by L.A. Zadeh:
'The Central Role of Information Granularity in Human
Reasoning and Concept Formation'

Plenary talks:

Prof. T. Terano (Japan). Kampe de Feriet Award.
'Fuzzy Systems. Support of Intelligent Activity'

Prof. A. F. Rocha (Brazil).
'Toward a Theory of Molecular Computing'

Prof. H.J. Zimmermann (Germany).
'Engineering Applications of Fuzzy Technology'

Prof. P. Bonissone (U.S.A.)
'Soft Computing for Train Handling'

Prof. R. Scozzafava (Italy)
'Processing uncertainty in a coherent conditional
probability setting: a survey'

=========================================================================
L.A. ZADEH. DR. HONORIS CAUSA
=========================================================================

Professor L.A. Zadeh will be investited Dr. Honoris Causa by the
University of Granada on June 28th 1996.

Previous 9 Next Top

>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Wed, 15 May 1996 19:48:49 +1000
From: ISIS conference (isis@cs.monash.edu.au)
To: kdd@gte.com
Subject: Call for Participation for ISIS
http://www.cs.monash.edu.au/~jono/ISIS/ISIS.shtml
(yes, it is '.shtml', not a typo)

ISIS CONFERENCE: INFORMATION, STATISTICS AND INDUCTION IN SCIENCE

*** Call for Participation ***

Old Melbourne Hotel
Melbourne, Australia
20-23 August 1996

INVITED SPEAKERS:

Henry Kyburg, Jr. (University of Rochester, NY)
Marvin Minsky (MIT)
J. Ross Quinlan (Sydney University)
Jorma J. Rissanen (IBM Almaden Research, San Jose, California)
Ray Solomonoff (Oxbridge Research, Mass)

This conference will explore the use of computational modeling to
understand and emulate inductive processes in science. The problems
involved in building and using such computer models reflect
methodological and foundational concerns common to a variety of
academic disciplines, especially statistics, artificial intelligence
(AI) and the philosophy of science. This conference aims to bring
together researchers from these and related fields to present new
computational technologies for supporting or analysing scientific
inference and to engage in collegial debate over the merits and
difficulties underlying the various approaches to automating inductive
and statistical inference.

About the invited speakers:

Henry Kyburg is noted for his invention of the lottery paradox (in
'Probability and the Logic of Rational Belief', 1961) and his research
since then in providing a non-Bayesian foundation for a probabilistic
epistemology.

Marvin Minsky is one of the founders of the field of artificial
intelligence. He is the inventor of the use of frames in knowledge
representation, stimulus for much of the concern with nonmonotonic
reasoning in AI, noted debunker of Perceptrons and recently the
developer of the 'society of minds' approach to cognitive science.

J. Ross Quinlan is the inventor of the information-theoretic approach
to classification learning in ID3 and C4.5, which have become
world-wide standards in testing machine learning algorithms.

Jorma J. Rissanen invented the Minimum Description Length (MDL)
method of inference in 1978, which has subsequently been widely
adopted in algorithms supporting machine learning.

Ray Solomonoff developed the notion of algorithmic complexity in 1960,
and his work was influential in shaping the Minimum Message Length
(MML) work of Chris Wallace (1968) and the Minimum Description Length
(MDL) work of Jorma Rissanen (1978).

=========================
Tutorials (Tue 20 Aug 96)
=========================

10am - 1pm:
Tutorial 1: Peter Spirtes 'Automated Learning of Bayesian Networks'
Tutorial 2: Michael Pazzani 'Machine Learning and Intelligent Info Access'
2pm - 5pm:
Tutorial 3: Jan Zytkow 'Automation of Scientific Discovery'
Tutorial 4: Paul Vitanyi 'Kolmogorov Complexity & Applications'

About the tutorial leaders:

Peter Spirtes is a co-author of the TETRAD algorithm for the induction
of causal models from sample data and is an active member of the
research group on causality and induction at Carnegie Mellon University.

Mike Pazzani is one of the leading researchers world-wide in machine
learning and the founder of the UC Irvine machine learning archive.
Current interests include the use of intelligent agents to support
information filtering over the Internet.

Jan Zytkow is one of the co-authors (with Simon, Langley and Bradshaw)
of 'Scientific Discovery' (1987), reporting on the series of
BACON programs for automating the learning of quantitative scientific
laws.

Paul Vitanyi is co-author (with Ming Li) of 'An Introduction to
Kolmogorov Complexity and its Applications (1993) and of much
related work on complexity and information-theoretic methods of
induction. Professor Vitanyi will be visiting the Department
of Computer Science, Monash, for several weeks after the
conference.

A limited number of free student conference registrations or tutorial
registrations will be available by application to the organizers in
exchange for part-time work during the conference.

Program Committee:
Hirotugu Akaike, Lloyd Allison, Shun-Ichi Amari,
Mark Bedau, Jim Bezdek, Hamparsum Bozdogan, Wray Buntine,
Peter Cheeseman, Honghua Dai, David Dowe, Usama Fayyad, Doug
Fisher, Alex Gammerman, Clark Glymour, Randy Goebel, Josef
Gruska, David Hand, Bill Harper, David Heckerman, Colin
Howson, Lawrence Hunter, Frank Jackson, Max King, Kevin Korb,
Henry Kyburg, Rick Lathrop, Ming Li, Nozomu Matsubara,
Aleksandar Milosavljevic, Richard Neapolitan, Jon Oliver,
Michael Pazzani, J. Ross Quinlan, Glenn Shafer, Peter Slezak,
Padhraic Smyth, Ray Solomonoff, Paul Thagard, Neil Thomason,
Raul Valdes-Perez, Tim van Gelder, Paul Vitanyi, Chris
Wallace, Geoff Webb, Xindong Wu, Jan Zytkow.

Inquiries to:
isis96@cs.monash.edu.au
David Dowe (chair): dld@cs.monash.edu.au
Kevin Korb (co-chair): korb@cs.monash.edu.au or
Jonathan Oliver (co-chair): jono@cs.monash.edu.au

Detailed up-to-date information, including registration costs and further
details of speakers, their talks and the tutorials is available on the WWW at:
http://www.cs.monash.edu.au/~jono/ISIS/ISIS.shtml

- David Dowe, Kevin Korb and Jon Oliver.

Previous 10 Next Top

>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~