KDD Nugget 94-6, mailed 1994/04/01 Contents: * G. Piatetsky-Shapiro: FAQ for KDD Nuggets -- proposal * S. Nishisato: Two-Day Workshop on Analysis of Categorical Data * R. Zicari: AWARDS AT OBJECT WORLD GERMANY `94 -- CFP * R. McGregor: Loch Ness radar patterns confirm existence of Monster ********Quote of the month: ***************************************** You cannot get to the information superhighway, until you shovel out your information driveway (Jeff McNealy, Pluggers comic strip) ********************************************************************** The KDD Nuggets is a moderated list for the exchange of information relevant to Knowledge Discovery in Databases (KDD), e.g. application descriptions, conference announcements, tool reviews, information requests, interesting ideas, outrageous opinions, etc. Contributions to kdd@GTE.COM ; Add/delete requests to kdd-request@GTE.COM . -- Gregory Piatetsky-Shapiro (moderator) ------------------------------------ Date: Thu, 31 Mar 94 From: gps@gte.com (Gregory Piatetsky-Shapiro) Subject: FAQ for KDD Nuggets I have received a number of requests for the FAQ of this mailing list. Below, I list the most frequent questions and my initial answers. Any comments, corrections, etc are welcome. With your help, I hope to compile the initial draft of the FAQ list to be posted to other mailing lists. It will also be soon available via FTP, Gopher, or Mosaic, along with the archive of Nuggets. **********Questions ***************** Definitions: 1.0 What is Knowledge Discovery in Databases (KDD), Data Mining, etc? 1.1 What is the difference between KDD and Machine Learning ? Publications: 2.1 Research-oriented 2.1.1 Overviews 2.1.2 Collections and Books 2.2 Applications-oriented Tools: 3.0 What tools are available for Knowledge Discovery ? ==3.1 Generic Tools ===3.1.1 Generic Classification Tools ====3.1.1.1 Generic Classification: Decision-tree approach ====3.1.1.2 Generic Classification: Neural network approach ====3.1.1.3 Generic Classification: Other approach ===3.1.2 Generic Tools: Deviation Detection ===3.1.3 Generic Tools: Clustering ===3.1.4 Generic Tools: Visualization ===3.1.5 Generic Tools: Statistics ===3.1.6 Generic Tools: Other methods ===3.1.9 Generic Integrated Tools: ==3.2 Domain specific tools ************ Initial Answers ***************************** ==1.0 What is Knowledge Discovery in Databases (KDD), Data Mining, etc? The notion of Knowledge Discovery in Databases (KDD) has been given various names, including data mining, knowledge extraction, data pattern processing, data archaeology, information harvesting, siftware, and even (when done poorly) data dredging. Whatever the name, the essence of KDD is the {\em nontrivial extraction of implicit, previously unknown, and potentially useful information from data} (Frawley et al 1992). KDD encompasses a number of different technical approaches, such as clustering, data summarization, learning classification rules, finding dependency networks, analyzing changes, and detecting anomalies (see Matheus et al 1993). ==1.1 What is the difference between Data Mining and Machine Learning ? Knowledge Discovery in Databases (Data Mining) and the part of Machine Learning dealing with learning from examples overlap in the algorithms used and in problems addressed. The main differences are 1) Knowledge Discovery (KDD) is concerned with finding *understandable* knowledge, while ML is concerned with improving performance of an agent. So training a neural network to balance a pole is part of ML, but not of KDD. However, there are efforts to extract knowledge from neural networks which are very relevant for KDD. 2) KDD is concerned with very large, real-world databases, while ML *typically* (but not always) looked at smaller data sets. So efficiency questions are much more important for KDD. 3) ML is a broader field which includes not only learning from examples, but also reinforcement learning, learning with teacher, etc. So one can say that KDD is that part of ML which is concerned with finding *understandable* knowledge in large sets of real-world examples. ********************* ==2.0 Publications ===2.1 Research-oriented ====2.1.1 Overviews of KDD C. Matheus, P. Chan, G. Piatetsky-Shapiro, Systems for Knowledge Discovery in Databases, IEEE Transactions on Knowledge and Data Engineering, 5(6), Dec. 1993. W. Frawley, G. Piatetsky-Shapiro, and C. Matheus, 1992. Knowledge Discovery in Databases: An Overview. AI Magazine, Fall 1992. Reprint of the introductory chapter of {\em Knowledge Discovery in Databases} collection, AAAI/MIT Press, 1991. M. Holsheimer, A. Siebes, DATA MINING: The Search for Knowledge in Databases, obtained by anonymous ftp. URL ftp://ftp.cwi.nl/pub/CWIreports/AA/CS-R9406.ps.Z ftp ftp.cwi.nl Name (ftp.cwi.nl:marcel): ftp 331 Guest login ok, send ident (your e-mail address) as password. Password: ftp binary ftp cd pub/CWIreports/AA ftp get CS-R9406.ps.Z ftp bye ====2.1.2 Research-oriented Collections and Books Rough Sets and Knowledge Discovery, W. Ziarko, editor, Springer-Verlag, 1994. Special Issue on Learning and Discovery in Databases, IEEE Transactions on Knowledge and Data Engineering, N. Cercone and M. Tsuchiya, guest editors, 5(6), Dec 1993 KDD-93: Proceedings of AAAI-93 Knowledge Discovery in Databases workshop, G. Piatetsky-Shapiro, editor, AAAI Press technical report WS-02, July 1993 Special Issue on Machine Discovery, Machine Learning Journal, Jan Zytkow, guest editor, 12(1-3), 1993. Special issue on Knowledge Discovery in Databases and KnowledgeBases, International Journal of Intelligent Systems, Vol 7, no, 7, Sep 1992, G. Piatetsky-Shapiro, guest editor. G. Piatetsky-Shapiro and W. Frawley, 1991. Editors, {\em Knowledge Discovery in Databases}, Cambridge, Mass.: AAAI/MIT Press. ===2.2. Application-oriented publications L. Lewinson, "Data Mining: Intelligent Technology Gets down to Business", PC AI Nov - Dec, 1993, pp. 17-23. K. Parsaye and M. Chignell, 1993. Intelligent Database Tools & Applications, John Wiley. W. H. Inmon and S. Osterfelt, 1991. {\em Understanding Data Pattern Processing: the key to Competitive Advantage}. QED Technical Publishing Group, Wellesley, MA. **************** ==3.0 What tools are available for Knowledge Discovery ? (I have a detailed information on most of this tools and will post this file separately later. GPS) For now I propose this tools classification. ==3.1 Generic Tools ===3.1.1 Generic Classification Tools ====3.1.1.1 Generic Classification: Decision-tree approach C4.5 Angoss Knowledge Seeker IND XPERTRule OC1 ====3.1.1.2 Generic Classification: Neural network approach @Brain AIM BrainMaker Database Mining Software ModelWare N-train ====3.1.1.3 Generic Classification: Other approach Datalogic/R DISCOVER-IT Information Harvesting IXL/IDIS Nextra PC-MARS ===3.1.2 Generic Tools: Deviation Detection EXPLORA ===3.1.3 Generic Tools: Clustering AUTOCLASS Data Mariner ===3.1.4 Generic Tools: Visualization NetMap ===3.1.5 Generic Tools: Statistics ===3.1.6 Generic Tools: Other methods CrossTarget FOIL 6.0 ===3.1.9 Generic Integrated Tools: RECON EMERALD ==3.2 Domain specific tools -------------------------- Date: Mon, 21 Mar 1994 20:18:13 -0500 (EST) From: S_NISHISATO@oise.on.ca Subject: Two-Day Workshop on Analysis of Categorical Data Dear Colleagues: The fourteenth annual workshop on dual scaling (also known by such names as correspondence analysis, homogeneity analysis, optimal scaling, additive scoring) is slated for May 30 and 31 this year. As before, this will be an introductory and expository workshop on multidimensional analysis of contingency tables, multiple-choice data, sorting data, paired comparison data, rank-order data, successive categories (rating) data and multi-way data. It would be appreciated if you could draw attention of those who you consider may benefit from this workshop. I would be happy to answer any questions you may have. I can be reached at IN%"snishisato@oise.on.ca", (416) 923-6641, extension 2696, or (416) 926-4725 (fax). Shizuhiko Nishisato ************************************************************** Analysis of Categorical Data by Dual Scaling DATES Monday, May 30, 1994 (9:00-4:00) Tuesday, May 31, 1994 (9:00-3:00) PLACE The Ontario Institute for Studies in Education (OISE) 252 Bloor Street West, Toronto, Ontario, Canada M5S 1V6 TOPICS An Introduction to Dual Scaling/Correspondence Analysis (1) Categorical Data and Quantification...necessary basics (2) Multidimensional Analysis and Interpretation (3) Working Knowledge through Illustrative Examples FEE $125 Canadian ($100 US) Regular $30 Canadian ($25 US) Full-time Students with ID The fee does not include the text book: "Elements of Dual Scaling: An Introduction to Practical Data Analysis" Hiilsdale, NJ: Lawrence Erlbaum, 1994. (The book will be available at the workshop at a special discount price unless you already have one). STAFF Shizuhiko Nishisato (Professor and workshop leader) Merlin W. Wahlstrom (Professor & Chair, Guest Speaker: "Applications of Dual Scaling: A User's Perspective" Graduate students (workshop assistants). WORKSHOP Lectures with many examples, and hands-on experience with data analysis. No prior knowledge is assumed. APPLY TO S. Nishisato, OISE (see the mailing address above; email snishisato@oise.on.ca; fax (416) 926-4725; telephone (416) 923-6641, extension 2696), with (1) your name, (2) affiliation and title, (3) mailing address (4) telephone number (and email address and fax number, if available), and (5) cheque for the tuition. REFUND Full refund until noon, May 23: $10 service charge thereafter. LODGING Apply directly to Victoria University (416) 585-4524, (fax) (416) 585-4530, 7 minutes walk to OISE Adults: single $42, twin $60/night (with breakfast) Students/Seniors: single $35, twin $50/night (with breakfast) * one-night deposit is required For Baseball Fans! How about a game of the World Champion Toronto Blue Jays against the invincible Oakland A's at 7:35 p.m., Monday 30, at the SkyDome? We will select thirty (30) lucky winners of free tickets at noon, May 30, at the workshop. Why won't you join us? ******************************************************************* ------------------------------------ From: zicari@informatik.uni-frankfurt.de Subject: oaa Date: Sat, 19 Mar 94 23:53:16 MEZ FIRST ANNUAL OBJECT APPLICATIONS AWARDS AT OBJECT WORLD GERMANY `94 Call for Participation Kronberg, Germany-- February 21, 1994-- OBJECT WORLD GERMANY, to take place September 27-29, 1994 in Frankfurt, Intercontinental Hotel, will present the launch of the first annual Object Applications Awards for best end-user applications developed using Object Technology (OT). The awards are sponsored by the Object Management Group (OMG), and COMPUTERWOCHE. Awards will be given out in five categories, specifically to honor user applications and pioneering implementations of the technology. A distinguished panel of judges, chaired by Professor Roberto Zicari of the Johann Wolfgang Goethe University of Frankfurt, will determine the winners through a nomination process. Nominations must come from user organizations for internally developed applications not for commercial sale. Production applications or documented and demonstrable working prototypes are acceptable. Entrants to the Object Applications Awards must call for an official Entry Kit. All entries must be received by July 15, 1994. Entrants will be asked to detail goals of the application, the approach used, problems encountered, and key benefits realized conforming to the one of the categories. For your Entry Kit call or fax: Object Applications Awards, c/o LogOn Technology Transfer GmbH, Burgweg 14a, D-61476 Kronberg (Ts.), Germany. Tel. +49-6173-2852, Fax. +49-6173-94 04 20. ------------------------------------ From: rmcgregor@nessie.aberdeen.ac.uk (Rob Roy McGregor) Subject: Loch Ness radar patterns confirm existence of Monster Date: Friday, 1 April 94 13:53:16 MEZ The local paper Aberdeen News reports that the analysis of the radar echo data, collected over the last year at the famous Loch Ness lake, third deepest in the world, has revealed certain patterns that could be the final proof that the famous monster does exist. The periods of activity increased significantly in the morning, then were reduced between 11 am and 2 pm, and increased again between 2 pm and 5 pm. In the view of Prof. Andrew McGregor of the Edinburgh University, this means that Nessie (as the monster is affectionately known to locals) is taking a long lunch break.