KDD Nugget 95:32, e-mailed 95-12-15 Contents: News: * GPS, KDD-96 Best Paper awards announcement * Ed Colet, Data mining and Advanced Scout in the press... * G. John, IBM Data Mining pointers http://booksrv2.raleigh.ibm.com/cgi-bin/bookmgr/bookmgr.cmd/BOOKS/datamine/ http://www.almaden.ibm.com/cgi-bin/stss/get/data-mining/ * W. Buntine, important address change for Wray Buntine * V. Kamp, Question: KDD for FORECASTING-Systems? Siftware: * T. Rauber, TOOLDIAG 2.0: Pattern Recognition toolbox available http://www.uninova.pt/~tr/home/tooldiag.html Positions: * Ed Colet, Part-time positions working with IBM and NBA Meetings: * J. Valentine, Data Mining seminar, London, 25-26 April 1995 http://www.demon.co.uk/unicom * N. Zagoruiko, INPRIM-96 Congress, Novosibirsk, Russia, June 1996, section on Discovery -- The KDD Nuggets is a moderated mailing list focusing on Data Mining and Knowledge Discovery in Databases (KDD) research and development. Contributions are welcome and should be emailed, with a DESCRIPTIVE subject line (and a URL, when available) to . E-mail add/delete requests to . Nuggets frequency is approximately weekly. Back issues of Nuggets, a catalog of S*i*ftware (data mining tools), and a wealth of other information on Data Mining and Knowledge Discovery is available at Knowledge Discovery Mine site, URL . -- Gregory Piatetsky-Shapiro (moderator) ********************* Official disclaimer *********************************** * All opinions expressed herein are those of the writers (or the moderator) * * and not necessarily of their respective employers (or GTE Laboratories) * ***************************************************************************** ~~~~~~~~~~~~ Quotable Quote ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ "The most exciting phrase to hear in science, the one that heralds new discoveries, is not 'Eureka!' (I found it!) but 'That's funny ...'" Isaac Asimov >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Tue, 12 Dec 1995 17:28:43 -0500 From: gps@gte.com (Gregory Piatetsky-Shapiro) Subject: KDD-96 Best paper awards KDD-96 Best Paper Awards: Awards will be presented for the best research and the best application papers, selected by the program committee. The winning papers will also be invited for publication in the forthcoming Data Mining and Knowledge Discovery Journal. >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Mon, 11 Dec 1995 15:19:44 -0500 (EST) From: Ed Colet Subject: Data mining and Advanced Scout in the press... To: kdd@gte.com Recent press coverage of "Advanced Scout" software demonstrates the benefits of data mining. The Washington Post and US News & World Report have published stories on "Advanced Scout", an application developed by IBM Research that applies data mining to NBA (National Basketball Association) game data. The articles have made the following points: ---------------------------------------------------------------- Washington Post, October 29, 1995. "NBA's Computer Revolution: Some teams going high-tech in analyzing statistics" NBA assistant coaches are using an advanced computer technology called Advanced Scout, which uses data mining to sift through mounds of numbers to find statistical patterns that can help coaches plot game plans and strategy. "Advanced Scout", allows coaches to organize and interpret game stats, telling them what happens when a certain lineup combination plays together, how effective a scorer someone is in certain stretches of playing time or how well an opposing player shoots when different players guard him. A coach could figure out many of the answers himself, but finding them could consume hours poring over video, play-by-play and other game stats. And the program can unearth patterns that never occurred to a coach. It takes the game to a deeper level of analysis...with the potential to revolutionize the game. ------------------------------------------------------------------- US News & World Report, December 11, 1995. "Basketball's new high-tech guru: IBM software is changing coaches' game plans." By crunching reams of statistics at warp speed, the innovative new program helps coaches analyze their teams' performance....it's the latest high-tech weapon coaches are wielding to sharpen their teams' competitive edge. Bob Salmi, assistant coach of the New York Knicks points out that 'Advanced Scout' "allows you to quickly ask questions of your team data and automatically identify patterns that may cause wins and losses" Advanced Scout is more sophisticated and intuitive than most software pro sports teams use today. The program, for instance can respond to a general question such as "Who is the best shooter in a game, and under what circumstances?" The ability to respond to general questions is based on data mining technology...the technique has been used in other fields such as market research and oil and gas exploration...until recently the analysis of the data was performed by an expert. ----------------------------------------------------------------------- The above articles underscore the notion that data mining technology can be useful to persons without advanced training in computer science or data analysis but are active in domains in which there is a lot of available data, such as the domain of a professional basketball coach. These people, experts in their field, can now easily apply complex data analytic methods to further help them perform their tasks. Data mining involves the use of computer technology to help users sift through data (often large databases) looking for nuggets of valuable, hidden knowledge. In the past, such sifting has been largely a manual process, one done by data analysts educated in disciplines like statistics and trained in the use of statistical software packages. By automating that process substantially, IBM's data mining technology can not only help data analysts, but can also be used to create tools such as "Advanced Scout" which do not require that the user have a background in mathematics or computer science. Inderpal Bhandari, the IBM researcher who created "Advanced Scout", feels that such tools can help us cope with the explosion of data in our society. "They will be usable by a wide variety of users in fields as diverse as basketball or retail marketing or personal computing.", he said. "The technology has the potential to make a broad societal impact". >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From: gjohn@CS.Stanford.EDU Date: Wed, 6 Dec 1995 15:54:23 -0800 To: kdd%eureka@gte.com (KDD Nuggets Moderator) Subject: IBM Data Mining URLs Content-Type: text Content-Length: 1590 Gregory, in the last issue (95:31) of KDD Nuggets you wrote a short note giving an address for an IBM white paper on data mining, but there was a small typo in the address. The correct address for that document is http://booksrv2.raleigh.ibm.com/cgi-bin/bookmgr/bookmgr.cmd/BOOKS/datamine/ ^^^ (sorry for the typo in Nuggets 95:31 -- GPS) Web surfers interested in data mining at IBM should also be directed to the Data Mining group's home page http://www.almaden.ibm.com/cgi-bin/stss/get/data-mining/ Our pages describe data mining, career opportunities in our group, data mining technologies being developed at IBM, data mining presentation slides, and data visualization. There's also an interactive example of the use of data mining on retail data. There are other web pages on data mining at IBM that have not yet been linked into the main page above. These other pages may be found by going to http://www.ibm.com , clicking on "Search" at the bottom of the page, and searching for (obviously) data mining. __________________________________________________________________________ George H. John Ph.D. Candidate Senior Analyst, Data Mining Computer Science Dept Stanford University IBM Almaden Research Ctr Stanford, CA 94305-9010 gjohn@cs.stanford.edu gjohn@almaden.ibm.com (415) 497-6986 http://robotics.stanford.edu/~gjohn (408) 927-2088 __________________________________________________________________________ >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Fri, 8 Dec 95 09:59:32 PST From: Wray Buntine To: kdd@gte.com Subject: important address change for Wray Buntine Dear Colleagues Please note my new address (postal, email, telephone, fax, web) below. Mail to my old address at Heuristicrats HAS NOT been getting through, hence the need to broadcast this notice. Although mail to my old address at NASA (where I left in May) is still getting through. PLEASE update all your address lists, AND resend any important mail you might have sent me in the last month to my new address below. At the new company Thinkbank I will be available for consulting, and will be developing data analysis tools. Wray Buntine +1 (415) 328 8897 [voice] Thinkbank, Inc. +1 (510) 540-6080 [office] 1678 Shattuck Avenue, Suite 320 +1 (510) 540-6627 [fax] Berkeley, CA 94709 wray@Thinkbank.COM PS. Heuristicrats is involved in an internal dispute involving at least four lawyers. All employees have been fired or discharged. I may stand to loose an SBIR research award I won. >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Mon, 11 Dec 1995 13:38:02 +0100 From: kamp@polaris.OFFIS.uni-oldenburg.de (Vera Kamp) Subject: Question: KDD for FORCASTING-Systems? Content-Type: text/plain; charset="us-ascii" Content-Length: 774 Dear Gregory Piatetsky-Shapiro, I would like to ask you as the moderator of the KDD-mailing list if you could place my above question within the KDD-Nuggets. Actually we have a project concerning a Forcasting-System for a big bakery company to minimize the back-coming deliveries. We would like to know if KDD techniques are (or could) be used for a forcasting task like this. Thank you in advance Sincerly Vera Kamp ****************************************************************** Dipl.-Inform. Vera Kamp Telefon: 0441/ 9722-132 Universitaet Oldenburg Uni-Sekr.: -201 FB Informatik Fax: -202 Escherweg 2 D-26121 Oldenburg email: kamp@informatik.uni-oldenburg.de >~~~Siftware:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Tue, 12 Dec 1995 14:31:42 +0100 Sender: "Classification, clustering, and phylogeny estimation" From: "Thomas W. Rauber" Subject: TOOLDIAG 2.0: Pattern Recognition toolbox available + HOMEPAGE X-Cc: Mark.Kantrowitz@cs.cmu.edu, fmurtagh@eso.org, hirtle+@pitt.edu [ Archive maintainers: please update! ] Dear researchers Please note the following announcement about a software toolbox for the analysis of multidimensional data. C-source and documentation included. KEYWORDS: Pattern recognition, multivariate analysis, supervised learning, feature selection, error estimation, classification. TOOLDIAG is a collection of methods for statistical pattern recognition. The main area of application is classification. The main capabilities of the program are: - Different classifier architectures KNN, QGC, RBF, Parzen, Q* - Feature selection + Search strategies: BF, SFS, SBS, B&B, Exhaustive + Selection criteria: Minimum error, probabilistic distance, inter-class distance - Feature extraction PCA, LDA, Sammon - Supervised learning of a classifier - Error estimation LOO, K-fold cross validation, Resubstitution, Holdout, Bootstrap - Normalization - Graphical Interface to the GNUPLOT program For a more detailed description look up the URLs ++++ TOOLDIAG WWW HOMEPAGE +++++ http://www.uninova.pt/~tr/home/tooldiag.html - or directly - ftp://ftp.uninova.pt/users/tr/soft/tooldiag.README Best regards Dr. Thomas W. Rauber tr@uninova.pt Dept. of Electrical Eng., http://www.uninova.pt/~tr Universidade Nova de Lisboa & UNINOVA Phone: (+351) (1) 3500-241 2825 Monte Caparica, PORTUGAL Fax: (+351) (1) 294-1253 >>>>>>>>>>> New address after 1.3.1996 <<<<<<<<<<<<<< Dr. Thomas W. Rauber tr@inf.ufes.br Departamento de Informatica http://www.inf.ufes.br/~tr Universidade Federal do Espirito Santo Phone: (+55) (27) 335-2654 Av. Fernando Ferrari, Vitoria, ES, BRASIL Fax: (+55) (27) 335-2650 >~~~Positions:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Mon, 11 Dec 1995 15:16:42 -0500 (EST) From: Ed Colet Subject: Advanced Scout position posting... To: kdd@gte.com Cc: ecolet@watson.ibm.com, Ed Colet Mime-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Content-Length: 1861 Attention College students - Would you like a part-time internship which involves working with the NBA and IBM ? National Media Group (NMG, the marketing arm of the National Basketball Coaches Association) is looking for college students to collaborate with the group from IBM's T.J. Watson Research Center working on the highly visible "Advanced Scout" data mining project. "Advanced Scout" is a program that applies data mining technology to help coaches find interesting trends and patterns in game data. Responsibilities: The part time opportunity (approx 5-10 hours per week) involves the following two major responsibilities: - Work closely with a local NBA team that's using "Advanced Scout" to mine their game data. Experience the process of mining game data by acting as an assistant to the coaching staff. Benefit from interactions with the research team at IBM's T.J. Watson Research Center, world-famous for its science and technology. - Work closely with a local school to introduce a group of middle school students to computers and technology as part of an education outreach program which aims to enrich the academic program of students in inner-city schools. The students will interact with the coaching staff via their joint use of "Advanced Scout". Requirements: - Outstanding computer skills with PC's, DOS/Windows, and the Internet. - Excellent verbal and written communication skills. - A deep interest in basketball and in working with kids. - Database experience a plus. - Its best if you are located in a city with an NBA team. To apply: Please send a cover letter, resume, a reference letter from your academic advisor or a professor, and any additional material to: Jon Levine National Media Group, Inc. P.O.Box 20923 New York, NY 10023 attention: Advanced Scout Project. >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Wed, 6 Dec 1995 17:47:17 GMT From: Julie Valentine Subject: Data Mining seminar, 25-26 April 1995 DATA MINING '96 LONDON, 25-26 APRIL 1996 Sponsored by: BCS SGES AI Intelligence SPSS (to be confirmed) Further information URL:http://www.demon.co.uk/unicom Background and Objectives Many organizations have collected large amounts of data recording their past activities. Buried within these databases is knowledge, from which can be learnt important lessons which, in turn, can be exploited to improve future performance. The extraction of this knowledge, often in the form of a number of rules which describes how one or more fields are related to other fields, is known as data mining or KDD (Knowledge Discovery in Databases). The techniques used in KDD exploit some of the most recent research in artificial intelligence and machine learning. A fundamental purpose of this Seminar is to gather together both academics and representatives from industry in order to review the current techniques and to discuss their practical application. Industry and commerce have begun to see the potential of these techniques and have started to exploit them in a wide range of applications such as market segmentation, risk analysis, credit rating and customer profiling. Data mining techniques have also been used by social service departments and there is huge potential for medical data mining. Case studies for a wide range of applications will be presented. Anyone wishing to apply these tools needs to be aware of the availability and use of data mining toolkits. There are a number on the market and some software available on the Web. These will be assessed both in case studies and in comparative studies. Topics covered will thus include: Database manipulation Visualisation Techniques Statistical analysis Clustering Fuzzy Reasoning Neural Nets Heuristic Methods Case Studies Data Mining Methodologies and Tools Rule Induction Who should attend? A primary goal of the conference is to bring together researchers and end-users. The latest results in data mining will be presented and assessed for their practicality. Typical delegates will be * Academic Researchers * Toolkit Developers * Data Analysts * Management Consultants * Database Administrators Organising Committee: Professor V J Rayward Smith, University of East Anglia (Chair) Professor Sally McClean, University of Ulster Mr Ken Totton, BT Mr Tony Bowden, Tony Bowden & Associates Mr Colin Shearer, Integral Solutions Keynote Speakers invited: Professor Gregory Piatetsky-Shapiro, GTE Laboratories, USA Dr Arno Siebes, CWI, Amsterdam Other Speakers agreed Dr Robert Milne, Intelligent Applications Dr Willi Kloesgen, GMD Attendance fees 1 day : 395; 2 days: 695 Academic Discounts on application There is a reduced fee for speakers of 200/2 days If you wish to speak at the conference, attend as a delegate, exhibit products or services, or contribute a written paper for the proceedings, please contact Julie Valentine : jvalentine@unicom.demon.co.uk -- Julie Valentine >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Return-Path: Date: Sun, 10 Dec 95 04:37:00 +0600 From: "Nikolay G. Zagoruiko" To Dr. Gregory Peatetsky-Shapiro Dear Dr. Piatetsky-Shapiro, From the letter applied you will learn about the INPRIM-96Congress to be held in Akademgorodok (Novosibirsk) at the end ofJune 1996. This is a significant event. At the 1-st Congress, about2500 reports and papers have been presented, and now we already knowthe names of a number of prominent scientists both from this country andfrom abroad who agreed to attend the Congress and present plenarypapers. The work of the Congress will be organized as follows: thefirst half of a day will be devoted to survey plenary reports of theleading experts from each section. In the second half of a day, onehour will be given to intersectional plenary papers, and three hours -- to the reports according to sections. There will be four working days. I am to organize the work of the 13-th section "Methods of discovering of regularities in artificial intelligence systems". It issupposed to invite the experts in the field of the methods of discovering of regularities in the information contained in Data and Knowledge Bases of Expert Systems, and using these regularities in solving the problems of data analysis, pattern recognition, prediction, diagnostics and other decision taking problems. Applications for participation can be sent to my address by mail: N.G. Zagoruiko, Institute for Mathematics Siberian Branch of the Russian Academy of Sciences, Novosibirsk, 630090, Russia; fax: (3832) 35 09 60; e-mail: zag@math.nsk.su; Information by tel.: (3832) 35 08 60. With best regards, Yours sincerely, Nikolay Zagoruiko, professor. ========================================================================= SECOND SIBERIAN CONGRESS ON INDUSTRIAL AND APPLIED MATHEMATICS (INPRIM-96) dedicated to the memory of A. A. Lyapunov (1911-1973), A. P. Ershov (1931-1988), and I. A. Poletaev (1915-1983) FIRST ANNOUNCEMENT The Sobolev Institute of Mathematics, the Institute of Informatics Systems, the Institute for Computational Technology, and the Computer Center of the Siberian Branch of the Russian Academy of Sciences, together with Novosibirsk State University, Novosibirsk State Technical University, and the Siberian Society for Promotion of Science and Education (SIBOS) convene the International Congress INPRIM-96. The Congress will take place at Novosibirsk Akademgorodok from June 25 to June 30, 1996. The Program Committee of the Congress is as follows: M.M. Lavrent'ev (Chairman), L.A. Bokut' (Vice-Chairman), I.V. Pottosin (Vice-Chairman), S.A. Treskov (Secretary), A.D. Aleksandrov, A.S. Alekseev, N.V. Belyakin, V.L. Beresnev, V.I. Bykov, W. Deuber, V.I. Drobyshevich, V.I. Elokhin, A.A. Evdokimov, S.K. Godunov, S.V. Gol'din, V.M. Gol'dshtein, V.P. Golubyatnikov, N.M. Gorskii, A.B. Gorstko, M. Hazewinkel, V.N. Kas'yanov, V.L. Katkov, A.S. Kleshchev, A.N. Konovalov, S.N. Korotkov, A.D. Korshunov, A.V. Kostochka, V.V. Kuleshov, S.S. Kutateladze, V.I. Kuz'minov, S.S. Lavrov, O.N. Lebedev, V.K. Leont'ev, V.A. Leus, B.A. Lugovtsov, O.B. Lupanov, V.L. Makarov, G.I. Marchuk, V.V. Mazepus, Yu.V. Merekin, G.S. Migirenko, A.M. Molchanov, V.A. Nepomnyashchii, J. Nesetril, B.I. Plotkin, L.N. Pobedin, A.I. Poletaev, A. Pridor, M.D. Ramazanov, V.A. Ratner, V.N. Remeslenikov, Yu.G. Reshetnyak, A.F. Revuzhenko, G.S. Rivin, A.I. Rylov, A.A. Sapozhenko, Yu.I. Shokin, V.M. Sidel'nikov, A.V. Sidorov, Yu.N. Solodkin, A.A. Titlyanova, V.A. Toponogov, B.A. Trakhtenbrot, A. B. Ugol'nikov, P.L. Ul'yanov, V.A. Uspenskii, V.A. Vasil'ev, O. F. Vasil'ev, V.L. Vaskevich, S.K. Vodop'yanov, A.S. Vostrikov, V.N. Vragov, G.S. Yablonskii, S.V. Yablonskii, N.G. Zagoruiko, A.V. Zamulin, Yu.I. Zhuravlev, B.I. Zil'ber. The Sections of the Congress and the Parallel Conferences are as follows (in parentheses indicated are the chairmans and co-chairmans of the corresponding program committees for conferences, and the coordinators for sections): 1. Andrei Ershov Second International Memorial Conference [sections: Theoretical Computer Science, Programming Methodology, Artificial Intelligence, and New Information Technologies] (A. Zamulin, M. Broy, D. Bjorner, and I. Pottosin). 2. The Conference on Cybernetics and Discrete Mathematics [sections: Discrete Analysis, and Operations Research] (O. B. Lupanov, V. L. Beresnev, and A. A. Evdokimov). 3. Mathematical Simulation (A. N. Konovalov and N. M. Gorskii). 4. Mathematical Biology (V. A. Ratner). 5. Geometry and Analysis (Yu. G. Reshetnyak and S. K. Vodop'yanov). 6. Cubature Formulas: Theory and Applications (M. D. Ramazanov, V. L. Vaskevich). 7. Mechanics (B. A. Lugovtsov and A. I. Rylov). 8. Mathematical Models for Processes in Atmosphere and Ocean (G. S. Rivin). 9. Mathematical Methods in Chemistry (V. I. Elokhin). 10. Engineering Mathematics (G. S. Migirenko and Yu. N. Solodkin). 11. Mathematical Methods in Economics (V. A. Vasil'ev and A. V. Sidorov). 12. Inverse Problems for Differential Equations (M. M. Lavrent'ev and V. P. Golubyatnikov). >>>>> 13. Methods of Discovery of the Regularities in the Systems of Artificial Intelligence (N. G. Zagoruiko). >>>>> 14. Computer Graphics (V. A. Leus). 15. Mathematical Methods in Humanities (N.V. Belyakin and L.N. Pobedin). 16. Algebra and Mathematical Logic. All questions concerning participation in the Andrei Ershov Second International Memorial Conference should be addressed to the chairman of the indicated conference, Alexandre Zamulin, Institute of Informatics Systems, 6, Lavrentjev pr. 630090 Novosibirsk, RUSSIA tel.: +7-3832-396258 fax: +7-3832-323494, e-mail: zam@iis.nsk.su Please inform the organizers about your decision to participate in the congress by March 1, 1996. Forwards your mail to Sergey Treskov, Institute of Mathematics, 630090 Novosibirsk, RUSSIA, phone: +7-3832-350962, fax: +7-3832-350652, e-mail: inprim@math.nsk.su Indicate in your Preregistration form, 1. First name 2. Middle name 3. Last name 4. Position 5. Address 6. Fax 7. Phone 8. E-mail 9. Title and one page abstract of the talk 10. Date of birth 11. Nationality (Citizenship) 12. Passport number 13. Remarks and wishes 14. Data for accompanying persons (1--3, 10--12). ==================================================================