KDD Nugget 95:33, e-mailed 95-12-21 Contents: News: * GPS, Data Mining and Knowledge Discovery journal CFP: http://www.research.microsoft.com/research/datamine/ * GPS, Advances in Knowledge Discovery and Data Mining book can be ordered from MIT Press http://aaai.org/Publications/Press/Catalog/fayyad.html Publications: S. Minton, JAIR article on "Rule-based Machine Learning Methods for Functional Prediction", http://www.cs.washington.edu/research/jair/home.html Positions: * R. St Amant, Job opportunity, large data sets, data mining Meetings: * P. Chan, CFP: AAAI-96 Workshop on Integrating Multiple Learned Models http://www.cs.fit.edu/~imlm/ * A. Sharma, CFP: ALT'96, Algorithmic Learning Theory, Australia, Oct 1996, http://www.cse.unsw.edu.au/~alt96/ Season's greetings to all of 1200+ subscribers from 50+ countries! -- The KDD Nuggets is a moderated mailing list focusing on Data Mining and Knowledge Discovery in Databases (KDD) research and development. Contributions are welcome and should be emailed, with a DESCRIPTIVE subject line (and a URL, when available) to . E-mail add/delete requests to . Nuggets frequency is approximately weekly. Back issues of Nuggets, a catalog of S*i*ftware (data mining tools), and a wealth of other information on Data Mining and Knowledge Discovery is available at Knowledge Discovery Mine site, URL . -- Gregory Piatetsky-Shapiro (moderator) ********************* Official disclaimer *********************************** * All opinions expressed herein are those of the writers (or the moderator) * * and not necessarily of their respective employers (or GTE Laboratories) * ***************************************************************************** ~~~~~~~~~~~~ Quotable Quote ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ "One of the symptoms of an approaching nervous breakdown is the belief that one's work is terribly important." - Bertrand Russell (1872-1970) >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Mon, 18 Dec 1995 From: gps@gte.com (Gregory Piatetsky-Shapiro) Subject: Data Mining and Knowledge Discovery (DMKD) journal Call for Papers see We are excited to announce the forthcoming new journal on Data Mining and Knowledge Discovery, to be published by Kluwer starting January 1997. We expect it to be the flagship publication in the Data Mining and Knowledge Discovery area. Personal subscription price will be $50 and institutional subscription price has not been set yet. Full details are at http://www.research.microsoft.com/research/datamine/ Here is the call for papers. **************************************************************** New Journal Announcement: Data Mining and Knowledge Discovery C a l l f o r P a p e r s **************************************************************** Advances in data gathering, storage, and distribution technologies have far outpaced computational advances in techniques for analyzing and understanding data. This created an urgent need for a new generation of tools and techniques for automated Data Mining and Knowledge Discovery in Databases (KDD). KDD is a broad area that integrates methods from several fields including machine learning, machine discovery, uncertainty modeling, statistics, databases, data visualization, high performance computing, management information systems (MIS), and knowledge-based systems. KDD refers to a multi-step process that can be highly interactive and iterative and which includes data selection, preprocessing, transformation, application of data mining algorithms to extract patterns/models from data, evaluating the extracted patterns, and converting them to an operational form or human-oriented knowledge. Hence "data mining" refers to a step in the overall KDD process. However, a significant portion of the published work has focused on the development and application of data mining methods for pattern/model esxtraction from data using automated or semi-automated techniques. Hence, by including it explicitly in the name of the journal, we hope to emphasize its role, and build bridges to communities working solely on data mining methods. Our goal is to make the journal of Data Mining and Knowledge Discovery a flagship publication in the KDD area, providing a unified forum for the KDD research community, whose publications are currently scattered among many different journals. The journal will publish state-of-the-art papers in both the research and practice of KDD, surveys of important techniques from related fields, and application papers of general interest. In addition, there will be a section for publishing useful information such as short application reports (1-3 pages), book and system reviews, and relevant product announcements. The topics of interest include: Theory and Foundational Issues in KDD: Data and knowledge representation for KDD Modeling of structured, textual, and multimedia data Uncertainty management in KDD Metrics for evaluating interestingness and utility of knowledge Algorithmic complexity, efficiency, and scalability issues in data mining Limitations of data mining methods Data Mining Methods and Algorithms: Discovery methods based on belief networks, decision trees, genetic programming, neural networks, rough sets, and other approaches Algorithms for mining spatial, textual, and other complex data Incremental discovery methods and re-use of discovered knowledge Integration of discovery methods Data structures and query evaluation methods for data mining Parallel and distributed data mining techniques Issues and challenges for dealing with massive or small data sets Knowledge Discovery Process Data pre-processing for data mining Evaluating, consolidating, and explaining discovered knowledge Data and knowledge visualization Interactive data exploration and discovery Application Issues: Application case studies Data mining systems and tools Details of successes and failures of KDD Resource and knowledge discovery on the Internet and WWW Privacy and security issues This list of topics is not intended to be exhaustive but an indication of typical topics of interest. Prospective authors are encouraged to submit papers on any topics of relevance to knowledge discovery and data mining. SUBMISSION AND REVIEW CRITERIA: We solicit papers on both research and applications. All submitted papers should be relevant to KDD, clearly written, and be accessible to readers from other disciplines by including a carefully written introduction. Submissions will be thouroughly reviewed to ensure they make a substantial advance either in increasing our understanding of a fundamental theoretical problem, or provide a strong technological advance enabling the algorithmic extraction of knowledge from data. Papers whose primary focus is on significant applications are strongly encouraged but must clearly address the general underlying issues and principles, as well as provide details of algorithmic aspects. Papers whose primary focus is on algorithms and methods must address issues of complexity, efficiency/feasibility for large data sets, and clearly state assumptions and limitations of methods covered. Short application summaries (1-3 pages) are also encouraged and would be judged on the basis of application significance, technical innovation, and clarity of presentation. SUBMISSION INSTRUCTIONS: We encourage electronic submission of postscript files. For harcopy submission, authors should submit five hard copies of their manuscript to: Ms. Karen Cullen , DATA MINING AND KNOWLEDGE DISCOVERY Editorial Office, Kluwer Academic Publishers, 101 Philip Drive, Norwell, MA 02061 phone 617-871-6600 fax 617-871-6528 email: kcullen@wkap.com Submissions should be in 12pt font, 1.5 line-spacing, and should not exceed 28 pages. We strongly encourage electronic submissions, please visit http://www.research.microsoft.com/research/datamine/ to obtain instructions on electronic submissions. Detailed instructions for submission of final manuscripts and Kluwer format files for LaTex, MS Word, and other typestting programs are provided at the above site. Exact instructions for hardcopy and electronic submission to Kluwer can be accessed at http://www.research.microsoft.com/research/datamine/ Being a publication for a rapidly emerging field, the journal would emphasize quick dissemination of results and minimal backlogs in publication time. We plan to review papers and respond to authors within 3 months of submission. An electronic server will be made available for access to accepted papers by all subscribers to the journal. Authors would be encouraged to make their data available via the journal web site. The journal will be a quarterly, with a first volume published in January 1997 by Kluwer Academic Publishers. Editors-in-Chief: Usama M. Fayyad ================ Jet Propulsion Laboratory, California Institute of Technology, USA Heikki Mannila University of Helsinki, Finland Gregory Piatetsky-Shapiro GTE Laboratories, USA Editorial Board: =============== Rakesh Agrawal (IBM Almaden Research Center, USA) Tej Anand (AT&T Global Information Solutions, USA) Ron Brachman (AT&T Bell Laboratories, USA) Wray Buntine (Heuristicrats Research Inc, USA) Peter Cheeseman (NASA AMES Research Center, USA) Greg Cooper (University of Pittsburgh, USA) Bruce Croft (University of Mass. Amherst, USA) Dan Druker (Arbor Software, USA) Saso Dzeroski (Josef Stefan Institute, Slovenia) Oren Etzioni (University of Washington, USA) Jerome Friedman (Stanford University, USA) Brian Gaines (University of Calgary, Canada) Clark Glymour (Carnegie-Mellon University, USA) Jim Gray (Microsoft Research, USA) Georges Grinstein (U. of Lowell, USA) Jiawei Han (Simon Fraser University, Canada) David Hand (Open University, UK) Trevor Hastie (Stanford University, USA) David Heckerman (Microsoft Research, USA) Se June Hong (IBM T.J. Watson Research Center, USA) Thomasz Imielinski (Rutgers University, USA) Larry Jackel (AT&T Bell Labs, USA) Larry Kerschberg (George Mason University, USA) Willi Kloesgen (GMD, Germany) Yves Kodratoff (Lab. de Recherche Informatique, France) Pat Langley (ISLE/Stanford University, USA) Tsau Lin (San Jose State University, USA) David Madigan (University of Washington, USA) Ami Motro (George Mason University, USA) Shojiro Nishio (Osaka University, Japan) Judea Pearl (University of California, Los Angeles, USA) Ed Pednault (AT&T Bell Labs, USA) Daryl Pregibon (AT&T Bell Laboratories, USA) J. Ross Quinlan (University of Sydney, Australia) Jude Shavlik (University of Wisconsin - Madison, USA) Arno Siebes (CWI, Netherlands) Evangelos Simoudis (IBM Almaden Research Center, USA) Andrzej Skowron (University of Warsaw, Poland) Padhraic Smyth (Jet Propulsion Laboratory, USA) Salvatore Stolfo (Columbia University, USA) Alex Tuzhilin (NYU Stern School, USA) Ramasamy Uthurusamy (General Motors Research Laboratories, USA) Vladimir Vapnik (AT&T Bell Labs, USA) Ronald Yager (Iona College, USA) Xindong Wu (Monash University, Australia) Wojciech Ziarko (University of Regina, Canada) Jan Zytkow (Wichita State University, USA) >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Tue, 18 Dec 1995 From: gps@gte.com (Gregory Piatetsky-Shapiro) Subject: Advances in Knowledge Discovery and Data Mining collection, http://aaai.org/Publications/Press/Catalog/fayyad.html The book is currently at the printer; it will be at MIT bookstore in January and should be available in February. Advance copies can now be ordered from MIT Press on-line. Advances in Knowledge Discovery and Data Mining, editors Usama M. Fayyad, Gregory Piatetsky-Shapiro, Padhraic Smyth, and Ramasamy Uthurusamy Published by the AAAI Press / The MIT Press 625 pp. $50.00 ISBN 0-262-56097-6 This book can be ordered online from The MIT Press . During the last decade, we have seen an explosive growth in our capabilities to both generate and collect data. Advances in data collection, widespread use of bar codes for most commercial products, and the computerization of many business and government transactions have flooded us with information, and generated an urgent need for new techniques and tools that can intelligently and automatically assist us in transforming this data into useful knowledge. This book examines and describes many such new techniques and tools, in the emerging field of data mining and knowledge discovery in databases (KDD). The chapters of this book span fundamental issues of knowledge discovery, classification and clustering, trend and deviation analysis, dependency derivation, integrated discovery systems, augmented database systems, and application case studies. >~~~Publications:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From: minton@ISI.EDU Date: Tue, 19 Dec 95 11:53:27 PST Posted-Date: Tue, 19 Dec 95 11:53:27 PST To: ai-stats@watstat.uwaterloo.ca, kdd@gte.com, connectionists@cs.cmu.edu Subject: JAIR article Content-Type: text Content-Length: 2050 Readers of this mailing list may be interested in the following JAIR article, which was just published: Weiss, S.M. and Indurkhya, N. (1995) "Rule-based Machine Learning Methods for Functional Prediction", Volume 3, pages 383-403. PostScript: volume3/weiss95a.ps (527K) compressed, volume3/weiss95a.ps.Z (166K) Abstract: We describe a machine learning method for predicting the value of a real-valued function, given the values of multiple input variables. The method induces solutions from samples in the form of ordered disjunctive normal form (DNF) decision rules. A central objective of the method and representation is the induction of compact, easily interpretable solutions. This rule-based decision model can be extended to search efficiently for similar cases prior to approximating function values. Experimental results on real-world data demonstrate that the new techniques are competitive with existing machine learning and statistical methods and can sometimes yield superior regression performance. The PostScript file is available via: -- comp.ai.jair.papers -- World Wide Web: The URL for our World Wide Web server is http://www.cs.washington.edu/research/jair/home.html -- Anonymous FTP from either of the two sites below: CMU: p.gp.cs.cmu.edu directory: /usr/jair/pub/volume3 Genoa: ftp.mrg.dist.unige.it directory: pub/jair/pub/volume3 -- automated email. Send mail to jair@cs.cmu.edu or jair@ftp.mrg.dist.unige.it with the subject AUTORESPOND, and the body GET VOLUME3/FILE-NM (e.g., GET VOLUME3/MOONEY95A.PS) Note: Your mailer might find our files too large to handle. Also, note that compressed files cannot be emailed, since they are binary files. -- JAIR Gopher server: At p.gp.cs.cmu.edu, port 70. For more information about JAIR, check out our WWW or FTP sites, or send electronic mail to jair@cs.cmu.edu with the subject AUTORESPOND and the message body HELP, or contact jair-ed@ptolemy.arc.nasa.gov. >~~~Siftware:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >~~~Positions:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Fri, 15 Dec 1995 15:15:48 -0500 From: "Robert St. Amant" To: kdd@gte.com Subject: Job opportunity, large data sets, data mining Content-Type: text Content-Length: 1478 Gregory, This appeared on sci.stat.consult. (I hope you haven't already gotten a dozen copies of this.) Regards, Rob St. Amant >From: Marc Andelman >Newsgroups: sci.stat.consult >Date: 12 Dec 1995 15:31:22 GMT >Organization: UltraNet Communications, Inc. >Lines: 27 >NNTP-Posting-Host: biosource.ultranet.com > >Biosource is an employer paid fee service. Our client, a >pharmaceutical benefit management firm in the midwest, is >seeking to hire a statistician. This person will do trend >and multivariate analysis of large data sets . A pharmaceutical >benefit management firm is a company that has access to data >from millions of pharmacies, and which analyzes this data >as a service to hospitals, HMOS,etc., The sort of things >they like to ascertain is >1. is a particular patient complying with his/her prescription, or >using the money to buy lottery tickets? > >2. Cost and utilization data > >3. Any trends that appear from very large data sets. > >Relevent requirments- Familiarity with large data sets, such >as one might get from experience with the IRS, insurance firms, >or other P.B.M. firms. > >Interested candidates should E Mail or FAX to Marc Andelman >at 508 843 8772. President, Biosource Inc. 1 Parkton Ave. >Worcester, MA 01605 > >Unless specifically requested, I will only get back in >touch with a particular person if this looks like a >dead ringer for someone's career interests. >Thank you. Marc Andelman >~~~Meetings:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Return-Path: Date: Mon, 18 Dec 1995 16:39:40 -0500 From: IMLM Workshop (pkc) To: ml@ics.uci.edu, kdd@gte.com, INDUCTIVE@hermes.csd.unb.ca, DAI-List@ece.sc.edu, GA-List@AIC.NRL.NAVY.MIL, Connectionists@CS.cmu.edu, ai-stats@watstat.uwaterloo.ca, dbworld@cs.wisc.edu Cc: imlm@tuck.cs.fit.edu, sal@cs.columbia.edu, dhw@santafe.edu Subject: CFP: AAAI-96 Workshop on Integrating Multiple Learned Models Content-Type: text Content-Length: 4021 CALL FOR PAPERS/PARTICIPATION INTEGRATING MULTIPLE LEARNED MODELS FOR IMPROVING AND SCALING MACHINE LEARNING ALGORITHMS to be held in conjunction with AAAI 1996 Portland, Oregon August 1996 Most modern machine learning research uses a single model or learning algorithm at a time, or at most selects one model from a set of candidate models. Recently however, there has been considerable interest in techniques that integrate the collective predictions of a set of models in some principled fashion. With such techniques often the predictive accuracy and/or the training efficiency of the overall system can be improved, since one can "mix and match" among the relative strengths of the models being combined. The goal of this workshop is to gather researchers actively working in the area of integrating multiple learned models, to exchange ideas and foster collaborations and new research directions. In particular, we seek to bring together researchers interested in this topic from the fields of Machine Learning, Knowledge Discovery in Databases, and Statistics. Any aspect of integrating multiple models is appropriate for the workshop. However we intend the focus of the workshop to be improving prediction accuracies, and improving training performance in the context of large training databases. More precisely, submissions are sought in, but not limited to, the following topics: 1) Techniques that generate and/or integrate multiple learned models. In particular, techniques that do so by: * using different training data distributions (in particular by training over different partitions of the data) * using different output classification schemes (for example using output codes) * using different hyperparameters or training heuristics (primarily as a tool for generating multiple models) 2) Systems and architectures to implement such strategies. In particular: * parallel and distributed multiple learning systems * multi-agent learning over inherently distributed data A paper need not be submitted to participate in the workshop, but space may be limited so contact the organizers as early as possible if you wish to participate. The workshop format is planned to encompass a full day of half hour presentations with discussion periods, ending with a brief period for summary and discussion of future activities. Notes or proceedings for the workshop may be provided, depending on the submissions received. Submission requirements: i) A short paper of not more than 2000 words detailing recent research results must be received by March 18, 1996. ii) The paper should include an abstract of not more than 150 words, and a list of keywords. Please include the name(s), email address(es), address(es), and phone number(s) of the author(s) on the first page. The first author will be the primary contact unless otherwise stated. iii) Electronic submissions in postscript or ASCII via email are preferred. Three printed copies (preferrably double-sided) of your submission are also accepted. iv) Please also send the title, name(s) and email address(es) of the author(s), abstract, and keywords in ASCII via email. Submission address: imlm@cs.fit.edu Philip Chan IMLM Workshop Computer Science Florida Institute of Technology 150 W. University Blvd. Melbourne, FL 32901-6988 407-768-8000 x7280 (x8062) 407-984-8461 (fax) Important Dates: Paper submission deadline: March 18, 1996 Notification of acceptance: April 15, 1996 Final copy: May 13, 1996 Chairs: Salvatore Stolfo, Columbia University sal@cs.columbia.edu David Wolpert, Santa Fe Institute dhw@santafe.edu Philip Chan, Florida Institute of Technology pkc@cs.fit.edu General Inquiries: Please address general inquiries to one of the co-chairs or send them to: imlm@cs.fit.edu Up-to-date workshop information is maintained on WWW at: http://cs.fit.edu/~imlm/ or http://www.cs.fit.edu/~imlm/ >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Return-Path: From: arun@cse.unsw.edu.au (Arun Sharma) To: kdd@gte.com Date: Mon, 18 Dec 1995 23:26:54 +1100 (EST) Subject: Request for posting call for papers in the KDD list Cc: arun@cse.unsw.edu.au (Arun Sharma) X-Mailer: ELM [version 2.4 PL23] Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=US-ASCII Content-Length: 4056 Dear Gregory, I would appreciate if you could kindly post the following call for papers in the KDD list. Thanks. -- Best regards, Arun ----------------- Call for Papers: ALT'96 The Seventh International Workshop on Algorithmic Learning Theory Coogee Holiday Inn, Sydney, Australia October 23-25, 1996 The 7th International Workshop on Algorithmic Learning Theory (ALT'96) will be held at the Coogee Holiday Inn, Sydney, Australia during October 23-25, 1996, and will be collocated with the Pacific Rim Knowledge Acquisition Workshop. The workshop is being sponsored by the Japanese Society for Artificial Intelligence (JSAI) and the University of New South Wales (UNSW). We invite submissions to ALT'96 in all areas related to algorithmic learning theory including (but not limited to): the design and analysis of learning algorithms, the theory of machine learning, computational logic of/for machine discovery, inductive inference, learning via queries, artificial and biological neural networks, pattern recognition, learning by analogy, Bayesian/MDL estimation, statistical learning, inductive logic programming, robotics, application of learning to databases, gene analysis, etc. INVITED TALKS: Invited talks will be given by Prof. J.R. Quinlan, (University of Sydney), Prof. T. Shinohara (Kyushu Institute of Technology), Prof. Les Valiant (Harvard Univ.), and Prof. Paul Vitanyi (CWI and Univ. of Amsterdam). SUBMISSIONS: Authors must submit nine copies of their extended abstracts to: Arun Sharma - ALT'96 School of Computer Science and Engineering University of New South Wales Sydney, 2052, Australia ABSTRACTS must be received by April 15, 1996. NOTIFICATION of acceptance or rejection will be mailed to the first (or designated) author by June 3, 1996. CAMERA-READY copy of accepted papers will be due July 1, 1996. FORMAT: The submitted abstract should consist of a cover page with title, authors' names, postal and e-mail addresses, an approximately 200 word summary, and a body not longer than ten (10) pages of size A4 or 7x10.5 inches in twelve-point font. Note that only the first ten (10) pages of the body will be sent out for review. Double-sided printing is strongly encouraged. POLICY: Each submitted abstract will be reviewed by the members of the program committee, and be judged on clarity, significance, and originality. Simultaneous submission of papers to any other conference with published proceedings is not allowed. Papers that have appeared in journals or other conferences are not appropriate for ALT'96. PROCEEDINGS will be published as a volume in the Lecture Notes Series in Artificial Intelligence from Springer-Verlag, and will be available at the conference. Selected papers of ALT'96 will be invited to be published in a special issue of a distinguished journal. CONFERENCE CHAIR: Prof. Setsuo Arikawa RIFIS, Kyushu University 33 Fukuoka, 812 Japan arikawa@rifis.kyushu-u.ac.jp PROGRAM COMMITTEE CHAIR: Arun Sharma, Univ. of New South Wales arun@cse.unsw.edu.au PROGRAM COMMITTEE: H. Arimura (KyuTech), Jose Balcazar (UPC, Barcelona), P. Bartlett (ANU), W. Cohen (AT&T), S. Ben David (Technion), H. Imai (U. Tokyo), K.P. Jantke (TH Leipzig), S. Kobayashi (U. Electro-Comm.), M. Numao (TiTech), S. Jain (National U. Singapore), S. Lange (TH Leipzig), L. De Raedt (Leuven), Y. Sakakibara (Fujitsu Labs) M. Sato (Osaka Pref. U.), O. Watanabe (TiTech), K. Yamanishi (NEC), T. Zeugmann (Kyushu) LOCAL ARRANGEMENTS CHAIR: Achim Hoffmann School of Computer Science and Engineering University of New South Wales Sydney 2052 Australia alt96@cse.unsw.edu.au For more information, contact: Email: alt96@cse.unsw.edu.au Homepage: http://www.cse.unsw.edu.au/~alt96/ -------------------------------------------------------------------------