MeetingsFrom: Christine Houck houck@dimacs.rutgers.eduDate: Thu, 31 May 2001 16:44:45 -0400 (EDT) Subject: DIMACS Workshop on Data Mining and Scalable Algorithms, Rutgers, NJ, Aug 22-24 DIMACS Center, Rutgers University, Piscataway, NJ Organizers: Alex Smola, Australian National University, Alex.Smola@anu.edu.au Paul Bradley, Digimine Inc., paulb@digimine.com Nello Cristianini, Royal Holloway College, University of London, Alex.Smola@anu.edu.au Olvi Mangasarian, University of Wisconsin, olvi@cs.wisc.edu Presented under the auspices of the Special Focus on Data Analysis and Mining. With the availability of very large collections of data, the areas of machine learning, statistics, optimization, and databases face the challenge of making efficient use of this information. Data mining targets the problem of finding useful, interesting, and understandable structure or models derived from the data. While there exist advanced techniques for dealing with nonparametric estimators efficiently when only limited data is available, often algorithms for large amounts of data resort to a rather limited class of possible estimates such as linear models or the assumption that the data can be represented by a small number of clusters. This restriction is mainly imposed due to implementation constraints. Yet this situation is paradoxical since complex models could be more easily justified from a statistical point of view, especially when data is abundant. It gives rise to the question whether statistical methods exist that strike a better balance between complexity and performance. Aims and Topics Practical Limits of Nonparametric Methods: Runtime, storage, relation to nearest neighbor methods. Practical Limits of Parametric Models: Is data really nonlinear or is a simple model good enough? Handling Categorical Data: Kernels for categorical data, data with mixed numeric and categorial attributes. Novelty Detection and Discovering Patterns: Fraud detection, modeling temporal/cyclic data. Missing or Censored Data Efficiency: Integration with database systems, efficient model building, efficient model deployment, large datasets. Data and Feature Selection: Reduced dataset and feature methods Small Training Set - Large Test Set: Can we gain anything by transduction or EM? Understandability and Visualization: Prediction explanation, data visualization/navigation. Applications: Collaborative filtering, text classification (e.g. email classification), mining of massive document repositories (withhypertextual, multilingual, multimedia features). There is a small fee to attend the workshop. Information on participation, registration, accommodations, and travel can be found at: http://dimacs.rutgers.edu/Workshops/Scalable/ |
Copyright © 2001 KDnuggets. Subscribe to KDnuggets News!