KDnuggets News 01:12, item 25, Meetings

KDnuggets : News : 2001 : n12 : item25 (previous | next)

Meetings

From: Christine Houck houck@dimacs.rutgers.edu
Date: Thu, 31 May 2001 16:44:45 -0400 (EDT)
Subject: DIMACS Workshop on Data Mining and Scalable Algorithms, Rutgers, NJ, Aug 22-24

DIMACS Center, Rutgers University, Piscataway, NJ

Organizers:
Alex Smola, Australian National University, Alex.Smola@anu.edu.au
Paul Bradley, Digimine Inc., paulb@digimine.com
      Nello Cristianini, Royal Holloway College, University of London,
Alex.Smola@anu.edu.au
Olvi Mangasarian, University of Wisconsin, olvi@cs.wisc.edu

Presented under the auspices of the Special Focus on Data Analysis and Mining.

With the availability of very large collections of data, the areas of
machine learning, statistics, optimization, and databases face the
challenge of making efficient use of this information. Data mining
targets the problem of finding useful, interesting, and understandable
structure or models derived from the data. While there exist
advanced techniques for dealing with nonparametric estimators efficiently
when only limited data is available, often algorithms for large
amounts of data resort to a rather limited class of possible estimates
such as linear models or the assumption that the data can be
represented by a small number of clusters. This restriction is mainly
imposed due to implementation constraints.

Yet this situation is paradoxical since complex models could be
more easily justified from a statistical point of view, especially
when data is abundant. It gives rise to the question whether
statistical methods exist that strike a better balance between
complexity and performance.

Aims and Topics

      Practical Limits of Nonparametric Methods: Runtime, storage, relation
      to nearest neighbor methods.

      Practical Limits of Parametric Models: Is data really nonlinear or is
      a simple model good enough?

      Handling Categorical Data: Kernels for categorical data, data with
      mixed numeric and categorial attributes.

      Novelty Detection and Discovering Patterns: Fraud detection, modeling
      temporal/cyclic data.

      Missing or Censored Data

      Efficiency: Integration with database systems, efficient model
      building, efficient model deployment, large datasets.

      Data and Feature Selection: Reduced dataset and feature methods

      Small Training Set - Large Test Set: Can we gain anything by
      transduction or EM?

      Understandability and Visualization: Prediction explanation, data
      visualization/navigation.

      Applications: Collaborative filtering, text classification (e.g.
      email classification), mining of massive document repositories
      (withhypertextual, multilingual,
      multimedia features).

There is a small fee to attend the workshop.
Information on participation, registration, accommodations, and travel
can be found at:

http://dimacs.rutgers.edu/Workshops/Scalable/

KDnuggets : News : 2001 : n12 : item25 (previous | next)