KDnuggets Home » News » 2010 » Mar » PAKDD Data Mining Competition  ( < Prev | 10:n07 | Next > )

PAKDD 2010: Data Mining Competition


 
  
How to build a model for a binary decision support system based on this type of biased sample in a credit scoring application


PAKDD2010 Data Mining Competition

OVERVIEW
The 14th Pacific-Asia Knowledge Discovery and Data Mining conference (PAKDD 2010) will be held during 21-24, 2010, Hyderabad, India. PAKDD2010 is pleased to host data mining competition, co-organized by NeuroTech Ltd. and Center for Informatics of the Federal University of Pernambuco (Brazil). The competition is open for academia and industry and can be accessed either through the PAKDD 2010 Conference site (www.iiit.ac.in/conferences/pakdd2010/) or directly to the competition server

sede.neurotech.com.br/PAKDD2010/

PROBLEM SUMMARY
Re-Calibration of a Credit Risk Assessment System Based on Biased Data: The most fundamental and most frequently found type of decision is the Binary Decision. This type of decision appears in any business activity where the decision outcome is either to "do that" or to "do something else". In general, this is made via a simple threshold which serves as the control parameter for producing decisions over a propensity score. Binary decisions, in principle, could be assessed "successful" or "unsuccessful" for either outcome, via errors type-I and type-II, but in general, only the "do that" decision outcome is monitored for decision assessment.

As a consequence, only a part of the market is monitored and has its decisions assessed as a "successful" or "unsuccessful", forming a very biased sample for system re-calibration/re-training because, it has been extracted from the market by a process focused on the decision objective. This competition focuses on how to build a model for a binary decision support system based on this type of biased sample in a credit scoring application. There are only data about the company's clients for modeling, but not about the rejected applicants. This is the context of PAKDD 2010 Competition.

These data sets come from a private label credit card operation of a Brazilian credit company and its partner shops. The official competition performance metric will be the area under the ROC curve Some other model performance metrics will be used for comparative purposes.

IMPORTANT DATES

  • Mar 24: Competition announcement, Competition starts
  • Apr 16: Prediction data set release
  • May 03: Competition submission deadline (PDF manuscript and scores)
  • May 17: Competition results released
  • Jun 21: Conference starts
ORGANIZERS
  • Paulo J. L. Adeodato (Chair)
  • Adrian L. Arnaud (Vice-Chair)
E-mail: pakdd2010@neurotech.com.br

KDnuggets Home » News » 2010 » Mar » PAKDD Data Mining Competition  ( < Prev | 10:n07 | Next > )