PAKDD2010 Data Mining Competition
OVERVIEW
The 14th Pacific-Asia Knowledge Discovery and Data Mining conference (PAKDD
2010) will be held during 21-24, 2010, Hyderabad, India. PAKDD2010 is pleased to
host data mining competition, co-organized by NeuroTech Ltd. and Center for
Informatics of the Federal University of Pernambuco (Brazil). The competition
is open for academia and industry and can be accessed either through the PAKDD
2010 Conference site (www.iiit.ac.in/conferences/pakdd2010/) or directly to the
competition server
sede.neurotech.com.br/PAKDD2010/
PROBLEM SUMMARY
Re-Calibration of a Credit Risk Assessment System Based on Biased Data: The most
fundamental and most frequently found type of decision is the Binary Decision.
This type of decision appears in any business activity where the decision
outcome is either to "do that" or to "do something else". In general, this is
made via a simple threshold which serves as the control parameter for producing
decisions over a propensity score. Binary decisions, in principle, could be
assessed "successful" or "unsuccessful" for either outcome, via errors type-I
and type-II, but in general, only the "do that" decision outcome is monitored
for decision assessment.
As a consequence, only a part of the market is monitored and has its decisions assessed as a "successful" or "unsuccessful", forming a very biased sample for system re-calibration/re-training because, it has been extracted from the market by a process focused on the decision objective. This competition focuses on how to build a model for a binary decision support system based on this type of biased sample in a credit scoring application. There are only data about the company's clients for modeling, but not about the rejected applicants. This is the context of PAKDD 2010 Competition.
These data sets come from a private label credit card operation of a Brazilian credit company and its partner shops. The official competition performance metric will be the area under the ROC curve Some other model performance metrics will be used for comparative purposes.
IMPORTANT DATES
- Mar 24: Competition announcement, Competition starts
- Apr 16: Prediction data set release
- May 03: Competition submission deadline (PDF manuscript and scores)
- May 17: Competition results released
- Jun 21: Conference starts
- Paulo J. L. Adeodato (Chair)
- Adrian L. Arnaud (Vice-Chair)