Overview of KDD Cup 2008
The KDD cup is an annual event and competition in Data Mining and Knowledge Discovery. It is organized by the ACM Special Interest Group on Knowledge Discovery and Data Mining which is the leading professional organization of data miners. The competition focuses on different areas each year. The 2008 competition was particularly interesting and we’ll provide an overview of the KDD Cup 2008.
The KDD Cup 2008 focused on the problems and challenges inherent in early detection of breast cancer through X-ray images of the breast. Breast cancer is a serious illness and unfortunately takes far too many victims if the disease is not detected in the early stages.
It has been found that for the most part, cancerous patients exhibit no more than one malignant lesion. The objective in this competition was to develop algorithms for computer aided detection of early stage breast cancer from X-ray images.
Breast cancer is the second leading cause of cancer deaths in women today and it is the most common form of cancer in women overall, except for skin cancers. It is expected that approximately 1.3 million women will be diagnosed annually with breast cancer worldwide and that over 450,000 of them will die from the disease.
Early detection is key in preventing and treating breast cancer. Ideally the cancer should be identified before any symptoms develop in order to provide the best chances for effective treatment.
In this competition, there were two main tasks.
The first task entailed requiring participants to analyze x-rays and score them in terms of confidence factors for the presence of malignancies. A score of infinity indicates corresponds to full confidence that the patient has a malignant tumor and a score of minus infinity indicates with full confidence that the patient has a benign tumor.
The second task attempts to prescreen x-rays in order to reduce the workload on radiologists. It evaluates x-rays and develops an algorithm which segregates results which are deemed completely normal and which do not require further review and inspection by a radiologist.
The data provided to participants consisted of four x-ray images each. This is the same type of x-rays normally available to a radiologist. The data is comprised of both training data and test data. In this way participants can work out the appropriate analysis and algorithms before tackling the actual test data.
Results of the challenge have been published on the Internet and are available from a wide variety of sources. They list winners and runner ups in both Challenge 1 and 2 as well as list of participants.
For each participant, results needed to be included in a defined file format and submitted to the judging organization. This enabled the data to be presented clearly and consistently and assisted the judging organization.
The KDD Cup of 2008 was a very important way to help to develop effective means for early detection of breast cancer in an efficient manner. Techniques developed can be of benefit to health care professionals and help to improve the early detection and ultimate effective treatment of this devastating disease.