Wednesday, April 16, 2008

Data Mining Data Sets

Every once in a while I receive a request or see one posted on some bulletin board about data mining data sets. I have to say, I have little patience for many of these requests because a simple google (or Clusty) search will solve the problem. Nevertheless, here are four sites I've used in the past to grab data for some testing of algorithms of software packages:

There are several sites for data, including:

UC Irvine Machine Learning Repository:

Carnegie Mellon Statlib Archive:

DELVE Datasets:

MIT Broad Institute Cancer Datasets:


hammer_shi said...

hi , I am come from Chinakdd ( ),Would you like me repost this blog ?


vfdvgf said...
This comment has been removed by a blog administrator.
SoftWarrior - 0 said...

Hi, i'm looking for a credit card fraud dataset, but i can't find it. pls, could you help me?

Prologic Corporation said...

This is a good article & good site.Thank you for sharing this article. It is help us following categorize:
healthcare, e commerce, programming, multi platform,inventory management, cloud-based solutions, it consulting, retail, manufacturing, CRM, technology means, digital supply chain management, Delivering high-quality service for your business applications,
Solutions for all Industries,packaged applications,business applications, Web services,
Business intelligence, Business Development, Software Development etc.

Our address:
2002 Timberloch Place, Suite 200
The Woodlands, TX 77380


Mukesh said...

Hi I am a Mater (Master in computer and information system MCIS) student, I want to complete my thesis on Student performance prediction and analysis using data mining so my requirement is large student dataset so that i can complete my thesis.

so please help me by forwarding dataset related to student performance to my mail address: