Data Mining, 2012/2013

Index

Course description

Data mining is the process of analyzing data and summarizing it into useful information. It utilizes methods from machine learning, statistics, and database systems. Data mining programs are intended to search through large amounts of data and automatically find hidden relationships and patterns in that data.

This course covers the steps involved in the Data Mining process, from data preparation to concrete algorithms for finding structural patterns in data.

Course topics

  1. Introduction
  2. Data preparation for input
  3. Classification algorithms
  4. Evaluation
  5. Association rule learning
  6. Clustering
  7. Recommendation systems: Basic ideas
  8. Web mining and link analysis
  9. Web advertising

Bibliography

The recommended textbook for this course is,

Additional references:

Class notes

I'll be using the class notes developed by Gregory Piatetsky-Shapiro and Gary Parker for the data mining course available at kdnuggets. You can go there and download the powerpoint presentations.

Useful links

Programming exercises to review elementary statistics

Extra slides on

Grading

The grading of this course is based on a final exam (40%) and a project (60%). The project can be done alone or in a group of two students, and consists of applying data mining to an application domain of your own choice. You'll have to,

This process mimics what happens when researchers submit their work to scientific conferences worldwide. This should be helpful for you to practice your writing and presentation skills.

Tips for your project

In your report, make sure that you,

As a final note, you may want to check David Goldberg's Technical Writing for Fun & Profit for tips on writing technical reports.