Data Mining, 2012/2013
Data mining is the process of analyzing data and summarizing it into useful information.
It utilizes methods from machine learning, statistics, and database systems.
Data mining programs are intended to search through large amounts of data and
automatically find hidden relationships and patterns in that data.
This course covers the steps involved in the Data Mining process, from data preparation
to concrete algorithms for finding structural patterns in data.
Data preparation for input
- What is Data Mining?
- Application examples
- Data Mining and Machine Learning
- Learning as search
- 1R and Naive Bayes
- Decision Trees
- Rule based algorithms
Association rule learning
- Training, test, and validation data
- Cross-validation and bootstrap
- Comparing algorithms
- Handling skewed classes
- Frequent itemsets
- The Apriori algorithm
Recommendation systems: Basic ideas
Web mining and link analysis
- Distance measures
- Hierarchical clustering
The recommended textbook for this course is,
Data Mining: Practical Machine Learning Tools and Techniques (Third Edition)
by Ian H. Witten, Eibe Frank & Mark A. Hall.
Morgan Kaufmann, 2011.
And this is the book webpage.
Introduction to Data Mining
by Pang-Ning Tan, Michael Steinbach & Vipin Kumar.
Pearson Education, 2006.
by Tom Mitchell.
McGraw Hill, 1997.
Mining of Massive Datasets
by Anand Rajaraman, Jure Leskovec & Jeffrey D. Ullman
I'll be using the class notes developed by Gregory Piatetsky-Shapiro and Gary Parker
data mining course available at kdnuggets. You can go there and download the powerpoint
Programming exercises to review elementary statistics
Extra slides on
The grading of this course is based on a final exam (40%) and a project (60%).
The project can be done alone or in a group of two students, and consists of applying
data mining to an application domain of your own choice.
You'll have to,
- write a technical report (10 page maximum) describing your application. (70%)
- make a 20-minute oral presentation of your work. (20%)
- read and carefully review a report from your classmates. (10%)
This process mimics what happens when researchers submit their work to scientific
conferences worldwide. This should be helpful for you to practice your writing and
Tips for your project
In your report, make sure that you,
- state what are you trying to learn. What are you using data mining for?
- describe your data properly. Where did you get it from? What's the meaning of the attributes? What are their types?
- describe the data preparation steps (if any) that you did.
- describe the algorithm(s) that you apply.
- present and discuss the results that you obtained.
As a final note, you may want to check David Goldberg's Technical Writing for Fun & Profit for tips on writing technical reports.