Students
Tuition Fee
Start Date
Medium of studying
Duration
Details
Program Details
Degree
Masters
Major
Artificial Intelligence | Computer Science | Data Science
Area of study
Information and Communication Technologies | Mathematics and Statistics
Course Language
English
Intakes
Program start dateApplication deadline
2014-03-01-
About Program

Program Overview


Course Description

This course presents current research in Knowledge Discovery in Databases (KDD) dealing with data integration, mining, and interpretation of patterns in large collections of data. Topics include data warehousing and data preprocessing techniques; data mining techniques for classification, regression, clustering, deviation detection, and association analysis; and evaluation of patterns mined from data. Industrial and scientific applications are discussed. Students will be expected to read assigned textbook chapters and research papers, and work on implementation/research projects that cover the different stages of the KDD process. This course can be used to satisfy the graduate AI bin requirement.


Class Meeting

  • Time: Tuesdays and Fridays 3:00-4:20 pm
  • Room: HL230

Instructor

  • Prof. Carolina Ruiz

Textbook

  • Required Textbook: Introduction to Data Mining by P.-N. Tan, M. Steinbach, V. Kumar, Addison-Wesley 2005
  • Recommended Textbook: "Data Mining: Practical Machine Learning Tools and Techniques (Third Edition)" by Ian H. Witten, Eibe Frank, Mark A. Hall, Morgan Kaufmann, January 2011

Prerequisite

  • Background in artificial intelligence, databases, and statistics at the undergraduate level, or permission of the instructor.
  • Proficiency in a high-level programming language (preferably Java) is required.

Grades

Component Percentage
Projects 80%
Showcase 10%
Class Participation 10%

Class Participation

All students are expected to read the material assigned for each class in advance and to participate in class discussions. Also, students will take turns presenting papers and leading class discussions of assigned readings. Class participation will be taken into account when deciding students' final grades.


Projects, Assignments, and Showcases

Projects

This course is project-intensive. Several projects related to the data mining stages and/or techniques covered in the class will be assigned. Students will work on these projects individually, not in teams. Students will be required to provide both a written report and an oral (in-class) presentation describing their work on each of these projects. Datasets for those projects will be selected from online database repositories, or other sources.


Tools Used

  • Matlab: A high-level language and interactive environment for numerical computation, visualization, and programming.
  • Weka: A machine-learning/data-mining environment that provides a large collection of Java-based mining algorithms, data preprocessing filters, and experimentation capabilities.
  • RapidMiner: An analytics platform that includes a multitude of methods for data integration, data transformation, data modeling, and data visualization.
  • Programming Languages: Students can use Python, R, or any other programming language to implement their own programs and scripts to complement the functionality of the systems above.

Showcase

Each student should search for a real-world successful application of data mining and present it in class. This successful data mining story should be about using data mining to discover novel and useful patterns that made a difference in a certain industry or field. The application domain is up to the student. The chosen successful data mining story should be discussed with the professor in advance. The student will then give a 10-minute in-class presentation describing this application in as much detail as possible, focusing on its data mining aspects.


Additional Suggested References

  • Lecture Notes
  • List of additional Machine Learning, AI, Data Mining, Statistics, Databases, Data Sets, and other online resources.
See More
How can I help you today?