Data Mining: Learning from Large Data Sets
Zurich , Switzerland
Visit Program Website
Tuition Fee
Not Available
Start Date
Not Available
Medium of studying
Not Available
Duration
Not Available
Details
Program Details
Degree
Masters
Course Language
English
Intakes
| Program start date | Application deadline |
| 2013-02-19 | - |
About Program
Program Overview
Overview
The Data Mining: Learning from Large Data Sets program is a graduate-level course that focuses on applying, analyzing, and evaluating state-of-the-art techniques from statistics, algorithms, and discrete and convex optimization for learning from large data sets.
Topics
- Dealing with large data (Data centers; Map-Reduce/Hadoop; Amazon Mechanical Turk)
- Fast nearest neighbor methods (Shingling, locality sensitive hashing)
- Online learning (Online optimization and regret minimization, online convex programming, applications to large-scale Support Vector Machines)
- Multi-armed bandits (exploration-exploitation tradeoffs, applications to online advertising and relevance feedback)
- Active learning (uncertainty sampling, pool-based methods, label complexity)
- Dimension reduction (random projections, nonlinear methods)
- Data streams (Sketches, coresets, applications to online clustering)
- Recommender systems
Details
- VVZ Information:
- Recitations:
- Tue 13-14 in CAB G 61. Last names starting with A-L
- Fri 14-15 in NO C 6. Last names starting with M-Z
- Textbook: A. Rajaraman, J. Ullman. Mining of Massive Data Sets.
Homeworks
- Self Assessment Questions
- Homework 1
- Homework 2
- Homework 3
- Homework 4
- Homework 5
- Homework 6
Solutions
- Homework 1
- Homework 2
- Homework 3
- Homework 4
- Homework 5
- Homework 6
Lecture Notes
- February 19: Introduction
- February 26: Approximate Retrieval; Min-hashing
- March 5: Locality sensitive hashing
- March 12: SVMs; online convex programming
- March 19: (Parallel) stochastic gradient descent
- March 26: Feature selection via l1-regularization; multi-class/structured prediction
- April 9: Active Learning
- April 16: Large scale unsupervised learning (Online k-means, coresets)
- April 23: Large scale unsupervised learning (Online EM, coresets, anomaly detection)
- April 30: Exploration--exploitation tradeoffs (k-armed bandits, upper confidence sampling)
- May 7: Contextual bandits
- May 14: Submodular functions (properties, algorithms and applications)
- May 28: Recommending sets (structured prediction, online submodular optimization)
Recitations
- Feb 26: Hadoop tutorial
- Mar 5: LSH & NN
- Mar 12: SVM
- Mar 19: Project 1 - Approximate Retrieval
- April 9: Online SVMs (HW3/Loss Functions/L1 Regularization)
- April 16: Project 2 - Large-scale Classification
- April 23: Active Learning
- April 30: Unsupervised Learning
- May 7: Exploration-Exploitation
- May 14: Project 3 - Recommender Systems
Old Exams
- Data Mining Exam, 2012 Spring
Relevant Readings
- Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified Data Processing on Large Clusters.
- Jure Leskovec, Eric Horvitz. Planetary-Scale Views on a Large Instant-Messaging Network.
- Manuel Gomez Rodriguez, Jure Leskovec, Andreas Krause. Inferring Networks of Diffusion and Influence,
- James Hays, Alexei A. Efros. Scene Completion Using Millions of Photographs.
- Kevin Beyer, Jonathan Goldstein, Raghu Ramakrishnan and Uri Shaft. When is "Nearest Neighbor" Meaningful?.
- Aristides Gionis, Piotr Indyk, Rajeev Motwani. Similarity Search in High Dimensions via Hashing
- Martin Zinkevich. Online Convex Programming and Generalized Infinitesimal Gradient Ascent.
- Martin Zinkevich, Markus Weimer, Alex Smola, Lihong Li. Parallelized Stochastic Gradient Descent.
- Ji Zhu, Saharon Rosset, Trevor Hastie, Rob Tibshirani. L1 norm support vector machines.
- John Duchi, Shai Shalev-Shwartz, Yoram Singer, Tushar Chandra. Efficient Projections onto the l1-Ball for Learning in High Dimensions.
- Nathan Ratliff, J. Andrew (Drew) Bagnell, and Martin Zinkevich. (Online) Subgradient Methods for Structured Prediction.
- Prateek Jain, Sudheendra Vijayanarasimhan, Kristen Grauman. Hashing Hyperplane Queries to Near Points with Applications to Large-Scale Active Learning.
- Simon Tong, Daphne Koller. Support Vector Machine Active Learning with Applications to Text Classification.
- Dan Feldman, Morteza Monemizadeh, Christian Sohler. A PTAS for k-Means Clustering Based on Weak Coresets.
- Chris Bishop. Pattern Recognition and Machine Learning.
- Percy Liang, Dan Klein. Online EM for Unsupervised Models.
- Dan Feldman, Matthew Faulkner, Andreas Krause. Scalable Training of Mixture Models via Coresets.
- Peter Auer, Nicolo Cesa-Bianchi, and Paul Fischer. Finite-time analysis of the multiarmed bandit problem.
- Lihong Li, Wei Chu, John Langford, and Robert E. Schapire. A contextual-bandit approach to personalized news article recommendation.
- Khalid El-Arini, Gaurav Veda, Dafna Shahaf and Carlos Guestrin. Turning Down the Noise in the Blogosphere.
- Matthew Streeter, Daniel Golovin. An Online Algorithm for Maximizing Submodular Functions.
- Matthew Streeter, Daniel Golovin, Andreas Krause. Online Learning of Assignments.
- Andreas Krause, Daniel Golovin. Submodular Function Maximization.
- Yisong Yue, Carlos Guestrin. Linear Submodular Bandits and their Application to Diversified Retrieval.
Related Courses
- CS345a: Data Mining at Stanford University
- 15-826: Multimedia Databases and Data Mining at Carnegie Mellon University
- CS/CNS/EE 253: Advanced Topics in Machine Learning at Caltech
See More
