Data Mining: Learning from Lar - ETH Zurich (Swi

Tuition Fee

Not Available

Start Date

Not Available

Medium of studying

Not Available

Duration

Not Available

Details

Program Details

Degree

Masters

Course Language

English

Intakes

Program start date	Application deadline
2013-02-19	-

About Program

Program Overview

Overview

The Data Mining: Learning from Large Data Sets program is a graduate-level course that focuses on applying, analyzing, and evaluating state-of-the-art techniques from statistics, algorithms, and discrete and convex optimization for learning from large data sets.

Topics

Dealing with large data (Data centers; Map-Reduce/Hadoop; Amazon Mechanical Turk)
Fast nearest neighbor methods (Shingling, locality sensitive hashing)
Online learning (Online optimization and regret minimization, online convex programming, applications to large-scale Support Vector Machines)
Multi-armed bandits (exploration-exploitation tradeoffs, applications to online advertising and relevance feedback)
Active learning (uncertainty sampling, pool-based methods, label complexity)
Dimension reduction (random projections, nonlinear methods)
Data streams (Sketches, coresets, applications to online clustering)
Recommender systems

Details

VVZ Information:
Recitations:
- Tue 13-14 in CAB G 61. Last names starting with A-L
- Fri 14-15 in NO C 6. Last names starting with M-Z
Textbook: A. Rajaraman, J. Ullman. Mining of Massive Data Sets.

Homeworks

Self Assessment Questions
Homework 1
Homework 2
Homework 3
Homework 4
Homework 5
Homework 6

Solutions

Homework 1
Homework 2
Homework 3
Homework 4
Homework 5
Homework 6

Lecture Notes

February 19: Introduction
February 26: Approximate Retrieval; Min-hashing
March 5: Locality sensitive hashing
March 12: SVMs; online convex programming
March 19: (Parallel) stochastic gradient descent
March 26: Feature selection via l1-regularization; multi-class/structured prediction
April 9: Active Learning
April 16: Large scale unsupervised learning (Online k-means, coresets)
April 23: Large scale unsupervised learning (Online EM, coresets, anomaly detection)
April 30: Exploration--exploitation tradeoffs (k-armed bandits, upper confidence sampling)
May 7: Contextual bandits
May 14: Submodular functions (properties, algorithms and applications)
May 28: Recommending sets (structured prediction, online submodular optimization)

Recitations

Feb 26: Hadoop tutorial
Mar 5: LSH & NN
Mar 12: SVM
Mar 19: Project 1 - Approximate Retrieval
April 9: Online SVMs (HW3/Loss Functions/L1 Regularization)
April 16: Project 2 - Large-scale Classification
April 23: Active Learning
April 30: Unsupervised Learning
May 7: Exploration-Exploitation
May 14: Project 3 - Recommender Systems

Old Exams

Data Mining Exam, 2012 Spring

Relevant Readings

Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified Data Processing on Large Clusters.
Jure Leskovec, Eric Horvitz. Planetary-Scale Views on a Large Instant-Messaging Network.
Manuel Gomez Rodriguez, Jure Leskovec, Andreas Krause. Inferring Networks of Diffusion and Influence,
James Hays, Alexei A. Efros. Scene Completion Using Millions of Photographs.
Kevin Beyer, Jonathan Goldstein, Raghu Ramakrishnan and Uri Shaft. When is "Nearest Neighbor" Meaningful?.
Aristides Gionis, Piotr Indyk, Rajeev Motwani. Similarity Search in High Dimensions via Hashing
Martin Zinkevich. Online Convex Programming and Generalized Infinitesimal Gradient Ascent.
Martin Zinkevich, Markus Weimer, Alex Smola, Lihong Li. Parallelized Stochastic Gradient Descent.
Ji Zhu, Saharon Rosset, Trevor Hastie, Rob Tibshirani. L1 norm support vector machines.
John Duchi, Shai Shalev-Shwartz, Yoram Singer, Tushar Chandra. Efficient Projections onto the l1-Ball for Learning in High Dimensions.
Nathan Ratliff, J. Andrew (Drew) Bagnell, and Martin Zinkevich. (Online) Subgradient Methods for Structured Prediction.
Prateek Jain, Sudheendra Vijayanarasimhan, Kristen Grauman. Hashing Hyperplane Queries to Near Points with Applications to Large-Scale Active Learning.
Simon Tong, Daphne Koller. Support Vector Machine Active Learning with Applications to Text Classification.
Dan Feldman, Morteza Monemizadeh, Christian Sohler. A PTAS for k-Means Clustering Based on Weak Coresets.
Chris Bishop. Pattern Recognition and Machine Learning.
Percy Liang, Dan Klein. Online EM for Unsupervised Models.
Dan Feldman, Matthew Faulkner, Andreas Krause. Scalable Training of Mixture Models via Coresets.
Peter Auer, Nicolo Cesa-Bianchi, and Paul Fischer. Finite-time analysis of the multiarmed bandit problem.
Lihong Li, Wei Chu, John Langford, and Robert E. Schapire. A contextual-bandit approach to personalized news article recommendation.
Khalid El-Arini, Gaurav Veda, Dafna Shahaf and Carlos Guestrin. Turning Down the Noise in the Blogosphere.
Matthew Streeter, Daniel Golovin. An Online Algorithm for Maximizing Submodular Functions.
Matthew Streeter, Daniel Golovin, Andreas Krause. Online Learning of Assignments.
Andreas Krause, Daniel Golovin. Submodular Function Maximization.
Yisong Yue, Carlos Guestrin. Linear Submodular Bandits and their Application to Diversified Retrieval.

Related Courses

CS345a: Data Mining at Stanford University
15-826: Multimedia Databases and Data Mining at Carnegie Mellon University
CS/CNS/EE 253: Advanced Topics in Machine Learning at Caltech

Data Mining: Learning from Large Data Sets

Program Overview

Overview

Topics

Details

Homeworks

Solutions

Lecture Notes

Recitations

Old Exams

Relevant Readings

Related Courses

Log in to continue