Math 285 Course Page

MATH 285: Selected Topics in High Dimensional Data Modeling

Fall 2015, San Jose State University

Course description

This is an advanced topics course in machine learning with big data [syllabus]. Topics to be covered include:
  1. Singular value decomposition (SVD)
  2. Dimensionality Reduction
  3. Spectral Clustering
  4. Subspace Clustering
  5. Compressive Sensing
  6. Dictionary Learning
and their applications to image processing. There is no required textbook; we will cover material from various sources (papers, websites, etc.).

Useful textbooks

Some chapters of the following books have overlap with the material taught in this course:


Course project

This course ends with a project that should be reported in the form of an oral presenation in class and/or a report (see here for instructions).

Projects completed by students (ordered by receipt time)

  1. Out of sample extension of PCA, Kernel PCA, and MDS [slides] [report]
  2. Diffusion maps [slides]
  3. Independent Component Analysis (ICA) [slides]
  4. Wine Clustering [slides]
  5. Machine Learning on Lipsticks Decision [slides] [report]
  6. Kernel Spectral Curvature Clustering (KSCC) [slides] [report]
  7. Kmeans ++ and Kmeans Parallel [slides]
  8. An Improved Approach for Image Matching Using Principle Component Analysis(PCA) [slides] [report]
  9. Three Dimensional Motion Tracking using Clustering [slides] [report]
  10. Linear Discriminant Analysis (LDA) [slides]
  11. Introduction to Independent Component Analysis [slides] [report]
  12. Data Clustering with Commute Time Distance [slides]
  13. Movie Rating Prediction [slides] [report]
  14. Support Vector Machine With Data Reduction [report]
  15. Introducing Locally Linear Embedding (LLE) as a Method for Dimensionality Reduction [report]
  16. K-means vs GMM & PLSA [report]
  17. Ordinal MDS and Spectral Clustering on Students Knowledge and Performance Status and Toy Data [report]

Learning resources

MATLAB resources

Suggested papers

Principal Component Analysis (PCA)

Multidimensional Scaling (MDS)

Isometric Feature Map (ISOmap)

Kernel Principal Componenet Analysis (Kernel PCA)

  • This is a relatively easy-to-read paper on Kernel PCA (you can ignore the sections about active shape models)
  • Here is a nice blog that tries to explain Kernel PCA with the Gaussian kernel (also called RBF kernel)
  • Read this paper for mathematical derivation of Kernel PCA; the longer version of the paper is available at this link

Clustering basics and kmeans clustering

See below for two excellent lectures: How to initialize kmeans:
  • kmeans++ [slides] [paper]. It has been implemented in Matlab 2014b as the default.
  • kmeans// (parallelized kmeans++ for large data sets) [paper]
How to determine the number of clusters:

Spectral clustering

  • A (long) tutorial on spectral clustering [paper]
  • Normalized cuts and image segmentation [paper] [software]
  • On spectral clustering: analysis and an algorithm [paper]
  • Self-tuning spectral clustering [paper] [webpage]

Subspace clustering

Dictionary learning

Data sets

Useful course websites

