Math 285 Course Page

Back to my homepage

MATH 285: Selected Topics in High Dimensional Data Modeling

Fall 2015, San Jose State University

Course description

This is an advanced topics course in machine learning with big data [syllabus]. Topics to be covered include:
  1. Singular value decomposition (SVD)
  2. Dimensionality Reduction
  3. Spectral Clustering
  4. Subspace Clustering
  5. Compressive Sensing
  6. Dictionary Learning
and their applications to image processing. There is no required textbook; we will cover material from various sources (papers, websites, etc.).

Useful textbooks

Some chapters of the following books have overlap with the material taught in this course:


Course project

This course ends with a project that should be reported in the form of an oral presenation in class and/or a report (see here for instructions).

Projects completed by students (ordered by receipt time)

  1. Out of sample extension of PCA, Kernel PCA, and MDS [slides] [report]
  2. Diffusion maps [slides]
  3. Independent Component Analysis (ICA) [slides]
  4. Wine Clustering [slides]
  5. Machine Learning on Lipsticks Decision [slides] [report]
  6. Kernel Spectral Curvature Clustering (KSCC) [slides] [report]
  7. Kmeans ++ and Kmeans Parallel [slides]
  8. An Improved Approach for Image Matching Using Principle Component Analysis(PCA) [slides] [report]
  9. Three Dimensional Motion Tracking using Clustering [slides] [report]
  10. Linear Discriminant Analysis (LDA) [slides]
  11. Introduction to Independent Component Analysis [slides] [report]
  12. Data Clustering with Commute Time Distance [slides]
  13. Movie Rating Prediction [slides] [report]
  14. Support Vector Machine With Data Reduction [report]
  15. Introducing Locally Linear Embedding (LLE) as a Method for Dimensionality Reduction [report]
  16. K-means vs GMM & PLSA [report]
  17. Ordinal MDS and Spectral Clustering on Students Knowledge and Performance Status and Toy Data [report]

Learning resources

MATLAB resources

Suggested papers

Principal Component Analysis (PCA)

Multidimensional Scaling (MDS)

Isometric Feature Map (ISOmap)

Kernel Principal Componenet Analysis (Kernel PCA)

  • This is a relatively easy-to-read paper on Kernel PCA (you can ignore the sections about active shape models)
  • Here is a nice blog that tries to explain Kernel PCA with the Gaussian kernel (also called RBF kernel)
  • Read this paper for mathematical derivation of Kernel PCA; the longer version of the paper is available at this link

Clustering basics and kmeans clustering

See below for two excellent lectures: How to initialize kmeans:
  • kmeans++ [slides] [paper]. It has been implemented in Matlab 2014b as the default.
  • kmeans// (parallelized kmeans++ for large data sets) [paper]
How to determine the number of clusters:

Spectral clustering

  • A (long) tutorial on spectral clustering [paper]
  • Normalized cuts and image segmentation [paper] [software]
  • On spectral clustering: analysis and an algorithm [paper]
  • Self-tuning spectral clustering [paper] [webpage]

Subspace clustering

Dictionary learning

Data sets

Useful course websites

Instructor feedback

This is an experimental course in data science, being taught at SJSU for the first time. Your feedback (as early as possible) is encouraged and greatly appreciated, and will be seriously considered by the instructor for improving the course experience for both you and your classmates. Please submit your annonymous feedback through this page.