[Paper Reading] Learning to hash for indexing big data — A survey
Problem Definition
An overview of learning to hash for indexing big data
Introduction
- Big data! There is an increasing volume of data including text, images, audio, etc.
- How to deal with such gigantic databases?
- Search cost > storage cost
- Rising Content-based Image Retrieval (CBIR)
- The time complexity to compare a query point q with all the data X takes O(|X|). Too slow!
- The data dimension is also increasing (the curse of dimensionality)!
- Approximate Nearest Neighbors (ANNs) is often enough
- Fast indexing O(|X|) -> O(log|X|) or even O(1), such as tree-based indexing methods
- Tree-based indexing requires high storage cost and degrades when handling high-dimensional data
- Hashing methods! Partition database into several parts and use hash ‘bits’ to represent them
- Hashing methods have low storage cost and time complexity
- “Learning to hash”: to learn data-dependent and task-specific hash functions that yield compact binary codes to achieve good search accuracy
Notations
Pipeline of Hashing-based ANN Search
- Design hash functions
2. Generate hash codes and indexing the database items
3. Online query using hash codes
Randomized Hashing Methods
Random Projection Based Hashing
Random Permutation based Hashing
Categories of Learning Based Hashing Methods
Data-Dependent vs. Data-Independent
Unsupervised, Supervised, and Semi-Supervised
Pointwise, Pairwise, Triplet-wise and Listwise
Linear vs. Nonlinear
Single-Shot Learning vs. Multiple-Shot Learning
Non-Weighted vs. Weighted Hashing
Methodology Review and Analysis
Spectral Hashing
Anchor Graph Hashing
Angular Quantization Based Hashing
Binary Reconstructive Embedding
Metric Learning based Hashing
Semi-Supervised Hashing
Column Generation Hashing
Ranking Supervised Hashing
Circulant Binary Embedding
Deep Learning for Hashing
Advanced Methods and Related Applications
Hyperplane Hashing
Subspace Hashing
MultiModality Hashing
Applications with Hashing
Open Issues and Future Directions
Reference
Wang, Jun, et al. “Learning to hash for indexing big data — A survey.” Proceedings of the IEEE 104.1 (2016): 34–57.