[Paper Reading] Learning to hash for indexing big data — A survey

Ya-Liang Allen Chang
3 min readMar 28, 2019

--

Problem Definition

An overview of learning to hash for indexing big data

Introduction

  • Big data! There is an increasing volume of data including text, images, audio, etc.
  • How to deal with such gigantic databases?
  • Search cost > storage cost
  • Rising Content-based Image Retrieval (CBIR)
  • The time complexity to compare a query point q with all the data X takes O(|X|). Too slow!
  • The data dimension is also increasing (the curse of dimensionality)!
  • Approximate Nearest Neighbors (ANNs) is often enough
  • Fast indexing O(|X|) -> O(log|X|) or even O(1), such as tree-based indexing methods
  • Tree-based indexing requires high storage cost and degrades when handling high-dimensional data
  • Hashing methods! Partition database into several parts and use hash ‘bits’ to represent them
  • Hashing methods have low storage cost and time complexity
  • “Learning to hash”: to learn data-dependent and task-specific hash functions that yield compact binary codes to achieve good search accuracy

Notations

Pipeline of Hashing-based ANN Search

  1. Design hash functions

2. Generate hash codes and indexing the database items

3. Online query using hash codes

Randomized Hashing Methods

Random Projection Based Hashing

Random Permutation based Hashing

Categories of Learning Based Hashing Methods

Data-Dependent vs. Data-Independent

Unsupervised, Supervised, and Semi-Supervised

Pointwise, Pairwise, Triplet-wise and Listwise

Linear vs. Nonlinear

Single-Shot Learning vs. Multiple-Shot Learning

Non-Weighted vs. Weighted Hashing

Methodology Review and Analysis

Spectral Hashing

Anchor Graph Hashing

Angular Quantization Based Hashing

Binary Reconstructive Embedding

Metric Learning based Hashing

Semi-Supervised Hashing

Column Generation Hashing

Ranking Supervised Hashing

Circulant Binary Embedding

Deep Learning for Hashing

Advanced Methods and Related Applications

Hyperplane Hashing

Subspace Hashing

MultiModality Hashing

Applications with Hashing

Open Issues and Future Directions

Reference

Wang, Jun, et al. “Learning to hash for indexing big data — A survey.” Proceedings of the IEEE 104.1 (2016): 34–57.

--

--

No responses yet