{{"about.title" | translate}} {{"about.subtitle" | translate}}

Perceptual hashing

Difference hash (dhash)
Median hash (mhash)
phash
Wavelet hash (whash)
fastimagehash vs imagehash performance comparison (phash)

Hash query in PostgreSQL

To check how similar two images hashes are, we need to compute the Hamming distance (or Hamming weight), which is the number of bits that are different in each hash.

For a 64-bit hash, an hamming distance of 0 indicates a very strong match, while values higher than 10 usually suggests that the images are significantly different.

On a processor with the POPCNT instruction (SSE4), one can calculate the Hamming distance in four instructions per 64-bit chunk. The domain-specific PostgreSQL module used in this project does not have any boundary checking or loops, and is essentially as fast as a typical sequential scan.

hash1 = 10110110
hash2 = 10010101

mov     rax, [hash1]
xor     rax, [hash2]  ; rax = hash1 XOR hash2 = 00100011
popcnt  rax, rax      ; rax = popcount(00100011) = 3

Project overview

Database schema (some hashes omitted)
High level overview