Big Apple - Small Byte: Specgram Shazam

AudioLab Documentation

Power Spectral Density

Intro to PSD ( Human Friendly )

INSPECTING THE SPECGRAM FUNCTION
Returns the following:

Pxx - "a len(times) x len(freqs) array of power" aka Periodogram. Pxx is the estimation of the DFT of a function.

WOAH ASSUMED WRONG RETURN ORDER - documentation lists two different orders. We were looking at the wrong one. Now the data association makes sense - each item in the Pxx array corresponds to a time (where in the input it originated from), and there is a single array which describes the frequencies from the (overall) input's FFT.

Eesh!

FINDING A HASHING FUNCTION
Shazam hashes short segments - 30 peaks per second are hashed and saved. Target zone hashes are generated from the input and queried against the database. We discussed hashing the entire song based on peak distance. Peak 1 to peak 2, peak 2 to peak 3... but "when your target hash value has fewer bits than what's being hashed, then uniqueness cannot ever be guaranteed." Length is an issue - we would either have to enforce a given length or squeeze enough data out of a very short song to match the data from the longer song. This could lead to false positives. I don't like it.

How do we create a function that is, as the Shazam paper suggested is necessary, reproducible independent of position within an audio file?

===========================================

FAST FORWARD - JUL 6
===========================================

Decision made to implement the Shazam algorithm.

Big Apple - Small Byte

Tuesday, July 3, 2012

Specgram Shazam - Research

No comments:

Post a Comment