Tuesday, July 3, 2012

Specgram Shazam - Research

AudioLab Documentation

Power Spectral Density
Intro to PSD ( Human Friendly ) 
INSPECTING THE SPECGRAM FUNCTION
Returns the following:
Pxx - "a len(times) x len(freqs) array of power" aka PeriodogramPxx is the estimation of the DFT of a function.
WOAH ASSUMED WRONG RETURN ORDER - documentation lists two different orders. We were looking at the wrong one. Now the data association makes sense - each item in the Pxx array corresponds to a time (where in the input it originated from), and there is a single array which describes the frequencies from the (overall) input's FFT.


Eesh!


FINDING A HASHING FUNCTION  
Shazam hashes short segments - 30 peaks per second are hashed and saved. Target zone hashes are generated from the input and queried against the database. We discussed hashing the entire song based on peak distance. Peak 1 to peak 2, peak 2 to peak 3... but  "when your target hash value has fewer bits than what's being hashed, then uniqueness cannot ever be guaranteed." Length is an issue - we would either have to enforce a given length or squeeze enough data out of a very short song to match the data from the longer song. This could lead to false positives. I don't like it. 


How do we create a function that is, as the Shazam paper suggested is necessary, reproducible independent of position within an audio file?


===========================================

FAST FORWARD - JUL 6
===========================================

Decision made to implement the Shazam algorithm.

No comments:

Post a Comment