A few comments:<p>- My understanding is that a gamma chirp is the established filter to use for an auditory filter bank--any reason you choose an elliptical filter instead?<p>- I didn't look too closely, but it seems like you are analyzing the output of the filter bank as real numbers. I highly recommend you convolve with a complex representation of the filter and keep all of the math in the complex domain until you collapse to loudness.<p>- I'd not bucket to discrete 100hz time slices, instead just convolve the temporal masking function with the full time resolution of the filter bank output.<p>- You want to think about some volume normalization step that would give the final minimized Zimtohrli distance metric between A and B*x, where x is a free variable for volume. Otherwise, a perceptual codec that just tends to make things a bit quieter might get a bad score.<p>- For fletcher munson, I assume you are just using a curve at a high-ish volume? If so, good :)<p>- Not sure how you are spacing filter bank center frequencies relative to ERB size, but I'd recommend oversampling by a factor of 2-3. (That is, a few filters per ERB).<p>Apologies if any of these are off base--I just took a quick look.