This is a good illustration of the core problem we have in "anomaly detection" in data science. Often we are presented with a challenge that if solved, would negate the presence of the challenge itself: We have to look for events that aren't explained or predicted to exist by our current understanding of the given system. To find them, we collect all events and evaluate their likelihood under our best model, taking the least-likely as our "anomalous" events. Then, once found, we have to explain them. But to explain them requires that we understand the system well enough to predict the existence of those events. If we did, we could have produced a better model, and that model would have rated those events as more likely. So they wouldn't have shown up. This contradiction seems to be inherent to the whole concept of anomaly detection.