This doesn't replicate using gpt-4o-mini, which always picks Flight B even when Flight A is made somewhat more attractive.<p>Source: just ran it on 0-20 newlines with 100 trials apiece, raising temperature and introducing different random seeds to prevent any prompt caching.
While I love XAI and am always happy to see more work in this area, I wonder if other people use the same heuristics as me when judging a random arxiv link. This paper has one author, was not written in latex, and no comment referencing a peer reviewed venue. Do other people in this field look at these same signals and pre-judge the paper negatively?<p>I did attempt to check my bias and skim the paper, it does seem well written and takes a decent shot towards understanding LLMs. However, I am not a fan of black-box explanations, so I didn't read much (I really like Sparse autoencoders). Has anyone else read the paper? How is the quality?
explainable AI just ain't there yet.<p>I wonder if the author took a class with Lipton, since he's at CMU. We literally had a lecture about Shapley Values "explaining" AI. It's BS.