We explored a novel method to gauge the significance of tokens in prompts given to large language models, without needing direct model access. Essentially, we just did an ablation study on the prompt using cosine similarity of the embeddings as the measure. We got surprisingly promising results when comparing this really simple approach to integrated gradients. Curious to hear thoughts from the community!
Very interesting research!<p>Given that you're using cosine similarity of text embeddings to approximate the influence of individual tokens in a prompt, how does this approach fare in capturing higher-order interactions between tokens, something that Integrated Gradients (allegedly) is designed to account for? Are there specific scenarios where the cosine similarity method might fall short in capturing the nuances that Integrated Gradients can reveal?
What do you consider to be an “average length” prompt? How about a “long” prompt? You mention those in the text, and I’m curious of the token-length thresholds you’re seeing before performance degrades, and if that varies more when higher-importance tokens are distributed across the length versus clustered at the beginning.
Super cool! Tried it on my prompt which tried compressing information using emojis, but those were given a low importance score. Switched the emojis out for plain text, which is given a higher importance score, and I'm seeing better results.