The open source ranking library is really interesting. It's using a type of merge sort where the comparator function is an llm comparing (but doing batches >2 for fewer calls).<p>Reducing problems to document ranking is effectively a type of test-time search - also very interesting!<p>I wonder if this approach could be combined with GRPO to create more efficient chain of thought search...<p><a href="https://github.com/BishopFox/raink?tab=readme-ov-file#description">https://github.com/BishopFox/raink?tab=readme-ov-file#descri...</a>
One interesting thing about LLMs, that is also related to why chain of thoughts work so well, is that they are good at sampling (saying a lot of things about a problem), and are good, when shown N solutions, to point at the potentially better one. They do these things better than zero-shot "tell me how to do that". So CoT is searching inside the space of representation + ranking, basically. So this idea is leveraging something LLMs are able to clearly do pretty well.
This furthers an idea I've had recently that we (and the media) are focusing too much on creating value by making more ever more complex LLMs, and instead we are vastly underestimating creative applications of current generation AI.
A concept that I've been thinking about a lot lately: transforming complex problems into document ranking problems to make them easier to solve. LLMs can assist greatly here, as I demonstrated at inaugural DistrictCon this past weekend.
Very cool! This is also one of my beliefs in building tools for research, that if you can solve the problem of predicting and ranking the top references for a given idea, then you've learned to understand a lot about problem solving and decomposing problems into their ingredients. I've been pleasantly surprised by how well LLMs can rank relevance, compared to supervised training of a relevancy score. I'll read the linked paper (shameless plug, here it is on my research tools site: <a href="https://sugaku.net/oa/W4401043313/" rel="nofollow">https://sugaku.net/oa/W4401043313/</a>)
Great article, I’ve had similar findings! LLM based “document-chunk” ranking is a core feature of PaperQA2 (<a href="https://github.com/Future-House/paper-qa">https://github.com/Future-House/paper-qa</a>) and part of why it works so well for scientific Q&A compared to traditional embedding-ranking based RAG systems.
So instead of testing each patch, it's faster to "read" it and see if it looks like the right kind of change to be fixing a particular bug. Neat.
Interesting insight, and funny in a way since LLMs themselves can be seen as a specific form of document ranking, i.e. ranking a list of tokens by appropriateness as continuation of a text sequence.
I see in the readme you investigated tournament style, but didn't see results.<p>How'd it perform compared to listwise?<p>Also curious about whether you tried schema-based querying to the llm (function calling / structured output). I recently tried to have a discussion about this exact topic with someone who posted about pairwise ranking with llms.<p><a href="https://lobste.rs/s/yxlisx/llm_sort_sort_input_lines_semantically#c_xk5zgz" rel="nofollow">https://lobste.rs/s/yxlisx/llm_sort_sort_input_lines_semanti...</a>
Minor nitpick,<p>Should be "document ranking reduces to these hard problems",<p>I never knew why the convention was like that, it seems backwards to me as well, but that's how it is.