TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

ArXiv Papers as Audiobooks

105 pointsby Acsmaggartabout 1 year ago

13 comments

Uehrekaabout 1 year ago
When I’ve tried listening to YouTube videos explaining, say, Attention Is All You Need, I find that I cannot do it passively at all. The first 10 or so minutes I’m nodding along, folding laundry or doing dishes, then the presenter says something like “by reifying this tensor against the priors I was just talking about, we’re able to—“ and I have to pause, rewind a couple minutes, grab a piece of paper and actually engage with what’s going on.<p>I have to imagine listening to raw papers (not even someone like Andrei Karpathy interpreting and presenting it) would be even more difficult. I don’t know if there’s an easy way to passively consume academic literature at all. If it’s important stuff, it will usually be pretty challenging.
评论 #39721136 未加载
评论 #39720711 未加载
评论 #39721901 未加载
评论 #39721133 未加载
评论 #39720946 未加载
评论 #39720826 未加载
neuronexmachinaabout 1 year ago
The LLM prompts are pretty interesting, e.g.: <a href="https:&#x2F;&#x2F;github.com&#x2F;imelnyk&#x2F;ArxivPapers&#x2F;blob&#x2F;main&#x2F;gpt&#x2F;utils.py#L239">https:&#x2F;&#x2F;github.com&#x2F;imelnyk&#x2F;ArxivPapers&#x2F;blob&#x2F;main&#x2F;gpt&#x2F;utils.p...</a><p>&gt; &quot;You are an ArXiv paper audio paraphraser. Your primary goal is to rephrase the original paper content while preserving its overall meaning and structure, but simplifying along the way, and make it easier to understand. In the event that you encounter a mathematical expression, it is essential that you verbalize it in straightforward nonlatex terms, while remaining accurate, and in order to ensure that the reader can grasp the equation&#x27;s meaning solely through your verbalization. Do not output any long latex expressions, summarize them in words.&quot;
评论 #39720974 未加载
Almondsetatabout 1 year ago
Papers are already difficult to process when reading them carefully multiple times, what even is the point of turning them into an audio version? I am genuinely at a loss, unless we are talking about blind people
评论 #39720873 未加载
se4uabout 1 year ago
Many years ago, I did that when I had a large paper reviewing load during my phd. My solution was simply to purchase an app called SayIt for like a dollar that read the pdf to me, worked really well.<p>Nowadays I often pass the pdf through LLMs to get personalize (expand on jargon or contract the verbiage) and then read them. That gives me a better return on time spent.
Acsmaggartabout 1 year ago
I had been daydreaming a couple of weeks ago about being able to listen to papers while driving or doing repetitive tasks, and it looks like there is now a YouTube channel where these get posted:<p><a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;@ArxivPapers" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;@ArxivPapers</a><p>The pipeline seems to do a pretty good job of cleaning up the writing too, some ArXiv papers are a little rough.<p>(I&#x27;m not the project owner)
pulpfictionalabout 1 year ago
I&#x27;ve been looking for a good way to TTS longer PDFs and EPUBs into recordings so I may listen to them on the go. I&#x27;d like to take advantage of high quality TTS models but I&#x27;d prefer it to be one I may host myself.<p>Haven&#x27;t found the right way yet, I&#x27;m considering: <a href="https:&#x2F;&#x2F;github.com&#x2F;MycroftAI&#x2F;mimic3">https:&#x2F;&#x2F;github.com&#x2F;MycroftAI&#x2F;mimic3</a>
评论 #39721505 未加载
评论 #39721075 未加载
评论 #39721055 未加载
mrkramerabout 1 year ago
I had a similar idea but what happens when you stumble upon code, equations, tables, graphs etc.? Can LLM understand that as well?<p>For example; you are listening to the paper with some text2speech model and then it stumbles open code snippet or table or graph....what should happen next? Should model skip it or prompt you to look at the graph or table or whatever. Or should you write some software that tries to interpret graphs and other non-text content.
julienchastangabout 1 year ago
I am still trying to understand this, but it seems like the potential here is tremendous. For example, you can imagine producing audio tailored to the sophistication of the reader where a layperson may wish a more basic interpretation than a subject-matter expert. Really looking forward to seeing where this goes for the dissection and understanding of scientific publications.
mdanielabout 1 year ago
Did you purposefully omit a license?<p>I really do wish GitHub would prompt its repo owners &quot;did you forget a license?&quot;, but I also wish it would prompt them for adding &quot;topics&quot; to enhance discovery and I guess I&#x27;ll just continue to hold my breath on those
josh-sematicabout 1 year ago
<a href="https:&#x2F;&#x2F;www.listening.com&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.listening.com&#x2F;</a> does this as a service.FWIW I haven’t tried it myself.<p>Edit: looks like they support a few traditional publishers as well.
评论 #39720950 未加载
评论 #39721067 未加载
评论 #39720746 未加载
neuronexmachinaabout 1 year ago
It&#x27;d be interesting to also have these generate a slide presentation explaining a paper via some combination of presentation markdown, MermaidJS, and an image generator.
calebkaiserabout 1 year ago
I started working on a version of this just the other night—thank you for saving me the time! This is awesome.
mathgradthrowabout 1 year ago
Audiobooks make sense for thibgs which are communicated as fast as speech. Like stories.