TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Show HN: ReadToMe (iOS) turns paper books into audio

74 pointsby kolchinskiover 1 year ago
I&#x27;m launching something that started as a side project publicly today: ReadToMe, which is an iPhone app that turns paper books and other printed text into audio.<p>Originally this was a Christmas present for my fiancée, who loves books but has an eye problem that makes it hard for her to read more than a few pages at a time. She mostly listens to audiobooks while following along with the paper book, but some books aren&#x27;t available in audiobook or even e-book form, and all of the existing apps we tried were surprisingly bad at scanning paper books into audio — they make lots of mistakes, include footnotes and page numbers, etc., in a way that really degrades the experience.<p>Being an AI-oriented engineer by training, I had a crack at solving the problem myself, and was pleasantly surprised at how well the proof of concept worked. I then had some time free while shutting down my previous company (Mezli, YC W21), during which I polished up the app to the point you see it at now.<p>The way it works:<p>On the front end, it&#x27;s a SwiftUI app (mostly written by ChatGPT!) that consists mostly of a document scanner (VNDocumentCameraViewController) and a custom-built audio player.<p>The back end is more complex — book photos are first sent to an OCR API, then some custom code I wrote does a first pass at stitching together and correcting the results. Then, the corrected OCR results are sent to GPT-3.5-turbo for further post-processing and re-stitching together, and finally to a text-to-speech API for conversion to audio.<p>The hardest part of this process was actually getting the GPT calls right — I ended up writing a custom LLM eval framework for making sure the LLM wasn&#x27;t making edits relative to the true text of the book.<p>A few issues remain, which I&#x27;ll work on fixing if the app gets a significant amount of traction, including:<p>1) It can take multiple minutes to get audio back from a scan, especially if it&#x27;s on the longer side (10+ pages). I&#x27;ll be able to bring this down by spinning up dedicated servers for the OCR and TTS back-end.<p>2) The LLM sometimes does TOO good of a job at correcting &quot;mistakes&quot; in book text. This issue crops up particularly often when an author deliberately uses improper grammar, e.g. in dialogue.<p>The app is priced at $9.99&#x2F;month for up to 250 pages&#x2F;month right now, which I estimate will just about cover the costs of API calls. I&#x27;ll be bringing the price point down as the pricing of the required AI APIs comes down. There&#x27;s also a 3-day free trial if you want to try it out.<p>If you do find this useful, or know somebody who might, I&#x27;d appreciate you giving it a try or letting them know! And please let me know if you have any feedback, including issues or feature requests.

14 comments

spacemanspiff01over 1 year ago
It seems to me that there are 3 independent issues.<p>1 scanning the books to text.<p>2 reading text to the user.<p>3 having a good interface.<p>Number 1 seems to be where you put the most effort, along with 3.<p>I guess at least for me, there are often digital copies of books, either in epub or Kindle. When that&#x27;s available those should be used.<p>And if it is not available, wouldn&#x27;t it make more sense to have document scanner to epub?<p>I guess I&#x27;m just thinking that it is relatively rare that you really need to document scanning in order to get an audio book. Since most of the cost seems to be from document scanner side, it seems worthwhile to split them up.<p>And also seems like it would make sense to think of these as 2 separate products. Specialized document scanning, and audio generation. I can definitely see uses for one without the other.
评论 #39283378 未加载
LeoNatan25over 1 year ago
“Scan up to 250 printed pages per month for $9.99&#x2F;mo”<p>I’m sorry, but LOL. Not even a full book.<p>That has to be one of the most terrible business models. I guess it’s in line with most app subscription models these days, only much worse. And if the excuse is “well it costs me too much on Azure and the phone native APIs are not good enough”, perhaps the answer is “don’t do it then”. No thanks.
评论 #39288306 未加载
brothover 1 year ago
Love this but I have concerns with the price. You can usually find an audiobook corresponding to a paper book for relatively cheap. Services like Audible are a little more per month but you get more audio books. Given the 250 page per month limit at $9.99, how will this compete?
评论 #39283347 未加载
moritz64over 1 year ago
Is there something like this for epubs or pdfs with a truly high-quality TTS?<p>All apps that I know of use iOS internal TTS (sounds awful, not as good as Siri). Then is also Voice Dream Reader and even with the paid premium voices it is still not pleasant to listen to. Siri-grade TTS or Elevenlabs would be pleasant enough, though.
评论 #39289129 未加载
评论 #39287263 未加载
评论 #39287337 未加载
评论 #39311801 未加载
评论 #39287430 未加载
评论 #39287056 未加载
ummonkover 1 year ago
Were the onboard text recognition and speech synthesis APIs not good enough for this task?
评论 #39283321 未加载
ssttooover 1 year ago
Next step: turn the book into a 3D video.<p>I recently read an Isaac Asimov book where he was describing a device that takes a book and acts it out for you. Made me think we’re probably pretty close.
评论 #39287125 未加载
closetkantianover 1 year ago
Could you make a video showing how it works? I don&#x27;t have any iOS devices but would love to recommend to friends&#x2F;family. Thanks.
评论 #39261473 未加载
carbone_12over 1 year ago
OP - this is an incredible project! I worked on something similar (<a href="https:&#x2F;&#x2F;oration.app" rel="nofollow">https:&#x2F;&#x2F;oration.app</a>) and really love your idea of using CV&#x2F;OCR. I&#x27;ll certainly be giving your app a try
rickcarlinoover 1 year ago
I have been looking for a product like this for years, I hope you can bring the price down eventually. In the past I used one of those OCR pens that you can find on Amazon but I found that they were too slow to be of practical use.<p>Very excited to see all the cool things people publish once LLM pricing drops.
aryamaanover 1 year ago
If you don’t mind me asking what do you use TTS?
评论 #39258574 未加载
blatherardover 1 year ago
Sounds cool, have you looked into potential copyright issues?
评论 #39283762 未加载
评论 #39284177 未加载
Gysover 1 year ago
Love it! But should be for all languages
评论 #39283298 未加载
tamimioover 1 year ago
&gt; Turn any book into an audiobook<p>English book.
评论 #39255988 未加载
quickthrower2over 1 year ago
Funny. Felt like another (eyeroll) AI thing, until I read your story here. So definitely use this story in your marketing too! Also the story gives the impression of attention to detail because of why you did it, which is good to know.