TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Show HN: NLP Flashcards for Most of the Internet

29 点作者 samjgorman超过 3 年前
Hello HN! We&#x27;re Sam and Kanyes. We&#x27;re building an extension to help you remember what you read online. We&#x27;re calling it Ferret [1].<p>When you open Ferret on an HTML page, it generates recall-based questions + answers to reinforce key concepts with NLP. Consider the following toy example where we open Ferret on an explanation of Bayesian statistics. [2]<p>Q: What does the frequentist interpretation view probability as? A: the limit of the relative frequency of an event after many trials<p>Q: What is often computed in Bayesian statistics using mathematical optimization methods? A:The maximum a posteriori<p>We do this by (1) Parsing the DOM tree of an HTML page for &lt;p&gt; tags on the client, and segmenting these into preprocessed chunks (2) Performing inference on question-generation with a T5-base model pretrained on SQuAD (3) Extractive question-answering with the chunk &amp; question we&#x27;ve generated with RoBERTa, also pretrained on SQuAD.<p>No GPT-3 here— where&#x27;s the fun in an API call when you can do it yourself. Ferret is built as a React.JS app deployed as a chrome extension, with models hosted on AWS Sagemaker.<p>Finally, why could this be helpful? Human memory is lossy. Psychologists have shown for forever that your memory can be modeled with a forgetting curve. If you don&#x27;t attempt to retain knowledge, you&#x27;ll likely lose it. But most of the content we read online (technical blog posts, documentation, course notes, articles) gets ingested and quickly forgotten. We&#x27;re interested in low-friction approaches to helping people better remember this content , starting with fellow engineers who depend on their ability to remember key concepts to do the best job.<p>We&#x27;ve open-sourced the full repo and are actively responding to PRs + issues. [3]. You can read more about the technical + product challenges we faced if that interests you as well. [4]<p>We appreciate all feedback and suggestions!<p>[1]https:&#x2F;&#x2F;chrome.google.com&#x2F;webstore&#x2F;detail&#x2F;ferret&#x2F;mjnmolplinickaigofdpejfgfoehnlbh [2] https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Bayesian_statistics<p>[3] https:&#x2F;&#x2F;github.com&#x2F;kanyesthaker&#x2F;qgqa-flashcards<p>[4] https:&#x2F;&#x2F;samgorman.notion.site&#x2F;Ferret-c7508ec65df841859d1f84e518fcf21d

5 条评论

kanyesrthaker超过 3 年前
Hi, Kanyes here from Ferret. Starting the discussion by sharing an unsolved technical hurdle that may be of interest. We made a decision early in development to perform all inference on CPU to avoid unfriendly production costs and inefficiencies processing single inputs instead of batches.<p>Sequential models like T5 tend to be large (300mb &gt;), and we observed high latency per inference of approx 8s. We&#x27;ve masked this latency on the frontend, mainly sending concurrent requests with async code (4 at a time) and preloading content early. However, this is kind of hacky and we&#x27;d (ideally) want to reduce inference time.<p>To this end, we&#x27;ve demonstrated roughly 1.7x speedup by converting our model weights in pytorch to a quantized ONNX graph. However, we&#x27;ve found a lot of friction in trying to deploy ONNX graphs to AWS. We understand there are a variety of potential solutions (training smaller distilled models, deploying ONNX, contesting our rationale to use CPU etc), so we&#x27;re looking for suggestions for the optimal method to make inference faster!
shainvs超过 3 年前
Aside from challenges regarding per inference latency, any other unique challenges you guys faced when deploying nlp models to web? It&#x27;s pretty cool to see ml being applied more actively in day-to-day web browsing.
评论 #28360585 未加载
juliechen超过 3 年前
Really could’ve used this during my time in school, good work. What’s the GTM, and who are the ideal users your building for here?
评论 #28361762 未加载
sealeck超过 3 年前
Is there any way to export flashcards to Anki?
sealeck超过 3 年前
Wow this is cool!
评论 #28361697 未加载