TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Show HN: 100K sentences mined from Wikipedia to help non-native English learners

225 pointsby abhas9almost 8 years ago

27 comments

cschmidtalmost 8 years ago
I went to an interesting talk once at the Boston Python meetup, where a guy figured out how to order sentences so could learn them in an order where you already knew the &quot;other&quot; words in the sentence. Basically, making a directed graph of vocabulary.<p>He was doing it to learn Latin, but you could do it for any language.
评论 #14448974 未加载
评论 #14449791 未加载
评论 #14449823 未加载
评论 #14451265 未加载
评论 #14450171 未加载
评论 #14451005 未加载
wodenokotoalmost 8 years ago
How do you decide on which sentences to use?<p>I&#x27;m interested in generating example sentences myself, but in a way, that chooses sentences that are simple, easy to understand and support the word, they are supposed to exemplify.<p>For example &quot;She got a <i></i>car<i></i> for her birthday, while she was traveling in Italy eating pizza&quot; does not tell the reader anything about what a car is, or how the word should be used. However &quot;He drives his car to work&quot;, is a much better example of what a car is, what is a common associated verb and how it fits in a sentence.<p>How do you optimise selection for sentence like the latter?
评论 #14451638 未加载
评论 #14448673 未加载
评论 #14450478 未加载
评论 #14450986 未加载
sengorkalmost 8 years ago
I&#x27;ve always thought that the Simple English article versions of Wikipedia were always useful for non-native English speakers. <a href="https:&#x2F;&#x2F;simple.wikipedia.org&#x2F;" rel="nofollow">https:&#x2F;&#x2F;simple.wikipedia.org&#x2F;</a><p>Most people seem to be unaware of this Wikipedia aspect.
评论 #14450163 未加载
AdmiralAsshatalmost 8 years ago
Very cool! Although I feel that sometimes you really need a human touch to make it truly comprehended. For instance, I random clicked on &quot;antediluvian&quot;:<p><a href="https:&#x2F;&#x2F;buildmyvocab.in&#x2F;antediluvian&#x2F;" rel="nofollow">https:&#x2F;&#x2F;buildmyvocab.in&#x2F;antediluvian&#x2F;</a><p>Everything here will get you a &quot;good enough&quot; understanding of what the word means, but this is the only one that really comes close to explaining the word&#x27;s literal meaning, and it&#x27;s too vague to be of much use:<p><i>any of the early patriarchs who lived prior to the Noachian deluge</i><p>A non-native speaker isn&#x27;t going to have any idea what &quot;Noachian&quot; means (a native speaker probably isn&#x27;t either unless they can explicitly identify &quot;Noah&quot; as the root), and &quot;deluge&quot; is part of the root of the word we&#x27;re defining, so simply using the word &quot;deluge&quot; without explaining what it means doesn&#x27;t really help.<p>In short, this is a good groundwork, but I think it needs a human editor to push the individual definitions from &quot;acceptable&quot; to &quot;correct&quot;.
评论 #14448736 未加载
评论 #14454187 未加载
dcsanalmost 8 years ago
I find there&#x27;s a lot of material for studying isolated words, but as an engineer, analyzing the sentence patterns and grammar is more interesting.<p>I&#x27;m working on a project to do this for a database of Chinese grammar patterns. When there&#x27;s enough sentence examples for each pattern as structured data, we can then make games and other learning tools. For example: yīnwèi &#x2F; 因为 &#x2F; because <a href="http:&#x2F;&#x2F;cgram.rikai-bots.com&#x2F;grammar&#x2F;yinwei" rel="nofollow">http:&#x2F;&#x2F;cgram.rikai-bots.com&#x2F;grammar&#x2F;yinwei</a><p>Now there&#x27;s a magnets game to try to use that pattern: <a href="http:&#x2F;&#x2F;cgram.rikai-bots.com&#x2F;magnets&#x2F;?cnames=yinwei" rel="nofollow">http:&#x2F;&#x2F;cgram.rikai-bots.com&#x2F;magnets&#x2F;?cnames=yinwei</a><p>I would be happy to share the repo with anyone who&#x27;s interested, or using the data to make some other language learning games. PS I did a similar thing for japanese before: JGram.org and it really helped me learn japanese quickly.
ekingralmost 8 years ago
In the same vein, for French translation, Linguee[1] uses many sources from websites of organisations that display official content in several languages (eg. the websites of the EU, of the Canadian Parliament...). The fact that it&#x27;s <i>official</i> texts (eg. laws) makes it quite reliable.<p>[1] <a href="http:&#x2F;&#x2F;www.linguee.fr" rel="nofollow">http:&#x2F;&#x2F;www.linguee.fr</a>
lkbmalmost 8 years ago
This is pretty cool.<p>The second word I clicked was &quot;cant&quot;....and about half I saw were typos of &quot;can&#x27;t&quot;, so, there&#x27;s some bad data in there if you&#x27;re trying to learn standard english, but it&#x27;s good data if you want to understand things people actually write.<p>Anyway, time to go through and add some apostrophes to a few articles. :-)
baueralmost 8 years ago
Nicely done. You could add in a mailing list to send users a digest of new or top vocabulary words every week.
评论 #14448631 未加载
评论 #14448587 未加载
peterburkimsheralmost 8 years ago
Is there a word list for TOEFL and&#x2F;or IELTS?<p>I&#x27;m using a similar strategy (movies, music, Bible, articles) for studying Chinese. I&#x27;m using the TOCFL and HSK word lists. My friend uses a book with a list of 15000 vocabulary words by Morris Hill. I can&#x27;t find a txt version though.
saurabh1728almost 8 years ago
<a href="https:&#x2F;&#x2F;play.google.com&#x2F;store&#x2F;apps&#x2F;details?id=com.buildmyvocab.app" rel="nofollow">https:&#x2F;&#x2F;play.google.com&#x2F;store&#x2F;apps&#x2F;details?id=com.buildmyvoc...</a><p>is this your app abhas ? Quite interesting
评论 #14448803 未加载
milesalmost 8 years ago
This list is to help non-native English learners? Many native English speakers might have trouble with a few of these: abeyance, abscission, accretion, amalgamate, anodyne, antediluvian, apposite, arabesque, atavism, and avuncular.
评论 #14449568 未加载
评论 #14451271 未加载
goshxalmost 8 years ago
Very nice! Thanks for sharing!<p>Some words are not found: <a href="https:&#x2F;&#x2F;buildmyvocab.in&#x2F;affinity" rel="nofollow">https:&#x2F;&#x2F;buildmyvocab.in&#x2F;affinity</a><p>(Just a little correction there: does not exist*)
评论 #14448773 未加载
mbrookesalmost 8 years ago
Completely OT, but is there a mathematical explanation for why, when scrolled the spaces between the words appear to form connect channels?
评论 #14449993 未加载
WheelsAtLargealmost 8 years ago
This is good stuff. I like the sentences part but I would put the definition before the sentences.<p>This would be a great foreign language tool too.
ilamparithialmost 8 years ago
I created something similar using the Wordnik api. <a href="https:&#x2F;&#x2F;www.greedge.com&#x2F;grewordlist&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.greedge.com&#x2F;grewordlist&#x2F;</a>
satysinalmost 8 years ago
Very cool. As others have said I think you should add definitions for the words (even a link off to an external one is fine) and pronunciation (with audio, perhaps link to forvo.com?) would be superb.
opaqealmost 8 years ago
Mining canonical papers&#x2F;text to generate standardized tests (SAT&#x2F;GRE) might be a further step. My guess is that both tests and commercial prep-material are produced by committee.
dadvocatealmost 8 years ago
Would be more intuitive if the meaning is presented first and then example sentences
lacampbellalmost 8 years ago
While I think it needs a little polishing (a lot of wikipedia sentences are fairly hairy), I really like the core idea here. Keep up the good work.
Baeocystinalmost 8 years ago
Neat.<p>::clicks on a random word::<p>&quot;We couldn&#x27;t find any sentences for the word centripetal.&quot;<p>So... Why is it one of the chosen few?
racl101almost 8 years ago
What?! No &quot;cromulent&quot;?
masteryupa_almost 8 years ago
Could something similar be done with other languages as well, say Simplified Chinese?
评论 #14450321 未加载
malandrewalmost 8 years ago
What about other languages?
评论 #14450219 未加载
babyalmost 8 years ago
1. I don&#x27;t really understand what this is about. Having a description on the landing page would help.<p>&gt; Barron&#x27;s 800 Words list with example sentences<p>who is this Barron?<p>2. Please can you add pronunciation :D<p>3. words need a definition as well, not sure what some of these means even with the examples.
评论 #14448872 未加载
评论 #14448774 未加载
评论 #14450254 未加载
评论 #14448882 未加载
jlebrechalmost 8 years ago
can you make a container that lets you crawl any other language?
ge96almost 8 years ago
mined with a mithril axe
mrstatusalmost 8 years ago
i am one of learner&#x27;s of English, but i can&#x27;t.. tips me to get my English perfect.. <a href="http:&#x2F;&#x2F;www.mrstatus.in&#x2F;himbhoomi-jamabandi-copy&#x2F;" rel="nofollow">http:&#x2F;&#x2F;www.mrstatus.in&#x2F;himbhoomi-jamabandi-copy&#x2F;</a>