TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Names.io – Global Exhaustive Scraped Name Db

147 pointsby debdutover 4 years ago

29 comments

riffraffover 4 years ago
I have strong doubts this is exhaustive or that it could be. For one, it doesn&#x27;t contain the name &quot;Genoveffa&quot; which is the italianized form of Jennifer. There are wikipedia-level people named thus, so it&#x27;s not that uncommon.[0]<p>There&#x27;s an infinite amount of misspelled&#x2F;localized names outside their country of origin e.g. Maicol for Michael, Sandiago for Santiago, Uilliam, Villiam, Willian etc etc<p>Aditional &quot;anecdata&quot;: I live in a country (Hungary) where you have to apply for a special permit to give an &quot;unusual&quot; name to a kid, and since my son has an italian name we had to do that.<p>You can see the list of names people requested to add, and it includes random stuff like &quot;Magneto&quot; (which isn&#x27;t on this list either).<p>This, to say that basically any given word can be a name in some country, it&#x27;s likely not possible to have an exhaustive list.<p>Maybe replace with &quot;extensive&quot;.<p>[0] <a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Genoveffa_Franchini" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Genoveffa_Franchini</a>
评论 #24718300 未加载
评论 #24718662 未加载
评论 #24717572 未加载
评论 #24719197 未加载
评论 #24716852 未加载
评论 #24718128 未加载
评论 #24717359 未加载
评论 #24746911 未加载
notafraudsterover 4 years ago
It sort of sucks that the project has a domain name for a project name (&quot;names.io&quot;) but you don&#x27;t own the domain.<p>I went to the domain because the GitHub didn&#x27;t describe the format of the data. So I&#x27;d also beef up the README.
alexmingoiaover 4 years ago
Somewhat related and worth pointing out is that the whole world does not use family names, or the family name as a last name.<p>My wife was annoyed when she came to the US and every form has a “first” and “last” field, but she doesn’t have a last name. Her passport for example, only has a “name” field.
评论 #24715515 未加载
评论 #24716128 未加载
评论 #24714723 未加载
评论 #24716002 未加载
评论 #24716141 未加载
评论 #24715073 未加载
评论 #24719448 未加载
评论 #24715501 未加载
评论 #24717379 未加载
评论 #24716269 未加载
评论 #24715402 未加载
评论 #24717794 未加载
评论 #24717814 未加载
评论 #24715303 未加载
jeffrallenover 4 years ago
Wrong things that programmers believe: <a href="https:&#x2F;&#x2F;boingboing.net&#x2F;2016&#x2F;10&#x2F;18&#x2F;wrong-things-that-programmers.html" rel="nofollow">https:&#x2F;&#x2F;boingboing.net&#x2F;2016&#x2F;10&#x2F;18&#x2F;wrong-things-that-programm...</a>
评论 #24718036 未加载
dansoover 4 years ago
Looks like it doesn’t yet incorporate the Census surname data, which has more than 160K U.S. last names: <a href="https:&#x2F;&#x2F;www.census.gov&#x2F;topics&#x2F;population&#x2F;genealogy&#x2F;data.html" rel="nofollow">https:&#x2F;&#x2F;www.census.gov&#x2F;topics&#x2F;population&#x2F;genealogy&#x2F;data.html</a>
评论 #24714868 未加载
aphrozover 4 years ago
I have worked on projects where I needed to extract firstnames and lastnames and if you want to use this dataset to extract names, here are some caveats: - firstnames can be lastnames as well - common words can be names as well - some stop words can be names - the order can change, you can write firstname, lastname or lastname, firstname - Some names are as short as one letter<p>Using ML can be useful if you can separate people by origin or in more homogeneous population.
评论 #24716718 未加载
ffpipover 4 years ago
Calling it exhaustive is a bit much. It doesn&#x27;t have many names of my friends.<p>Here in India you get very long&#x2F;sometimes weird lastnames. Many are not there on the list.
评论 #24715324 未加载
richrichardssonover 4 years ago
It&#x27;s missing 9&#x2F;10 of the most popular Croatian surnames, or to be more generous 3&#x2F;10 if you agree that the Anglicised version of Knežević etc. is the same name. IMO they are not the same.
评论 #24718084 未加载
1023bytesover 4 years ago
Definitely not exhaustive, please don&#x27;t use this for validation.
评论 #24717522 未加载
bradfeehanover 4 years ago
It doesn&#x27;t even contain the author&#x27;s name
BMoreartyover 4 years ago
Exhaustive. Phooey. My last name of Morearty isn&#x27;t in there.
评论 #24716144 未加载
codegladiatorover 4 years ago
Doesn&#x27;t have my last name. Neither of my friends.<p>Probably a good list, but far from exhaustive.
mattlondonover 4 years ago
Exhaustive? Does net seem to be that exhaustive to me.<p>It doesn&#x27;t have my last name nor the last names of some family and friends I tried.<p>Perhaps just &quot;a large list of scraped names&quot;?
bovermyerover 4 years ago
I wrote a CLI tool for name generation awhile back:<p><a href="https:&#x2F;&#x2F;github.com&#x2F;ironarachne&#x2F;namegen" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;ironarachne&#x2F;namegen</a><p>It doesn&#x27;t have the volume of names that this one does, but it does have custom rules for names (e.g., Icelandic last names), and it can generate Thai names.
sswanerover 4 years ago
I appreciate the work on this repo to date, and as many comments below point out, it is not yet true exhaustive. For those of us with additional data sources we can and should submit a pull request.<p>Thanks for the effort, I intend to use this to enrich my test data generation scripts.
taikahessuover 4 years ago
Also I found the first names consist of names that are most likely, in fact, lastnames in ascii. Heikkila is most probably, really, only, a common Finnish surname Heikkilä. This makes me wonder how much overlap and discrepancy the lists might actually have.
beilabsover 4 years ago
Does not have my first name of my wife. No, you may not know her name....she goes to another school.<p>Aside, not that exhaustive though her name is a combination of two first letters from her dad and from her moms names. It is in the wild though, have heard others with her name.
Arechover 4 years ago
That&#x27;s an interesting thing. However, it would me more interesting (and usable, for example, in gamedev) if each name would also contain a reference to, for example, top-3 countries&#x2F;cultures in which the name is popular.
irrationalover 4 years ago
~160k first names<p>~100k last names<p>Out of 7-8 billion people this is really all (or most) of the first and last names? We aren’t a very creative species I guess. I especially would have expected the number of first names to be at least an order of magnitude larger.
评论 #24714918 未加载
评论 #24714979 未加载
nkriscover 4 years ago
I also noticed it contains names like &quot;carpenterjr&quot; in the surnames list, which is almost certainly a data collection error.
visargaover 4 years ago
Other names that are hard to come by: company names and product names. But you can get addresses from Open Street Map and Open Addresses.
akkyakimotoover 4 years ago
The exhaustive list only has 90K surnames, whilst Japan has 300K variations of surname.
d--bover 4 years ago
Cool, my last name is not in this.
The_rationalistover 4 years ago
Could this allow to improve state of the art named entity recognition?
neologover 4 years ago
When would this be useful?
评论 #24715325 未加载
评论 #24715028 未加载
评论 #24715714 未加载
评论 #24715096 未加载
评论 #24715310 未加载
ffghover 4 years ago
I&#x27;m curious what was your stack for scraping this much data?
评论 #24716005 未加载
momeunierover 4 years ago
It&#x27;s exhaustive and yet my first name is not there...
flemhansover 4 years ago
Good job
评论 #24715973 未加载
ahnickover 4 years ago
Nice job! It seems like you have a fairly rich data set. I could see this being really useful for soon to be parents trying to think up baby names. For anyone where this is your use case or you just find these types of name lists interesting, then you may also want to checkout <a href="https:&#x2F;&#x2F;mashword.com" rel="nofollow">https:&#x2F;&#x2F;mashword.com</a>.<p>Mashword is a word mashup name generator service that we recently built that recognizes many common human names. One of our primary use cases is finding alternatives or unique spellings to traditional or common names (e.g. <a href="https:&#x2F;&#x2F;mashword.com&#x2F;search?words=rebecca" rel="nofollow">https:&#x2F;&#x2F;mashword.com&#x2F;search?words=rebecca</a>) It does not support all of the names in these lists, but we are adding and growing our support for names all the time.