TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Names.io – Global Exhaustive Scraped Name Db

147 点作者 debdut超过 4 年前

29 条评论

riffraff超过 4 年前
I have strong doubts this is exhaustive or that it could be. For one, it doesn&#x27;t contain the name &quot;Genoveffa&quot; which is the italianized form of Jennifer. There are wikipedia-level people named thus, so it&#x27;s not that uncommon.[0]<p>There&#x27;s an infinite amount of misspelled&#x2F;localized names outside their country of origin e.g. Maicol for Michael, Sandiago for Santiago, Uilliam, Villiam, Willian etc etc<p>Aditional &quot;anecdata&quot;: I live in a country (Hungary) where you have to apply for a special permit to give an &quot;unusual&quot; name to a kid, and since my son has an italian name we had to do that.<p>You can see the list of names people requested to add, and it includes random stuff like &quot;Magneto&quot; (which isn&#x27;t on this list either).<p>This, to say that basically any given word can be a name in some country, it&#x27;s likely not possible to have an exhaustive list.<p>Maybe replace with &quot;extensive&quot;.<p>[0] <a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Genoveffa_Franchini" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Genoveffa_Franchini</a>
评论 #24718300 未加载
评论 #24718662 未加载
评论 #24717572 未加载
评论 #24719197 未加载
评论 #24716852 未加载
评论 #24718128 未加载
评论 #24717359 未加载
评论 #24746911 未加载
notafraudster超过 4 年前
It sort of sucks that the project has a domain name for a project name (&quot;names.io&quot;) but you don&#x27;t own the domain.<p>I went to the domain because the GitHub didn&#x27;t describe the format of the data. So I&#x27;d also beef up the README.
alexmingoia超过 4 年前
Somewhat related and worth pointing out is that the whole world does not use family names, or the family name as a last name.<p>My wife was annoyed when she came to the US and every form has a “first” and “last” field, but she doesn’t have a last name. Her passport for example, only has a “name” field.
评论 #24715515 未加载
评论 #24716128 未加载
评论 #24714723 未加载
评论 #24716002 未加载
评论 #24716141 未加载
评论 #24715073 未加载
评论 #24719448 未加载
评论 #24715501 未加载
评论 #24717379 未加载
评论 #24716269 未加载
评论 #24715402 未加载
评论 #24717794 未加载
评论 #24717814 未加载
评论 #24715303 未加载
jeffrallen超过 4 年前
Wrong things that programmers believe: <a href="https:&#x2F;&#x2F;boingboing.net&#x2F;2016&#x2F;10&#x2F;18&#x2F;wrong-things-that-programmers.html" rel="nofollow">https:&#x2F;&#x2F;boingboing.net&#x2F;2016&#x2F;10&#x2F;18&#x2F;wrong-things-that-programm...</a>
评论 #24718036 未加载
danso超过 4 年前
Looks like it doesn’t yet incorporate the Census surname data, which has more than 160K U.S. last names: <a href="https:&#x2F;&#x2F;www.census.gov&#x2F;topics&#x2F;population&#x2F;genealogy&#x2F;data.html" rel="nofollow">https:&#x2F;&#x2F;www.census.gov&#x2F;topics&#x2F;population&#x2F;genealogy&#x2F;data.html</a>
评论 #24714868 未加载
aphroz超过 4 年前
I have worked on projects where I needed to extract firstnames and lastnames and if you want to use this dataset to extract names, here are some caveats: - firstnames can be lastnames as well - common words can be names as well - some stop words can be names - the order can change, you can write firstname, lastname or lastname, firstname - Some names are as short as one letter<p>Using ML can be useful if you can separate people by origin or in more homogeneous population.
评论 #24716718 未加载
ffpip超过 4 年前
Calling it exhaustive is a bit much. It doesn&#x27;t have many names of my friends.<p>Here in India you get very long&#x2F;sometimes weird lastnames. Many are not there on the list.
评论 #24715324 未加载
richrichardsson超过 4 年前
It&#x27;s missing 9&#x2F;10 of the most popular Croatian surnames, or to be more generous 3&#x2F;10 if you agree that the Anglicised version of Knežević etc. is the same name. IMO they are not the same.
评论 #24718084 未加载
1023bytes超过 4 年前
Definitely not exhaustive, please don&#x27;t use this for validation.
评论 #24717522 未加载
bradfeehan超过 4 年前
It doesn&#x27;t even contain the author&#x27;s name
BMorearty超过 4 年前
Exhaustive. Phooey. My last name of Morearty isn&#x27;t in there.
评论 #24716144 未加载
codegladiator超过 4 年前
Doesn&#x27;t have my last name. Neither of my friends.<p>Probably a good list, but far from exhaustive.
mattlondon超过 4 年前
Exhaustive? Does net seem to be that exhaustive to me.<p>It doesn&#x27;t have my last name nor the last names of some family and friends I tried.<p>Perhaps just &quot;a large list of scraped names&quot;?
bovermyer超过 4 年前
I wrote a CLI tool for name generation awhile back:<p><a href="https:&#x2F;&#x2F;github.com&#x2F;ironarachne&#x2F;namegen" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;ironarachne&#x2F;namegen</a><p>It doesn&#x27;t have the volume of names that this one does, but it does have custom rules for names (e.g., Icelandic last names), and it can generate Thai names.
sswaner超过 4 年前
I appreciate the work on this repo to date, and as many comments below point out, it is not yet true exhaustive. For those of us with additional data sources we can and should submit a pull request.<p>Thanks for the effort, I intend to use this to enrich my test data generation scripts.
taikahessu超过 4 年前
Also I found the first names consist of names that are most likely, in fact, lastnames in ascii. Heikkila is most probably, really, only, a common Finnish surname Heikkilä. This makes me wonder how much overlap and discrepancy the lists might actually have.
beilabs超过 4 年前
Does not have my first name of my wife. No, you may not know her name....she goes to another school.<p>Aside, not that exhaustive though her name is a combination of two first letters from her dad and from her moms names. It is in the wild though, have heard others with her name.
Arech超过 4 年前
That&#x27;s an interesting thing. However, it would me more interesting (and usable, for example, in gamedev) if each name would also contain a reference to, for example, top-3 countries&#x2F;cultures in which the name is popular.
irrational超过 4 年前
~160k first names<p>~100k last names<p>Out of 7-8 billion people this is really all (or most) of the first and last names? We aren’t a very creative species I guess. I especially would have expected the number of first names to be at least an order of magnitude larger.
评论 #24714918 未加载
评论 #24714979 未加载
nkrisc超过 4 年前
I also noticed it contains names like &quot;carpenterjr&quot; in the surnames list, which is almost certainly a data collection error.
visarga超过 4 年前
Other names that are hard to come by: company names and product names. But you can get addresses from Open Street Map and Open Addresses.
akkyakimoto超过 4 年前
The exhaustive list only has 90K surnames, whilst Japan has 300K variations of surname.
d--b超过 4 年前
Cool, my last name is not in this.
The_rationalist超过 4 年前
Could this allow to improve state of the art named entity recognition?
neolog超过 4 年前
When would this be useful?
评论 #24715325 未加载
评论 #24715028 未加载
评论 #24715714 未加载
评论 #24715096 未加载
评论 #24715310 未加载
ffgh超过 4 年前
I&#x27;m curious what was your stack for scraping this much data?
评论 #24716005 未加载
momeunier超过 4 年前
It&#x27;s exhaustive and yet my first name is not there...
flemhans超过 4 年前
Good job
评论 #24715973 未加载
ahnick超过 4 年前
Nice job! It seems like you have a fairly rich data set. I could see this being really useful for soon to be parents trying to think up baby names. For anyone where this is your use case or you just find these types of name lists interesting, then you may also want to checkout <a href="https:&#x2F;&#x2F;mashword.com" rel="nofollow">https:&#x2F;&#x2F;mashword.com</a>.<p>Mashword is a word mashup name generator service that we recently built that recognizes many common human names. One of our primary use cases is finding alternatives or unique spellings to traditional or common names (e.g. <a href="https:&#x2F;&#x2F;mashword.com&#x2F;search?words=rebecca" rel="nofollow">https:&#x2F;&#x2F;mashword.com&#x2F;search?words=rebecca</a>) It does not support all of the names in these lists, but we are adding and growing our support for names all the time.