I have strong doubts this is exhaustive or that it could be.
For one, it doesn't contain the name "Genoveffa" which is the italianized form of Jennifer. There are wikipedia-level people named thus, so it's not that uncommon.[0]<p>There's an infinite amount of misspelled/localized names outside their country of origin e.g. Maicol for Michael, Sandiago for Santiago, Uilliam, Villiam, Willian etc etc<p>Aditional "anecdata": I live in a country (Hungary) where you have to apply for a special permit to give an "unusual" name to a kid, and since my son has an italian name we had to do that.<p>You can see the list of names people requested to add, and it includes random stuff like "Magneto" (which isn't on this list either).<p>This, to say that basically any given word can be a name in some country, it's likely not possible to have an exhaustive list.<p>Maybe replace with "extensive".<p>[0] <a href="https://en.wikipedia.org/wiki/Genoveffa_Franchini" rel="nofollow">https://en.wikipedia.org/wiki/Genoveffa_Franchini</a>
It sort of sucks that the project has a domain name for a project name ("names.io") but you don't own the domain.<p>I went to the domain because the GitHub didn't describe the format of the data. So I'd also beef up the README.
Somewhat related and worth pointing out is that the whole world does not use family names, or the family name as a last name.<p>My wife was annoyed when she came to the US and every form has a “first” and “last” field, but she doesn’t have a last name. Her passport for example, only has a “name” field.
Wrong things that programmers believe: <a href="https://boingboing.net/2016/10/18/wrong-things-that-programmers.html" rel="nofollow">https://boingboing.net/2016/10/18/wrong-things-that-programm...</a>
Looks like it doesn’t yet incorporate the Census surname data, which has more than 160K U.S. last names: <a href="https://www.census.gov/topics/population/genealogy/data.html" rel="nofollow">https://www.census.gov/topics/population/genealogy/data.html</a>
I have worked on projects where I needed to extract firstnames and lastnames and if you want to use this dataset to extract names, here are some caveats:
- firstnames can be lastnames as well
- common words can be names as well
- some stop words can be names
- the order can change, you can write firstname, lastname or lastname, firstname
- Some names are as short as one letter<p>Using ML can be useful if you can separate people by origin or in more homogeneous population.
Calling it exhaustive is a bit much. It doesn't have many names of my friends.<p>Here in India you get very long/sometimes weird lastnames. Many are not there on the list.
It's missing 9/10 of the most popular Croatian surnames, or to be more generous 3/10 if you agree that the Anglicised version of Knežević etc. is the same name. IMO they are not the same.
Exhaustive? Does net seem to be that exhaustive to me.<p>It doesn't have my last name nor the last names of some family and friends I tried.<p>Perhaps just "a large list of scraped names"?
I wrote a CLI tool for name generation awhile back:<p><a href="https://github.com/ironarachne/namegen" rel="nofollow">https://github.com/ironarachne/namegen</a><p>It doesn't have the volume of names that this one does, but it does have custom rules for names (e.g., Icelandic last names), and it can generate Thai names.
I appreciate the work on this repo to date, and as many comments below point out, it is not yet true exhaustive. For those of us with additional data sources we can and should submit a pull request.<p>Thanks for the effort, I intend to use this to enrich my test data generation scripts.
Also I found the first names consist of names that are most likely, in fact, lastnames in ascii. Heikkila is most probably, really, only, a common Finnish surname Heikkilä. This makes me wonder how much overlap and discrepancy the lists might actually have.
Does not have my first name of my wife. No, you may not know her name....she goes to another school.<p>Aside, not that exhaustive though her name is a combination of two first letters from her dad and from her moms names. It is in the wild though, have heard others with her name.
That's an interesting thing. However, it would me more interesting (and usable, for example, in gamedev) if each name would also contain a reference to, for example, top-3 countries/cultures in which the name is popular.
~160k first names<p>~100k last names<p>Out of 7-8 billion people this is really all (or most) of the first and last names? We aren’t a very creative species I guess. I especially would have expected the number of first names to be at least an order of magnitude larger.
Nice job! It seems like you have a fairly rich data set. I could see this being really useful for soon to be parents trying to think up baby names. For anyone where this is your use case or you just find these types of name lists interesting, then you may also want to checkout <a href="https://mashword.com" rel="nofollow">https://mashword.com</a>.<p>Mashword is a word mashup name generator service that we recently built that recognizes many common human names. One of our primary use cases is finding alternatives or unique spellings to traditional or common names (e.g. <a href="https://mashword.com/search?words=rebecca" rel="nofollow">https://mashword.com/search?words=rebecca</a>) It does not support all of the names in these lists, but we are adding and growing our support for names all the time.