Would you humor a fellow HNer and tell me if you're in your early forties?<p>I happen to be working on a toy machine learning project that, based on the fictional characters known by someone, predicts their approximate age. Your list is the first organic validation set that happened onto my machine!
Nice! May I ask if you collected those manually? Perhaps they should be sorted/divided according to the genre they're from? Books/Movies/Tv-shows, etc?<p>This list made be realize a sort-of annoying (sometimes) tendency I seem to have developed. It appears that my first reaction to cool things is now not wonder but 'I need to engineer the shit out of fit'.My first thought after looking at the list was not 'wow, cool', but more of 'so if I use Named Entity Recognition, and a large corpus, I could have tens of thousands of such names in hours. Maybe I can catch up on computational linguistics literature on the issue, and even identify the relative importance of characters on the text. Should be a day-long project'. Need to learn to enjoy things for what they are, sigh.
Nice list, for just names it's a great resource. I see some umlauts, spaces in names, punctuation in names, and it's clearly splittable for first/last name fields. I'm not sure what other ground could be covered that someone would need to account for.<p>I'm not "book" cultured, so a lot of names I don't recognize, but nice shout outs to 30 Rock and Anchorman :p<p>Github complains that it's not a properly formatted CSV file. Maybe consider a TSV? It'd probably still complain.<p>I've yet to use it, but it's been in my back pocket for when I need it. This PHP package looks nice if you need more than just names: <a href="https://github.com/fzaninotto/Faker" rel="nofollow">https://github.com/fzaninotto/Faker</a>
I didn't recognize most of them, I had to search some to get an idea.<p>If the goal is testing, an improvement would be to add some internationalization. There are not other than English characters there. You want to be sure that your first foreigner don't break your program.<p>Actually, maybe it would be a nice project to accept pull request from around the world and create an standard international data set.
Nice list. If I need to generate names for sample data I usually just use the Faker library.<p>When I'm writing database fixtures for use in tests, I like to manually choose names from movies/tv-shows for related entities.<p>For example for an Account with multiple Users I will pick Phil Dunphy for the owner role, Claire Dunphy for the admin role and Luke/Haley/Alex dunphy for regular user roles.
At work we needed a "clean" dataset for a five-character code. We wanted it to be something you could say out loud, e.g. "Hey, are you working on FORKS?" "No, I'm working on CHUCK", so random wasn't an option, and we were afraid an algorithm, like "consonant-vowel-consonant..." would randomly generate naughty words.<p>We ended up using our customers' first names and it was a disaster. We had all kinds of joke entries put in, like "JERK"... My favorite customer name was "POOP LENGTH". lol. /facepalm.<p>Anyway, so in this multimillion dollar enterprise application we're showing "POOP" to the whole company.<p>At least it was an intra-enterprise-only app.
Does that worth to be shared in this community? has that ever been a problem worth mention to someone? I am only aware to problems related to those names when a living person feels they are using their name/image in a defamatory or unauthorized way but I think anyone can find by herself a fiction or historical name for that task (or generics such as John Smith/Max Mustermann)
My go to name is Keyser Söze, which I use whenever I'm writing examples in documentation and such. It also has the advantage of containing a unicode character.
This isn't a great list because it assumes far too much about what a name is. Patio11 wrote a great blog post about what developers frequently get wrong when it comes to people and names; <a href="http://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/" rel="nofollow">http://www.kalzumeus.com/2010/06/17/falsehoods-programmers-b...</a>