Hey guys,<p>I need to write a script that will try to guess a persons gender based on given a given name.<p>What do you think the best way to do this is? I am thinking the best way is to look up the name in a large name/gender database (if there is one).<p>But it seems to be that some maths and syntax should also do the job...<p>What do you guys think?
1. Start with a database of popular names, like might be found in a baby name book.<p>2. Ask the user to input the name to be guessed.<p>3. Lookup the name in the database. If the name supplied by the user is not in the database, add the name to the database.<p>4. After the name lookup or insertion in the database perform a guess as to the sex of the name based on the data from the baby name book and weighted by previous user corrections for that name.<p>5. Print the guess to the screen (male or female) for the user to verify.<p>6. Ask the user to input if the guess is correct or incorrect.<p>7. Store the users input regarding if the guess is correct or not linked to that particular name to improve the future accuracy of the guess on that particular name.
It's hard to give an answer without knowing more about your input data and what you intend to do with the output.<p>On general principle, "math plus syntax" seems like a highly error-prone approach, especially if you need to process anything that's not a standard English name, or if you have lots of users named Pat. E.g. diminutives of Russian <i>masculine</i> names often end in "a". I'm sure there's names which are masculine in some countries and feminine in others. Etc.
<a href="http://amp.ece.cmu.edu/people/Andy/projectpage_names.html" rel="nofollow">http://amp.ece.cmu.edu/people/Andy/projectpage_names.html</a><p>"Estimating Age, Gender, and Identity using First Name Priors"<p>Although the paper looks at this in the context of computer vision -- matching faces in an image with names in the caption -- it should provide some information (and references) on your problem.
Mechanical Turk job.<p>Perform a Google search of the name, bring up all the pics that return and get the Turker to identify if the photos are male or female.