I just sent boardfeedback a link to one message on one of my boards. Eighteen words are marked in this message. Two of them are surnames. *sigh* Nan
Nan Lambert Starjak wrote: > I just sent boardfeedback a link to one message on one of my boards. > Eighteen words are marked in this message. Two of them are surnames. > Why can they not build a list of all the surname lists and then run each message against that? I could do it under DOS so programmers must be able to do it under whatever is being run there. Ian Singer -- ========================================================================= See my homepage at http://www.iansinger.com hosted on http://www.1and1.com/?k_id=10623894 All genealogy is stored in TMG from http://www.whollygenes.com Charts and searching using TNG from http://www.tngsitebuilding.com I am near Toronto Canada, can I tell where you are from your reply? =========================================================================
Ian, Having spent years tuning linguistic rules for automatic tagging of names of people in documents, I can tell you from experience that it is far more useful to have a list of first names rather than a list of surnames. First names are one of the biggest clues which can be used to locate a surname. First names, occupations, relationships to other people and events (births, marriages, deaths, etc.) are the largest source of contextual clues available for automatically figuring out which words are surnames. People who do this for a living make lists of first names to enable this type of processing. While some may make lists of surnames, it is actually much less helpful. If you actually tried to perform such a task yourself, I think you would find it to be much more complex than you thought it would be originally. I spent over two years writing hundreds of regular expressions to capture all sorts of name patterns encountered in a single set of about 14,000 biographies. I haven't actually seen any of the linked posts yet, but I'll take a look as soon as I can. -- Mary D. Taffet Computational Linguist Ian Singer wrote: >> > Why can they not build a list of all the surname lists and then run each > message against that? I could do it under DOS so programmers must be > able to do it under whatever is being run there. > > Ian Singer >