Note: The Rootsweb Mailing Lists will be shut down on April 6, 2023. (More info)
RootsWeb.com Mailing Lists
Total: 1/1
    1. Re: [BAd] new "feature"
    2. singhals
    3. As Ian said, this was easy enough to do in DOS, and since C++ and JAVA et al are supposed to be even better, you'd think it'd be just as easy with those. Clearly, though, not. Cheryl Mary D. Taffet wrote: > Ian, > > Having spent years tuning linguistic rules for automatic tagging of > names of people in documents, I can tell you from experience that it is > far more useful to have a list of first names rather than a list of > surnames. First names are one of the biggest clues which can be used to > locate a surname. First names, occupations, relationships to other > people and events (births, marriages, deaths, etc.) are the largest > source of contextual clues available for automatically figuring out > which words are surnames. > > People who do this for a living make lists of first names to enable this > type of processing. While some may make lists of surnames, it is > actually much less helpful. > > If you actually tried to perform such a task yourself, I think you would > find it to be much more complex than you thought it would be originally. > I spent over two years writing hundreds of regular expressions to > capture all sorts of name patterns encountered in a single set of about > 14,000 biographies. > > I haven't actually seen any of the linked posts yet, but I'll take a > look as soon as I can. > > -- Mary D. Taffet > Computational Linguist > > > > > Ian Singer wrote: > >>Why can they not build a list of all the surname lists and then run each >>message against that? I could do it under DOS so programmers must be >>able to do it under whatever is being run there. >> >>Ian Singer >> -- There should be no attachments on this message, unless I specifically mentioned them above.

    02/29/2008 06:30:27