Note: The Rootsweb Mailing Lists will be shut down on April 6, 2023. (More info)
RootsWeb.com Mailing Lists
Total: 3/3
    1. Re: [BAd] new "feature"
    2. Mary D. Taffet
    3. Cheryl, The actual programming mechanism itself is irrelevant (I use perl myself). What I was trying to get across is the knowledge on which the programming mechanisms operate, i.e. the interpretation/understanding of the underlying text. THAT is what is lacking/not performing well. Changing the programming mechanism (DOS, C++, Java, perl, lisp, python, etc.) has no bearing whatsoever on the underlying knowledge base. -- Mary singhals wrote: > As Ian said, this was easy enough to do in DOS, and since > C++ and JAVA et al are supposed to be even better, you'd > think it'd be just as easy with those. > > Clearly, though, not. > > Cheryl > > > Mary D. Taffet wrote: > >> Ian, >> >> Having spent years tuning linguistic rules for automatic tagging of >> names of people in documents, I can tell you from experience that it is >> far more useful to have a list of first names rather than a list of >> surnames. First names are one of the biggest clues which can be used to >> locate a surname. First names, occupations, relationships to other >> people and events (births, marriages, deaths, etc.) are the largest >> source of contextual clues available for automatically figuring out >> which words are surnames. >> >> People who do this for a living make lists of first names to enable this >> type of processing. While some may make lists of surnames, it is >> actually much less helpful. >> >> If you actually tried to perform such a task yourself, I think you would >> find it to be much more complex than you thought it would be originally. >> I spent over two years writing hundreds of regular expressions to >> capture all sorts of name patterns encountered in a single set of about >> 14,000 biographies. >> >> I haven't actually seen any of the linked posts yet, but I'll take a >> look as soon as I can. >> >> -- Mary D. Taffet >> Computational Linguist >> >> >> >> >> Ian Singer wrote: >> >>> Why can they not build a list of all the surname lists and then run each >>> message against that? I could do it under DOS so programmers must be >>> able to do it under whatever is being run there. >>> >>> Ian Singer

    02/29/2008 06:50:56
    1. Re: [BAd] new "feature"
    2. singhals
    3. All right, but I don't see that it invalidates Ian's original statement: this was possible to do 10, 15, even 20 years ago. Whatever the mechanism and whatever it's matching against -- it's been done before and shouldn't have produced these results this time. Cheryl Always willing to add ASSEMBLER, FORTRAN, COBOL, and God-save-us-all Turtle Graphics into gone-but-not-forgotten. Mary D. Taffet wrote: > Cheryl, > > The actual programming mechanism itself is irrelevant (I use perl > myself). What I was trying to get across is the knowledge on which the > programming mechanisms operate, i.e. the interpretation/understanding of > the underlying text. THAT is what is lacking/not performing well. > > Changing the programming mechanism (DOS, C++, Java, perl, lisp, python, > etc.) has no bearing whatsoever on the underlying knowledge base. > > -- Mary > > > singhals wrote: > >> As Ian said, this was easy enough to do in DOS, and since C++ and JAVA >> et al are supposed to be even better, you'd think it'd be just as easy >> with those. >> >> Clearly, though, not. >> >> Cheryl >> >> >> Mary D. Taffet wrote: >> >>> Ian, >>> >>> Having spent years tuning linguistic rules for automatic tagging of >>> names of people in documents, I can tell you from experience that it >>> is far more useful to have a list of first names rather than a list >>> of surnames. First names are one of the biggest clues which can be >>> used to locate a surname. First names, occupations, relationships to >>> other people and events (births, marriages, deaths, etc.) are the >>> largest source of contextual clues available for automatically >>> figuring out which words are surnames. >>> >>> People who do this for a living make lists of first names to enable >>> this type of processing. While some may make lists of surnames, it >>> is actually much less helpful. >>> >>> If you actually tried to perform such a task yourself, I think you >>> would find it to be much more complex than you thought it would be >>> originally. I spent over two years writing hundreds of regular >>> expressions to capture all sorts of name patterns encountered in a >>> single set of about 14,000 biographies. >>> >>> I haven't actually seen any of the linked posts yet, but I'll take a >>> look as soon as I can. >>> >>> -- Mary D. Taffet >>> Computational Linguist >>> >>> >>> >>> >>> Ian Singer wrote: >>> >>>> Why can they not build a list of all the surname lists and then run >>>> each message against that? I could do it under DOS so programmers >>>> must be able to do it under whatever is being run there. >>>> >>>> Ian Singer > > -- There should be no attachments on this message, unless I specifically mentioned them above.

    02/29/2008 08:09:56
    1. Re: [BAd] new "feature"
    2. Dan Anderson
    3. An earlier post mentioned "All, Will, Any and Last" but in fact ALL is a surname, WILL is a surname. It was not possible 10, 15, even 20 years ago for the programming mechanism to differentiate between the 2 uses of "all" below: All Johnson's out there, please contact me. and John All was born in June, 1899. They *tried* to weed out the common word usage by requiring the word be capitalized, but as you can clearly see, that is not enough. I was just over on the FTM Board and one sentence was: Did you Register Family Tree Maker? Register, Tree, Maker were all linked. On One Of My Boards, I Have A Poster Who Types Every Message Like This Despite The Fact That I Have Begged Her Not To. I haven't been brave enough to look at any of her messages to see how many "surnames" are linked. Another "not ready for prime time" feature/enhancement.... Dan On Fri, Feb 29, 2008 at 2:09 PM, singhals <[email protected]> wrote: > All right, but I don't see that it invalidates Ian's > original statement: this was possible to do 10, 15, even 20 > years ago. > > Whatever the mechanism and whatever it's matching against -- > it's been done before and shouldn't have produced these > results this time. > > Cheryl > Always willing to add ASSEMBLER, FORTRAN, COBOL, and > God-save-us-all Turtle Graphics into gone-but-not-forgotten. > > Mary D. Taffet wrote: > > > Cheryl, > > > > The actual programming mechanism itself is irrelevant (I use perl > > myself). What I was trying to get across is the knowledge on which the > > programming mechanisms operate, i.e. the interpretation/understanding of > > the underlying text. THAT is what is lacking/not performing well. > > > > Changing the programming mechanism (DOS, C++, Java, perl, lisp, python, > > etc.) has no bearing whatsoever on the underlying knowledge base. > > > > -- Mary > > > > > > singhals wrote: > > > >> As Ian said, this was easy enough to do in DOS, and since C++ and JAVA > >> et al are supposed to be even better, you'd think it'd be just as easy > >> with those. > >> > >> Clearly, though, not. > >> > >> Cheryl > >> > >> > >> Mary D. Taffet wrote: > >> > >>> Ian, > >>> > >>> Having spent years tuning linguistic rules for automatic tagging of > >>> names of people in documents, I can tell you from experience that it > >>> is far more useful to have a list of first names rather than a list > >>> of surnames. First names are one of the biggest clues which can be > >>> used to locate a surname. First names, occupations, relationships to > >>> other people and events (births, marriages, deaths, etc.) are the > >>> largest source of contextual clues available for automatically > >>> figuring out which words are surnames. > >>> > >>> People who do this for a living make lists of first names to enable > >>> this type of processing. While some may make lists of surnames, it > >>> is actually much less helpful. > >>> > >>> If you actually tried to perform such a task yourself, I think you > >>> would find it to be much more complex than you thought it would be > >>> originally. I spent over two years writing hundreds of regular > >>> expressions to capture all sorts of name patterns encountered in a > >>> single set of about 14,000 biographies. > >>> > >>> I haven't actually seen any of the linked posts yet, but I'll take a > >>> look as soon as I can. > >>> > >>> -- Mary D. Taffet > >>> Computational Linguist > >>> > >>> > >>> > >>> > >>> Ian Singer wrote: > >>> > >>>> Why can they not build a list of all the surname lists and then run > >>>> each message against that? I could do it under DOS so programmers > >>>> must be able to do it under whatever is being run there. > >>>> > >>>> Ian Singer > > > > > > > -- > There should be no attachments on this message, unless I > specifically mentioned them above. > > ------------------------------- > To unsubscribe from the list, please send an email to > [email protected] with the word 'unsubscribe' without the > quotes in the subject and the body of the message >

    02/29/2008 08:16:44