This article by Michael John Neill was in the Ancestry Daily News with a note "Please feel free to circulate this newsletter to other genealogy enthusiasts!" Some great ideas on how to find those elusive ancestors.... "SEARCHING FOR PETER BIEGER'S PICKLED PEPPER WEB PAGE: USING BOOLEAN SEARCHES TO LOCATE GENEALOGICAL INFORMATION" by Michael John Neill ====================================================== Note: Readers unfamiliar with the basics of Boolean operators (using "ands" and "ors") should reference the article which provides a basic overview of these terms and their uses in Tuesday's edition of the Ancestry Daily News. It is available online at: http://www.ancestry.com/dailynews/01_19_99.htm ************************************************************ The Internet contains a vast amount of information. One significant difficulty is locating web sites that might be useful in researching a specific genealogical problem. Lists of links are one way to deal with this problem. However, they have several drawbacks: 1) The links might be outdated. 2) I must understand the list's organizational structure. 3) The list might not be complete or contain the site I want or need. 4) The list of links will not include EVERY surname that appears on a given web page or site. While not to criticize lists of links, these drawbacks limit the effectiveness with which any set of links can be used. Using search engines is a much broader approach to finding web pages and is not limited to those pages that have been categorized on a list of links. However, search engines have their own limitations, including: 1) They don't contain every page on the Internet. 2) They contain hundreds of millions of pages. 3) Frequently, query boards are not searched by search engines. 4) Online databases are not searched by use of search engines. 5) I might have wrong information that hinders my ability to locate the page I need, including: a) Incorrect spelling of surname b) Incorrect location(s) c) Incorrect dates of vital events Some of these limitations can be hurdled more easily than others. It should be remembered that not every page containing genealogical information is included in the search engines. An extended example will concentrate on a German immigrant, Peter Bieger. Peter was born in Germany in the 1830s and emigrated to Warsaw, Hancock, Illinois. His wife's name was Barbara. As Bieger is not a common name, it might be best to begin with a simple search such as peter AND bieger This search assumes the web page will contain the spelling of "bieger." This spelling may be incorrect, or may be one of many variants. Alternate spellings could be incorporated into the search. peter AND (bieger OR bickert OR berger OR beger) All the variant spellings are placed in one parenthesis, replacing the one spelling of the surname. Alternate spellings are connected with ORs and not ANDs as one page may not contain all possible surname variants. (It should be noted that there are other possible spelling variants which are omitted in the interest of brevity.) If a search returns too many hits, it may be necessary to refine the search further (this would be especially true if the names were extremely common). The search could be refined by using other known information about the subject, such as Peter's birthplace. peter AND (bieger OR bickert OR berger OR beger) AND germany Or the state of Illinois could be used peter AND (bieger OR bickert OR berger OR beger) AND illinois There are potential pitfalls to using a location in order to refine the search. One is that a web page containing information on Peter might not mention the word "Germany" or "Illinois." A more serious problem is that the location may be abbreviated, "Ger" for Germany and "Ill," "Ills.," or "IL" for Illinois (while Ills. is generally no longer used as an abbreviation for Illinois, it may appear in the transcription of an original document). These abbreviations can also be incorporated into the search. Replacing "Germany" with the broader search of "germany OR ger" yields peter AND (bieger OR bickert OR berger OR beger) AND (germany OR ger) Again the parenthesis are added around the last portion as we are replacing the search for Germany with the broader search of "germany OR ger." It may be necessary to perform a similar search involving Illinois peter AND (bieger OR bickert OR berger OR beger) AND (illinois OR ill OR ills) A search can also be conducted that combines both locations. peter AND (bieger OR bickert OR berger OR beger) AND ( (germany OR ger) AND (illinois OR il) ) Analyzing this search to fully understand it may be in order. We can think of this search as being conducted in several parts: ~ Searching for Peter ~ Searching for bieger OR bickert OR berger OR beger ~ Searching for germany OR ger ~ Searching for illinois OR il OR ills The last two searches are grouped together with a parenthesis in the original search term, which indicates that the entire search in this set of parenthesis groups two searches [and begins with the second "(" and ends with the final ")"-referred to as Pot 3]. The way this search is constructed, we can think of three large pots: ~ Pot 1 is pages that contain Peter ~ Pot 2 is pages that contain bieger OR bickert OR berger OR beger ~ Pot 3 is pages that contain (germany OR ger) AND (illinois OR il) All three pots are connected with ANDs. This means the only pages that will appear in the final pot are those pages that appear in Pots 1, 2, and 3. Pots 1 and 2 are fairly straightforward. A closer look at Pot 3 is in order. Pot 3 is a more complex search, which can be thought of in two parts. Part 1 locates those pages that contain "Germany" or "ger." Part 2 locates those pages that contain "illinois" or "il." Parts 1 and 2 are connected with an AND, which means that only pages that appear in both Part 1 and in Part 2 will appear in the combination, which has been termed Pot 3. The previous search requires that both Germany and Illinois (or one of the variant spellings) appear on the web page. It would be reasonable to modify the search so that only one of the locations needed to be on the page. This could be done by replacing the final AND with an OR, obtaining: Peter AND (bieger OR bickert OR berger OR beger) AND ( (germany OR ger) OR (illinois OR il) ) USING COUNTIES AND OTHER LOCALITIES A search conducted using just states and countries as the only localities may still result in a large number of matches, especially if the first and last names being used are common. Using more specific geographic information will narrow your search and should only be done if broader searches produce too many results to search effectively. The county can be added very simply peter AND (bieger OR bickert OR berger OR beger) AND hancock In this case the names are unusual enough that using just the county is not a problem. However, there is more than one Hancock County in the United States and this may result in hits outside the area of research. In this case, a more focused search would be: peter AND (bieger OR bickert OR berger OR beger) AND ( hancock AND (illinois OR il) ) VARIANT SPELLINGS Variant spellings can always be included in your search. If I'm searching for: john AND trautvetter I can replace the search with: john AND (trautvetter OR troutfetter OR trautfetter OR trantvetter) Including variant spellings is easy. Replace the original word with a set of parenthesis that contains variant spellings connected with OR. Don't use AND or else it will require all the variant spellings to be on the same page. Nicknames present a similar problem to alternate spellings and location abbreviations. They can be dealt with in a similar manner. A search for: elizabeth AND rampley can be more effectively entered as: (elizabeth OR betsy OR beth OR eliza) AND rampley Including variant spellings is easy. Replace the original word with a set of parenthesis that contains variant spellings connected with OR. Don't use AND or else it will require all the variant spellings to be on the same page. WHY NOT USE GENEALOGY? You can use the word "genealogy" as a part of your search (by adding "AND genealogy") to your search phrase. However, not all pages that have genealogical information contain the word "genealogy." The word genealogy can also be misspelled in one of several ways ("geneology" being the most prevalent). It might seem easier to just search for "Peter Bieger." However, a search of this type (where the phrase "Peter Bieger" is searched for) will not catch references where the word "Peter" is not directly in front of the word "Bieger." The page will not be returned as a hit if the phrase involving Peter Bieger appears as "Bieger, Peter" (as it might in an index); as Peter middle name Bieger (as it might if someone knows his middle name); or as a phrase similar to "the first Bieger ancestor was named Peter." Chances are you do not want to miss those references. PROXIMITY OPERATORS Some search engines allow the use of the NEAR operator in addition to ANDs and ORs. NEAR functions similarly to an AND, but the difference is that the words on either side of the NEAR must be within a certain number of words of each other (a "word" is generally defined to be a series of characters not separated by a space). Some search engines allow the user to enter a number to indicate just how "near" and others have only one setting. Users should read the help pages for the specific search engine they are using to determine if the near operator can be used (not all sites support it) and how to specify the word distance. If the distance is not specified, a default value will be used (generally ten). The search for Peter Bieger search can be refined using NEAR, as in: peter NEAR bieger This search would result in those pages where the words "peter" and "bieger" are near each other (how "near" depends upon the search engine). The advantage to using NEAR is that the researcher may not be interested in those pages where "peter" and "bieger" are 1,200 words apart. Use of NEAR may be especially desired when searching for common first names or surnames. If the researcher were looking for web pages on Hancock County, Illinois, a search could be entered using the NEAR operator as: hancock NEAR illinois The following phrases (among others) would be located with this search: "Hancock County, Illlinois" "Town, Hancock, Illinois" "In Illinois, Hancock County I think" "State of Illinois, County of Hancock" and similar phrases where Hancock and Illinois are within ten words of each other. BUT I DON'T WANT TO TYPE ALL THOSE SEARCHES! You don't have to. Use the power of your computer. Type the searches into your word processor and then simply cut and paste them from that program into the search box at search engine's web site. Once you have the searches entered in your word processor you can use them in whatever search engines you are using (assuming they support Boolean searches and use of the word NEAR). I would not enter the same search in fifty search engines. Using two or three of the major ones should catch the majority of pages. The searches should be saved so that you can use them again a few weeks or months later in order to search for pages again. Somewhere in the document that contains the text of your searches keep the search engine's name and URL and the date you performed the searches. Remember, it's just as important to track online research as it is to track offline research. AVOID OVERLY COMPLEX SEARCHES It's possible to create searches more complicated than the ones used here. However, the more complex your search, the greater the chance that it might not search in exactly the way you think it will (especially if several sets of nested parenthesis are used). If you aren't certain how the search will be conducted, you should not use it. Not knowing what you are searching for is not effective and is not good genealogy. SEARCH ENGINES ALTAVISTA (http://www.altavista.digital.com) uses the advanced search feature which supports Boolean searches. HOTBOT (http://www.hotbot.com) METACRAWLER (http://www.metacrawler.com) Good Luck! ************************************************************ Michael John Neill, is the Course I Coordinator at the Genealogical Institute of Mid America (GIMA) held annually in Springfield, Illinois, and is also on the faculty of Carl Sandburg College in Galesburg, Illinois. Michael is the education columnist for the FGS FORUM and is on the editorial board of the Illinois State Genealogical Society Quarterly. He conducts seminars and lectures on a wide variety of genealogical and computer topics and contributes to several genealogical publications, including Ancestry and Genealogical Computing.