Note: The Rootsweb Mailing Lists will be shut down on April 6, 2023. (More info)
RootsWeb.com Mailing Lists
Total: 1/1
    1. Re: +PAGE
    2. Dave Mayall
    3. On Mon, 29 Oct 2001 15:24:05 -0000, you wrote: >Hello Dave, >You said regarding the way the system works: > >> We will try and explain how it works to those who feel the need to >> know (provided they understand that some of it isn't written yet, and >> is more in our heads than in reality). The majority wouldn't want to >> know though (trust me on this one!!) > >I feel the need to know. Thus as good an explanation as exists would be >welcome. Well, the only complete explanation of how it works is the code. Do you understand perl? Building an quarter up (the noddy guide) 1) The data for the quarter as submitted is grouped into "Accessions". An accession is a single contiguous batch of records (e.g. a page) 2) We build a map of which accessions contain records that are common, and go through an alignment process. This results in groups of matched contiguous data, known as Chunks. A chunk may contain one or more Accessions. [Everything after this doesn't happen yet!] 3) We use the +PAGE,nnn lines at the beginning and end of chunks (along with a bit of sense checking on name distributions) to build "superchunks", a series of chunks that we are happy to regard as contiguous. So if Smith, John is at the end of a chunk followed by +PAGE,123 and we have a chunk that starts +PAGE,123 then Smith, John Arthur, we would take that as contiguous. If we have +PAGE,123 Smith, Mary we wouldn't 4) We flag up the start and end points of the superchunks and a human decides whether or not there are missing records or not. If we are happy that there are not, we submit a special record that forces the superchunk to merge. Ultimately, the native sense check and the submitted superchunk forcing records will lead to the whole quarter being a single superchunk. -- Dave Mayall

    10/29/2001 01:52:34