Note: The Rootsweb Mailing Lists will be shut down on April 6, 2023. (More info)
RootsWeb.com Mailing Lists
Total: 2/2
    1. Re: Single-tree gedcom files question
    2. Peter J. Seymour
    3. On 2011-05-16 15:57, singhals wrote: > Peter J. Seymour wrote: >> I have been doing some analysis of a selection of the numerous gedcom >> files out there. One thing I have found disappointing is that the larger >> files tend to consist of a number of fragments rather than a single >> tree. In fact, the larger the file, the more likely it is to consist of >> fragments. Some large files seem to consist mostly of numerous >> unconnected individuals, or couples, or perhaps small trees of three or >> four people. >> So this seems to be how the really large files are made: throw together >> lots of data on the basis that it might be vaguely related. >> This set me wondering: How large do single trees get? So here is a >> challenge for you all, What is the largest single-tree gedcom you are >> aware of, does it consist of sensible data, and more to the point how >> large is it (File size in bytes and number of individuals, both metrics >> are needed please? > > As of 10 am EDT on Sunday 15 May 2011, one of my databases has 24744 > persons in a 18862080M file. .....going back > to the 1750s, ...... > > Another database has 14088 persons (descendants and spouses) in a > 6197248M file. This one is taken from a 1980s book based on 1970s > research; much of it has been confirmed in official records. > > Neither of these are particularly large in my corner of the world. > > FWIW. > > Cheryl Thanks. A rough calculation shows the 24744 file as around 10 times more fully populated than the "Diana" file previously mentioned. What this says to me is that 'number of generations' should be included in tree metrics (I suppose that should have been obvious, but better late than never). The way it would work is that the larger the number of generations covered by a given number of individuals, the less "good" the file is. In the current version of Gendatam Suite, a 20M file might take around 80-100M of RAM when loaded. That works fine with modern computers which might have as much as 4000M of RAM, but wouldn't have been feasible not that many years ago when it was rare for a computer to have more than about 8M of RAM. I suppose my point is that modern computers should cope well with holding and processing these and larger amounts of data. Peter

    05/16/2011 10:35:24
    1. Re: Single-tree gedcom files question
    2. singhals
    3. Peter J. Seymour wrote: > On 2011-05-16 15:57, singhals wrote: >> Peter J. Seymour wrote: >>> I have been doing some analysis of a selection of the numerous gedcom >>> files out there. One thing I have found disappointing is that the larger >>> files tend to consist of a number of fragments rather than a single >>> tree. In fact, the larger the file, the more likely it is to consist of >>> fragments. Some large files seem to consist mostly of numerous >>> unconnected individuals, or couples, or perhaps small trees of three or >>> four people. >>> So this seems to be how the really large files are made: throw together >>> lots of data on the basis that it might be vaguely related. >>> This set me wondering: How large do single trees get? So here is a >>> challenge for you all, What is the largest single-tree gedcom you are >>> aware of, does it consist of sensible data, and more to the point how >>> large is it (File size in bytes and number of individuals, both metrics >>> are needed please? >> >> As of 10 am EDT on Sunday 15 May 2011, one of my databases has 24744 >> persons in a 18862080M file. .....going back >> to the 1750s, ...... >> >> Another database has 14088 persons (descendants and spouses) in a >> 6197248M file. This one is taken from a 1980s book based on 1970s >> research; much of it has been confirmed in official records. >> >> Neither of these are particularly large in my corner of the world. >> >> FWIW. >> >> Cheryl > > Thanks. A rough calculation shows the 24744 file as around 10 times more > fully populated than the "Diana" file previously mentioned. What this > says to me is that 'number of generations' should be included in tree > metrics (I suppose that should have been obvious, but better late than > never). The way it would work is that the larger the number of > generations covered by a given number of individuals, the less "good" > the file is. > In the current version of Gendatam Suite, a 20M file might take around > 80-100M of RAM when loaded. That works fine with modern computers which > might have as much as 4000M of RAM, but wouldn't have been feasible not > that many years ago when it was rare for a computer to have more than > about 8M of RAM. I suppose my point is that modern computers should cope > well with holding and processing these and larger amounts of data. > > Peter The larger database runs 13 generations from the OP to the newest addition. The smaller one is 12 gens, OP to 2010. A third database has 41 Gens to the Great Ethelred, 2953216, 3939 individuals. Data is good to about the 8th gen, as good-as-it-gets-in-the-US Gens 9-13, but at the 14th gen it waffles off into the 16th century and is accordingly only showing direct-line ascent, no sibs. Could be an issue in determining a ratio for big/good? A lot of folks do just record straight line ancestry. And a lot of folks omit the more lyrical connections to Charlemagne or the Caesars or Zeus and Odin ... Cheryl

    05/16/2011 08:25:55