On 2011-05-16 19:25, singhals wrote: > Peter J. Seymour wrote: >> On 2011-05-16 15:57, singhals wrote: >>> Peter J. Seymour wrote: >>>> I have been doing some analysis of a selection of the numerous gedcom >>>> files out there. One thing I have found disappointing is that the >>>> larger >>>> files tend to consist of a number of fragments rather than a single >>>> tree. In fact, the larger the file, the more likely it is to consist of >>>> fragments. Some large files seem to consist mostly of numerous >>>> unconnected individuals, or couples, or perhaps small trees of three or >>>> four people. >>>> So this seems to be how the really large files are made: throw together >>>> lots of data on the basis that it might be vaguely related. >>>> This set me wondering: How large do single trees get? So here is a >>>> challenge for you all, What is the largest single-tree gedcom you are >>>> aware of, does it consist of sensible data, and more to the point how >>>> large is it (File size in bytes and number of individuals, both metrics >>>> are needed please? >>> >>> As of 10 am EDT on Sunday 15 May 2011, one of my databases has 24744 >>> persons in a 18862080M file. .....going back >>> to the 1750s, ...... >>> >>> Another database has 14088 persons (descendants and spouses) in a >>> 6197248M file. This one is taken from a 1980s book based on 1970s >>> research; much of it has been confirmed in official records. >>> >>> Neither of these are particularly large in my corner of the world. >>> >>> FWIW. >>> >>> Cheryl >> >> Thanks. A rough calculation shows the 24744 file as around 10 times more >> fully populated than the "Diana" file previously mentioned. What this >> says to me is that 'number of generations' should be included in tree >> metrics (I suppose that should have been obvious, but better late than >> never). The way it would work is that the larger the number of >> generations covered by a given number of individuals, the less "good" >> the file is. >> In the current version of Gendatam Suite, a 20M file might take around >> 80-100M of RAM when loaded. That works fine with modern computers which >> might have as much as 4000M of RAM, but wouldn't have been feasible not >> that many years ago when it was rare for a computer to have more than >> about 8M of RAM. I suppose my point is that modern computers should cope >> well with holding and processing these and larger amounts of data. >> >> Peter > > The larger database runs 13 generations from the OP to the newest addition. > > The smaller one is 12 gens, OP to 2010. > > A third database has 41 Gens to the Great Ethelred, 2953216, 3939 > individuals. Data is good to about the 8th gen, as > good-as-it-gets-in-the-US Gens 9-13, but at the 14th gen it waffles off > into the 16th century and is accordingly only showing direct-line > ascent, no sibs. Could be an issue in determining a ratio for big/good? > A lot of folks do just record straight line ancestry. And a lot of folks > omit the more lyrical connections to Charlemagne or the Caesars or Zeus > and Odin ... > > Cheryl No single figure is going to give a good point of comparison of files such as gedcoms with widely differing characteristics. That is why I am musing on what a useful set of metrics might be. So far it has: - Size - Number of individuals - Number of generations To which could be added something like: percentage of individuals in main (or only) tree. Oh and some sort of negative factor for any mention of Charlemagne, Pharoahs, Moses etc. File "goodness" is going to be a rather subjective concept, the metrics are going to be just that: metrics providing points of comparison. Some simple cacluations such as ratios may provide useful information about the characteristics (the data make-up) of a file. Peter
On May 16, 9:07 pm, "Peter J. Seymour" <[email protected]> wrote: > Oh and some sort of negative factor for any mention of Charlemagne, > Pharoahs, Moses etc. Is it fair to include Charlemagne there? It's not a subject that I've looked at in any great detail, but I was under the impression that descents from Charlemagne via Berengar I of Frioul and into the early Portuguese royal family were generally considered valid. Richard