RootsWeb.com Mailing Lists
Total: 1/1
    1. Re: Genealogical evidence and data model
    2. singhals
    3. If that's all you're wanting to do, you don't need anything so hi-falautin' as a data-model. You need a simple lab notebook used as a log. Your .RTF file is perfectly good, up to a point, and that point is where/when/if you have to PROVE you didn't go back and tweak the data in it to make it fit. Cheryl Tony Proctor wrote: > I did some work in this area Cheryl but I elected to keep a simple rich-text > description of the blow-by-blow gathering of evidence, e.g. where it came > from, how, snippets of conversations with individuals (copied from email, > IM, etc). It felt like projects such as Gentech might be trying to > over-formalise such data. Obviously a lot of data such as linkages, events, > dates, and stuff can be formalised but the record of the 'breadcrumb trails' > you followed to get that data could be as varied in content and format as > any of us could imagine. The provision of a simple "notes" item to accompany > each item of formalised data seemed to be a practical compromise. > > The use of "rich-text" as opposed to plain text allowed me to embed links to > specific parts of the formalised data, but that's covered in other threads. > > Tony Proctor > > "singhals" <singhals@erols.com> wrote in message > news:9N-dnW2F5OThdRHanZ2dnUVZ_oKhnZ2d@rcn.net... > >>Robert Grumbine wrote: >> >> >>> Oh well, a new person to the field, with ideas shaped by another, >>>to whine some about what's available. Nothing new there. But maybe >>>my whining can provide targets (some things I complain about might >>>be solved) or, as we continue, some support for doing certain things >>>could develop. I could write some suitable software to implement >>>certain ideas, if it looked worthwhile. >>> >>> I've done some back reading as I get into the subject, including >>>the gedcom/xml arguments, and am not really trying to go back to those. >>> >>> One interesting thing to me was the mention of the GENTECH >>>Genealogical Data Model. The sad news there being that, apparently, >>>nobody actually implements it. Or anything particularly close. >>> >>> I come to the computing/data from a science field (oceanography) >>>and one of the things which has promptly bothered me is that the >>>software available (paf, legacy, reunion) seems far too aimed >>>at conclusions rather than evidence, and even more poorly aimed >>>at representing source information trails. >>> >>> The evidence trail is something particularly bothersome >>>to me. From my field, let's say our original observation is that it >>>was 22.2 C. Now, if that was all we had, we'd be ticked, because it >>>doesn't tell us when the observation was taken, where it was, or >>>how it was taken. All these metadata are important, and usually you can >>>get them (with sufficient patience and phone calls, rather like >>>genealogy in that, it seems). >>> >>> But that is only the proverbial tip of the ice berg. Because >>>that 22.2 C observation (with rest of support) is almost certainly not >>>exactly the number we're going to use for analyzing the air-sea >>>heat flux, or sea surface temperature, or whatever it is we're doing. >>>The thing is, each observing method has biases. We know this, so >>>adjust for them as relevant to our problem at hand. The problem that >>>we _could_ run in to is that the 22.2 we now see is not the actual >>>original observation. Someone could already have made the adjustment >>>for intake temperature bias. How we avoid this is that the data >>>(are supposed to be) are given histories. The original observation >>>(and its metadata) are augmented by a new value and _its_ metadata >>>(22.4 C after George applied John Doe's intake temperature bias >>>correction, say), and this additional information then follows along. >>>I could decide that John Doe's correction method is not the best, >>>and instead apply, myself, Mary Roe's -- to the original 22.2, now >>>that I know the 22.4 was after somebody else applied a correction I >>>don't like to arrive at it. Not clear to me yet (I've been doing >>>some light reading of the data model document, but not carefully >>>nor complete) whether the GENTECH supports this sort of consideration. >>> >>> A different problem is that the typical software treatment seems >>>to be that it has little or no ability to track exactly what the >>>evidence and sources are. For instance, it seems that if I import a >>>file from someone and they cite a census record, I have my choice of >>>ignoring that _my_ source was Jane Genealogist, not the orignal record, >>>and preserve the census citation, or I can _add_ Jane as a source. >>>Now this is a problem, in my mind. When I look later, it will show >>>two sources -- the census, and Jane. But my real state of knowledge >>>is only that Jane _said_ the census had some information. This isn't >>>two independant sources, it's 1 source, 1 step removed from the >>>primary document. (Please, no jumping on that usage, I realize that >>>there's a trade meaning to the term 'primary document', and census >>>isn't an example.) What I want the software to do is, when I import >>>a file that has citations, mark that my source is Jane, and her >>>sources were ... whatever she said. If I'm making a 20th generation >>>copy/import (of a copy of a copy ...), then the software should show >>>the prior 19 importers as well as the original person who looked at >>>a document. GENTECH seems to support this concern of mine, but >>>with no implementation thereof, I'm still sol. >>> >>> >> >> >>First off -- PAF, Legacy, Reunion are all lineage-linked >>databases. You'll probably be slightly happier with one of >>the EVENT-linked databases; I know there are at least two, I >>remember only one name (The Master Genealogist). >> >>Second, when those older programs were being written, a >>permanent way to record conclusions is what was wanted. NO >>ONE wanted to have to keep handwriting copies for the family >>if the computer would print it out for you. TMG came along >>later, when computer genealogy wasn't quite as insular as it >>had been. But, I'd venture to suggest that out of any 100 >>genealogists at least 51% _still_ want a program to record >>their conclusions so they can print it out. This doesn't >>mean that 49% is insignificant, it just means it's the minority. >> >>Now. >> >>I like the concept (I can hear people falling over in >>droves) of tracking who-said-what-and-when-did-he-say-it. >>However, let's bring a touch of realism in ... I'll even >>play fair and use one of my smaller databases as the example. >> >>Database L has 2000 names; each name has one source per >>datapoint (i.e., a source for the name, for the parent >>relationship, for the bd, for the bp, for the spouse, for >>the md, for the mp, for the dd, for the dp), which is 10 >>sources per name, potentially 20,000 source entries. By >>the time that data is re-tagged with each of 20 iterations, >>it is going to be unmanageable. The more supporting >>documentation (i.e., complete extracts of books, images of >>documents, etc etc) you include, the faster it will become >>unmanageable. >> >>I tried doing it manually for one project, but it palled >>very quickly. >> >>I still like the idea of knowing where you got it, but I'm >>unconvinced it is worth the programmer's effort or the >>user's effort of maintaining the chain-of-evidence. >> >>Cheryl > > >

    01/30/2008 08:16:17