>> you don't need anything so hi-falautin' as a data-model ...which is basically my point Cheryl. You do need the data model for the formalised data, and that model must be flexible enough to cover the semantic & syntactic issues with variant dates, names, places, etc. All I was saying is that this kind of audit trail of how you came about the formalised data can be simply attached to each item as a free-form meta-data tag. It would seem to be a case of knowing when best to formalise data and when best to leave it free-form Tony Proctor "singhals" <singhals@erols.com> wrote in message news:tM6dnc77o_CMRj3anZ2dnUVZ_oKhnZ2d@rcn.net... > If that's all you're wanting to do, you don't need anything > so hi-falautin' as a data-model. You need a simple lab > notebook used as a log. Your .RTF file is perfectly good, > up to a point, and that point is where/when/if you have to > PROVE you didn't go back and tweak the data in it to make it > fit. > > Cheryl > > Tony Proctor wrote: > > > I did some work in this area Cheryl but I elected to keep a simple rich-text > > description of the blow-by-blow gathering of evidence, e.g. where it came > > from, how, snippets of conversations with individuals (copied from email, > > IM, etc). It felt like projects such as Gentech might be trying to > > over-formalise such data. Obviously a lot of data such as linkages, events, > > dates, and stuff can be formalised but the record of the 'breadcrumb trails' > > you followed to get that data could be as varied in content and format as > > any of us could imagine. The provision of a simple "notes" item to accompany > > each item of formalised data seemed to be a practical compromise. > > > > The use of "rich-text" as opposed to plain text allowed me to embed links to > > specific parts of the formalised data, but that's covered in other threads. > > > > Tony Proctor > > > > "singhals" <singhals@erols.com> wrote in message > > news:9N-dnW2F5OThdRHanZ2dnUVZ_oKhnZ2d@rcn.net... > > > >>Robert Grumbine wrote: > >> > >> > >>> Oh well, a new person to the field, with ideas shaped by another, > >>>to whine some about what's available. Nothing new there. But maybe > >>>my whining can provide targets (some things I complain about might > >>>be solved) or, as we continue, some support for doing certain things > >>>could develop. I could write some suitable software to implement > >>>certain ideas, if it looked worthwhile. > >>> > >>> I've done some back reading as I get into the subject, including > >>>the gedcom/xml arguments, and am not really trying to go back to those. > >>> > >>> One interesting thing to me was the mention of the GENTECH > >>>Genealogical Data Model. The sad news there being that, apparently, > >>>nobody actually implements it. Or anything particularly close. > >>> > >>> I come to the computing/data from a science field (oceanography) > >>>and one of the things which has promptly bothered me is that the > >>>software available (paf, legacy, reunion) seems far too aimed > >>>at conclusions rather than evidence, and even more poorly aimed > >>>at representing source information trails. > >>> > >>> The evidence trail is something particularly bothersome > >>>to me. From my field, let's say our original observation is that it > >>>was 22.2 C. Now, if that was all we had, we'd be ticked, because it > >>>doesn't tell us when the observation was taken, where it was, or > >>>how it was taken. All these metadata are important, and usually you can > >>>get them (with sufficient patience and phone calls, rather like > >>>genealogy in that, it seems). > >>> > >>> But that is only the proverbial tip of the ice berg. Because > >>>that 22.2 C observation (with rest of support) is almost certainly not > >>>exactly the number we're going to use for analyzing the air-sea > >>>heat flux, or sea surface temperature, or whatever it is we're doing. > >>>The thing is, each observing method has biases. We know this, so > >>>adjust for them as relevant to our problem at hand. The problem that > >>>we _could_ run in to is that the 22.2 we now see is not the actual > >>>original observation. Someone could already have made the adjustment > >>>for intake temperature bias. How we avoid this is that the data > >>>(are supposed to be) are given histories. The original observation > >>>(and its metadata) are augmented by a new value and _its_ metadata > >>>(22.4 C after George applied John Doe's intake temperature bias > >>>correction, say), and this additional information then follows along. > >>>I could decide that John Doe's correction method is not the best, > >>>and instead apply, myself, Mary Roe's -- to the original 22.2, now > >>>that I know the 22.4 was after somebody else applied a correction I > >>>don't like to arrive at it. Not clear to me yet (I've been doing > >>>some light reading of the data model document, but not carefully > >>>nor complete) whether the GENTECH supports this sort of consideration. > >>> > >>> A different problem is that the typical software treatment seems > >>>to be that it has little or no ability to track exactly what the > >>>evidence and sources are. For instance, it seems that if I import a > >>>file from someone and they cite a census record, I have my choice of > >>>ignoring that _my_ source was Jane Genealogist, not the orignal record, > >>>and preserve the census citation, or I can _add_ Jane as a source. > >>>Now this is a problem, in my mind. When I look later, it will show > >>>two sources -- the census, and Jane. But my real state of knowledge > >>>is only that Jane _said_ the census had some information. This isn't > >>>two independant sources, it's 1 source, 1 step removed from the > >>>primary document. (Please, no jumping on that usage, I realize that > >>>there's a trade meaning to the term 'primary document', and census > >>>isn't an example.) What I want the software to do is, when I import > >>>a file that has citations, mark that my source is Jane, and her > >>>sources were ... whatever she said. If I'm making a 20th generation > >>>copy/import (of a copy of a copy ...), then the software should show > >>>the prior 19 importers as well as the original person who looked at > >>>a document. GENTECH seems to support this concern of mine, but > >>>with no implementation thereof, I'm still sol. > >>> > >>> > >> > >> > >>First off -- PAF, Legacy, Reunion are all lineage-linked > >>databases. You'll probably be slightly happier with one of > >>the EVENT-linked databases; I know there are at least two, I > >>remember only one name (The Master Genealogist). > >> > >>Second, when those older programs were being written, a > >>permanent way to record conclusions is what was wanted. NO > >>ONE wanted to have to keep handwriting copies for the family > >>if the computer would print it out for you. TMG came along > >>later, when computer genealogy wasn't quite as insular as it > >>had been. But, I'd venture to suggest that out of any 100 > >>genealogists at least 51% _still_ want a program to record > >>their conclusions so they can print it out. This doesn't > >>mean that 49% is insignificant, it just means it's the minority. > >> > >>Now. > >> > >>I like the concept (I can hear people falling over in > >>droves) of tracking who-said-what-and-when-did-he-say-it. > >>However, let's bring a touch of realism in ... I'll even > >>play fair and use one of my smaller databases as the example. > >> > >>Database L has 2000 names; each name has one source per > >>datapoint (i.e., a source for the name, for the parent > >>relationship, for the bd, for the bp, for the spouse, for > >>the md, for the mp, for the dd, for the dp), which is 10 > >>sources per name, potentially 20,000 source entries. By > >>the time that data is re-tagged with each of 20 iterations, > >>it is going to be unmanageable. The more supporting > >>documentation (i.e., complete extracts of books, images of > >>documents, etc etc) you include, the faster it will become > >>unmanageable. > >> > >>I tried doing it manually for one project, but it palled > >>very quickly. > >> > >>I still like the idea of knowing where you got it, but I'm > >>unconvinced it is worth the programmer's effort or the > >>user's effort of maintaining the chain-of-evidence. > >> > >>Cheryl > > > > > >