On Wed, 11 May 2011 13:04:35 -0700 (PDT), Richard Smith <[email protected]> wrote: >On May 11, 5:44 pm, Ian Goddard <[email protected]> wrote: > >> I think your requirements ought to be met by all genealogical S/W (give >> or take choice of platform). I doubt they're met by any. > >Well, I'm pleased you don't think my requirements are unreasonable. I >can't say I'm surprised that you don't think any existing software >meets them, as that was the conclusion I was coming to, too. I see a >lot of evenings writing code in my future :-/ The Gentech data model >seems a pretty good starting point, to me. It's shame it seems to >have spent the last decade languishing unloved and, so far as I can >see, unimplemented. Maybe I'm about to discover why. I've been looking for/pleading for such a program for many years. I produced a pattern database with relevant tables in MS Access, for others to look at and comment, in the hope that someone would be able to produce an application that would tie it all together, but no such luck. The sample database is at: http://groups.yahoo.com/group/gensoft/ -- Steve Hayes from Tshwane, South Africa Web: http://hayesfam.bravehost.com/stevesig.htm Blog: http://methodius.blogspot.com E-mail - see web page, or parse: shayes at dunelm full stop org full stop uk
On Sunday, May 22, 2011 6:07:23 AM UTC-4, Ian Goddard wrote: > Peter J. Seymour wrote: > > > > In my experience merging should be regarded as a one-way process, if you > > do it, you need to be confident you will not want to revert. > > Coming from a scientific background I have real problems with this view. > Any conclusion is simply the holder's best summary of available data > at the time. It is open to being falsified by better data becoming > available and hence the required confidence you mention must be lacking. > > -- > Ian > > The Hotmail address is my spam-bin. Real mail address is iang > at austonley org uk Ian, Of course, I agree with you. The question is, should the evidence we use to make our genealogical conclusions appear in our databases as entities of their own, and if they should, what in what form should they appear? Today's programs don't allow it. Some, like Gramps, have an event record that can be used for the "eventa" concept (see my recent response to Peter), but not an independent (e.g., never merged or modified) "persona" concept. The question is, do we need/want the codified evidence records as separate records in our databases or not? My answer is a resounding yes. There are many opponents to the idea, however. One camp, represented by some of the Better GEDCOM contributors, believe that evidence, if it be codified into our databases at all, should be included within the source record of the source that the evidence came from, maybe codified ("marked up") or maybe just as unstructured notes. This is better than nothing, but evidence in that form is very hard to compute with. Tom Wetmore
On Sunday, May 22, 2011 5:46:43 AM UTC-4, Peter J. Seymour wrote: > On 2011-05-21 22:15, Tom Wetmore wrote: > > Peter, > > > > I do tend to hyperbole, so thanks for calling me out! > > > > The Gramps program allows you to add "events" to a database in a standalone manner (I believe). Later you can link those events to the persons. It's still a "person-based" system. > > > > As Wes pointed out, there is nothing preventing a user of "ordinary" programs creating separate persons for each "evidence person" and then merging them into the final person later when they decide who is who. This is what I have to do with my program now. The problem with this approach is that once you merge you loose your research history. We need nondestructive merges, which I believe is best done by just building up trees of person records. > > > > I will look up the Gendatam program. Thanks for the tip. > > > > Tom Wetmore > > In my experience merging should be regarded as a one-way process, if you > do it, you need to be confident you will not want to revert. An > inappropriate merge can be impossible to recover from without removing > and re-entering amounts of data. A halfway house would be to have links > or groups to bring the relevant records together in a controlled way. > How much further you go depends on how purist you want to be about data > and reasoning trails. I am all for working to eliminate such links by > merging. The merge doesn't (or shouldn't) destroy the original evidence > although it may discard some of the deductions made along the way. > However, I am not offended by unmerged records. > > Peter Peter, My concerns are only that 1) evidence be codified into a useful form in our databases; 2) our conclusion persons be formed only from information taken from evidence; and 3) that the evidence not be modified or destroyed. My answer to these needs is quite simple, persona records to hold the evidence and person trees with persona records at the leaves to hold the reasoning and conclusions. The evidence is not merged with this approach. There may be other workable arrangements, but I am convinced the methods I have outlined most naturally mimic and therefore support the genealogical research process. I have steered clear of bringing the concept of events into this, but they are also key. My method includes event records at the evidence level as well (since so much evidence describes events involving persons in multiple roles, rather than just persons). Persona records therefore do not always occur in isolation, but often as constellations of persona records with an event ("eventa"?) record that binds them to a time and place and assigns them roles, allowing the personas to have context sensitive attributes (e.g., age at event, residence at event, and so on). And of course, the roles infer the relationships between the personas we are most interested in discovering. Tom Wetmore
On 05-21-2011 17:15, Tom Wetmore wrote: > once you merge you loose your research history Unless you intentionally attach your reasons for merging to the record somehow. -- Wes Groleau There are two types of people in the world … http://Ideas.Lang-Learn.us/barrett?itemid=1157
On 05-21-2011 17:10, Nick Matthews wrote: > I've yet to be convinced on the usefulness UUID/GUIDs, I'm never sure > what they're supposed to show. Facts should be linked to their source > and conclusions linked to the person who made them - anything else seems > to vague. Merely another form of primary key, but one which (allegedly) will always meet the uniqueness constraint even if generated by a different database. One way to distinguish two "different" records when no other difference can be seen. -- Wes Groleau There are two types of people in the world … http://Ideas.Lang-Learn.us/barrett?itemid=1157
On 21/05/2011 11:39, Ian Goddard wrote: > > If you intend being "Able to compare and transfer data directly between > databases" you need to think in terms of an import and export in a > format which directly mirrors the data model. Presumably you're > thinking in terms of XML for such an external representation. > As currently designed, the entire database is a single SQLite3 file. This has a number of advantages, not least, it's easy to move about and copy and we're not constrained by memory size since it doesn't have to be read in first (although I am a bit concerned about the file size once we start putting scans and photos into it - I may consider splitting these off). The program can transfer data by connecting one database to another and just transfer the records you are interested in. I will be adding GEDCOM export/import for transfer to/from other programs, but I am not anticipating "round trip" accuracy, the idea will be to create a new database from a GEDCOM file and then attach it to your working database. > Can I suggest that rather than start straight in with an SQL database& > hard code you prototype with XML? This suggestion arises from personal > experience of projects where XML was the medium of data exchange between > partners and getting the right expression of data in XML was a big help > in data modeling. You might then find it more convenient to store data > as XML fragments rather then conventional database columns. > Although the basic design is (already) a relational database, one of the key elements within it will be the "reference document" - this document holds the raw data that is used as the source of all facts used. They can be transcriptions of censuses, certificates or parish register entries or summaries of other documents, emails, letters, conversation transcripts or the musings of the researcher. These documents will (eventually) be in XML form so you can link words and phrases to individual records. In effect, the researcher is breaking a document down into atomic parts and then coalescing the parts of a number of documents into a narrative. This sort of approach can become tedious and so needs lots of help from the computer to be workable, which is what I am working on at the moment. Since it is always possible to add more documents which support (or not) the current conclusions - you can work in the opposite direction and start with a tree and add the documentation in afterwords. > Also, you have integers as PKs in the database design. This is the most > efficient way to glue the tables together internally but if it's to meet > the data exchange aim you need something in the external representation > which will be unique across databases such as UUID/GUIDs. > I've yet to be convinced on the usefulness UUID/GUIDs, I'm never sure what they're supposed to show. Facts should be linked to their source and conclusions linked to the person who made them - anything else seems to vague.
Wes, Tom: >Once you merge you loose your research history. Wes: >>Unless you intentionally attach your reasons for merging to the record somehow. You can attach a reason if you merge, but that doesn't enable an "undo" operation. Imagine the state of affairs after merging a third or fourth record. Imagine the complexity that would have to be added to the record to allow undos at that point. Merging is evil. You never want to do it. Merging is not summarizing. You want the "tree" of all evidence that goes together to create the final conclusion persons. That is the "true map" of your reasoning, your thoughts laid bare. If you change your mind all you do is rearrange your trees. No fuss, no muss. The final conclusion persons should summarize the facts that you believe best characterize all the information gleaned from all the evidence. There are very smart ways to do this, including simply letting the conclusion person "inherit" the information from the evidence records in cases where there is no conflict of information. Genealogy is history. History is based on researching records. Records are holy. Merging records destroys them. Don't mess with the records! Smile, smile. Tom Wetmore
On 2011-05-20 19:36, Tom Wetmore wrote: > Richard, > > I'm late to this discussion, but thought I'd leap in. Your first requirement is sometimes discussed using the terms "record-based genealogy" or "evidence-based genealogy". > > All current genealogical system are "person-based" or "conclusion-based." That is , you only add information to your database that you know to refer to a known person. ..... > > Tom Wetmore That is not true, Gendatam Suite for instance allows a variety of records to be added to a database as initially standalone records and I presume some other systems do to. What I find particularly advantageous to that arrangement is that you are not constrained to just one method of working with the data, you can add to it and link it together in whatever sequence you happen to find convenient. You can even have the computer help you do the linking. The proof of the pudding is then to run a chart and examine the results. Peter
Peter, I've read all the Gendatum pages, and now I don't think you called me out. Maybe you can convince me. I don't see any good way that Gendatum can handle the persona concept, which in my mind is the hallmark of handling record-based genealogy. A persona is "codified evidence," that is, all the information that can be gleaned about a single person from a single item of evidence. It becomes a record in a database that is indexed, that can be searched for and indexed by name, that can be searched for and indexed by any date that might be mentioned in the evidence, that can be searched for and indexed by any place mentioned in the evidence, that can be searched for and indexed by any property of the person mentioned in the evidence. Does Gendatum handle these kinds of records? These are not the Gendatum concept of a person record, since those records are supposed to represent real persons, in other words, "conclusion persons" as I mentioned before. Looking at the diagram of the Gendatum model and reading the documentation in detail, I can't see how these kinds of records can be added to the database. Let me ask it this way. Say you collected 75 records about persons with names you were interested in, but you didn't yet know if they were the persons you were interested in. What could you do with Gendatum with respect to those 75 records? I assume you would want to be able to access all the information in those records quickly by computer, so you wouldn't want them just as paper copies or as image files. I assume you would want them codified in some way as records in your database, so you could search through them, sort them, group them into different groups and arrangements as you hypothesized about the real persons they might represent. What would you do with this information to help you do your reasoning and help you make your conclusions? I believe that how this question is answered defines whether a genealogical application is "just" conclusion based, or whether it also supports the evidence part of genealogy. As I said, most genealogical applications don't give you a good way to support this kind of evidence data. What you are "supposed" to do with nearly all genealogical program today, is to look at the physical evidence, then automatically know as if by magic exactly what real person that evidence applies to, and then simply add whatever new information you learned about the person from that evidence to the proper person record in your database. So evidence only appears in your database as attributes of other records. If Gendatum answers the question this way, and it looks to me like it does, then it is a conclusion-based system pretty similar to the others. But if Gendatum has a way to add those 75 records in such a way that the genealogist can truly use them, reason about them, rearrange them, build them into real person records, then I'd agree that Gendatum "crosses the chasm" (as this problem has been characteristic on the Ancestor Insider blog). I'd recommend that people read the "Crossing the Chasm" blog entries, as this is exactly the "first requirement" that was brought up by Richard Smith in the original post on this thread. Tom Wetmore
Peter, I do tend to hyperbole, so thanks for calling me out! The Gramps program allows you to add "events" to a database in a standalone manner (I believe). Later you can link those events to the persons. It's still a "person-based" system. As Wes pointed out, there is nothing preventing a user of "ordinary" programs creating separate persons for each "evidence person" and then merging them into the final person later when they decide who is who. This is what I have to do with my program now. The problem with this approach is that once you merge you loose your research history. We need nondestructive merges, which I believe is best done by just building up trees of person records. I will look up the Gendatam program. Thanks for the tip. Tom Wetmore
Tom Wetmore wrote: > Think of a persona as one of those old-style index cards with all the index holes punched all around them Been there, done that! Back in the days when I worked in labs we had a system based on those for identifying wood fragments under the microscope. You bought the booklet, a stack of printed cards & a notch cutter and cut cards from the list in the booklet, one card for each species. -- Ian The Hotmail address is my spam-bin. Real mail address is iang at austonley org uk
Nick Matthews wrote: > > I didn't intend to comment on this thread because my project is at a > very early stage, there's no functional program yet but there is code > and the beginnings of a practical (I hope) database design at > thefamilypack.org - but that was such an actuate description of what I > am trying to achieve that I feel obliged to point it out. > > In particular, you mention standing data; I think this is an area where > open source collaborative efforts can really be made to work. I'm sure > that anyone who has spent a few years on their family tree will have > become expert on some small areas of local history and geography, if > there was a simple way to contribute that expertise, without commercial > interests taking advantage, then I'm sure it would happen. FreeBMD and > friends are a good example of what can be done. > > Unfortunately this will be the last piece in what has become a very > large jigsaw in designing such a system - but a lest it is being thought > about at the beginning of the process. > > If anyone does go to the trouble of looking it up - I should point out > the the database design only includes the minimum necessary for the > program code that is being written. So if you have a question starting > "Why doesn't the database include ...." the answer is almost certainly > "No, not yet, but ...". If you intend being "Able to compare and transfer data directly between databases" you need to think in terms of an import and export in a format which directly mirrors the data model. Presumably you're thinking in terms of XML for such an external representation. Can I suggest that rather than start straight in with an SQL database & hard code you prototype with XML? This suggestion arises from personal experience of projects where XML was the medium of data exchange between partners and getting the right expression of data in XML was a big help in data modeling. You might then find it more convenient to store data as XML fragments rather then conventional database columns. Also, you have integers as PKs in the database design. This is the most efficient way to glue the tables together internally but if it's to meet the data exchange aim you need something in the external representation which will be unique across databases such as UUID/GUIDs. -- Ian The Hotmail address is my spam-bin. Real mail address is iang at austonley org uk
Tom Wetmore wrote: > The term "persona" was popularized by the GenTech model, but that model is so hard to understand (because it is a fully normalized relational model, where the normalization completely obfuscates the underlying data model), that it has gone no where. Actually, I think the layout of the GenTech model is largely what obfuscates it. Redrawing to fit the work-flow would help. However it is an ER model in an increasingly OO world and I think an OO approach would be better. "Simple" concepts such as names are really only simple in a particular cultural setting; forename/surname is only one possible naming system. An OO approach would allow for a base class of "Name" which could be used as a place-holder wherever a name is required in the data model but implemented by an appropriate sub-class where a name is required in real data. The trouble is that what we have in current S/W seems to have been put together to help those who had done extensive paper research to write it up and not to help in that research in the first place. ISTM that if you start by considering the research process you automatically come up with something which is quite like the GenTech for the simple reason that that's the way the data really is. Certainly my own initial musings, way before I encountered the GenTech model, were very much along the lines of a subset of it - right down to the name "Persona" which again is a pretty obvious one; they're the /Dramatis personae/ who play the roles in the events. -- Ian The Hotmail address is my spam-bin. Real mail address is iang at austonley org uk
Ian, I certainly agree that the genealogical data model should be OO, consisting obviously of some entities (sources, events, persons, ...) with some well defined attributes (names, dates, places [if not entities], ...) and relationships. It's a classic ER/OO setup, and we'd be crazy not to exploit it as such. GenTech started as an OO model, but its leader was a relational database adherent, who insisted that instead of an ER model the GenTech model should be a set of fully normalized relational tables. For that reason alone I think GenTech died. (Well, and also the fact that EVERY fact had to be added with a separate ASSERTION record). A model should be a high level ER/OO model, not a model already customized for one particular representation in one kind of database. Personally, I think the network database is a much more natural database for genealogy -- you keep the beautiful and natural structure of the underlying ER/OO model, with no disadvantages as far as I've ever (20+ years of experience) found. I have used B-tree based databases for all my genealogical software and have gotten wonderful performance out of them. Well, the disadvantage is having to provide your own framework for searching, but this is not all that hard. In my original software, LifeLines, the database was nothing more than all the records kept in GEDCOM format, a GEDCOM that could be extended by whatever keys the user needed. It allowed all the GEDCOM record types, and allowed users to also define their own record types, though this feature was never used, and the program gives no support for it. In my current generation I am also keeping the OO records as the contents of the database, now in JSON or XML, and adding support for personas (same datatype as persons with no distinctions), multi-tierd person trees, event records, place records, and so on. LifeLines was written when I was a junior genealogist, fully steeped in the person-based or conclusion-based ethos. But now I have reached the point in my genealogical life where I have thousands of "unattached" items of evidence I am sorting through, and I MUST get that data into a database in the form of personas just so I don't go insane searching through unorganized pieces of paper, index cards, and image files on my computer. As personas, the data is indexed, sortable, searchable and manipulable in many ways. Think of a persona as one of those old-style index cards with all the index holes punched all around them (look up McBee edge-notched cards on Google -- the physical analog of all associative memory systems! -- I LOVED those things when I was a kid, fifty years ago -- see http://www.kk.org/thetechnium/archives/2008/06/one_dead_media.php ). Stick your needle in different holes and your personas get sorted by name, by date, by place, by residence, by occupation, by whatever. Lots better than looking at certificates, pages in city directories, pages in family history books! Tom Wetmore
On 2011-05-20, Joe Makowiec <[email protected]> wrote: > On 20 May 2011 in soc.genealogy.computing, Peter J. Seymour wrote: > >> This amused me. Toying with providing better resilience in report >> annotation at high generation numbers, I encountered this unhelpful >> piece of numerology: >> >> Generation 149: G146 Grand Parents List size 16 individuals (out >> of 356811923176489970264571492362373784095686656 possible) > > I didn't do the calculation out to check the low digits, but the order > of magnitude is correct. > >> As far as I know the number is correct. I would not be surprised if >> it is more than the total number of atoms in the universe. > > Actually, it's about the square root of the total number of atoms in > the universe, which is estimated at about 10^80. (The number above is > 3 * 10^44.) > > http://en.wikipedia.org/wiki/Observable_universe#Matter_content > >> Obviously, as you go back through a binary ancestral tree, the span >> doubles at each generation. At the same time, on travelling back in >> time the total world population size dimishes. The bottom line is a family tree is _not_ a tree; it's a directed acyclic graph. (Some have called it a forest, but I don't remember the precise definition of a forest.) Some time between the present and 1500AD (about 15 generations back), you will most likely find that you are your own distant cousin--that the same person appears in multiple places in your tree. By about 1000AD (about 30 generations back), you'll have _MANY_ such duplications. If the tree/graph were printed on paper, it would resemble more of a kite or diamon with you at left end and Adam and Eve at the right end. In many cases, there will be multiple points in time where the number of unique ancestors hits a local maximum. > Brian Pears has written some interesting articles on the subject: > > http://www.bpears.org.uk/Misc/AncestorParadox/ Yes, that's kind-of what he says, too. -- Robert Riches [email protected] (Yes, that is one of my email addresses.)
On 05-20-2011 21:09, Richard Smith wrote: > single event derives from multiple sources (e.g. the date of birth and > place of birth from separ ate sources), it's often not possible to > source those things separately. Well, with GEDCOM strict, that is true. That's why I try to maintain images of sources when possible. For a long time, my "program" was a text editor, allowing me to do things GEDCOM supported (loosely) and genie programs didn't. Even the most flexible (LifeLines—thank you Thomas!) eventually frustrated me with limitations. But eventually, the file got big enough that the need for assistance overcame the distaste for limitations. Some of the things I did in the text editor were not GEDCOM-compliant in the strictest sense, but WorldConnect had no problem with them. (like attaching sources to dates and places, or putting multiple dates on one event.) -- Wes Groleau There are two types of people in the world … http://Ideas.Lang-Learn.us/barrett?itemid=1157
On 05-20-2011 14:36, Tom Wetmore wrote: > That is, our databases should support the entire "evidence layer" of data, > which they don't do today, with the "conclusion layer" of data, which is > all they support. I believe that the evidence should be codified into > "person records" that are structurally very similar to the conclusion > person records. That is, all your John Smiths should be encoded into > their own "person records." Seems to me that if one chooses to do so, many systems will allow you to create additional INDI records and merge them when the evidence is sufficient. To me, that sounds like they do support what you describe. -- Wes Groleau There are two types of people in the world … http://Ideas.Lang-Learn.us/barrett?itemid=1157
Wes, You don't want to merge the "evidence persons," the "personas", into your conclusion records. As you say, you can do this today with any of today's programs. When you merge you loose the integrity of your evidence. The "personas" should be permanent, as they are the codification of your evidence. There is some issue of how you "inherit" information from the personas up into the conclusion records, but this has been discussed and there are solutions. Also note that one can handle two-level trees (personas are grouped into conclusion persons) or one can imaging many-tier trees with many levels. An exmaple of the multi-tiered approach from my own data comes from a person who is found in a series of city directories of Norwich, Connecticut, and a series of city directories in Yarmouth, Nova Scotia. These records all started out as independent personas in my database, one created from each city directory entry. When I concluded that all the ones from Norwich were the same person I created a conclusion person with those personas as "leaves" in a conclusion tree. When I decided all the ones form Yarmouth, Nova Scotia, were the same person, I created a conclusion person from them. Then later, when I decided that the Connecticut man and the Nova Scotia man were one and the same I created a third conclusion person from the two earlier conclusion records. Now I have a three-tiered structure of records and conclusions that exactly match 1) the data I have found; 2) the conclusions I have made from that data; and 3) properly structured conclusion persons holding my best inferences. This is frankly beautiful. Tom Wetmore
On May 21, 12:58 am, Wes Groleau <[email protected]> wrote: > On 05-20-2011 14:36, Tom Wetmore wrote: > > > That is, our databases should support the entire "evidence layer" of data, > > which they don't do today, with the "conclusion layer" of data, which is > > all they support. I believe that the evidence should be codified into > > "person records" that are structurally very similar to the conclusion > > person records. That is, all your John Smiths should be encoded into > > their own "person records." > > Seems to me that if one chooses to do so, many systems will allow you > to create additional INDI records and merge them when the evidence is > sufficient. To me, that sounds like they do support what you describe. Not really. The problem you get in most systems is that once you've merged individuals, you lose the ability to see what source actually said. It's often not possible to attach a particular spelling (or variant) of the name to a particular source. If information for a single event derives from multiple sources (e.g. the date of birth and place of birth from separ ate sources), it's often not possible to source those things separately. And trying to properly source relationships is impossible in all the current systems without implying information in the source that's simply not there. Consider the following example, slightly simplified from a real case in my family: * John is the grandson of Thomas. [Source: 1901 census.] * Thomas and Sarah's marriage only had one child. [Source: 1911 census.] * George is the son of Thomas and Sarah. [Source: baptism register.] * Thomas and Sarah were bachelor and spinster when they married. [Source: marriage cert.] * Thomas and Sarah lived into their eighties and were buried together. [Source: gravestone.] Taken together these five pieces of information make a pretty strong case for George as the father of John, as it seems unlikely that Thomas had any children other than George. But none of the sources actually say this. How would you encode this example in GEDCOM? For what it's worth, I thought Tom's post was spot on regarding the deficiencies in the current programs. Richard
On 20 May 2011 in soc.genealogy.computing, Peter J. Seymour wrote: > This amused me. Toying with providing better resilience in report > annotation at high generation numbers, I encountered this unhelpful > piece of numerology: > > Generation 149: G146 Grand Parents List size 16 individuals (out > of 356811923176489970264571492362373784095686656 possible) I didn't do the calculation out to check the low digits, but the order of magnitude is correct. > As far as I know the number is correct. I would not be surprised if > it is more than the total number of atoms in the universe. Actually, it's about the square root of the total number of atoms in the universe, which is estimated at about 10^80. (The number above is 3 * 10^44.) http://en.wikipedia.org/wiki/Observable_universe#Matter_content > Obviously, as you go back through a binary ancestral tree, the span > doubles at each generation. At the same time, on travelling back in > time the total world population size dimishes. Brian Pears has written some interesting articles on the subject: http://www.bpears.org.uk/Misc/AncestorParadox/ -- Joe Makowiec http://makowiec.org/ Email: http://makowiec.org/contact/?Joe Usenet Improvement Project: http://twovoyagers.com/improve-usenet.org/