I have been doing some analysis of a selection of the numerous gedcom files out there. One thing I have found disappointing is that the larger files tend to consist of a number of fragments rather than a single tree. In fact, the larger the file, the more likely it is to consist of fragments. Some large files seem to consist mostly of numerous unconnected individuals, or couples, or perhaps small trees of three or four people. So this seems to be how the really large files are made: throw together lots of data on the basis that it might be vaguely related. This set me wondering: How large do single trees get? So here is a challenge for you all, What is the largest single-tree gedcom you are aware of, does it consist of sensible data, and more to the point how large is it (File size in bytes and number of individuals, both metrics are needed please? Peter
On May 16, 8:20 am, "Peter J. Seymour" <[email protected]> wrote: > This set me wondering: How large do single trees get? So here is a > challenge for you all, What is the largest single-tree gedcom you are > aware of, does it consist of sensible data, and more to the point how > large is it (File size in bytes and number of individuals, both metrics > are needed please? If you google for 'BUELL001.GED' you'll find a large GEDCOM file by someone called Matthew James Buell. It is about 3MB and contains about 9,900 individuals, virtually all of whom are related. (It's a descent from the biblical Adam, so clearly parts of it are dubious.) I'm sure I've seen larger databases, though I'm not sure I can point you to one at the moment. But this one seems to have become a fairly standard test database for applications as it's large enough that it can starts hitting scalability issues and diverse enough to include a large range of dates and nationality. Richard
On May 16, 9:26 am, [email protected] wrote: > I've read thru this thread, and I wonder if there will be any program, > including your own, that will fullfil all things that have been asked here. Lots of very good points here. Will I succeed in producing a usable system that'll include all of the features discussed here? I've no idea: quite possibly not. But equally, with judicious reuse of existing technology and as long as I accept it's not something I can knock up in a spare weekend, I don't see why I can't have a jolly good go at it. Also, the process of implementing it will, I think, teach me a lot about the difficulties in such a system. > It seems to me that you're after a kind of logging system, rather than a > genealogy program. But you want to define persons, places, events, > relations, sources and even dates (and a few more?) as entities Characteristics (or attributes if you prefer) are the main one you've missed. That's for things like occupation, and possibly sex (though in my mind that may be handled separately). > and have n:n > relations (in relational database speech) between all of them, including > recursive relationships on all of them. More or less. So one way of implementing it is a pure RDF-like mapping. For the purpose of this discussion, let's call a person / place / event / date / etc. an entity. We could have a table that simply mapped entity to entity, maybe with a field (such as the RDF predicate) that specifies what the mapping represents. In this model, we might map a person to an event, and the predicate field would tell us the person's role in the event. That's a very abstracted view and means that the database has little or structure -- all of the structure must be built on top of it at the application layer. The most obvious possibility is to have separate tables for each of the entity types, and then provide mapping tables for each possible many-to-many relationship (and simple reference fields for any many-to- one fields). This is closer to what traditional genealogy applications do. The GDM solution is intermediate in that it retains tables for each entity type, but still uses a single assertion table for (almost) all of the entity-to-entity mappings. Each of these approaches has merit. Although the RDF-like approach is very abstract and would need a lot building on top of it, it would have the advantage that existing RDF tools could be used, for example, in the reasoning parts of the code. The traditional one table per entity and per mapping approach is certainly the simplest at the database-level, as long as the range of tables needed doesn't get too large. The GDM approach has the advantage of being a de facto (if unimplemented standard), seems very well thought out, and is likely to offer good interoperability with any other programs that do something along these lines. > So, is there then any real structure in the data?? Only in the way, as an > example, that a date cannot appear in the place of a link where one would > expect a person reference. But other than that? The structure is largely built on top of it. One such way is through data schemas that are in further database tables. As an example, I'm proposing that a person's name is simply a collection of textual parts. So, for example, "Richard Andrew Smith" can be parsed as two given names and a surname; or I may choose to distinguish "Richard", the name I actually use, from "Andrew" which I never use. So the Name table contains very little at all (though in practice, it may contain a display name generated from the separate name parts. The Name table would link to a NamePart table with three rows, one for each of "Richard", "Andrew" and "Smith". There would be a NamePartSchema table containing "surname" and "given name". There may possibly be a NameSchema table which allows me to say that typically in Western society, people have one or more given names followed by a surname. Of these four tables, the Name and NamePart tables are where the data lives; the NameSchema and NamePartSchema tables are for encoding rules about the data -- things that might seem tempting to encode in the database schema itself, but where doing so would effectively result in singling out one set of cultural norms as "right" to the detriment of others. Even to an English person, given names and surnames are not sufficient to encode what a source actually says. The source may say "Mrs Jones" or "James Johnson, junior". These extra styles ("Mrs", "junior") are very helpful to us: in recent times "Mrs" would tell us the woman is married; and "junior" tells us that there is probably a father (or other older male relative) with the same name still alive. But the information conveyed by these styles is culturally-dependent. Longer ago in England, "Mrs" was an indication of social status, and in the US "junior" tends not to imply a living father. The latter is an important distinction -- in England we'd probably want the default display name for the individual (i.e. the name displayed on a tree or in an listing) to simply be "James Johnson", but in the US, we're more likely to want "James Johnson, junior". In other countries the differences are more profound. Icelandic patronymics are different from surnames in that a child does not inherit the father's patronymic, but gets one formed from his given name. Icelandic patronymics, like surnames in some of Eastern Europe -- Poland for example, vary depending on the sex of the child: "-son" becomes "-dóttir", or "-ski" becomes "-ska". That's a further level of complexity that could be built on top of the datbase's name schemas. In practice, implementing this is unlikely to be a priority for me as I'm not studying any families where such complexities apply. Nevertheless, I'd certainly want to have given some thought to how I might implement it (not in the least because, as recently as the mid-nineteenth century, I have Prussian ancestry, and it doesn't take much imagination to believe I might have Polish or Russian ancestry if I go back a few generations further). > Can such program be built? Yes, but the degree of freedom you want assures > you to frequent (and some substantial) changes to be applied. And changing > the program is one thing, but assuring your data survive those changes is > another story, where ultimate care will be needed. That's very true, and I certainly don't underestimate the problems there. But I do think the problems are manageable. And taking frequent database backups (or XML exports, or whatever) mitigates the risk somewhat. > In the end, you might have to decide on what to spend most of your time: > coding and maintaining your data, or doing proper research. I'm more or less reaching the point where I'm unlikely to discover many more ancestors through on-line research. Yes, there's a gradual trickle of new sources becoming available on-line, and from time to time, I get new leads from them. I can certainly discover new side branches on-line, but my interest in that is rather less. So most new research these days involves a trip to a records office. That's not something I can do after work, or on the spur of the moment if I get a free few hours on a weekend. But I can spend that time coding and organising my data, and, in any case, that's something I enjoy doing. So yes, it'll use up a lot of time, but probably not time that I could have spent doing genealogical research. > As a side note: > There has been discussions about hierarchy of places. I think trying to > register such things is a bad idea in the first place, because such > "relationships" have been so volatile in history. The only indication one > could give that would remain consistent is something like : place X is part > of (located nearby, ....) Y in 2011. Even geographical coordinates are no > good, since villages etc. have moved in the run of time. I don't disagree that these problems exist, but I do disagree that it's a reason not to try to encode it. I'm not sure whether you're familiar with Phillimore's Atlas and Index of Parish Registers. (If you're area of interest doesn't include England, you probably won't be.) It's a very useful map of the parishes of Britain, how they border each other, roughly where they are, and other useful information such as where the parish registers are and when they date to. Until Victorian times, boundary changes and the creation or amalgamation of parishes were really quite rare. In most cases, the parish boundaries in 1530 were still the same in 1830, and it's these boundaries that Phillimore's depicts. Since then, certainly, there's been a flurry of changes, but I don't think that would necessary devalue something like an electronic version of Phillimore's. Richard
On 05-15-2011 17:03, Richard Smith wrote: > coded cultural norms. The example Ian Goddard quoted earlier about > places is particularly apt. For those of us not based in the US, > places simply don't fit into the neat city-county-state hierarchy that > the US has, and that a number of existing programs assume. In > England, parish-county is often a better hierarchy, but I'm not > advocating that either. What I want is the ability to say that in the > English branch of my family, a particular place name hierarchy should > be used, while, say, the German branch should use something > different. It may take longer to set up, but the result is more far > powerful. Many of us with good reason complain about inadequacies in GEDCOM, yet this is something GEDCOM addresses with the FORM tag. Is there even one program that does anything useful with this tag? In other words, you can say what you want to say in GEDCOM, but your program isn't listening. -- Wes Groleau There are two types of people in the world … http://Ideas.Lang-Learn.us/barrett?itemid=1157
"singhals" wrote: > Lor' God Almighty. If the current one-size-fits-all approach > isn't working there's no rational cause to believe any other > one-size-fits-all approach (including yours) will work better. The whole point of this thread is that I don't believe a one-size-fits- all approach is a good one at all. That's why, for example, I want a rules engine with a configurable set of fuzzy rules instead of hard- coded cultural norms. The example Ian Goddard quoted earlier about places is particularly apt. For those of us not based in the US, places simply don't fit into the neat city-county-state hierarchy that the US has, and that a number of existing programs assume. In England, parish-county is often a better hierarchy, but I'm not advocating that either. What I want is the ability to say that in the English branch of my family, a particular place name hierarchy should be used, while, say, the German branch should use something different. It may take longer to set up, but the result is more far powerful. > ANY approach works better for some folks than for others. > NO approach works best for everyone. I agree with those two points. > ALL approaches work for some. But I don't agree with that. Just because a program allows someone to rapidly amass a collection of names and set it out as a family tree doesn't mean it's working. You only need to look at some of the garbage in some of, say, Ancestry.com's trees to see that there's a lot of so-called research out there that's patently wrong. The way that GEDCOM doesn't adequately support sourcing of data does everyone a disservice, as do programs that base their data models too heavily on GEDCOM. Similarly for programs that make it hard to accurately enter source information (again, as many GEDCOM-based programs do). > Far's I can tell, most genealogy programs allow you to say > what country you're in and none of 30 or so I've used or > seen demonstrated prohibit the addition of OS/NS as > appropriate. But quite a lot can't *default* to something sensible. There are plenty of programs that simply cannot be configured so that 1 Jan 1720 is *by default* treated as a date in the Julian calendar -- i.e. that it is a Friday rather than a Monday -- and that it is the day after 31 Dec 1720. Certainly most programs allow you to specify this, but many do not allow this to be a default. Unless it has been changed very recently, Gramps is an example of such a program. In Gramps, if I enter "1720-01-01 (Julian,Mar25)", it will do the right thing, but so far as I'm aware, it cannot be made to do the right thing if I simply enter "1720-01-01". And Gramps is not the only program I've used that has this problem. > > I'm far more interested in using (and if necessary, writing) a program > > that's flexible enough to cope with the subtleties and ambiguities > > that crop up in real life, than something that's easy to use and has a > > pretty interface. (That said, I don't necessarily see these as > > Of you think I prefer pretty over functional, you're sadly > mistaken. If you're wanting to change globally the > background color or your font color, go for it and God > Bless. If you're proposing to permit font, font style > and/or font color changes INDIVIDUALLY within a single > database -- how can I help? Why on earth are you talking about font colours? Who even mentioned fonts or colours? Compromising flexibility for a pretty user interface is a common problem. Take a look at any word processor to see it in action. They're acceptable if you want to write a letter, or even a privately published pamphlet. But few, if any, professionally publishing houses would use a word processor to lay out books. In the scientific community, a piece of software called TeX (or more commonly these days, LaTeX or XeTeX) is used instead of word processor. It's vastly more flexible and powerful, and allows much better consistency and conferability than a word processor does. And it's interface makes it *far* less easy to use -- it doesn't even have a graphical user interface, for example. You see similar problems with genealogy programs. There was a discussion on this newsgroup around a year ago about software coping with an individual with two possible sets of parents. Although some software can handle this, most cannot. And one commonly given reason for this is that there's no clean way of displaying it in a pedigree view. That's almost a text-book example of where the desire for a pretty interface is allowed to adversely influence the data model. > But, IME, subtleties and ambiguities are not susceptible to > automated analysis with any reliability, because sure as > check the very next after you end testing will FUBAR and > bite you on the sit-upon. Again, you're missing the point. It may or may not be possible to analyse complex cases, and the results will certainly need human checking. But that's not the point. The point is that the information should be stored in a computer-readable form. Suppose I have an individual that I'm interested: let's call him John Smith. I've found two possible baptisms for him. I'm fairly confident he must be either the son of Henry and Sarah or of George and Mary, but I don't know which. Now suppose some time later I'm researching another family in that area and I discover quite unambiguously that John, the son of George and Mary, emigrated to Argentina and died childless out there. I enter that in my program, but I've forgotten that I had previously been interested in these families. I want my program to notice that the original ambiguity can be resolved. Maybe it won't do it automatically -- it's probably best if it doesn't -- but it should, at least, draw my attention to it. > > contradictory goals.) If I'm going to spend thousands of hours using > > the software to organise my research, I certainly don't begrudge a > > extra quarter-hour configuring it so that it's right for me. You're > > right that not everyone will agree, but it's unlikely that an event- > > oriented genealogy program, as opposed to the many existing lineage- > > oriented ones, will appeal to them anyway. > > Perhaps I've not yet had my weekly allotment of caffeine, > but it seems to me you keep changing what you want to do. > First you're wanting to write a program that's "better" than > what's available. Then when we point out that "better" is > fairly subjective, you say you're looking only to customize > YOUR database and revert to writing code. I don't think I've made any substantive changes to what I'm intending to do. Whether I write a program, modify someone else's code, or use an existing program as-is is largely irrelevant. I'm not looking to make work for myself, but I do need a program that matches the three requirements set out in my very first email. That hasn't changed. If I write a program, it'll be open source, and if I modify an existing one, then of necessity it will be. I would hope the program might appeal to others too, but that's not a primary goal. If others don't want to use it, then I care only because it's likely to be symptomatic of bad design decisions that will adversely affect me too at some stage. I've no idea what you think "looking to customize [my] database and revert to writing code" means, other than writing a program, and in any case, I didn't use phrases like those. > Commercial computer programs are written to prevent novices > and children from doing irreparable damage to the facts and > getting the developer sued; they are rarely written for the > purpose of facilitating experimentation by advanced users, > because there is no way to ensure that ONLY advanced users > use the product. That's true of some products, but it's certainly not true of all. The Unix ethos, which has largely been inherited by Linux, does not generally prevent people doing irreparable damage. It's a completely different mindset. The Unix philosophy is to give people powerful tools and expect them to learn to use them properly. If they don't, then it's not the developer's fault if they break something. Sometimes that philosophy can go too far, but generally not. In any case, the idea of the developer getting sued is laughable. Invariably all such liability is disclaimed. > >> Worse, in some cases, families > >> combine cultures, norms, religions ... few would want a > >> program that forces them to use a different program or > >> dataset for each branch of the family. > > > A good point, but one that I think can be solved easily enough. The > > Gentech data model supports dividing your research into separate > > projects. Although they were seemingly designed for entirely self- > > contained areas of research, what's interesting is that these projects > > do not need to be self contained. It would be easy enough to use them > > as a way of dividing research into separate areas to which different > > settings could be applied. That way I can apply different settings to > > my English, Irish, German and J�rriais ancestors. > > >> More, I'm not sure you can call a rule with enough > >> exceptions to fit a "rule", fuzzy or not, because > >> eventually, fuzzy logic loses its logic. > > > Now you're just arguing about semantics. Would you be happier if I > > used the term "cultural norm" instead of "fuzzy rule"? Whatever you > > No, because I was thinking more of "I before E except ..." > which is a standard "rule" in English and American, but > which has so much verbiage after the 'except' that it is > all-but useless as a 'rule'. [see also, weird vs yield] The rule "'i' before 'e'" works around 90% of the time -- I know because I've just checked all the words in my dictionary. Add the exception "except after 'c'" and it improves quite a bit further. Whilst I think your comparison with English spelling has no validity whatsoever, I personally would welcome a genealogical rule that held well over 90% of the time. > > call them, they're potentially useful. I want my genealogy program to > > draw it to my attention if I have a sixty-year-old woman giving > > birth. Certainly it's not impossible, but it's sufficiently unusual > > that I want alerting to it. > > That's something most genealogy programs can be told to do. > If yours doesn't, try PAF or Legacy or FTM. > > OTOH, if you're wanting an event-based program, try The > Master Genealogist. www.whollygenes.com You evidently haven't read the first email in this thread. Had you done so you would see immediately why I can't use any of these programs. But don't let the facts stand in the way of you expressing your opinion: you certainly haven't done so far. Richard
On 2011-05-15 02:31, Wes Groleau wrote: > On 05-13-2011 03:11, Peter J. Seymour wrote: >> It is however in my experience rare for an event to be assigned more >> than one possible date. > > What do you mean by "assigned" ? I _often_ found competing claims in > primary sources for the date of an event. Try "found to have" or "has" for instance, possibly in conjunction with "apparently". > > I have also found different dates for an event reported BY the person > the event pertained to. Fair enough, if you are satisified they are the same event, the event has multiple dates pending further information. > > I found a date of death in a book allegedly of cemetery transcriptions > published by a genealogical society, then went to the cemetery and found > that the month was clearly legible on the stone and not the month stated > in the book. > > etc. > We probably all have simiar scenarios. It depends at what level I am recording information, but sometimes I "cheat" and don't record an identified wrong date (or at least only as a note not as a full-blown date), but then I am not doing it commercially. Peter
On Sat, 14 May 2011 21:58:37 -0400, singhals wrote: > Wes Groleau wrote: >> On 05-13-2011 03:11, Peter J. Seymour wrote: >>> It is however in my experience rare for an event to be assigned more >>> than one possible date. >> >> What do you mean by "assigned" ? I _often_ found competing claims in >> primary sources for the date of an event. >> >> I have also found different dates for an event reported BY the person >> the event pertained to. >> >> I found a date of death in a book allegedly of cemetery transcriptions >> published by a genealogical society, then went to the cemetery and >> found that the month was clearly legible on the stone and not the month >> stated in the book. >> >> etc. >> >> > And of course my GGF -- his hobby was lying about the year of his birth. > > If he hadn't held on to the same day-month, I'd never recognize him. > > Cheryl That would be similar to my Grandmother who actually celebrated 2 birthdays. The one she was 'actually' born on, and the one given to her by the Welsh Gypsies who abducted her as a child. My researches indicate that she was actually not abducted, but raised by an aunt who ran a 'baby farm' using the same forename but a different surname. At her marriage age 20 one Sarah Ann disappeared from public record and another with a different surname appeared listing a different father to the one on her birth certificate. Later censuses showed the aunt who raised her appeared as a visitor and the children of 'siblings' from the baby farm as nephews and nieces. I was able, using customizing, to accommodate all this in Gramps.
On 05-14-2011 21:58, singhals wrote: > Wes Groleau wrote: >> On 05-13-2011 03:11, Peter J. Seymour wrote: >>> It is however in my experience rare for an event to be assigned more >>> than one possible date. >> >> What do you mean by "assigned" ? I _often_ found competing claims in >> primary sources for the date of an event. >> >> I have also found different dates for an event reported BY the person >> the event pertained to. >> >> I found a date of death in a book allegedly of cemetery transcriptions >> published by a genealogical society, then went to the cemetery and found >> that the month was clearly legible on the stone and not the month stated >> in the book. >> >> etc. >> > > And of course my GGF -- his hobby was lying about the year of his birth. > > If he hadn't held on to the same day-month, I'd never recognize him. My GGGGGGM aged eight years from one census to the next. :-) I have a family history with the author's birthdate and those of half of his many siblings one day different from those his mother reported in Civil War Pension paperwork. -- Wes Groleau There are two types of people in the world … http://Ideas.Lang-Learn.us/barrett?itemid=1157
Wes Groleau wrote: > On 05-13-2011 03:11, Peter J. Seymour wrote: >> It is however in my experience rare for an event to be assigned more >> than one possible date. > > What do you mean by "assigned" ? I _often_ found competing claims in > primary sources for the date of an event. > > I have also found different dates for an event reported BY the person > the event pertained to. > > I found a date of death in a book allegedly of cemetery transcriptions > published by a genealogical society, then went to the cemetery and found > that the month was clearly legible on the stone and not the month stated > in the book. > > etc. > And of course my GGF -- his hobby was lying about the year of his birth. If he hadn't held on to the same day-month, I'd never recognize him. Cheryl
On 05-13-2011 03:11, Peter J. Seymour wrote: > It is however in my experience rare for an event to be assigned more > than one possible date. What do you mean by "assigned" ? I _often_ found competing claims in primary sources for the date of an event. I have also found different dates for an event reported BY the person the event pertained to. I found a date of death in a book allegedly of cemetery transcriptions published by a genealogical society, then went to the cemetery and found that the month was clearly legible on the stone and not the month stated in the book. etc. -- Wes Groleau There are two types of people in the world … http://Ideas.Lang-Learn.us/barrett?itemid=1157
Well, someone needs to come up with something pdq. Modern day 'partnerships' will drive future family tree people crazy... :-) Paul
Richard Smith wrote: > "singhals" wrote: >> Richard Smith wrote: > >>> My point is that fuzzy rules *such as* these are useful, not that >>> these specific rules are of universal applicability. In England, >> >> Many posters to this list/newsgroup aren't in England and/or >> don't deal with British research. > > English norms are the example I'm using because they're the ones I'm > most familiar with, and I don't imagine they're will be totally alien > to others on the newsgroup, even if they're not especially applicable > to their own research. But as I just said, rules *such as these* are > useful. The specific rules I was talking about may not be. > >> And, not everyone wants to re-jigger their program. > > Unfortunately, that's precisely the attitude that has resulted in the > current generation of programs, most of which are grossly inadequate. > The "one size fits all" approach doesn't work well, at least not for Lor' God Almighty. If the current one-size-fits-all approach isn't working there's no rational cause to believe any other one-size-fits-all approach (including yours) will work better. ANY approach works better for some folks than for others. NO approach works best for everyone. ALL approaches work for some. > something as complicated as genealogical research. To take an > example, I want my program to know that if I enter a date in 1720, > unless I tell it otherwise, it's in the Julian calendar, but if I > enter a date in 1820, it's a Gregorian date. That's something that > needs customising. In Britain, the calendar change happened in 1752; > in France, in 1582; in Russia, not until 1918. For many people. most > of the time, they'll be dealing with one country and having a default > makes sense; but others might be dealing with a family that moved > around frequently, and for them a default calendar change might be > more of a nuisance than a help. > Far's I can tell, most genealogy programs allow you to say what country you're in and none of 30 or so I've used or seen demonstrated prohibit the addition of OS/NS as appropriate. > I'm far more interested in using (and if necessary, writing) a program > that's flexible enough to cope with the subtleties and ambiguities > that crop up in real life, than something that's easy to use and has a > pretty interface. (That said, I don't necessarily see these as Of you think I prefer pretty over functional, you're sadly mistaken. If you're wanting to change globally the background color or your font color, go for it and God Bless. If you're proposing to permit font, font style and/or font color changes INDIVIDUALLY within a single database -- how can I help? But, IME, subtleties and ambiguities are not susceptible to automated analysis with any reliability, because sure as check the very next after you end testing will FUBAR and bite you on the sit-upon. > contradictory goals.) If I'm going to spend thousands of hours using > the software to organise my research, I certainly don't begrudge a > extra quarter-hour configuring it so that it's right for me. You're > right that not everyone will agree, but it's unlikely that an event- > oriented genealogy program, as opposed to the many existing lineage- > oriented ones, will appeal to them anyway. > Perhaps I've not yet had my weekly allotment of caffeine, but it seems to me you keep changing what you want to do. First you're wanting to write a program that's "better" than what's available. Then when we point out that "better" is fairly subjective, you say you're looking only to customize YOUR database and revert to writing code. Commercial computer programs are written to prevent novices and children from doing irreparable damage to the facts and getting the developer sued; they are rarely written for the purpose of facilitating experimentation by advanced users, because there is no way to ensure that ONLY advanced users use the product. >> Worse, in some cases, families >> combine cultures, norms, religions ... few would want a >> program that forces them to use a different program or >> dataset for each branch of the family. > > A good point, but one that I think can be solved easily enough. The > Gentech data model supports dividing your research into separate > projects. Although they were seemingly designed for entirely self- > contained areas of research, what's interesting is that these projects > do not need to be self contained. It would be easy enough to use them > as a way of dividing research into separate areas to which different > settings could be applied. That way I can apply different settings to > my English, Irish, German and Jèrriais ancestors. > >> More, I'm not sure you can call a rule with enough >> exceptions to fit a "rule", fuzzy or not, because >> eventually, fuzzy logic loses its logic. > > Now you're just arguing about semantics. Would you be happier if I > used the term "cultural norm" instead of "fuzzy rule"? Whatever you No, because I was thinking more of "I before E except ..." which is a standard "rule" in English and American, but which has so much verbiage after the 'except' that it is all-but useless as a 'rule'. [see also, weird vs yield] > call them, they're potentially useful. I want my genealogy program to > draw it to my attention if I have a sixty-year-old woman giving > birth. Certainly it's not impossible, but it's sufficiently unusual > that I want alerting to it. That's something most genealogy programs can be told to do. If yours doesn't, try PAF or Legacy or FTM. OTOH, if you're wanting an event-based program, try The Master Genealogist. www.whollygenes.com Cheryl
On 2011-05-13 12:08, Richard Smith wrote: > On May 13, 8:11 am, "Peter J. Seymour"<[email protected]> > wrote: > >> That is why in the Gendatam system an event can be standalone or linked >> to any number of people and/or evidence records and can have any number >> of dates. > > Thanks for pointing out the Gendatam data model. That's a new one on > me, and I'll definitely spend some time over the weekend reading up on > it. > >> It is however in my experience rare for an event to be >> assigned more than one possible date. > > I think it depends a bit on how you use it. Lots of sources -- > censuses, gravestones, marriage certificates to give a few examples -- > give the age of a person which allows you to infer the date of birth > to within a year. But it's not uncommon, in my experience at least, > for these ages to be wrong. Often it's unlikely that I'll find a > precise date of birth, but it's convenient if programs can display an > approximate date -- for example, so I can easily distinguish in a list > of names between John Smith (b c1650) and John Smith (b c1800). Most > software doesn't seem to do this automatically, so it's sometimes > worth adding birth events for them. But if you do, then you have to > deal with incompatible data. > > In the case of a birth, we know that a person is only born once ..... > > Richard Quite so. The other tricky one is absence of data such as failure to find a date at all when you have a name for someone. (My most common hurdle). Peter
On 2011-05-13 11:06, Ian Goddard wrote: > Peter J. Seymour wrote: >> It is however in my experience rare for an event to be assigned more >> than one possible date. > > I have one ancestor whose origin was quite difficult to pin down, > largely, I think, arising from age differences between herself & her > husband which lead to an element of concealment of the marriage. I was > originally working from the age calculated from the '41 census & her > stated age at death (even then understated by two years - maybe she > never told her husband the truth). In this case she died before the '51 > census otherwise I'd have had another view of the date. > > Where an individual is a collateral & it's not currently worth trying to > chase down a birth or baptism I have a few instances where I just have > multiple estimates of birth based on different census records. Even when > they agree they're multiple data points and, IMV, should be treated as > such. > I guess my tendency is to think if they are different dates they could refer to different people. So I have a weak group of a few individuals possibly the same person but with their own events. However, it depends a lot on the surrounding circumstances such as how common the name is in that particular setting.
singhals wrote: > Richard Smith wrote: >> On May 12, 7:49 pm, singhals<[email protected]> wrote: >> >>>> ... Having a fuzzy rule to tell you that it's >>>> normal to be baptised as a baby, while accepting that baptisms at all >>>> ages do occur, helps a computer assist you in finding that record. >>> >>> It's NORMAL to be baptised as a baby IF and ONLY IF: >>> 1) the child and parents are Christians >>> 2) the parents belong to a branch of Christianity that does >>> infant baptisms. >>> >>> Otherwise -- not normal. Hindus, Moslems, Taoists, >>> Buddhists, and Confucians do not baptise at all. Most Jewish >>> branches do not baptise. Baptists, Methodists, Disciples, >>> and a fistful of other denominations insist on "adult" >>> baptisms (with varying definitions of adult). >> >> My point is that fuzzy rules *such as* these are useful, not that >> these specific rules are of universal applicability. In England, > > Many posters to this list/newsgroup aren't in England and/or don't deal > with British research. > OTOH many of us are. And for us the assumption in US-sourced S/W that every address belongs in something styled a city doesn't fit well. The solution to both problems is a rules engine (executable code) and a separate rule-set which is data. That allows you to adapt the function to suit the situation without changing the code. Looking back on the 20 years I spent in S/W I moved more and more to that as a design approach and it served very well. >> certainly until a hundred years ago, it was normal to be baptised as a >> baby. Yes, there were plenty of religious groups that did not do so, %>< > > But, up until now, I wasn't seeing any attention paid to exceptions to > your fuzzy rule. And, not everyone wants to re-jigger their program. > Worse, in some cases, families combine cultures, norms, religions ... > few would want a program that forces them to use a different program or > dataset for each branch of the family. To illustrate the point I made above, indicate that the family was Baptist & an adult baptism rule-set could be applied. > More, I'm not sure you can call a rule with enough exceptions to fit a > "rule", fuzzy or not, because eventually, fuzzy logic loses its logic. And you don't have to create exemptions because you can substitute a different set of rules. Been there, done that - but with different types of rules, not fuzzy logic. The important thing is to realise you'll need to do it before you design the system. My last client before I retired wouldn't be told & built themselves a maintenance nightmare. -- Ian The Hotmail address is my spam-bin. Real mail address is iang at austonley org uk
Peter J. Seymour wrote: > It is however in my experience rare for an event to be > assigned more than one possible date. I have one ancestor whose origin was quite difficult to pin down, largely, I think, arising from age differences between herself & her husband which lead to an element of concealment of the marriage. I was originally working from the age calculated from the '41 census & her stated age at death (even then understated by two years - maybe she never told her husband the truth). In this case she died before the '51 census otherwise I'd have had another view of the date. Where an individual is a collateral & it's not currently worth trying to chase down a birth or baptism I have a few instances where I just have multiple estimates of birth based on different census records. Even when they agree they're multiple data points and, IMV, should be treated as such. -- Ian The Hotmail address is my spam-bin. Real mail address is iang at austonley org uk
On 2011-05-13 01:33, Richard Smith wrote: > On May 12, 7:49 pm, singhals<[email protected]> wrote: >> Richard Smith wrote: >>> Ian Goddard wrote: >>>> Richard Smith wrote: >>>>> RIF allows us to say "If a person was baptised on some date, then he >>>>> or she was born within the previous year", but as genealogists we >>>>> don't want rules like that. Our rules are much more fuzzy. We want >>>>> to say "If a person was baptised on some date, then, in the absence of >>>>> evidence to the contrary, he or she was probably born within the >>>>> previous year". >> >>>> Hmm. I think I'd go for something along the lines of "If the statement >>>> that a person was baptised on some date is true then that person was >>>> born on or before that date". > >> Other parts the posted suggestion can be misleading if you >> have (as I did) four brothers (A B C and D) who each had a >> son named A1-4 B1-4 C1-4 and D1-4; AND by some quirk, all >> the boys named A were born within a 16 month period around >> 1752 OS/NS, all those named B were born about 20-26 months >> later, etc. I needed the Will of the grandfather to sort >> them out. > > I'm sorry, but I'm really not seeing what this example is supposed to > illustrate. Yes, if you have four brothers called Tom, Dick, Harry > and George, each of whom had, at similar times, four sons called Tom, > Dick, Harry and George, then resolving it is going to be at best > complicated, and perhaps impossible. But how is this to do with the > suggestion that (effectively) baptism dates are normally a good proxy > for birth dates? I think I must be missing the point of your > argument. > > Richard I have been following this thread and honestly don´t think a computer program can replace the human brain. It must be superstition to believe that any automated rules can relieve anyone from a proper deep research. Instead of AI, I prefer HI (Human Intelligence). Kurt F
Richard Smith wrote: > On May 12, 7:49 pm, singhals<[email protected]> wrote: > >>> ... Having a fuzzy rule to tell you that it's >>> normal to be baptised as a baby, while accepting that baptisms at all >>> ages do occur, helps a computer assist you in finding that record. >> >> It's NORMAL to be baptised as a baby IF and ONLY IF: >> 1) the child and parents are Christians >> 2) the parents belong to a branch of Christianity that does >> infant baptisms. >> >> Otherwise -- not normal. Hindus, Moslems, Taoists, >> Buddhists, and Confucians do not baptise at all. Most Jewish >> branches do not baptise. Baptists, Methodists, Disciples, >> and a fistful of other denominations insist on "adult" >> baptisms (with varying definitions of adult). > > My point is that fuzzy rules *such as* these are useful, not that > these specific rules are of universal applicability. In England, Many posters to this list/newsgroup aren't in England and/or don't deal with British research. > certainly until a hundred years ago, it was normal to be baptised as a > baby. Yes, there were plenty of religious groups that did not do so, > but in England at that time, it was not normal to be a member of one > of those religions. In another country, or if the family you're > researching is predominately of a minority religion, then a different > set of fuzzy rules will apply. Maybe instead you need a fuzzy rule > saying that a child is probably 12-14 for their Bar Mitzvah. The > point is that for any given culture (whether national, religious or > local) there are certain norms that, whilst not exclusively kept to, > are a useful guideline. But, up until now, I wasn't seeing any attention paid to exceptions to your fuzzy rule. And, not everyone wants to re-jigger their program. Worse, in some cases, families combine cultures, norms, religions ... few would want a program that forces them to use a different program or dataset for each branch of the family. More, I'm not sure you can call a rule with enough exceptions to fit a "rule", fuzzy or not, because eventually, fuzzy logic loses its logic. Cheryl
On 2011-05-12 22:58, Ian Goddard wrote: > Richard Smith wrote: ..... > I can live with the notion that there are alternative alleged dates for > a given event and that the discrepancy is currently unresolvable. But > that can be another area in which existing data models fail. If a single > date is an attribute of the event object then the only solution is to > have multiple event objects for the same actual event which is an > unsatisfactory representation. The requirement is to be able to record > multiple dates against a single event object, preferably with some means > of recording the reliability. Or to have an object for the event > /record/ with a single date but to have an object of a separate class to > represent the underlying event to which the various record objects can > be linked. > That is why in the Gendatam system an event can be standalone or linked to any number of people and/or evidence records and can have any number of dates. It is however in my experience rare for an event to be assigned more than one possible date. The links are scoped as Gentech-style assertion records that help tie down probabilites. However, I should point out that this sort of thing can get rather complicated if it is pursued to its logical conclusion. Peter
"singhals" wrote: > Richard Smith wrote: > > My point is that fuzzy rules *such as* these are useful, not that > > these specific rules are of universal applicability. In England, > > Many posters to this list/newsgroup aren't in England and/or > don't deal with British research. English norms are the example I'm using because they're the ones I'm most familiar with, and I don't imagine they're will be totally alien to others on the newsgroup, even if they're not especially applicable to their own research. But as I just said, rules *such as these* are useful. The specific rules I was talking about may not be. > And, not everyone wants to re-jigger their program. Unfortunately, that's precisely the attitude that has resulted in the current generation of programs, most of which are grossly inadequate. The "one size fits all" approach doesn't work well, at least not for something as complicated as genealogical research. To take an example, I want my program to know that if I enter a date in 1720, unless I tell it otherwise, it's in the Julian calendar, but if I enter a date in 1820, it's a Gregorian date. That's something that needs customising. In Britain, the calendar change happened in 1752; in France, in 1582; in Russia, not until 1918. For many people. most of the time, they'll be dealing with one country and having a default makes sense; but others might be dealing with a family that moved around frequently, and for them a default calendar change might be more of a nuisance than a help. I'm far more interested in using (and if necessary, writing) a program that's flexible enough to cope with the subtleties and ambiguities that crop up in real life, than something that's easy to use and has a pretty interface. (That said, I don't necessarily see these as contradictory goals.) If I'm going to spend thousands of hours using the software to organise my research, I certainly don't begrudge a extra quarter-hour configuring it so that it's right for me. You're right that not everyone will agree, but it's unlikely that an event- oriented genealogy program, as opposed to the many existing lineage- oriented ones, will appeal to them anyway. > Worse, in some cases, families > combine cultures, norms, religions ... few would want a > program that forces them to use a different program or > dataset for each branch of the family. A good point, but one that I think can be solved easily enough. The Gentech data model supports dividing your research into separate projects. Although they were seemingly designed for entirely self- contained areas of research, what's interesting is that these projects do not need to be self contained. It would be easy enough to use them as a way of dividing research into separate areas to which different settings could be applied. That way I can apply different settings to my English, Irish, German and Jèrriais ancestors. > More, I'm not sure you can call a rule with enough > exceptions to fit a "rule", fuzzy or not, because > eventually, fuzzy logic loses its logic. Now you're just arguing about semantics. Would you be happier if I used the term "cultural norm" instead of "fuzzy rule"? Whatever you call them, they're potentially useful. I want my genealogy program to draw it to my attention if I have a sixty-year-old woman giving birth. Certainly it's not impossible, but it's sufficiently unusual that I want alerting to it. Richard