Richard Smith wrote: > I've spent years looking for decent genealogy software that suits my > needs, and I'm almost at the stage of giving up and writing my own. > However, before I do that, I thought I'd ask on this newsgroup whether > anyone has any suggestions of suitable software. > > Most products I've tried are far too lineage-oriented. That's perhaps > okay for storing the results of my research, but that's not what I'm > after. I want something much more event-oriented that can store the > research itself. I want to record that I found John Smith on the 1881 > census, two plausible John Smiths on the 1851 census, and three > possible baptisms. I want to be able to record what the record says, > not what I think it probably means, including the different spellings > used in different sources. Although that seems a reasonable enough > requirement, a lot of products make it hard to use them like that. > Entering census data is often particularly tedious. > > If I only wanted to do that, I'd probably just use a spreadsheet. But > I also want an application that can let me say that I currently > believe the John Smith on the 1881 census is the same person as the > John Smith who was listed in the 1851 census on North Street, not the > one on South Street, and that I don't believe this person is the same > as any of the baptisms. And I'd like to be able to do this in a way > that's easy to change when new evidence comes to light. This seems > very hard in most of the products I've tried, and nigh-on impossible > for negative assertions like "the John Smith on the 1881 census was > not either of three baptisms found". I've also never found software > that can cope satisfactorily with relationships more complicated than > simple parent-child ones. For example, I would like to be able to say > "John was the grandson of Thomas, and probably the son of Thomas's son > Henry, though possibly an illegitimate son of Thomas's daughter > Sarah". That's certainly something that a computer program ought to > be able to handle in a structured fashion, but, again, I've never > found one that can. > > My second requirement is that the software runs on Linux and doesn't > require me to be connected to the Internet. (So a web-based program > is fine, but only if I can install it locally.) If it were open > source, that would be an added bonus, but it's not a requirement. My > only other requirement is that the program must be able to export its > database in some vaguely usable format and re-import it again. It's > probably best if it's not GEDCOM because I doubt GEDCOM will map > cleanly enough to the sort of concepts the program needs, but some XML > format (even if it's undocumented) would be perfect. > > I'm not aware of anything that comes close to this. Even without the > requirements that it runs on Linux and has an export format, I'm not > aware of anything, and that strikes me as surprising. Surely my first > requirement is just basic good practice? And whilst I'm sure that a > lot of research is not done to particularly good standards, surely > most software vendors must be familiar with what good research > entails? So I'm really hoping that someone will be able to point me > towards some really good piece of software that I've somehow > overlooked. > > Any suggestions or comments gratefully received! > > Richard I've read thru this thread, and I wonder if there will be any program, including your own, that will fullfil all things that have been asked here. It seems to me that you're after a kind of logging system, rather than a genealogy program. But you want to define persons, places, events, relations, sources and even dates (and a few more?) as entities and have n:n relations (in relational database speech) between all of them, including recursive relationships on all of them. So, is there then any real structure in the data?? Only in the way, as an example, that a date cannot appear in the place of a link where one would expect a person reference. But other than that? Can such program be built? Yes, but the degree of freedom you want assures you to frequent (and some substantial) changes to be applied. And changing the program is one thing, but assuring your data survive those changes is another story, where ultimate care will be needed. In the end, you might have to decide on what to spend most of your time: coding and maintaining your data, or doing proper research. As a side note: There has been discussions about hierarchy of places. I think trying to register such things is a bad idea in the first place, because such "relationships" have been so volatile in history. The only indication one could give that would remain consistent is something like : place X is part of (located nearby, ....) Y in 2011. Even geographical coordinates are no good, since villages etc. have moved in the run of time. -- Veel mensen danken hun goed geweten aan hun slecht geheugen. (G. Bomans) Lots of people owe their good conscience to their bad memory (G. Bomans)
On May 16, 9:26 am, [email protected] wrote: > I've read thru this thread, and I wonder if there will be any program, > including your own, that will fullfil all things that have been asked here. Lots of very good points here. Will I succeed in producing a usable system that'll include all of the features discussed here? I've no idea: quite possibly not. But equally, with judicious reuse of existing technology and as long as I accept it's not something I can knock up in a spare weekend, I don't see why I can't have a jolly good go at it. Also, the process of implementing it will, I think, teach me a lot about the difficulties in such a system. > It seems to me that you're after a kind of logging system, rather than a > genealogy program. But you want to define persons, places, events, > relations, sources and even dates (and a few more?) as entities Characteristics (or attributes if you prefer) are the main one you've missed. That's for things like occupation, and possibly sex (though in my mind that may be handled separately). > and have n:n > relations (in relational database speech) between all of them, including > recursive relationships on all of them. More or less. So one way of implementing it is a pure RDF-like mapping. For the purpose of this discussion, let's call a person / place / event / date / etc. an entity. We could have a table that simply mapped entity to entity, maybe with a field (such as the RDF predicate) that specifies what the mapping represents. In this model, we might map a person to an event, and the predicate field would tell us the person's role in the event. That's a very abstracted view and means that the database has little or structure -- all of the structure must be built on top of it at the application layer. The most obvious possibility is to have separate tables for each of the entity types, and then provide mapping tables for each possible many-to-many relationship (and simple reference fields for any many-to- one fields). This is closer to what traditional genealogy applications do. The GDM solution is intermediate in that it retains tables for each entity type, but still uses a single assertion table for (almost) all of the entity-to-entity mappings. Each of these approaches has merit. Although the RDF-like approach is very abstract and would need a lot building on top of it, it would have the advantage that existing RDF tools could be used, for example, in the reasoning parts of the code. The traditional one table per entity and per mapping approach is certainly the simplest at the database-level, as long as the range of tables needed doesn't get too large. The GDM approach has the advantage of being a de facto (if unimplemented standard), seems very well thought out, and is likely to offer good interoperability with any other programs that do something along these lines. > So, is there then any real structure in the data?? Only in the way, as an > example, that a date cannot appear in the place of a link where one would > expect a person reference. But other than that? The structure is largely built on top of it. One such way is through data schemas that are in further database tables. As an example, I'm proposing that a person's name is simply a collection of textual parts. So, for example, "Richard Andrew Smith" can be parsed as two given names and a surname; or I may choose to distinguish "Richard", the name I actually use, from "Andrew" which I never use. So the Name table contains very little at all (though in practice, it may contain a display name generated from the separate name parts. The Name table would link to a NamePart table with three rows, one for each of "Richard", "Andrew" and "Smith". There would be a NamePartSchema table containing "surname" and "given name". There may possibly be a NameSchema table which allows me to say that typically in Western society, people have one or more given names followed by a surname. Of these four tables, the Name and NamePart tables are where the data lives; the NameSchema and NamePartSchema tables are for encoding rules about the data -- things that might seem tempting to encode in the database schema itself, but where doing so would effectively result in singling out one set of cultural norms as "right" to the detriment of others. Even to an English person, given names and surnames are not sufficient to encode what a source actually says. The source may say "Mrs Jones" or "James Johnson, junior". These extra styles ("Mrs", "junior") are very helpful to us: in recent times "Mrs" would tell us the woman is married; and "junior" tells us that there is probably a father (or other older male relative) with the same name still alive. But the information conveyed by these styles is culturally-dependent. Longer ago in England, "Mrs" was an indication of social status, and in the US "junior" tends not to imply a living father. The latter is an important distinction -- in England we'd probably want the default display name for the individual (i.e. the name displayed on a tree or in an listing) to simply be "James Johnson", but in the US, we're more likely to want "James Johnson, junior". In other countries the differences are more profound. Icelandic patronymics are different from surnames in that a child does not inherit the father's patronymic, but gets one formed from his given name. Icelandic patronymics, like surnames in some of Eastern Europe -- Poland for example, vary depending on the sex of the child: "-son" becomes "-dóttir", or "-ski" becomes "-ska". That's a further level of complexity that could be built on top of the datbase's name schemas. In practice, implementing this is unlikely to be a priority for me as I'm not studying any families where such complexities apply. Nevertheless, I'd certainly want to have given some thought to how I might implement it (not in the least because, as recently as the mid-nineteenth century, I have Prussian ancestry, and it doesn't take much imagination to believe I might have Polish or Russian ancestry if I go back a few generations further). > Can such program be built? Yes, but the degree of freedom you want assures > you to frequent (and some substantial) changes to be applied. And changing > the program is one thing, but assuring your data survive those changes is > another story, where ultimate care will be needed. That's very true, and I certainly don't underestimate the problems there. But I do think the problems are manageable. And taking frequent database backups (or XML exports, or whatever) mitigates the risk somewhat. > In the end, you might have to decide on what to spend most of your time: > coding and maintaining your data, or doing proper research. I'm more or less reaching the point where I'm unlikely to discover many more ancestors through on-line research. Yes, there's a gradual trickle of new sources becoming available on-line, and from time to time, I get new leads from them. I can certainly discover new side branches on-line, but my interest in that is rather less. So most new research these days involves a trip to a records office. That's not something I can do after work, or on the spur of the moment if I get a free few hours on a weekend. But I can spend that time coding and organising my data, and, in any case, that's something I enjoy doing. So yes, it'll use up a lot of time, but probably not time that I could have spent doing genealogical research. > As a side note: > There has been discussions about hierarchy of places. I think trying to > register such things is a bad idea in the first place, because such > "relationships" have been so volatile in history. The only indication one > could give that would remain consistent is something like : place X is part > of (located nearby, ....) Y in 2011. Even geographical coordinates are no > good, since villages etc. have moved in the run of time. I don't disagree that these problems exist, but I do disagree that it's a reason not to try to encode it. I'm not sure whether you're familiar with Phillimore's Atlas and Index of Parish Registers. (If you're area of interest doesn't include England, you probably won't be.) It's a very useful map of the parishes of Britain, how they border each other, roughly where they are, and other useful information such as where the parish registers are and when they date to. Until Victorian times, boundary changes and the creation or amalgamation of parishes were really quite rare. In most cases, the parish boundaries in 1530 were still the same in 1830, and it's these boundaries that Phillimore's depicts. Since then, certainly, there's been a flurry of changes, but I don't think that would necessary devalue something like an electronic version of Phillimore's. Richard
[email protected] wrote: > Richard Smith wrote: > >> I've spent years looking for decent genealogy software that suits my >> needs, and I'm almost at the stage of giving up and writing my own. >> However, before I do that, I thought I'd ask on this newsgroup whether >> anyone has any suggestions of suitable software. >> >> Most products I've tried are far too lineage-oriented. That's perhaps >> okay for storing the results of my research, but that's not what I'm >> after. I want something much more event-oriented that can store the >> research itself. I want to record that I found John Smith on the 1881 >> census, two plausible John Smiths on the 1851 census, and three >> possible baptisms. I want to be able to record what the record says, >> not what I think it probably means, including the different spellings >> used in different sources. Although that seems a reasonable enough >> requirement, a lot of products make it hard to use them like that. >> Entering census data is often particularly tedious. >> >> If I only wanted to do that, I'd probably just use a spreadsheet. But >> I also want an application that can let me say that I currently >> believe the John Smith on the 1881 census is the same person as the >> John Smith who was listed in the 1851 census on North Street, not the >> one on South Street, and that I don't believe this person is the same >> as any of the baptisms. And I'd like to be able to do this in a way >> that's easy to change when new evidence comes to light. This seems >> very hard in most of the products I've tried, and nigh-on impossible >> for negative assertions like "the John Smith on the 1881 census was >> not either of three baptisms found". I've also never found software >> that can cope satisfactorily with relationships more complicated than >> simple parent-child ones. For example, I would like to be able to say >> "John was the grandson of Thomas, and probably the son of Thomas's son >> Henry, though possibly an illegitimate son of Thomas's daughter >> Sarah". That's certainly something that a computer program ought to >> be able to handle in a structured fashion, but, again, I've never >> found one that can. >> >> My second requirement is that the software runs on Linux and doesn't >> require me to be connected to the Internet. (So a web-based program >> is fine, but only if I can install it locally.) If it were open >> source, that would be an added bonus, but it's not a requirement. My >> only other requirement is that the program must be able to export its >> database in some vaguely usable format and re-import it again. It's >> probably best if it's not GEDCOM because I doubt GEDCOM will map >> cleanly enough to the sort of concepts the program needs, but some XML >> format (even if it's undocumented) would be perfect. >> >> I'm not aware of anything that comes close to this. Even without the >> requirements that it runs on Linux and has an export format, I'm not >> aware of anything, and that strikes me as surprising. Surely my first >> requirement is just basic good practice? And whilst I'm sure that a >> lot of research is not done to particularly good standards, surely >> most software vendors must be familiar with what good research >> entails? So I'm really hoping that someone will be able to point me >> towards some really good piece of software that I've somehow >> overlooked. >> >> Any suggestions or comments gratefully received! >> >> Richard > > I've read thru this thread, and I wonder if there will be any program, > including your own, that will fullfil all things that have been asked here. > > It seems to me that you're after a kind of logging system, rather than a > genealogy program. But you want to define persons, places, events, > relations, sources and even dates (and a few more?) as entities and have n:n > relations (in relational database speech) between all of them, including > recursive relationships on all of them. > So, is there then any real structure in the data?? Only in the way, as an > example, that a date cannot appear in the place of a link where one would > expect a person reference. But other than that? > > Can such program be built? Yes, but the degree of freedom you want assures > you to frequent (and some substantial) changes to be applied. And changing > the program is one thing, but assuring your data survive those changes is > another story, where ultimate care will be needed. > In the end, you might have to decide on what to spend most of your time: > coding and maintaining your data, or doing proper research. > > As a side note: > There has been discussions about hierarchy of places. I think trying to > register such things is a bad idea in the first place, because such > "relationships" have been so volatile in history. The only indication one > could give that would remain consistent is something like : place X is part > of (located nearby, ....) Y in 2011. Even geographical coordinates are no > good, since villages etc. have moved in the run of time. > Thank you, Hermann! Cheryl
[email protected] wrote: > > I've read thru this thread, and I wonder if there will be any program, > including your own, that will fullfil all things that have been asked here. > > It seems to me that you're after a kind of logging system, rather than a > genealogy program. But you want to define persons, places, events, > relations, sources and even dates (and a few more?) as entities and have n:n > relations (in relational database speech) between all of them, If you model data in entity-relational terms most of these things have to be seen as entities. And in some cases the relationships /are/ n:n. In an RDBMS implementation a link table is a good way of representing an n:n relationship and once one start's thinking in such terms it becomes clear that such a table can hold more than just link information which is why relationships start to emerge as entities in their own right. If, OTOH, you model data in OO terms you would model them as objects and provide classes for them. And again in such a model some of the inter-class associations are again n:n. "even dates"? Historical dates are a real problem. Are we talking Gregorian or Julian? What's the start of the year, January 1 or Lady day? What year is 3 Elizabeth? How do "Early nineteenth century", 1820 and "First quarter nineteenth century" collate - a problem which has put me off trying to devise a historical date data type for my favourite RDBMS. At least the OO approach enables you to make some of this explicit by providing such attributes to dates - I don't know if your favoured genealogy program does this but Gramps certainly does. > including recursive relationships on all of them. Not all. However if you think about some of the entities/objects/whatever you want to call them, they have an internal structure which is hierarchical and the hierarchy differs from one instance to the next. For instance one record might come from the parish register of Dunny-on-the-Wold which is in the Dunshire archives. Another might be in a book of parish register transcriptions which was published in 1900 by the records section of the Wolds Archaeological Society. Another might be a page reference of one of several books by an author which is one of many books published by a publisher. Another may be a paper in a journal.... You get the picture. They're all similar in nature and different in detail and even in depth. And parts of any hierarchy might be re-used - different pages from the same book, different books by the same author, etc. And a good way of dealing with this is to use a recursive model. That means that you don't have to keep lots of copies of the title of the archive etc. It also means that you can have the option of linking the PR directly to the archive but another document to a particular collection which has its own distinct identity within the archive. > So, is there then any real structure in the data?? Only in the way, as an > example, that a date cannot appear in the place of a link where one would > expect a person reference. But other than that? Yes, of course there's structure. It just makes a lot of sense to use a data model flexible enough to fit the actual data rather than force fit the data onto a preconceived model. For instance I know of no ancestor of mine who lived in a city. Why should I have to warp their data to re-use a "city" field as something else & then how does this warped data sit properly alongside that of some of my wife's ancestors who did live in a city? > > Can such program be built? Yes, but the degree of freedom you want assures > you to frequent (and some substantial) changes to be applied. And changing > the program is one thing, but assuring your data survive those changes is > another story, where ultimate care will be needed. Nope. You end up having to make such changes because you didn't think it through in the first place. If you plan for flexibility you can code for it. This is a non-genealogical example but one taken from real-life: A client of mine was in the secure printing business - partly in printing the stationery (think cheques, for instance) and partly in digital printing /on/ the stationery (think printing the payee, amounts, etc. on the cheques). Eventually they got a contract for a much more complex document type than they'd handled previously, in fact the contract called for two different documents. The data would arrive as XML, also a first for them. Clearly this was a trend - XML would be the basis for contracts in the future. I suggested using XSL (a rules engine, in effect) to rewrite the data so that stuff likely to be in all such contracts - despatch addresses, due dates, etc. be taken out into an XML structure and vocabulary specific to my clients whilst the document specific bit be left unchanged but wrapped up in a specific place within this new structure. The next stage would be hard-coded to take apart the structure, put the house-keeping info into a database but the document-specific XML fragments would be stored unchanged in a text field in the database. For a print run the relevant fragments of XML would be grabbed from the database, strung together in a new XML file and run through another XSLT to convert them into the form required to drive the printer. This was extremely re-useable - only the XSL stylesheets would need to be changed for new contracts. The clients wouldn't have it at all. They insisted on having a database design that exactly reflected the printed document which also resulted in having code which matched the database to construct the print file. I did the front-end as I'd planned it but instead of storing the XML fragments I ran them through an XMLT & macro processor to generate the SQL to stuff the data into the document-specific database. Not only did this end up with two different databases and two different programs to handle the two documents but it also resulted in a maintenance nightmare as the main document changed over the life of the contract. For all I know they're still doing that. The next contract started off with many more document types. Storing the XML was too much for the clients but I did get them to agree to a half-way house. Instead of generating SQL I generated print-file fragments (essentially what would have been the second transform of my original scheme) and stored those in a text field in the database. As the contract rolled on we accommodated more document types & changes to the originals and we re-used the program itself largely as it stood for a second contract. All we had to do was keep writing more stylesheets and printer scripts, which we'd have had to do anyway. The tweaks to actual program code were only to do with changes to the way work was organised in the factory. That's the difference between planning for flexibility and no doing so. %>< > There has been discussions about hierarchy of places. I think trying to > register such things is a bad idea in the first place, because such > "relationships" have been so volatile in history. The only indication one > could give that would remain consistent is something like : place X is part > of (located nearby, ....) Y in 2011. Even geographical coordinates are no > good, since villages etc. have moved in the run of time. > By now it should be clear how to treat this. You recognise /in advance/ that the hierarchies will be time-dependent and make provision for optional start and finish dates. You also recognise that a particular place may be simultaneously in different hierarchies, e.g. ecclesiastical (even different ecclesiastical hierarchies, such as different Anglican & RC parishes), manorial, Poor Law. You adopt a data model that fits and then code to that. I can envisage a system with several aspects: - The genealogical data itself. - Standing data such as location information. - Rules such as the fuzzy logic which Richard mentioned. - A shared data model to describe the above. - Code to handle them. This leaves scope for different S/W vendors and open source teams to provide the last part. It also provides scope for specialists to provide shared standing data or shared rules. It even, in an ideal world, provides scope for archive sites such as A2A to export data in a useable form. And it provides scope for users such as you and I to explore that data and to find the family relationships which hide within it. -- Ian The Hotmail address is my spam-bin. Real mail address is iang at austonley org uk