RootsWeb.com Mailing Lists
Total: 3/3
    1. Re: GEDOM as a database format
    2. Ian Goddard
    3. T.M. Sommers wrote: > > I fail to see any technological advantages of XML over GEDCOM. Their > syntax is different, but their semantics are essentially the same: both > are lists of trees. The advantage of XML is the range of tools it provides. Given an XMLT engine such as Saxon you could transform your XML output into HMLT by writing a suitable stylesheet. Or into FO for a pretty-printer report. Or into an SVG file for a diagram. > > The availability of parsing tools for XML is not a real advantage. You > still have to write the callbacks, and that is where most of the work > is. A GEDCOM line is trivial to parse: it is either LEVEL TAG DATA or > LEVEL XREF TAG. That's for a SAX parser. I've used the Woods SAX parser in Delphi. The handler and call-backs are small compared to the volume of code which is simply reused. That was some time ago. I've also used XMLT to transform the XML into a series of SQL statements which could be presented directly to RDMBS engine. If I were doing this now I'd also consider the possibility of using an engine which can parse the XML directly and load it into the database. > > GEDCOM allows users > to define their own tags, and I don't know if XML validating parsers can > handle that. > The main advantage of a validating parser is to prevent just that! In any thread about GEDCOM one usually finds complaints about limited inter-operability resulting from one package's not understanding anther's tags. One way to handle optional stuff like that in XML would be to use attributes, e.g. <OptionalData type="CauseOfDeath">Drowning</OptionalData> -- Ian Hotmail is for spammers. Real mail address is igoddard at nildram co uk

    12/18/2007 04:24:48
    1. Re: GEDOM as a database format
    2. T.M. Sommers
    3. Ian Goddard wrote: > T.M. Sommers wrote: >> >> I fail to see any technological advantages of XML over GEDCOM. Their >> syntax is different, but their semantics are essentially the same: >> both are lists of trees. > > The advantage of XML is the range of tools it provides. Given an XMLT > engine such as Saxon you could transform your XML output into HMLT by > writing a suitable stylesheet. Or into FO for a pretty-printer report. > Or into an SVG file for a diagram. Perhaps I should have been more explicit and said that I see no advantages of XML over GEDCOM for the uses to which GEDCOM is put, namely passing data back and forth between genealogical programs. You seem to want to use XML as some sort of reporting language for your own database. I suppose it would work for that, but why bother? If it's your own program, you have direct access to the data, and can directly create any kind of report you want, without the hassle of first converting everything to XML first.. >> The availability of parsing tools for XML is not a real advantage. >> You still have to write the callbacks, and that is where most of the >> work is. A GEDCOM line is trivial to parse: it is either LEVEL TAG >> DATA or LEVEL XREF TAG. > > That's for a SAX parser. DOM, too. > I've used the Woods SAX parser in Delphi. The > handler and call-backs are small compared to the volume of code which is > simply reused. No smaller than the comparable code to handle GEDCOM would be. >> GEDCOM allows users to define their own tags, and I don't know if XML >> validating parsers can handle that. > > The main advantage of a validating parser is to prevent just that! In > any thread about GEDCOM one usually finds complaints about limited > inter-operability resulting from one package's not understanding > anther's tags. The main complaint, I think, is that programs do not implement all of standard GEDCOM. There is no reason to believe that that would change if GEDCOM were replaced by XML. And it is a bit ironic to use the *Extensible* Markup Language to prevent genealogical programs from extending their data-transfer language. > One way to handle optional stuff like that in XML would be to use > attributes, e.g. > > <OptionalData type="CauseOfDeath">Drowning</OptionalData> Ugh. This is better than 2 CAUS Drowning how? -- Thomas M. Sommers -- tms@nj.net -- AB2SB

    12/21/2007 08:46:00
    1. Re: GEDOM as a database format
    2. Ian Goddard
    3. T.M. Sommers wrote: > Ian Goddard wrote: > > Perhaps I should have been more explicit and said that I see no > advantages of XML over GEDCOM for the uses to which GEDCOM is put, > namely passing data back and forth between genealogical programs. But the purpose for which GEDCOM was designed, as another poster has made clear, was more restricted than that - it was to pass data to and from the LDS database and nothing more. That's why the definition only includes data elements of interest to LDS. It's other programs which have latched onto it and, AIUI, added their own private extensions. Let me take one of your points out of sequence to elaborate on this: > > The main complaint, I think, is that programs do not implement all of > standard GEDCOM. There is no reason to believe that that would change > if GEDCOM were replaced by XML. > > And it is a bit ironic to use the *Extensible* Markup Language to > prevent genealogical programs from extending their data-transfer language. > >> One way to handle optional stuff like that in XML would be to use >> attributes, e.g. >> >> <OptionalData type="CauseOfDeath">Drowning</OptionalData> > > Ugh. This is better than > > 2 CAUS Drowning > > how? > Fair enough. Having given the choice of example a couple of seconds thought I ended up with something for which GEDCOM has a tag and for which an XML format would probably also have a standard element. So let's try something for which it doesn't, criminal conviction. Or military service. Or a rite of passage in a religion other than Christianity or Judaism. Or manumission. Or the charter and manorial court references to which we have to resort when we get back beyond parish registers. AIUI the response of genealogy S/W developers is to coin their own tags. Which is fine if we want to pass data between users of that particular program. Not so fine if users of two programs want to exchange data or a user wants to migrate to another program. What should a program do if it encounters an unknown tag? Discard it silently? Reject the structure to which it belongs? Reject the whole file? The problem is that the unknown tag is part of the structure and has to be recognised by the parser. In XML we would have an element such as <OptionalData> with an attribute such as "type". Both of these would be part of the structure and _any_ program using the schema as import would recognise them. The value of the attribute and the element would probably be represented internally as a name/value pair in some way. The essential point is that the program doesn't have to recognise the value of the type attribute in the same way that it would have to recognise the non-standard tag in a GEDCOM style situation. > You seem to want to use XML as some sort of reporting language for your own > database. I suppose it would work for that, but why bother? If it's > your own program, you have direct access to the data, and can directly > create any kind of report you want, without the hassle of first > converting everything to XML first.. > No what I want is something which doesn't start off by being designed to fill a specific role and has to be stretched in an ad hoc manner to do related stuff. I have no quarrel with GEDCOM not being open-ended in its abilities. It was designed for a purpose and it fulfills it. But that purpose is essentially to collect sequences of names and dates and some LDS specific stuff. Surely we can aspire to more than that? >>> The availability of parsing tools for XML is not a real advantage. >>> You still have to write the callbacks, and that is where most of the >>> work is. A GEDCOM line is trivial to parse: it is either LEVEL TAG >>> DATA or LEVEL XREF TAG. >> >> That's for a SAX parser. > > DOM, too. > The Delphi app to which I referred used a SAX parser for which I had to write event handlers which built up an object which was then passed to a single call back. It also used the MS DOM for which I had to write no event handler - I just used it as an XSLT engine. (If it sounds odd using two parsers in tandem it was because the raw XML documents were so large that the DOM would have run out of memory. The SAX parser could chomp its way through the documents chopping it up into bite-size chunks to feed to the DOM.) -- Ian Hotmail is for spammers. Real mail address is igoddard at nildram co uk

    12/22/2007 02:23:00