RootsWeb.com Mailing Lists
Total: 2/2
    1. Re: GEDcOM as a database format
    2. Tony Proctor
    3. "Dennis Lee Bieber" <wlfraed@ix.netcom.com> wrote in message news:13mrvtst5hh523c@corp.supernews.com... > On Sun, 23 Dec 2007 04:33:37 GMT, JD <jd4x4@<del.this>verizon.net> > declaimed the following in soc.genealogy.computing: > > > > > I personally wouldn't want software that ignored something and didn't at > > least provide me with a means of dealing with it. But if it did, as you > > say it can be transformed with xslt, which is exactly what I did with my > > publishing source data. I had several sources each with slightly > > XSLT still requires a known source and destination format; it can't > take an unknown source tag and create a known destination tag with > meaning... Maybe it can produce some sort of blanket output for unknown > tags, but that will quite likely not be a reversible transformation. > > > differing source schemas, but all of the data was related and I created > > my own "final" schema from them, for the use I required. And, the > > You "created"... The software didn't derive a consistent schema... > > Who "creates" the schema and transforms for all the many programs > that currently exists? > > > > > See the above. The "standardization" you refer to doesn't mean that the > > data has been changed, only reordered.. to YOUR schema. Software doesn't > > have to do that alone. YOUR schema can be YOUR ordering of the data. > > > To me, said ordering requires prior knowledge of what the meaning of > various tags IS... What if "my" data considers "fourth flood of the > river <x> in the reign of <y>" to be acceptable as a date (okay, even > TMG would consider that a very irregular date). How would your software > treat something that output such as "<date>....</date>", vs > "<date><month>...</month><day>...</day><year>...</year></date>" > > Besides... I'm buying the software to handle the genealogical data > and reporting... I'm not writing my own package in which I have the > option of defining transforms into what I think should be used... Unless > all the producers of said software all agree on what is valid data, > commercial software will not be able to /losslessly/ accept the data of > others. > > > > > And should you be in a position (as I was) to need a third application > > for the data that would have a somewhat different output... you could > > then define your own schema and validate against it. > > > How many weekend genealogists are going to even know what an XML > transform is, much less write one to handle one source of data? > > > So we don't need to perpetuate that by not providing a mechanism that > > could facilitate automation, imo. > > Well, we could insist that all extant genealogy programs be modified > to refuse to accept any data entry that doesn't have some sort of source > citation, even if it is nothing more than "personal knowledge of <xyz>" > -- > Wulfraed Dennis Lee Bieber KD6MOG > wlfraed@ix.netcom.com wulfraed@bestiaria.com > HTTP://wlfraed.home.netcom.com/ > (Bestiaria Support Staff: web-asst@bestiaria.com) > HTTP://www.bestiaria.com/ XML is touted as some sort of panacea. It is an improvement on the plethora of data formats (in all IT areas) that existed previously, but it has to be understood for what it is. It is merely a standardised syntax for representing hierarchical data. That standardisation therefore only applies to the syntax, not to the semantics. What this means, in layman speak, is that any XML file is instantly recognisable as "XML" but it doesn't make the content any more understandable. Sure, there are lots of tools for loading/viewing/manipulating XML but they only know about the syntax, not the semantics. Yes, you can write your only XSLT (which I have to say is an awful language) but all those transformations would be doing is manipulating the syntax, e.g. removing stuff, moving stuff around, extracting stuff, etc. In principle, this would all be possible with any documented data format, including GEDCOM, at the expense (& risk) of having to write a little more of the necessary software yourself. I firmly believe that a "data model" has to be defined and accepted first. This subject has come up several times in this group, and links have been posted here about ongoing projects striving to achieve this. Once such a data model specification exists then representation of it in any data format (XML, GEDCOM, some other) is almost a mechanical operation Tony Proctor

    12/24/2007 04:53:59
    1. Re: GEDcOM as a database format
    2. singhals
    3. Tony Proctor wrote: > "Dennis Lee Bieber" <wlfraed@ix.netcom.com> wrote in message > news:13mrvtst5hh523c@corp.supernews.com... > >>On Sun, 23 Dec 2007 04:33:37 GMT, JD <jd4x4@<del.this>verizon.net> >>declaimed the following in soc.genealogy.computing: >> >> >>>I personally wouldn't want software that ignored something and didn't at >>>least provide me with a means of dealing with it. But if it did, as you >>>say it can be transformed with xslt, which is exactly what I did with my >>>publishing source data. I had several sources each with slightly >> >>XSLT still requires a known source and destination format; it can't >>take an unknown source tag and create a known destination tag with >>meaning... Maybe it can produce some sort of blanket output for unknown >>tags, but that will quite likely not be a reversible transformation. >> >> >>>differing source schemas, but all of the data was related and I created >>>my own "final" schema from them, for the use I required. And, the >> >>You "created"... The software didn't derive a consistent schema... >> >>Who "creates" the schema and transforms for all the many programs >>that currently exists? >> >> >>>See the above. The "standardization" you refer to doesn't mean that the >>>data has been changed, only reordered.. to YOUR schema. Software doesn't >>>have to do that alone. YOUR schema can be YOUR ordering of the data. >>> >> >>To me, said ordering requires prior knowledge of what the meaning of >>various tags IS... What if "my" data considers "fourth flood of the >>river <x> in the reign of <y>" to be acceptable as a date (okay, even >>TMG would consider that a very irregular date). How would your software >>treat something that output such as "<date>....</date>", vs >>"<date><month>...</month><day>...</day><year>...</year></date>" >> >>Besides... I'm buying the software to handle the genealogical data >>and reporting... I'm not writing my own package in which I have the >>option of defining transforms into what I think should be used... Unless >>all the producers of said software all agree on what is valid data, >>commercial software will not be able to /losslessly/ accept the data of >>others. >> >> >>>And should you be in a position (as I was) to need a third application >>>for the data that would have a somewhat different output... you could >>>then define your own schema and validate against it. >>> >> >>How many weekend genealogists are going to even know what an XML >>transform is, much less write one to handle one source of data? >> >> >>>So we don't need to perpetuate that by not providing a mechanism that >>>could facilitate automation, imo. >> >>Well, we could insist that all extant genealogy programs be modified >>to refuse to accept any data entry that doesn't have some sort of source >>citation, even if it is nothing more than "personal knowledge of <xyz>" >>-- >>Wulfraed Dennis Lee Bieber KD6MOG >>wlfraed@ix.netcom.com wulfraed@bestiaria.com >>HTTP://wlfraed.home.netcom.com/ >>(Bestiaria Support Staff: web-asst@bestiaria.com) >>HTTP://www.bestiaria.com/ > > > XML is touted as some sort of panacea. It is an improvement on the plethora > of data formats (in all IT areas) that existed previously, but it has to be > understood for what it is. It is merely a standardised syntax for > representing hierarchical data. That standardisation therefore only applies > to the syntax, not to the semantics. What this means, in layman speak, is > that any XML file is instantly recognisable as "XML" but it doesn't make the > content any more understandable. > As examples of the obvious in Tony's last sentence: I may _recognize_ that a letter is written in Hindi or Russian without having ANY clue what the letter is about. I may even recognize that the paper upon which the letter is written is high-ticket paper but I don't necessarily know from which company or how high the ticket was. Notes from some of Einstein's (or Hawking's) research log are instantly recognizable as being written in plain-text, but not too many of us actually _understand_ most of the work being logged. > Sure, there are lots of tools for loading/viewing/manipulating XML but they > only know about the syntax, not the semantics. Yes, you can write your only > XSLT (which I have to say is an awful language) but all those > transformations would be doing is manipulating the syntax, e.g. removing > stuff, moving stuff around, extracting stuff, etc. In principle, this would > all be possible with any documented data format, including GEDCOM, at the > expense (& risk) of having to write a little more of the necessary software > yourself. > > I firmly believe that a "data model" has to be defined and accepted first. > This subject has come up several times in this group, and links have been > posted here about ongoing projects striving to achieve this. Once such a > data model specification exists then representation of it in any data format > (XML, GEDCOM, some other) is almost a mechanical operation Total agreement on the data-model will be achieved when If And ONLY IF (IFF) only one person has to be pleased by it. I don't care to record the names of 200 wedding guests, others want that information. Some people want to include the GPS data on the precise place of burial; I figure the name of the cemetery and a place-name is all the precision I need 90% of the time (and the other 10%, I put in notes to myself). But so long as there is "room" in the market for conflicting views on whether the GPS data is necessary or whether the names of all witnesses (as opposed to only the official witnesses as opposed to only the names of the participants) are necessary ... there's gonna be a need for a data-model so flexible it may as well not exist (see also: Gedcom standard). IME, YMMV, and so on. Cheryl

    12/26/2007 02:36:39