Wes Groleau <groleau+news@freeshell.org> wrote in news:jGBMi.1175$P06.1062@trnddc05: > Tony Proctor wrote: >> As a contrived illustration, consider some free-form notes that >> wanted to reference a person's name, address during a particular >> year, and the date they moved there: >> >> <Person("Anthony Proctor")> lives in <Person("Tony >> Proctor").Address("2007-10-01").Country> and moved there in >> <Event("ProctorMove").Year> >> >> All this sample serves to show is the generality of the use of a >> mark-up language, and how those tags can generate both display text >> (for reading) and a hyperlink to the associated in-memory object, or >> to other references to it. What you see on the screen might be >> simply: >> >> Tony Proctor lives in Ireland and moved there in 2002 > > I and many others have thought about ways to tag words and phrases > in free-form text with XML tags and attributes to carry the linking > information. But as far as I know, none of us have ever actually > produced a working implementation. > Hi all. A quick background- I spent my life in the automotive industry, as a tech as well as a technical training editor. My last 5 years were spent trying to understand & use XML as it related to getting technical info in this format and then publishing training course materials. I see a lot of discussion about XML here, and wanted to share my thoughts about it. I see XML as a complex subject, but usually misunderstood. I don't claim to understand it completely myself, but what I have learned gives me a LOT of respect for how brilliant it is in it's concept... simplicity and flexibility. The discussions I see here are similar to industry experience of having to adopt XML when dealing with U.S. government processes & regulations that require it. What it forces (in a nutshell) is one to think about what information they deal with, from whom, and what they want to finally use it for. It forces organization & categorization that isn't restrained by any one use. It does it by the standardization of a)the raw data format (ASCII text and use of "tags" <> and </> ) and b)a structure that requires the definition of it's elements be shared (by a schema, or document definition... the "rules"). I think that the key to using XML is to make sure that ALL of the data can be "tagged" with at least enough structure that nothing can be "lost" (unless someone wants to loose it!). What everyone struggles with is their own "subset" of tag requirements, but usually there is enough agreement among everyone about a core set of tags that everything will fit into. That's where the beauty of XML and schemas come into play.. anyone can define their own "tags" and even share them as long as they share their "schema". You either use their schema, or you produce a subset that at least conforms to the basic set of tags. In order to use data that is only broadly defined & tagged, you need to then create your own schema, based on your well thought out & DEFINED criteria. XML authoring, presentation and storage software is designed to "force" your rules on the data set while keeping the core tags and/or ensuring that your data can be "remapped" back into the core set without loss. What's being discussed are actually several "fine points", all of which are a bit irrelevant to XML itself (ahh.. the beauty again!). Some are discussing XML as a transport (which it can be called), some as a storage method (which it can be), some as a language (which it can also be), some as an organization structure (yup, that too). But really what it IS, is meta-data.. literally data about data. Labels and attributes. A system to attach labels and attributes to data, at their simplest as well as most complex levels of use. If you don't categorize and use data like I do, then at least we can share it if we both agree on it's most basic & common meaning. Obviously if I spend time in refining my data in great details, and you think it's just swell that way and saves you a lot of effort, then I've already tagged it for you to use right out of the box. If not, you can just use the data with my more broadly defined tags. You can even "remap" my tags with your own schema and rules. XML at LEAST provides a structure for sharing & understanding how someone's data is organized, and allows for sharing it without loss or regard to how someone else wants to use it. Hee, hee.. the RULES. THAT's the hard part. XML is the easy, logical part. (I think XML may be the key to the universe if we can only understand it, rather than just use it) :-)
JD <jd4x4@ wrote: XML is like ISO 9000/9001: it it form without meaning or purpose. It is basically meaningless. It is in the same category as "proofs" that a computer program is "correct" ... based on some "requirement" that itself could be buggy as can be. What matters is not the form but the meaning. And I seriously doubt that the genealogy community will agree to one straitjacket format for meaning, that is, structure. Will FTM and TMG agree to change their basic workings so they are the same? That will be necessary of they are to share data in an exact perfect match manner. In genealogy there really is only one single absolute given, at least, if one attributes the meaning of "is" to mean "born before DNA technology on people". That is, a given person, going back in time, has a binary tree of ancestors, exactly two per generation, with possible coelescence. (Now, these days, of course, a person can have two mothers: the autosomal/X mother and the mitochondrial mother .. and this doesn't fit with that model!). Beyond that some programs may tie things to "events" or "extra types of so-called 'parents'", etc. and they are just not going to agree on how. The whole idea of portability of dats is impossible. Doug McDonald
Doug said: >Will FTM and TMG agree to change their basic workings so they are the same? If they did, what would be the point in having two of them? Both programs are successful because their features cater to researchers who have different needs and standards for genealogical data. GEDCOM likewise has its own agenda which is why it does such a poor job of accommodating the universe of genealogy programs. >In genealogy there really is only one single absolute given, at >least, if one attributes the meaning of "is" to mean >"born before DNA technology on people". That is, a given person, >going back in time, has a binary >tree of ancestors, exactly two per generation, with possible coelescence. That is certainly the foundation of PAF and other programs that were designed 20 years ago. More modern programs are slowing coming around to the realization that genealogy is not about recording facts. It is about recording and evaluating _evidence_. And evidence doesn't play by such neat and tidy rules. Bob Velke Wholly Genes Software -- No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.5.503 / Virus Database: 269.17.1/1182 - Release Date: 12/12/2007 11:29 AM
Doug McDonald <mcdonald@SnPoAM_scs.uiuc.edu> wrote in news:fjrft2$vj3$1@news.ks.uiuc.edu: > XML is like ISO 9000/9001: it it form without meaning or purpose. > It is basically meaningless. It is in the same category as > "proofs" that a computer program is "correct" ... based on > some "requirement" that itself could be buggy as can be. > Wow. I'm not sure where to start. One of the primary benefits that I see with XML is that (one of) it's purposes is structure, yet flexibility to adapt and extend. The second would be that definitions are known (by virtue of the <shudder> structure of the schema document) by anyone who wants to use the data. -see more about that below.. > What matters is not the form but the meaning. And I seriously doubt > that the genealogy community will agree to one straitjacket > format for meaning, that is, structure. Will FTM and TMG agree Certainly everyone can agree on items such as Name, Date, Source, Notes, Comments, etc. Yes?? > to change their basic workings so they are the same? That will > be necessary of they are to share data in an exact perfect > match manner. If at least a basic XML schema is agreed on and XML used in any fashion, at the very least it would be an exchange standard. At the best it would comply with the XML intent that it accept a new schema document without harming or loosing the original data structure, and allow for the same data set to be used by a differing piece of software that might make use of "expanded" sets of tags. > > In genealogy there really is only one single absolute given, > at least, if one attributes the meaning of "is" to mean > "born before DNA technology on people". That is, > a given person, going back in time, has a binary > tree of ancestors, exactly two per generation, with possible > coelescence. (Now, these days, of course, a person can have two > mothers: the autosomal/X mother and the mitochondrial mother .. and > this doesn't fit with that model!). There are quite a few "absolutes", I think. What differs is exactly as you say- How completely & correctly people enter the "absolute" values (XML can help with this- if nothing more than to show you that there is an empty "hole" in the needed data, and that it wasn't just forgotten or missing in the export), and how people put two and two together (XML can help here as well with a schema that only does what it is supposed to to- classify the data, not analyze or manipulate it). > > Beyond that some programs may tie things to "events" > or "extra types of so-called 'parents'", etc. and they > are just not going to agree on how. > Software and user preference should be the only forces that draw conclusions, and those conclusions shouldn't change the data (the facts), or the description of what the data is. > The whole idea of portability of dats is impossible. > What's really impossible is to think that there is one schema that can do it all. The data is what it is, no more, no less. Just as I used a subset of my automotive data (that was mainly meant for engineers) to publish training materials (not build them from scratch), you just need a schema that at it's most basic level allows for tagging ALL of the data, and increasingly refines the data into more and more granular bits that don't differ, rather expand on the more basic tag. And, if I add data to your set to suit my purposes, you may choose to ignore it because you put cars together, not tear them apart. My uses and yours are complimentary, not exclusionary. You would just ignore my training data. At the end of the day, if you I may not agree with the criteria that you accept for relationships, but I would accept that you got someone's name and statement that they had an offspring if you tell me where you got it from. Then, it's really up to me (and you) to decide if that connects us, isn't it? None of that changes the data itself. Unless you made a typo :-) > Doug McDonald
"JD verizon.net>" <jd4x4@<del.this> wrote in message news:Xns9A04C51031446jd4x4verizonnet@199.45.49.11... > Wes Groleau <groleau+news@freeshell.org> wrote in > news:jGBMi.1175$P06.1062@trnddc05: > > > Tony Proctor wrote: > >> As a contrived illustration, consider some free-form notes that > >> wanted to reference a person's name, address during a particular > >> year, and the date they moved there: > >> > >> <Person("Anthony Proctor")> lives in <Person("Tony > >> Proctor").Address("2007-10-01").Country> and moved there in > >> <Event("ProctorMove").Year> > >> > >> All this sample serves to show is the generality of the use of a > >> mark-up language, and how those tags can generate both display text > >> (for reading) and a hyperlink to the associated in-memory object, or > >> to other references to it. What you see on the screen might be > >> simply: > >> > >> Tony Proctor lives in Ireland and moved there in 2002 > > > > I and many others have thought about ways to tag words and phrases > > in free-form text with XML tags and attributes to carry the linking > > information. But as far as I know, none of us have ever actually > > produced a working implementation. > > > > Hi all. A quick background- I spent my life in the automotive industry, as > a tech as well as a technical training editor. My last 5 years were spent > trying to understand & use XML as it related to getting technical info in > this format and then publishing training course materials. I see a lot of > discussion about XML here, and wanted to share my thoughts about it. I see > XML as a complex subject, but usually misunderstood. I don't claim to > understand it completely myself, but what I have learned gives me a LOT of > respect for how brilliant it is in it's concept... simplicity and > flexibility. > > The discussions I see here are similar to industry experience of having to > adopt XML when dealing with U.S. government processes & regulations that > require it. What it forces (in a nutshell) is one to think about what > information they deal with, from whom, and what they want to finally use it > for. It forces organization & categorization that isn't restrained by any > one use. It does it by the standardization of a)the raw data format (ASCII > text and use of "tags" <> and </> ) and b)a structure that requires the > definition of it's elements be shared (by a schema, or document > definition... the "rules"). > > I think that the key to using XML is to make sure that ALL of the data can > be "tagged" with at least enough structure that nothing can be "lost" > (unless someone wants to loose it!). What everyone struggles with is their > own "subset" of tag requirements, but usually there is enough agreement > among everyone about a core set of tags that everything will fit into. > That's where the beauty of XML and schemas come into play.. anyone can > define their own "tags" and even share them as long as they share their > "schema". You either use their schema, or you produce a subset that at > least conforms to the basic set of tags. In order to use data that is only > broadly defined & tagged, you need to then create your own schema, based on > your well thought out & DEFINED criteria. XML authoring, presentation and > storage software is designed to "force" your rules on the data set while > keeping the core tags and/or ensuring that your data can be "remapped" back > into the core set without loss. > > What's being discussed are actually several "fine points", all of which are > a bit irrelevant to XML itself (ahh.. the beauty again!). Some are > discussing XML as a transport (which it can be called), some as a storage > method (which it can be), some as a language (which it can also be), some > as an organization structure (yup, that too). But really what it IS, is > meta-data.. literally data about data. Labels and attributes. A system to > attach labels and attributes to data, at their simplest as well as most > complex levels of use. If you don't categorize and use data like I do, then > at least we can share it if we both agree on it's most basic & common > meaning. Obviously if I spend time in refining my data in great details, > and you think it's just swell that way and saves you a lot of effort, then > I've already tagged it for you to use right out of the box. If not, you can > just use the data with my more broadly defined tags. You can even "remap" > my tags with your own schema and rules. > > XML at LEAST provides a structure for sharing & understanding how someone's > data is organized, and allows for sharing it without loss or regard to how > someone else wants to use it. > > Hee, hee.. the RULES. THAT's the hard part. XML is the easy, logical part. > > (I think XML may be the key to the universe if we can only understand it, > rather than just use it) :-) Interestingly Wes, the snippet of my post that you quote here has nothing to do with XML. Although I did mention XML somewhere, it was to point out the inappropriateness of it since it's designed for hierarchical data, and family relationships are not hierarchical - they're a "network". The snippet you quoted was me making a case for a new rich-text mark-up language that could represent any number of embedded data types (including all the variations, and "fuzziness", mentioned by other researchers here), and that would create 'live objects' from those references when the text was loaded into a viewer. Those live objects could then support all sorts of possibilities, e.g. cross-referencing, correlation, navigation. I did take some time aside from my paid day-job to experiment with this idea, and the results were very inspiring. It felt like it could have implications for almost any sort of textual storage system. The basic principles are generic and not specific to genealogy Tony Proctor
There are several distinct issues here: 1. GEDCOM 6 vs earlier. GEDCOM 6 is an XML version. There is a document describing it but I'm not sure of its status - is it still work in progress? In any event it doesn't seem to be used. It follows the same data model as the current 5.5 so any references to the data model apply to both. Where we consider "GEDCOM" vs XML in terms of parsing files, etc. them "GEDCOM" is likely to mean 5.5 or earlier and XML would include GEDCOM 6. But, to repeat myself, 6 doesn't seem to be in use. 2. GEDCOM (pre-6) as a database format. By database format I mean the internal database used by an application to store and manipulate data. GEDCOM wouldn't be my choice irrespective of data model considerations. My choice would probably be an RDBMS. Familiarity is part of this but there is a huge choice of database engines, there is a (more or less) standard language, SQL and there are technologies which to some extent decouple the choice of programming language from the choice of engine. In comparison GEDCOM would have to be stored as a text file between program runs, parsed on each run and rewritten at the end of any run which amends the file. The parser would have to be a dedicated GEDCOM parser. Unless a supplementary index is used alongside the text file the data would have to be re-indexed on each run (probably using a dedicated GEDCOM indexer) or else the benefits of indexing would have to be foregone. If indexing is not used scalability would be more difficult. As a text file the data store would be open to modification with a text editor which is a problem for integrity and death to an external index. 3. GEDCOM 6 (and other XML) as a database format. As a text format this has some of the disadvantages as plain XML in terms of having to be parsed and indexed for each run. Unlike plain GEDCOM, however, there is plenty of ready-made technology for this including XQUERY. XMLT technology will assist web-site and report generation. 4. GEDCOM as a data exchange format. This is a more appropriate application as scalability would not be an issue. The parsing requirements remain, however. 5. GEDCOM 6 (and other XML) as a data exchange format. This would be a preferable data exchange format as it has the technological advantages of XML. 6. GEDCOM as a data model. Whether as a database or data exchange format GEDCOM cannot be better than its underlying data model. This is a fairly simple model Presumably it meets LDS's requirements but for some of us it's too limited. -- Ian Hotmail is for spammers. Real mail address is igoddard at nildram co uk
Ian Goddard wrote: > There are several distinct issues here: > > 2. GEDCOM (pre-6) as a database format. > By database format I mean the internal database used by an application > to store and manipulate data. GEDCOM wouldn't be my choice irrespective > of data model considerations. My choice would probably be an RDBMS. > Familiarity is part of this but there is a huge choice of database > engines, there is a (more or less) standard language, SQL and there are > technologies which to some extent decouple the choice of programming > language from the choice of engine. In comparison GEDCOM would have to > be stored as a text file between program runs, parsed on each run and > rewritten at the end of any run which amends the file. The parser would > have to be a dedicated GEDCOM parser. Unless a supplementary index is > used alongside the text file the data would have to be re-indexed on > each run (probably using a dedicated GEDCOM indexer) or else the > benefits of indexing would have to be foregone. If indexing is not used > scalability would be more difficult. As a text file the data store > would be open to modification with a text editor which is a problem for > integrity and death to an external index. Take a look at LifeLines (http://lifelines.sourceforge.net/). I haven't looked at the internals, but it stores GEDCOM in some sort of database. > 4. GEDCOM as a data exchange format. > This is a more appropriate application as scalability would not be an > issue. I should hope so, since that is what it was designed for. > The parsing requirements remain, however. Any kind of data exchange format will have to be parsed. > 5. GEDCOM 6 (and other XML) as a data exchange format. > This would be a preferable data exchange format as it has the > technological advantages of XML. I fail to see any technological advantages of XML over GEDCOM. Their syntax is different, but their semantics are essentially the same: both are lists of trees. The availability of parsing tools for XML is not a real advantage. You still have to write the callbacks, and that is where most of the work is. A GEDCOM line is trivial to parse: it is either LEVEL TAG DATA or LEVEL XREF TAG. Validating parsers might be an advantage of XML, but GEDCOM allows users to define their own tags, and I don't know if XML validating parsers can handle that. -- Thomas M. Sommers -- tms@nj.net -- AB2SB