Confirming that files are "identical" itself presents a challenge. For example, given earlier comments about inconsistency with spaces, does one interpret all white space of whatever length as "space"? Does one ignore or warn about trailing spaces? To enforce (say) that a hash of the file should match exactly is a severe test. Maybe it is even undesirable, depending on whether you allow flexibility on reading files but try to write only files in canonical form. Example: a placename: New Milton Should this match New Milton ? (probably a bad example but I don't have gedcom spec handy) A better example may be treatment of space between a tag and its value... Dave Beakhust Sent from my iPhone On 24 Apr 2012, at 15:38, Tim Powys-Lybbe <[email protected]> wrote: > On 24 Apr at 12:01, "Adrian Bruce" <[email protected]> wrote: > >> My apologies to those who are bored by GEDCOM but its existence is >> fundamental to our ability to transfer data other than in human-only >> readable form. It concerns me that genealogical and family history >> societies are letting software people make the running on whether or >> not GEDCOM is replaced / enhanced / left to die. >> >> For what it's worth as a former IT professional of 30y standing >> writing and supporting software: >> >> Caroline said "when software developers blame the exporting program >> they abdicate their own responsibility to their customers". I >> certainly applaud those who tweak their software and hope they'll gain >> the market share they deserve. But we need to distinguish between the >> general responsibility to their customers of providing the optimum >> software and the responsibility to produce GEDCOM that is compliant to >> a standard. This responsibility exists and lies with the person who >> writes the export code. There are 2 reasons for this - if they call it >> a GEDCOM export, it should be that, not a half-hearted attempt. >> Secondly, if there are (say) 20 popular programs out there, and you >> write a 21st, you seriously do not want to be writing 20 different >> import routines plus your own export - one export and one import ought >> to suffice. >> >> Sue said "Identifying "incorrect" GEDCOM is difficult because the >> specification is not entirely clear." I'd disagree. For the most part, >> the GEDCOM standard is perfectly clear and it annoys me that so many >> sling around the view that GEDCOM is flawed. (I suspect Sue, from her >> phrasing, doesn't belong to the extreme mud-slingers, though). >> Certainly, the casual reader will not find it at all clear - but >> that's not the target audience. In the BetterGEDCOM Wiki, it proved >> hard for any of the IT literate contributors to find an "error" in the >> specification - about the only one that sticks in my mind is that one >> could have an infinite loop of a Source referring to a standalone >> Note, which is justified by the first Source, which would refer to the >> same standalone Note, which is... >> >> This is NOT to say that GEDCOM is adequate for family history today. >> It isn't. The point is that all the new standards in the world won't >> help if the major problem is not with GEDCOM but with the fact that >> developers either can't be bothered to read the standard properly or >> can't be bothered to take all the steps necessary to reformat their >> own data to fit into the GEDCOM model. Neither of those problems will >> be fixed by a new or enhanced standard. >> >> In essence we need a 2-pronged approach - firstly we need to highlight >> the incompetencies of software suppliers who can't be bothered to >> understand the difference between CONT and CONC in a GEDCOM file. >> Secondly we need to agree on what family historians want from a >> revision / replacement of GEDCOM. (If we want anything). For instance, >> US genealogists tend to emphasise the data that goes into citations - >> are UK family historians satisfied with what they have in GEDCOM? >> Alternatively are we happy that FamilySearch will drive GEDCOMX (say) >> and only produce something to satisfy FS's needs? >> >> Adrian Bruce > > I agree with all that you say. The crux of the matter is the software > developers who write faulty programs to export and import GEDCOM. > > Thinking further about this, I cannot see the Family Search people doing > anything to improve GEDCOM 5.5. But what is needed for any version of > GEDCOM is a method of testing that a genealogy program correctly handles > it. Let's try this: > > 1. A standard GEDCOM file should be developed that incorporates all the > features of GEDCOM. For GEDCOMX this can only be done by that team. > > 2. The first test of compliance for any software is that it should be > able to receive in this standard file, create a genealogy in their own > format and then export from their program the GEDCOM. The exported > GEDCOM should be identical to the standard one. > > 3. The second test of compliance for any genealogy software is that the > program should be used to construct all the features that the developers > have incorporated. Then a GEDCOM should be created to export the data. > The first sub-check of compliance should be to then import this GEDCOM > and check that the recreated genealogy file is identical to the > original. The second sub-check of compliance should be to then import > the GEDCOM to another program that is otherwise known to be compliant > and then export from that another GEDCOM and finally import the last > GEDCOM into the program under test and check that the regenerated > genealogy is identical to the original. > > I would expect to hear squeals of protest from the software developers. > We would depend on the reviewers and journalists to do or commission > these relatively simple tests in any report they made on the various > genealogy programs. A little pressure there? > > -- > Tim Powys-Lybbe [email protected] > for a miscellany of bygones: http://powys.org/ > > ------------------------------- > To unsubscribe from the list, please send an email to [email protected] with the word 'unsubscribe' without the quotes in the subject and the body of the message
One thing that has not been mentioned so far in this thread is that a GEDCOM file is only a simple text file. You want to know if your new program will import everything from your old program. Open a new file in your new program and enter data into every available field including source material, making sure that you follow the program's instructions (use a fictitious person or a real one if you have one that has information for every field). Do the same with your existing program. Export a GEDCOM file from both programs and open them in a text editor to compare any differences. Most differences can be 'remedied' using the SEARCH and REPLACE facility in Word (although you must remember to save the resultant file as a simple text file with a .GED extension). Always experiment with a copy of your existing file, never the original. An example that springs to mind is as follows: Old program: 1 OCCU 2 TITL Plasterer 2 PLAC Manor Park, ESS 2 FROM 16 Sep 1899 New program 1 OCCU Plasterer 2 DATE 16 Sep 1899 2 PLAC Manor Park, ESS Use SEARCH and REPLACE to globally find ^p2TITL (where ^p is a carriage return) and replace it with nothing. In the same way, exchange 2 FROM for 2 DATE. Date and place are the wrong way round, and this is the tedious bit. Highlight from 2 PLAC to the end of the date line and use the sort facility to reverse the order. This may take quite a while, but quicker than typing it all in again! I know this doesn't solve the fundamental problem, but it is a workaround. And if you are into even the most basic programming, you could make all these changes with one pass. Jeanne Bunting 3472 Attersley -----Original Message----- From: Dave Beakhust Sent: Tuesday, April 24, 2012 5:17 PM To: [email protected] Cc: [email protected] Subject: Re: [SOG-UK] GEDCOM (was Upgrade from FTM2006?) Confirming that files are "identical" itself presents a challenge. For example, given earlier comments about inconsistency with spaces, does one interpret all white space of whatever length as "space"? Does one ignore or warn about trailing spaces? To enforce (say) that a hash of the file should match exactly is a severe test. Maybe it is even undesirable, depending on whether you allow flexibility on reading files but try to write only files in canonical form. Example: a placename: New Milton Should this match New Milton ? (probably a bad example but I don't have gedcom spec handy) A better example may be treatment of space between a tag and its value... Dave Beakhust Sent from my iPhone On 24 Apr 2012, at 15:38, Tim Powys-Lybbe <[email protected]> wrote: > On 24 Apr at 12:01, "Adrian Bruce" <[email protected]> wrote: > >> My apologies to those who are bored by GEDCOM but its existence is >> fundamental to our ability to transfer data other than in human-only >> readable form. It concerns me that genealogical and family history >> societies are letting software people make the running on whether or >> not GEDCOM is replaced / enhanced / left to die. >> >> For what it's worth as a former IT professional of 30y standing >> writing and supporting software: >> >> Caroline said "when software developers blame the exporting program >> they abdicate their own responsibility to their customers". I >> certainly applaud those who tweak their software and hope they'll gain >> the market share they deserve. But we need to distinguish between the >> general responsibility to their customers of providing the optimum >> software and the responsibility to produce GEDCOM that is compliant to >> a standard. This responsibility exists and lies with the person who >> writes the export code. There are 2 reasons for this - if they call it >> a GEDCOM export, it should be that, not a half-hearted attempt. >> Secondly, if there are (say) 20 popular programs out there, and you >> write a 21st, you seriously do not want to be writing 20 different >> import routines plus your own export - one export and one import ought >> to suffice. >> >> Sue said "Identifying "incorrect" GEDCOM is difficult because the >> specification is not entirely clear." I'd disagree. For the most part, >> the GEDCOM standard is perfectly clear and it annoys me that so many >> sling around the view that GEDCOM is flawed. (I suspect Sue, from her >> phrasing, doesn't belong to the extreme mud-slingers, though). >> Certainly, the casual reader will not find it at all clear - but >> that's not the target audience. In the BetterGEDCOM Wiki, it proved >> hard for any of the IT literate contributors to find an "error" in the >> specification - about the only one that sticks in my mind is that one >> could have an infinite loop of a Source referring to a standalone >> Note, which is justified by the first Source, which would refer to the >> same standalone Note, which is... >> >> This is NOT to say that GEDCOM is adequate for family history today. >> It isn't. The point is that all the new standards in the world won't >> help if the major problem is not with GEDCOM but with the fact that >> developers either can't be bothered to read the standard properly or >> can't be bothered to take all the steps necessary to reformat their >> own data to fit into the GEDCOM model. Neither of those problems will >> be fixed by a new or enhanced standard. >> >> In essence we need a 2-pronged approach - firstly we need to highlight >> the incompetencies of software suppliers who can't be bothered to >> understand the difference between CONT and CONC in a GEDCOM file. >> Secondly we need to agree on what family historians want from a >> revision / replacement of GEDCOM. (If we want anything). For instance, >> US genealogists tend to emphasise the data that goes into citations - >> are UK family historians satisfied with what they have in GEDCOM? >> Alternatively are we happy that FamilySearch will drive GEDCOMX (say) >> and only produce something to satisfy FS's needs? >> >> Adrian Bruce > > I agree with all that you say. The crux of the matter is the software > developers who write faulty programs to export and import GEDCOM. > > Thinking further about this, I cannot see the Family Search people doing > anything to improve GEDCOM 5.5. But what is needed for any version of > GEDCOM is a method of testing that a genealogy program correctly handles > it. Let's try this: > > 1. A standard GEDCOM file should be developed that incorporates all the > features of GEDCOM. For GEDCOMX this can only be done by that team. > > 2. The first test of compliance for any software is that it should be > able to receive in this standard file, create a genealogy in their own > format and then export from their program the GEDCOM. The exported > GEDCOM should be identical to the standard one. > > 3. The second test of compliance for any genealogy software is that the > program should be used to construct all the features that the developers > have incorporated. Then a GEDCOM should be created to export the data. > The first sub-check of compliance should be to then import this GEDCOM > and check that the recreated genealogy file is identical to the > original. The second sub-check of compliance should be to then import > the GEDCOM to another program that is otherwise known to be compliant > and then export from that another GEDCOM and finally import the last > GEDCOM into the program under test and check that the regenerated > genealogy is identical to the original. > > I would expect to hear squeals of protest from the software developers. > We would depend on the reviewers and journalists to do or commission > these relatively simple tests in any report they made on the various > genealogy programs. A little pressure there? > > -- > Tim Powys-Lybbe [email protected] > for a miscellany of bygones: http://powys.org/ > > ------------------------------- > To unsubscribe from the list, please send an email to > [email protected] with the word 'unsubscribe' without the quotes > in the subject and the body of the message ------------------------------- To unsubscribe from the list, please send an email to [email protected] with the word 'unsubscribe' without the quotes in the subject and the body of the message