"Steve Hayes" <hayesstw@telkomsa.net> wrote in message news:31g6i85ir4tlbhov78e19vf81e19c8toab@4ax.com... > In an earlier message I suggested using AWK to manipulate a GEDCOM file to > solve a particular problem. > > That point tended to get lost in discussion of other points like using > other > ways to solve the problem, or discussion of flaws in the GEDCOM data model > itself and proposals for its replacement, which I see as a separate > question. > > What I would like to see is the development of a kind of library of AWK > routines to manipulate GEDCOM files. Lots of genealogists have GEDCOM > files, > and some would like to make changes to them, or extract information from > them > in ways that might not be possible with other genealogy programs. > > Here is a GEDCOM file. > > I tried to choose a short one to use as an example, which shows the > structure > of the file. > <snip> > > -- > Steve Hayes from Tshwane, South Africa > Blog: http://khanya.wordpress.com > E-mail - see web page, or parse: shayes at dunelm full stop org full stop > uk If you're OK using AWK Steve then I would recommend a more reliable approach. Textual manipulation to solve a problem is usually less-than-satisfactory due to ambiguities looking at plain text, and the fact that a simple text-processing language cannot easily understand the grammar of something like GEDCOM. In a similar vein, no one would (or should) try and manipulate an XML file directly from its textual representation. They would first load it into a DOM (Document Object Model) before processing the associated objects. I would recommend loading the GEDCOM file into an object-representation, in memory, and manipulate its objects instead. I have seen free tools for doing this although I do not have a reference to hand. Tony Proctor
Tony Proctor wrote: > "Steve Hayes" <hayesstw@telkomsa.net> wrote in message > news:31g6i85ir4tlbhov78e19vf81e19c8toab@4ax.com... >> In an earlier message I suggested using AWK to manipulate a GEDCOM file to >> solve a particular problem. >> >> That point tended to get lost in discussion of other points like using >> other >> ways to solve the problem, or discussion of flaws in the GEDCOM data model >> itself and proposals for its replacement, which I see as a separate >> question. >> >> What I would like to see is the development of a kind of library of AWK >> routines to manipulate GEDCOM files. Lots of genealogists have GEDCOM >> files, >> and some would like to make changes to them, or extract information from >> them >> in ways that might not be possible with other genealogy programs. >> >> Here is a GEDCOM file. >> >> I tried to choose a short one to use as an example, which shows the >> structure >> of the file. >> > > <snip> > >> >> -- >> Steve Hayes from Tshwane, South Africa >> Blog: http://khanya.wordpress.com >> E-mail - see web page, or parse: shayes at dunelm full stop org full stop >> uk > > If you're OK using AWK Steve then I would recommend a more reliable > approach. Textual manipulation to solve a problem is usually > less-than-satisfactory due to ambiguities looking at plain text, and the > fact that a simple text-processing language cannot easily understand the > grammar of something like GEDCOM. > > In a similar vein, no one would (or should) try and manipulate an XML file > directly from its textual representation. They would first load it into a > DOM (Document Object Model) before processing the associated objects. > > I would recommend loading the GEDCOM file into an object-representation, in > memory, and manipulate its objects instead. I have seen free tools for doing > this although I do not have a reference to hand. This idea also occurred to me. I dismissed it fairly quickly. First it just becomes Yet Another GEDCOM Based Application. Like all the other YAGBAs it has the problems of dealing with the ways GEDCOM has been twisted by all the other YAGBAs & their users. Maybe it would need some sort of expert system to understand how to parse incoming GEDCOM files from A & how to write them for B. Then it would need all manner of editing functions otherwise someone would be complaining that although it does what someone else wanted it doesn't do what they want, and, of course, an easy-to-use user interface to all this. Then why can't it just provide for new data entry as well? And why can't it display a nice tree? And do reports? And and and... If its import and export facilities were good enough to be the universal GEDCOM handler it would need to be scope creep would be inevitable. By v2.0 it would probably adding its own GEDCOM semantics & become part of the problem. -- Ian The Hotmail address is my spam-bin. Real mail address is iang at austonley org uk
On 2013-02-19 11:02, Ian Goddard wrote: > If its import and export facilities were good enough to be the universal > GEDCOM handler it would need to be scope creep would be inevitable. By > v2.0 it would probably adding its own GEDCOM semantics & become part of > the problem. It is simpler to treat the problem in two parts: input and output. It is feasible (if long winded) to provide a good approximation to a universal gedcom reader. However, providing for all variations of gedcom output is a much more severe problem and is the part best not attempted. So, for instance, if you want to provide a gedcom output option try just v5.5. In practice I find that once a particuar problem has been analysed, possibly by loading the file into a suitable program, it is often possible to go back to the original gedcom and do some editing although possibly manually rather than using a script. The human involvement providing that little bit of extra intelligence that can make the difference. If it is a large file, you just have to go plodding through. It might take a bit of discipline but it is doable. Now, what was the problem in question?