RootsWeb.com Mailing Lists
Total: 3/3
    1. Re: Using AWK to manipulate GEDCOM files
    2. Kerry Raymond
    3. Despite all the scorn heaped on the idea, I must say that I often use AWK for quickly fixing problems with GEDCOMs (particularly ones given to me that are not importing correctly), removing information of certain kinds before passing it on, etc. It's a quick way to get certain tasks done, like get rid of all those UPPERCASE SURNAMES. You do have to have a good understanding of what your GEDCOM file actually contains as otherwise you might get some unintended consequences. This might make it difficult to have a shareable library of awk scripts that others can use "out of the box" given the not-as-standard-as-it-should-be nature of GEDCOM, but I can't see a problem with having a shareable library of awk scripts that others can use as a starting point for their own tweaking. Awk is not the perfect tool for every job, but it's handy to have in the tool kit. Kerry

    02/22/2013 02:39:11
    1. Re: Using AWK to manipulate GEDCOM files
    2. Tony Proctor
    3. "Kerry Raymond" <kraymond@iprimus.com.au> wrote in message news:AMSdnSntPaO0LbvMnZ2dnUVZ_jidnZ2d@westnet.com.au... > Despite all the scorn heaped on the idea, I must say that I often use AWK > for quickly fixing problems with GEDCOMs (particularly ones given to me > that are not importing correctly), removing information of certain kinds > before passing it on, etc. It's a quick way to get certain tasks done, > like get rid of all those UPPERCASE SURNAMES. You do have to have a good > understanding of what your GEDCOM file actually contains as otherwise you > might get some unintended consequences. This might make it difficult to > have a shareable library of awk scripts that others can use "out of the > box" given the not-as-standard-as-it-should-be nature of GEDCOM, but I > can't see a problem with having a shareable library of awk scripts that > others can use as a starting point for their own tweaking. > > Awk is not the perfect tool for every job, but it's handy to have in the > tool kit. > > Kerry > I have used AWK too Kerry so I know it can be useful for automating certain manipulations. My point in this thread was merely that it is designed for manipulating _text_. Depending on what you want to do with the data files then that may be totally adequate. However, I really like the idea of being able to script manipulation of the genealogical entities as opposed to the raw text. I have used similar systems outside of the field of genealogy and they can be remarkably powerful. For instance, if your genealogy software doesn't provide the type of query you need to execute then such a tool could compile your data file into a number of objects in memory (e.g. Persons, Places, Events, etc) and allow you to iterate through them, test properties, correlate relationships, make adjustments to them, etc. It would need a simple scripting language but there are existing precedents for this that could easily be adapted. In summary, this could be a truly powerful tool, but it deserves a proper implementation. Also, if-and-when we get a modern data standard then I would like to see this being a supplemental part of it, although I couldn't justify my own time on a GEDCOM implementation. Maybe Steve's (OP) requirements are more specific than this - e.g. a programmatic edit - in which case I apologise for looking beyond that and hijacking the thread. However, it has spurred me to a potential involvement at a later date so thanks for that. Tony Proctor

    02/22/2013 01:49:07
    1. Re: Using AWK to manipulate GEDCOM files
    2. Janis Papanagnou
    3. Am 22.02.2013 09:49, schrieb Tony Proctor: > "Kerry Raymond" <kraymond@iprimus.com.au> wrote in message > news:AMSdnSntPaO0LbvMnZ2dnUVZ_jidnZ2d@westnet.com.au... >> [...] >> >> Awk is not the perfect tool for every job, but it's handy to have in the >> tool kit. >> >> Kerry >> > > I have used AWK too Kerry so I know it can be useful for automating certain > manipulations. My point in this thread was merely that it is designed for > manipulating _text_. Not only that; it's also used to evaluate data, similar to what you seem to describe below. > Depending on what you want to do with the data files > then that may be totally adequate. > > However, I really like the idea of being able to script manipulation of the > genealogical entities as opposed to the raw text. I have used similar > systems outside of the field of genealogy and they can be remarkably > powerful. For instance, if your genealogy software doesn't provide the type > of query you need to execute then such a tool could compile your data file > into a number of objects in memory (e.g. Persons, Places, Events, etc) and > allow you to iterate through them, test properties, correlate relationships, > make adjustments to them, etc. Iterate, test, correlate; this is something I regularily do with awk. You know that you have arrays to store the data in memory, associative arrays usable for relations, and loops and conditions to operate on that data. I am sure you can construct complex requirements that will lead to more code than what is typical for awk. But I've also got the impression that you might not be fully aware of awk's possibilities? For another (non-awk) approach organise your data in a database and operate on it using SQL alone. Janis > It would need a simple scripting language but > there are existing precedents for this that could easily be adapted. > > [...]

    02/22/2013 03:51:11