"Janis Papanagnou" <janis_papanagnou@hotmail.com> wrote in message news:kg7f2e$qjg$1@speranza.aioe.org... > Am 22.02.2013 09:49, schrieb Tony Proctor: >> "Kerry Raymond" <kraymond@iprimus.com.au> wrote in message >> news:AMSdnSntPaO0LbvMnZ2dnUVZ_jidnZ2d@westnet.com.au... >>> [...] >>> >>> Awk is not the perfect tool for every job, but it's handy to have in the >>> tool kit. >>> >>> Kerry >>> >> >> I have used AWK too Kerry so I know it can be useful for automating >> certain >> manipulations. My point in this thread was merely that it is designed for >> manipulating _text_. > > Not only that; it's also used to evaluate data, similar to what you > seem to describe below. > >> Depending on what you want to do with the data files >> then that may be totally adequate. >> >> However, I really like the idea of being able to script manipulation of >> the >> genealogical entities as opposed to the raw text. I have used similar >> systems outside of the field of genealogy and they can be remarkably >> powerful. For instance, if your genealogy software doesn't provide the >> type >> of query you need to execute then such a tool could compile your data >> file >> into a number of objects in memory (e.g. Persons, Places, Events, etc) >> and >> allow you to iterate through them, test properties, correlate >> relationships, >> make adjustments to them, etc. > > Iterate, test, correlate; this is something I regularily do with awk. > You know that you have arrays to store the data in memory, associative > arrays usable for relations, and loops and conditions to operate on > that data. I am sure you can construct complex requirements that will > lead to more code than what is typical for awk. But I've also got the > impression that you might not be fully aware of awk's possibilities? > > For another (non-awk) approach organise your data in a database and > operate on it using SQL alone. > > Janis > >> It would need a simple scripting language but >> there are existing precedents for this that could easily be adapted. >> >> [...] A simple off-the-cuff example, with an arbitrary syntax, just to illustrate the difference Janis. This bit of script wants to look at all the events in my timeline, then look at all the people sharing those name events, and select the ones whose name has the element "Jesson". Person me = New Person("Tony Proctor"); for (Event e: me.AllEvents()) { for (Person other: e.AllPersons()) { if (other.name().contains("Jesson")) { ...do something with this other person... } } } Tony Proctor
On 2/22/2013 4:33 AM, Tony Proctor wrote: > "Janis Papanagnou" <janis_papanagnou@hotmail.com> wrote in message > news:kg7f2e$qjg$1@speranza.aioe.org... >> Am 22.02.2013 09:49, schrieb Tony Proctor: >>> "Kerry Raymond" <kraymond@iprimus.com.au> wrote in message >>> news:AMSdnSntPaO0LbvMnZ2dnUVZ_jidnZ2d@westnet.com.au... >>>> [...] >>>> >>>> Awk is not the perfect tool for every job, but it's handy to have in the >>>> tool kit. >>>> >>>> Kerry >>>> >>> >>> I have used AWK too Kerry so I know it can be useful for automating >>> certain >>> manipulations. My point in this thread was merely that it is designed for >>> manipulating _text_. >> >> Not only that; it's also used to evaluate data, similar to what you >> seem to describe below. >> >>> Depending on what you want to do with the data files >>> then that may be totally adequate. >>> >>> However, I really like the idea of being able to script manipulation of >>> the >>> genealogical entities as opposed to the raw text. I have used similar >>> systems outside of the field of genealogy and they can be remarkably >>> powerful. For instance, if your genealogy software doesn't provide the >>> type >>> of query you need to execute then such a tool could compile your data >>> file >>> into a number of objects in memory (e.g. Persons, Places, Events, etc) >>> and >>> allow you to iterate through them, test properties, correlate >>> relationships, >>> make adjustments to them, etc. >> >> Iterate, test, correlate; this is something I regularily do with awk. >> You know that you have arrays to store the data in memory, associative >> arrays usable for relations, and loops and conditions to operate on >> that data. I am sure you can construct complex requirements that will >> lead to more code than what is typical for awk. But I've also got the >> impression that you might not be fully aware of awk's possibilities? >> >> For another (non-awk) approach organise your data in a database and >> operate on it using SQL alone. >> >> Janis >> >>> It would need a simple scripting language but >>> there are existing precedents for this that could easily be adapted. >>> >>> [...] > > A simple off-the-cuff example, with an arbitrary syntax, just to illustrate > the difference Janis. This bit of script wants to look at all the events in > my timeline, then look at all the people sharing those name events, and > select the ones whose name has the element "Jesson". > > > Person me = New Person("Tony Proctor"); > for (Event e: me.AllEvents()) { > for (Person other: e.AllPersons()) { > if (other.name().contains("Jesson")) { > ...do something with this other person... > } > } > } See the script I posted earlier and copied below which reads each record of your file into an array and then prints that record. If I understand you correctly, the above can be done in awk by reading the whole file into one array of records and then operating on it in the END section. Ed. $ cat parseGedcom.awk function prtRec(rec, tag,numTags,tagNr) { printf "%s",recSep numTags = asorti(rec,tags) for (tagNr=1; tagNr <= numTags; tagNr++) { tag = tags[tagNr] print tag " = <" rec[tag] ">" } recSep = "-------------------\n" } { level = $1 if (level == 0) { prtRec(rec) delete rec maxLevel = 0 } tag = $2 level2tag[level] = tag fullTag="" for (i=0;i<=level;i++) { fullTag = (fullTag ? fullTag "->" : "") level2tag[i] } data = gensub(/^[[:space:]]*([^[:space:]]+[[:space:]]*){2}/,"","") rec[fullTag] = data maxLevel = (level > maxLevel ? level : maxLevel) } END { prtRec(rec) } $ $ gawk -f parseGedcom.awk file ------------------- HEAD = <> HEAD->CHAR = <ANSEL> HEAD->DATE = <20 APR 2002> HEAD->DATE->TIME = <2:58:56> HEAD->DEST = <PAF> HEAD->FILE = <GEDCOM4.ged> HEAD->GEDC = <> HEAD->GEDC->FORM = <LINEAGE-LINKED> HEAD->GEDC->VERS = <5.5> HEAD->SOUR = <ANSTFILE> HEAD->SOUR->CORP = <The Church of Jesus Christ of Latter-day Saints> HEAD->SOUR->CORP->ADDR = <50 East North Temple Street> HEAD->SOUR->CORP->ADDR->CONT = <Salt Lake City, Utah 84150> HEAD->SOUR->DATA = <Ancestral File> HEAD->SOUR->DATA->COPR = <Copyright (c) 1987, June 1998> HEAD->SOUR->DATA->DATE = <5 January 1998> HEAD->SOUR->NAME = <Ancestral File (R)> HEAD->SOUR->VERS = <4.19> HEAD->SUBM = <@SUB01@> HEAD->SUBN = <@N01@> ------------------- @SUB01@ = <SUBM> @SUB01@->ADDR = <50 East North Temple Street> @SUB01@->ADDR->CONT = <Salt Lake City, Utah 84150> @SUB01@->NAME = <Created by FamilySearch (TM) Internet Genealogy Service> ------------------- @S01@ = <SOUR> @S01@->AUTH = <The Church of Jesus Christ of Latter-day Saints> @S01@->PUBL = <Copyright (c) 1987, June 1998, data as of 5 January 1998> @S01@->REPO = <@R01@> @S01@->TITL = <Ancestral File (R)> ------------------- @R01@ = <REPO> @R01@->ADDR = <35 N West Temple Street> @R01@->ADDR->CONT = <Salt Lake City, Utah 84150 USA> @R01@->NAME = <Family History Library> ------------------- @N01@ = <SUBN> @N01@->DESC = <2> @N01@->ORDI = <N> ------------------- @I3GLR-Z3@ = <INDI> @I3GLR-Z3@->AFN = <3GLR-Z3> @I3GLR-Z3@->BIRT = <> @I3GLR-Z3@->BIRT->DATE = <1 Jul 1850> @I3GLR-Z3@->BIRT->PLAC = <Geelong, Vic, Astl> @I3GLR-Z3@->FAMC = <@F524078@> @I3GLR-Z3@->FAMS = <@F1794078@> @I3GLR-Z3@->NAME = <Thomas William /BALDOCK/> @I3GLR-Z3@->NAME->GIVN = <Thomas William> @I3GLR-Z3@->NAME->SURN = <BALDOCK> @I3GLR-Z3@->SEX = <M> @I3GLR-Z3@->SOUR = <@S01@> ------------------- @I3GLR-4R@ = <INDI> @I3GLR-4R@->AFN = <3GLR-4R> @I3GLR-4R@->FAMS = <@F524078@> @I3GLR-4R@->NAME = <Thomas /BALDOCK/> @I3GLR-4R@->NAME->GIVN = <Thomas> @I3GLR-4R@->NAME->SURN = <BALDOCK> @I3GLR-4R@->SEX = <M> @I3GLR-4R@->SOUR = <@S01@> ------------------- @I3GLR-5X@ = <INDI> @I3GLR-5X@->AFN = <3GLR-5X> @I3GLR-5X@->FAMS = <@F524078@> @I3GLR-5X@->NAME = <Anne /CHAMBERS/> @I3GLR-5X@->NAME->GIVN = <Anne> @I3GLR-5X@->NAME->SURN = <CHAMBERS> @I3GLR-5X@->SEX = <F> @I3GLR-5X@->SOUR = <@S01@> ------------------- @I98BW-JC@ = <INDI> @I98BW-JC@->AFN = <98BW-JC> @I98BW-JC@->BIRT = <> @I98BW-JC@->BIRT->DATE = <1854> @I98BW-JC@->BIRT->PLAC = <Geelong, Victoria, Australia> @I98BW-JC@->DEAT = <> @I98BW-JC@->DEAT->DATE = <9 Dec 1890> @I98BW-JC@->DEAT->PLAC = <Geelong, Victoria, Australia> @I98BW-JC@->FAMC = <@F1794093@> @I98BW-JC@->FAMS = <@F1794078@> @I98BW-JC@->NAME = <Emily Jane /THORNTON/> @I98BW-JC@->NAME->GIVN = <Emily Jane> @I98BW-JC@->NAME->SURN = <THORNTON> @I98BW-JC@->SEX = <F> @I98BW-JC@->SOUR = <@S01@> ------------------- @I98BX-N6@ = <INDI> @I98BX-N6@->AFN = <98BX-N6> @I98BX-N6@->FAMS = <@F1794093@> @I98BX-N6@->NAME = <Charles Edwin /THORNTON/> @I98BX-N6@->NAME->GIVN = <Charles Edwin> @I98BX-N6@->NAME->SURN = <THORNTON> @I98BX-N6@->SEX = <M> @I98BX-N6@->SOUR = <@S01@> ------------------- @I98BX-PC@ = <INDI> @I98BX-PC@->AFN = <98BX-PC> @I98BX-PC@->FAMS = <@F1794093@> @I98BX-PC@->NAME = <Emily /GROWDON/> @I98BX-PC@->NAME->GIVN = <Emily> @I98BX-PC@->NAME->SURN = <GROWDON> @I98BX-PC@->SEX = <F> @I98BX-PC@->SOUR = <@S01@> ------------------- @I98CJ-BW@ = <INDI> @I98CJ-BW@->AFN = <98CJ-BW> @I98CJ-BW@->BIRT = <> @I98CJ-BW@->BIRT->DATE = <ABT 1876> @I98CJ-BW@->BIRT->PLAC = <Geelong, Victoria, Australia> @I98CJ-BW@->FAMC = <@F1794078@> @I98CJ-BW@->NAME = <Percy William Growdon /BALDOCK/> @I98CJ-BW@->NAME->GIVN = <Percy William Growdon> @I98CJ-BW@->NAME->SURN = <BALDOCK> @I98CJ-BW@->SEX = <M> @I98CJ-BW@->SOUR = <@S01@> ------------------- @I98BW-LP@ = <INDI> @I98BW-LP@->AFN = <98BW-LP> @I98BW-LP@->BIRT = <> @I98BW-LP@->BIRT->DATE = <1879> @I98BW-LP@->BIRT->PLAC = <Geelong, Victoria, Australia> @I98BW-LP@->DEAT = <> @I98BW-LP@->DEAT->DATE = <6 Sep 1886> @I98BW-LP@->DEAT->PLAC = <Geelong, Victoria, Australia> @I98BW-LP@->FAMC = <@F1794078@> @I98BW-LP@->NAME = <Percy William Growdon /BALDOCK/> @I98BW-LP@->NAME->GIVN = <Percy William Growdon> @I98BW-LP@->NAME->SURN = <BALDOCK> @I98BW-LP@->SEX = <M> @I98BW-LP@->SOUR = <@S01@> ------------------- @I98BW-KJ@ = <INDI> @I98BW-KJ@->AFN = <98BW-KJ> @I98BW-KJ@->BIRT = <> @I98BW-KJ@->BIRT->DATE = <1878> @I98BW-KJ@->BIRT->PLAC = <Geelong, Victoria, Australia> @I98BW-KJ@->FAMC = <@F1794078@> @I98BW-KJ@->NAME = <Arthur Jabez /BALDOCK/> @I98BW-KJ@->NAME->GIVN = <Arthur Jabez> @I98BW-KJ@->NAME->SURN = <BALDOCK> @I98BW-KJ@->SEX = <M> @I98BW-KJ@->SOUR = <@S01@> ------------------- @I98BW-P7@ = <INDI> @I98BW-P7@->AFN = <98BW-P7> @I98BW-P7@->BIRT = <> @I98BW-P7@->BIRT->DATE = <1887> @I98BW-P7@->BIRT->PLAC = <Geelong, Victoria, Australia> @I98BW-P7@->DEAT = <> @I98BW-P7@->DEAT->DATE = <1907> @I98BW-P7@->DEAT->PLAC = <> @I98BW-P7@->FAMC = <@F1794078@> @I98BW-P7@->NAME = <Gladys Claudine /BALDOCK/> @I98BW-P7@->NAME->GIVN = <Gladys Claudine> @I98BW-P7@->NAME->SURN = <BALDOCK> @I98BW-P7@->SEX = <F> @I98BW-P7@->SOUR = <@S01@> ------------------- @I98BW-N2@ = <INDI> @I98BW-N2@->AFN = <98BW-N2> @I98BW-N2@->BIRT = <> @I98BW-N2@->BIRT->DATE = <1884> @I98BW-N2@->BIRT->PLAC = <Geelong, Victoria, Australia> @I98BW-N2@->DEAT = <> @I98BW-N2@->DEAT->DATE = <25 Oct 1951> @I98BW-N2@->DEAT->PLAC = <> @I98BW-N2@->FAMC = <@F1794078@> @I98BW-N2@->NAME = <Clive Alfred /BALDOCK/> @I98BW-N2@->NAME->GIVN = <Clive Alfred> @I98BW-N2@->NAME->SURN = <BALDOCK> @I98BW-N2@->SEX = <M> @I98BW-N2@->SOUR = <@S01@> ------------------- @I98BW-MV@ = <INDI> @I98BW-MV@->AFN = <98BW-MV> @I98BW-MV@->BIRT = <> @I98BW-MV@->BIRT->DATE = <1881> @I98BW-MV@->BIRT->PLAC = <Geelong, Victoria, Australia> @I98BW-MV@->FAMC = <@F1794078@> @I98BW-MV@->NAME = <Lawrence /BALDOCK/> @I98BW-MV@->NAME->GIVN = <Lawrence> @I98BW-MV@->NAME->SURN = <BALDOCK> @I98BW-MV@->SEX = <M> @I98BW-MV@->SOUR = <@S01@> ------------------- @F1794078@ = <FAM> @F1794078@->CHIL = <@I98BW-MV@> @F1794078@->HUSB = <@I3GLR-Z3@> @F1794078@->MARR = <> @F1794078@->MARR->DATE = <20 Apr 1876> @F1794078@->MARR->PLAC = <Geelong, Victoria, Australia> @F1794078@->WIFE = <@I98BW-JC@> ------------------- @F524078@ = <FAM> @F524078@->CHIL = <@I3GLR-Z3@> @F524078@->HUSB = <@I3GLR-4R@> @F524078@->WIFE = <@I3GLR-5X@> ------------------- @F1794093@ = <FAM> @F1794093@->CHIL = <@I98BW-JC@> @F1794093@->HUSB = <@I98BX-N6@> @F1794093@->WIFE = <@I98BX-PC@> ------------------- TRLR = <>
On 2/22/2013 5:33 AM, Tony Proctor wrote: > > A simple off-the-cuff example, with an arbitrary syntax, just to illustrate > the difference Janis. This bit of script wants to look at all the events in > my timeline, then look at all the people sharing those name events, and > select the ones whose name has the element "Jesson". > > > Person me = New Person("Tony Proctor"); > for (Event e: me.AllEvents()) { > for (Person other: e.AllPersons()) { > if (other.name().contains("Jesson")) { > ...do something with this other person... > } > } > } If you haven't already, you should take a look at Lifelines, which has its own language for doing things just like the above. -- T.M. Sommers -- ab2sb
On Fri, Feb 22, 2013 at 10:33:38AM -0000, Tony Proctor wrote: > >A simple off-the-cuff example, with an arbitrary syntax, just to illustrate >the difference Janis. This bit of script wants to look at all the events in >my timeline, then look at all the people sharing those name events, and >select the ones whose name has the element "Jesson". > > >Person me = New Person("Tony Proctor"); >for (Event e: me.AllEvents()) { > for (Person other: e.AllPersons()) { > if (other.name().contains("Jesson")) { > ...do something with this other person... > } > } >} As for sql, this kind of thing can be done fairly easily inside a database which allows so-called 'stored procedures'.[1] I use Postgresql with its native procedural language, but perl, python, etc, procedures could be used instead. I don't see the need here for an object oriented approach. For sql, the natural thing to do is to build up a collection of useful views on the data, together with a collection of useful functions. Martin [1] In fact, you could do it in directly in sql, using views, temporary tables, or common table expressions. You are, after all, just looking to work on some subset of people.