On Sun, 24 Feb 2013 11:28:37 -0600, Charlie Hoffpauir <invalid@invalid.com> wrote: >On Sun, 24 Feb 2013 16:25:51 GMT, Eagle@bellsouth.net (J. Hugh >Sullivan) wrote: <snip> >> >>Using the slash and two apostrophes the program gave a warning and a >>list of special characters that should not be used (all except parens >>and slashes). It also gave the option to ignore the problem for the >>current individual. Interestingly it objected to TWO apostrophes but >>not one. >> >>When I looked at the help file for Legacy on nicknames it gave a long >>list of nicknames but no indication that quote marks would be treated >>as nicknames. >> >>The S & R in the program does not appear to work. Even if it did how >>would it know to replace the first " with an open ( and the second " >>with a closed )? >> >>I would apologize again for my naivete in using the program BUT, it >>seems to have accomplished one thing. Because of the variety in >>programs no standardized GEDCOM will ever serve a universal purpose. >>Thus the discussion on the best program to manipulate a GEDCOM may be >>of use. >> >>Hugh > >My limited tests on the free version confirm everything you've said. >Hindsight tells me you should have started putting the called name in >single quotes, or parens, or quite possibly anything but quotes. Given >the present situation, possibly you could modify the routine that >Dennis Bieber wrote in Powershell to replace the double quotes with >singles, and eliminate the nickname field, then re-import the >corrected GEDCOM back into Legacy.(you may not want to do this if you >have entered "real" nicknames in your file). However, if you have >"real" nicknames, how did you get them entered? I searched everywhere >I know to look and I can't see any way to intentionally enter a >nickname, other than the (undocumented) way of using double quotes. >This all makes me glad I now only use one genealogy program. I asked how to enter nicknames in the Legacy user group (actually in the wrong group, but got an answer anyway): ============================================= If John Doe's nickname is Jack you would enter in the Forename field: John "Jack". When you create a Report, on the Report Options>General tab there are two check boxes for "Use quoted names for narratives" and "Remove quoted names." If you check both of these, narratives will begin with the full name John Doe and thereafter refer to him as Jack. ============================================== Which is consistant with what we're seeing in the GEDCOM results.
On Sun, 24 Feb 2013 16:25:51 GMT, Eagle@bellsouth.net (J. Hugh Sullivan) wrote: >On Sun, 24 Feb 2013 07:09:01 -0600, Charlie Hoffpauir ><invalid@invalid.com> wrote: > >>Now if Hugh doesn't use the quote marks as defining a nickname for any >>other entries, the editing the GEDCOM to correct the problem becomes >>much easier.... ie, simply remove any instance of the NICK field. > >I entered my name five times with the middle name in (1) parens (2) >slashes (3) apostrophes (4) quotes (5) single apostrophe. The results >are below. > >0 HEAD >1 SOUR Legacy >2 VERS 7.5 >2 NAME Legacy (R) >2 CORP Millennia Corp. >3 ADDR PO Box 9410 >4 CONT Surprise, AZ 85374 >1 DEST Legacy >1 DATE 24 Feb 2013 >1 SUBM @S0@ >1 FILE C:\My Documents\GED\GECOM Trial.ged >1 GEDC >2 VERS 5.5.1 >2 FORM LINEAGE-LINKED >1 CHAR ANSEL >0 @S0@ SUBM >1 NAME Not Given >0 @I1@ INDI >1 NAME James (Hugh) /SULLIVAN/ >2 GIVN James (Hugh) >2 SURN SULLIVAN >1 SEX M >1 _UID E4099F3324514CD88BFB63DBCC224CFB51C2 >1 CHAN >2 DATE 24 Feb 2013 >3 TIME 09:39 >0 @I2@ INDI >1 NAME James /Hugh/ /SULLIVAN/ >2 GIVN James /Hugh/ >2 SURN SULLIVAN >1 SEX M >1 _UID E1898F2341DF42EEB2A50ECDA7C69C05ACD6 >1 CHAN >2 DATE 24 Feb 2013 >3 TIME 09:41 >0 @I3@ INDI >1 NAME James 'Hugh' /SULLIVAN/ >2 GIVN James 'Hugh' >2 SURN SULLIVAN >1 SEX M >1 _UID 8BAD81023233425293E5437D496B205212CF >1 CHAN >2 DATE 24 Feb 2013 >3 TIME 10:07 >0 @I4@ INDI >1 NAME James "Hugh" /SULLIVAN/ >2 GIVN James "Hugh" >2 SURN SULLIVAN >2 NICK Hugh >1 SEX M >1 _UID B33B642F082E42A6BD0B4CDAF4A0A506CC83 >1 CHAN >2 DATE 24 Feb 2013 >3 TIME 10:01 >0 @I5@ INDI >1 NAME James 'Hugh /SULLIVAN/ >2 GIVN James 'Hugh >2 SURN SULLIVAN >1 SEX M >1 _UID AA32319050E5438982722A012BAC0BA14028 >1 CHAN >2 DATE 24 Feb 2013 >3 TIME 10:10 >0 TRLR > >Using the slash and two apostrophes the program gave a warning and a >list of special characters that should not be used (all except parens >and slashes). It also gave the option to ignore the problem for the >current individual. Interestingly it objected to TWO apostrophes but >not one. > >When I looked at the help file for Legacy on nicknames it gave a long >list of nicknames but no indication that quote marks would be treated >as nicknames. > >The S & R in the program does not appear to work. Even if it did how >would it know to replace the first " with an open ( and the second " >with a closed )? > >I would apologize again for my naivete in using the program BUT, it >seems to have accomplished one thing. Because of the variety in >programs no standardized GEDCOM will ever serve a universal purpose. >Thus the discussion on the best program to manipulate a GEDCOM may be >of use. > >Hugh My limited tests on the free version confirm everything you've said. Hindsight tells me you should have started putting the called name in single quotes, or parens, or quite possibly anything but quotes. Given the present situation, possibly you could modify the routine that Dennis Bieber wrote in Powershell to replace the double quotes with singles, and eliminate the nickname field, then re-import the corrected GEDCOM back into Legacy.(you may not want to do this if you have entered "real" nicknames in your file). However, if you have "real" nicknames, how did you get them entered? I searched everywhere I know to look and I can't see any way to intentionally enter a nickname, other than the (undocumented) way of using double quotes. This all makes me glad I now only use one genealogy program.
Martin Steer wrote: > >> And the mention of January 1662-3 above should be a hint that dates are >> another can of worms better dealt with by an OO approach > > Given that dates are just data, I don't get this. You'll have to tell me > why this is so. 1. The above date is prior to the adoption of Gregorian dates in England & her colonies. The year, especially in church usage, was often taken as starting on March 25th. Sometimes the first three months are simply given as being in the same year as the preceding December (1662 in the above case), sometimes with a dual year as above and sometimes as new-style dates (1663 in my example). Conventional RDBMS only deals with the new-style (hopefully it will also know about September 1752 - run cal if you don't know about that one). 2. Dates aren't necessarily precise. Using the Civil Registration indexes for the UK, for instance, one will simply get a month and this is the month at the end of which quarterly returns were sent in to the central office so that March 1845 means Q1 1845 and Q1 could, in fact, include events from late December 1844. 3. Early legal documents might be dated by regnal year. e.g. Elizabeth I succeeded in November 1558 so an event in December 1559 and one in September 1560 would both be in 2 Elizabeth. 4. Documents might be undated. For instance one document which I think names one of my earliest ancestors by surname is undated but the archivists list it as "early 14th century". There's evidence which suggests to me that it should collate before a document with a 1295 date. Given the fact that the archivists' date is only an approximation this isn't really a contradiction but a dating system should recognise that. -- Ian The Hotmail address is my spam-bin. Real mail address is iang at austonley org uk
On Sun, 24 Feb 2013 00:17:51 -0500, "T.M. Sommers" <tmsommers2@gmail.com> wrote: >On 2/18/2013 11:20 AM, J. Hugh Sullivan wrote: >> >> I use " marks to identify middle names of people who go by the middle >> name. My problem is switching from one genie program to another. The >> second reads the name in quotes as a nickname and shows it twice... >> James "Hugh" "Hugh" Sullivan. > >Why not use 'James Hugh "Hugh" Sullivan'? It clearly indicates that >Hugh is not only your real name (there is a 'Claude "Pete" Davis' in my >database, who was called 'Pete' although that was no part of his "real" >name), but it is the name you used. The duplication may look a little >odd, but it is unambiguous, and, I think, clear to anyone. That's an interesting suggestion, but I don't think it addresses the problem that the OP was seeing. If I understand his problem correctly, entering the name(s) as 'James Hugh "Hugh" Sullivan' would simply cause the name to appear as 'James Hugh "Hugh" "Hugh" Sullivan' when it transferred into the other program. I downloaded a copy of the free version of Legacy, and imported my data into it to play around with this. Legacy does indeed put the name in quote marks into the GEDCOM in a field labeled as "NICK". The interesting thing is, I can't find anywhere in Legacy where you can actually enter a Nickname. It's easy to add another name, but it shows up as an alternate name. (But I'm not a Legacy user, so I'm not familiar with the program.) My guess is that Legacy (somewhere) advocates the user entering a nickname in quotes.... if that's the case, then the OP is actually creating the problem by the technique he uses. Now if Hugh doesn't use the quote marks as defining a nickname for any other entries, the editing the GEDCOM to correct the problem becomes much easier.... ie, simply remove any instance of the NICK field.
On Fri, Feb 22, 2013 at 10:33:38AM -0000, Tony Proctor wrote: > >A simple off-the-cuff example, with an arbitrary syntax, just to illustrate >the difference Janis. This bit of script wants to look at all the events in >my timeline, then look at all the people sharing those name events, and >select the ones whose name has the element "Jesson". > > >Person me = New Person("Tony Proctor"); >for (Event e: me.AllEvents()) { > for (Person other: e.AllPersons()) { > if (other.name().contains("Jesson")) { > ...do something with this other person... > } > } >} As for sql, this kind of thing can be done fairly easily inside a database which allows so-called 'stored procedures'.[1] I use Postgresql with its native procedural language, but perl, python, etc, procedures could be used instead. I don't see the need here for an object oriented approach. For sql, the natural thing to do is to build up a collection of useful views on the data, together with a collection of useful functions. Martin [1] In fact, you could do it in directly in sql, using views, temporary tables, or common table expressions. You are, after all, just looking to work on some subset of people.
On 12/5/2012 5:18 AM, Ian Goddard wrote: > I'd hoped to get some specific ideas to compare with my own idea about > data structure which is: > > Data types > > Person: > Surname : string > Forenames : list of string (probably implement in Free Pascal so that > would be TStringList) > Qualification : string (e.g. Executors of) Isn't this really a case of a many-to-many relationship? A person can own (for example) many parcels, and a parcel can have many joint owners. So shouldn't the Qualification field be in the table that links the land to the person? > PersonList : list of Name > > StatuteMeasure > Acres : smallint > Roods : smallint > Perches : smallint > > OldMoney > Pounds : smallint > Shillings : smallint > Pence : smallint (assumes that further investigation doesn't reveal > ha'pence or farthings!) Perhaps store the amount as a number of farthings, so that the database does not need to be changed if you find some recorded that way. The user interface could take care of converting back and forth. The same could be done with the size above. -- T.M. Sommers -- ab2sb
On 2/18/2013 11:20 AM, J. Hugh Sullivan wrote: > > I use " marks to identify middle names of people who go by the middle > name. My problem is switching from one genie program to another. The > second reads the name in quotes as a nickname and shows it twice... > James "Hugh" "Hugh" Sullivan. Why not use 'James Hugh "Hugh" Sullivan'? It clearly indicates that Hugh is not only your real name (there is a 'Claude "Pete" Davis' in my database, who was called 'Pete' although that was no part of his "real" name), but it is the name you used. The duplication may look a little odd, but it is unambiguous, and, I think, clear to anyone. -- T.M. Sommers -- ab2sb
On 2/21/2013 7:19 PM, Dennis Lee Bieber wrote: > On Thu, 21 Feb 2013 00:01:22 -0500, "T.M. Sommers" > <tmsommers2@gmail.com> declaimed the following in > soc.genealogy.computing: > >> You left out the NAME tag. > > Whoops... But I do show enough that it should be obvious where it > would go <G> Between $1 and $2 on the first replacement set. Sure. I just mentioned it in case someone tried to use it and ended up mangling their data. -- T.M. Sommers -- ab2sb
On 2/22/2013 5:33 AM, Tony Proctor wrote: > > A simple off-the-cuff example, with an arbitrary syntax, just to illustrate > the difference Janis. This bit of script wants to look at all the events in > my timeline, then look at all the people sharing those name events, and > select the ones whose name has the element "Jesson". > > > Person me = New Person("Tony Proctor"); > for (Event e: me.AllEvents()) { > for (Person other: e.AllPersons()) { > if (other.name().contains("Jesson")) { > ...do something with this other person... > } > } > } If you haven't already, you should take a look at Lifelines, which has its own language for doing things just like the above. -- T.M. Sommers -- ab2sb
Martin Steer wrote: > On Fri, Feb 22, 2013 at 10:33:38AM -0000, Tony Proctor wrote: >> >> A simple off-the-cuff example, with an arbitrary syntax, just to >> illustrate >> the difference Janis. This bit of script wants to look at all the >> events in >> my timeline, then look at all the people sharing those name events, and >> select the ones whose name has the element "Jesson". >> >> >> Person me = New Person("Tony Proctor"); >> for (Event e: me.AllEvents()) { >> for (Person other: e.AllPersons()) { >> if (other.name().contains("Jesson")) { >> ...do something with this other person... >> } >> } >> } > > As for sql, this kind of thing can be done fairly easily inside a > database which allows so-called 'stored procedures'.[1] I use Postgresql > with its native procedural language, but perl, python, etc, procedures > could be used instead. > > I don't see the need here for an object oriented approach. Unfortunately, neither did the designers of GEDCOM. But take a fairly basic element of genealogy, the personal name. For a start have a read through http://www.w3.org/International/questions/qa-personal-names Then take into account the fact that even within one culture naming conventions change over time. In the medieval period, for instance it was common in the Norman empire to have a system similar to the Icelandic where William's son John would be called John FitzWilliam before FitzWilliam became an inherited surname. In addition we have the problem of illegitimate children, e.g. from the parish register of Kirkburton, April 1662 "Joshua sonne of Alice Jessop and Richard Wareing bapt the first day." Is Joshua subsequently going to be known as Jessop or Wareing? We don't know but if we were looking at some query such as Tony suggested we'd want this event to come up irrespective of whether the name contained Jessop, Wareing or even Waring. And going back a few months in the same register we have the following for December 1661: "William sonne of Grace Taylor otherwise Jessop & Regnald or Leonard Wright was bapt the 18th". This introduces the fact that we sometimes had alias surnames (otherwise Jessop) which could also be inherited and disputed paternity. Most pages of a PR will include an illegitimacy; the second, pathological instance is much rarer (although there was another instance of a child with alternate father's surnames in January 1662-3) it's clear that what seems at first sight to be a basic thing like a name does not, in fact, sit well with relational design. It would, in fact, have been much better to have taken an OO approach so that one could have a base class of Name and as many sub-classes as required to hold different implementations. And the mention of January 1662-3 above should be a hint that dates are another can of worms better dealt with by an OO approach > For sql, the > natural thing to do is to build up a collection of useful views on the > data, together with a collection of useful functions. > -- Ian The Hotmail address is my spam-bin. Real mail address is iang at austonley org uk
Tony Proctor wrote: > "Steve Hayes" <hayesstw@telkomsa.net> wrote in message > news:h4p6i8l277lgaqsbl8ad70jkek4injed62@4ax.com... >> On Tue, 19 Feb 2013 11:02:25 +0000, Ian Goddard <goddai01@hotmail.co.uk> >> wrote: >> >> I think you are being too dismissive. >> >> I'm not taking about XML files, but about Gedcom files, and I'm not >> talking >> about a DOM, but about AWK. >> >> And I'm not taking about some hypothetical Platonic ideal of the perfect >> Gedcom replacement, but about the actual Gedcom files that millions of >> genealogists have on their computers now. >> >> These "you can't get there from here" comments are really not very >> helpful. >> >> > > I gave advice based on experience Steve. For every AWK file you write, I can > contrive a GEDCOM example that will break it. Another post in this thread > mentioned ambiguities, and having to assume the availability of special > characters that won't occur in names or notes. The simplest approach to awkward corner cases is probably to just use an interactive editor - notepad, vi, emacs or whatever. There used to be an old saw along the lines of don't use C if you can use awk, don't use awk if you can use sed etc. Nowadays I suppose perl & python would have to be thrown in somewhere (although there might be a case for saying don't use perl if you can use C ;) The principle being that there's a trade-off between the simplicity of expression of your requirement and the scope of the tool. If the problem is within the scope of, say, tr, there's no point in risking screwing up when using something more advanced but trickier to address. -- Ian The Hotmail address is my spam-bin. Real mail address is iang at austonley org uk
J. Hugh Sullivan wrote: > On Wed, 20 Feb 2013 09:52:47 -0500, singhals <singhals@erols.com> > wrote: > >> (G) The loose nut on the keyboard? > > PICNIC > > Problem in chair not in computer. Or PEBKAC - Problem exists between keyboard and chair. -- Ian The Hotmail address is my spam-bin. Real mail address is iang at austonley org uk
Charlie Hoffpauir wrote: > On Mon, 18 Feb 2013 23:47:46 -0500, Denis Beauregard > <denis.b-at-francogene.com@fr.invalid> wrote: > > >>> In general, the difference is that the original problem was to replace >>> quote marks with some other character, when quote marks appear in the >>> name field, for an "arbitrary" name. That is, it is assumed that some >>> portion of the given names in the file, (but not all) have had quote >>> marks surrounding one or more of the given names, and these are the >>> characters that must be changed. >> >> So, this problem consists in : >> >> replacing "Hugh" "Hugh" by Hugh (or "Hugh") in all lines beginning >> with >> 1 NAME >> >> where "Hugh" "Hugh" would be any pair of similar names, i.e. >> "Denis" "Denis" or "Charles" "Charles". >> >>From my experience with Brief, a text editor with regular expressions, >> I don't know how to define a duplicated word. Brief was not using the >> standard regular expressions but with it, something like "$1" "$1" >> was not accepted... >> >> >> Denis > > Not exactly. The duplication of names only occurs when the GEDCOM is > imported into RM and then RM displays the name, as in a report. The > presumption is that this is caused because Hugh surrounded some given > names with Quote marks, doing so to indicate that these given names > were what the person was commonly known by. > > So a portion of the GEDCOM might look like this: > > 0 @I1@ INDI > 1 NAME Gerald "Bernard" /Landry/ > 2 GIVN Gerald "Bernard" > 2 SURN Landry > 1 SEX M > 1 BIRT > 2 DATE 9 MAR 1937 > 2 PLAC St-Jacques > 0 @I2@ INDI > 1 NAME Bernard /St-Jacques/ > 2 GIVN Bernard > 2 SURN St-Jacques > 1 SEX M > 1 FAMS @F1@ > 1 FAMC @F2@ > > And we want to process it so that it looks like this: > > 0 @I1@ INDI > 1 NAME Gerald ~Bernard~ /Landry/ > 2 GIVN Gerald ~Bernard~ > 2 SURN Landry > 1 SEX M > 1 BIRT > 2 DATE 9 MAR 1937 > 2 PLAC St-Jacques > 0 @I2@ INDI > 1 NAME Bernard /St-Jacques/ > 2 GIVN Bernard > 2 SURN St-Jacques > 1 SEX M > 1 FAMS @F1@ > 1 FAMC @F2@ > > Where the tilde I used might be any other character of our choosisng > as long as it were not a character that would also appear elsewhere in > the Given name fields of the GEDCOM > > Once the modified GEDCOM was imported into RM, Hugh would then use the > search/replace function on the Given name field to change the tilde > back to quote marks. > > So that is this specific situation.... but there are probably > infinitely more situations where modifications to a GEDCOM might be > needed to transfer data from one particular Genealogy program to > another.... hence the interest in all the solutions proposed. Steve > has indicated an interest in development of a series of AWK utilities > for this purpose. There once was a quite useful program for modifying > GEDCOMs called Gedcom Explorer (GEDX) that utilized a base code with > user defined macros to accomplish the same purpose. Perhaps something > along that line could be developed. > In Unixland tr is the command: http://unixhelp.ed.ac.uk/CGI/man-cgi?tr+1 -- Ian The Hotmail address is my spam-bin. Real mail address is iang at austonley org uk
Am 22.02.2013 09:49, schrieb Tony Proctor: > "Kerry Raymond" <kraymond@iprimus.com.au> wrote in message > news:AMSdnSntPaO0LbvMnZ2dnUVZ_jidnZ2d@westnet.com.au... >> [...] >> >> Awk is not the perfect tool for every job, but it's handy to have in the >> tool kit. >> >> Kerry >> > > I have used AWK too Kerry so I know it can be useful for automating certain > manipulations. My point in this thread was merely that it is designed for > manipulating _text_. Not only that; it's also used to evaluate data, similar to what you seem to describe below. > Depending on what you want to do with the data files > then that may be totally adequate. > > However, I really like the idea of being able to script manipulation of the > genealogical entities as opposed to the raw text. I have used similar > systems outside of the field of genealogy and they can be remarkably > powerful. For instance, if your genealogy software doesn't provide the type > of query you need to execute then such a tool could compile your data file > into a number of objects in memory (e.g. Persons, Places, Events, etc) and > allow you to iterate through them, test properties, correlate relationships, > make adjustments to them, etc. Iterate, test, correlate; this is something I regularily do with awk. You know that you have arrays to store the data in memory, associative arrays usable for relations, and loops and conditions to operate on that data. I am sure you can construct complex requirements that will lead to more code than what is typical for awk. But I've also got the impression that you might not be fully aware of awk's possibilities? For another (non-awk) approach organise your data in a database and operate on it using SQL alone. Janis > It would need a simple scripting language but > there are existing precedents for this that could easily be adapted. > > [...]
"Janis Papanagnou" <janis_papanagnou@hotmail.com> wrote in message news:kg7f2e$qjg$1@speranza.aioe.org... > Am 22.02.2013 09:49, schrieb Tony Proctor: >> "Kerry Raymond" <kraymond@iprimus.com.au> wrote in message >> news:AMSdnSntPaO0LbvMnZ2dnUVZ_jidnZ2d@westnet.com.au... >>> [...] >>> >>> Awk is not the perfect tool for every job, but it's handy to have in the >>> tool kit. >>> >>> Kerry >>> >> >> I have used AWK too Kerry so I know it can be useful for automating >> certain >> manipulations. My point in this thread was merely that it is designed for >> manipulating _text_. > > Not only that; it's also used to evaluate data, similar to what you > seem to describe below. > >> Depending on what you want to do with the data files >> then that may be totally adequate. >> >> However, I really like the idea of being able to script manipulation of >> the >> genealogical entities as opposed to the raw text. I have used similar >> systems outside of the field of genealogy and they can be remarkably >> powerful. For instance, if your genealogy software doesn't provide the >> type >> of query you need to execute then such a tool could compile your data >> file >> into a number of objects in memory (e.g. Persons, Places, Events, etc) >> and >> allow you to iterate through them, test properties, correlate >> relationships, >> make adjustments to them, etc. > > Iterate, test, correlate; this is something I regularily do with awk. > You know that you have arrays to store the data in memory, associative > arrays usable for relations, and loops and conditions to operate on > that data. I am sure you can construct complex requirements that will > lead to more code than what is typical for awk. But I've also got the > impression that you might not be fully aware of awk's possibilities? > > For another (non-awk) approach organise your data in a database and > operate on it using SQL alone. > > Janis > >> It would need a simple scripting language but >> there are existing precedents for this that could easily be adapted. >> >> [...] A simple off-the-cuff example, with an arbitrary syntax, just to illustrate the difference Janis. This bit of script wants to look at all the events in my timeline, then look at all the people sharing those name events, and select the ones whose name has the element "Jesson". Person me = New Person("Tony Proctor"); for (Event e: me.AllEvents()) { for (Person other: e.AllPersons()) { if (other.name().contains("Jesson")) { ...do something with this other person... } } } Tony Proctor
"Janis Papanagnou" <janis_papanagnou@hotmail.com> wrote in message news:kg7f2e$qjg$1@speranza.aioe.org... > Am 22.02.2013 09:49, schrieb Tony Proctor: > > Iterate, test, correlate; this is something I regularily do with awk. > You know that you have arrays to store the data in memory, associative > arrays usable for relations, and loops and conditions to operate on > that data. I am sure you can construct complex requirements that will > lead to more code than what is typical for awk. But I've also got the > impression that you might not be fully aware of awk's possibilities? > > For another (non-awk) approach organise your data in a database and > operate on it using SQL alone. > > Janis Thanks Janis but I'm not talking about record-by-record processing, and firing of actions based on conditions for each record [Yes, I have used AWK a lot]. I'm thinking of more complex genealogical entities that have to be compiled into 'objects' before they can be manipulated. SQL (even the versions with procedural extensions) is similarly too low-level Tony Proctor
Despite all the scorn heaped on the idea, I must say that I often use AWK for quickly fixing problems with GEDCOMs (particularly ones given to me that are not importing correctly), removing information of certain kinds before passing it on, etc. It's a quick way to get certain tasks done, like get rid of all those UPPERCASE SURNAMES. You do have to have a good understanding of what your GEDCOM file actually contains as otherwise you might get some unintended consequences. This might make it difficult to have a shareable library of awk scripts that others can use "out of the box" given the not-as-standard-as-it-should-be nature of GEDCOM, but I can't see a problem with having a shareable library of awk scripts that others can use as a starting point for their own tweaking. Awk is not the perfect tool for every job, but it's handy to have in the tool kit. Kerry
"Kerry Raymond" <kraymond@iprimus.com.au> wrote in message news:AMSdnSntPaO0LbvMnZ2dnUVZ_jidnZ2d@westnet.com.au... > Despite all the scorn heaped on the idea, I must say that I often use AWK > for quickly fixing problems with GEDCOMs (particularly ones given to me > that are not importing correctly), removing information of certain kinds > before passing it on, etc. It's a quick way to get certain tasks done, > like get rid of all those UPPERCASE SURNAMES. You do have to have a good > understanding of what your GEDCOM file actually contains as otherwise you > might get some unintended consequences. This might make it difficult to > have a shareable library of awk scripts that others can use "out of the > box" given the not-as-standard-as-it-should-be nature of GEDCOM, but I > can't see a problem with having a shareable library of awk scripts that > others can use as a starting point for their own tweaking. > > Awk is not the perfect tool for every job, but it's handy to have in the > tool kit. > > Kerry > I have used AWK too Kerry so I know it can be useful for automating certain manipulations. My point in this thread was merely that it is designed for manipulating _text_. Depending on what you want to do with the data files then that may be totally adequate. However, I really like the idea of being able to script manipulation of the genealogical entities as opposed to the raw text. I have used similar systems outside of the field of genealogy and they can be remarkably powerful. For instance, if your genealogy software doesn't provide the type of query you need to execute then such a tool could compile your data file into a number of objects in memory (e.g. Persons, Places, Events, etc) and allow you to iterate through them, test properties, correlate relationships, make adjustments to them, etc. It would need a simple scripting language but there are existing precedents for this that could easily be adapted. In summary, this could be a truly powerful tool, but it deserves a proper implementation. Also, if-and-when we get a modern data standard then I would like to see this being a supplemental part of it, although I couldn't justify my own time on a GEDCOM implementation. Maybe Steve's (OP) requirements are more specific than this - e.g. a programmatic edit - in which case I apologise for looking beyond that and hijacking the thread. However, it has spurred me to a potential involvement at a later date so thanks for that. Tony Proctor
On 2/22/2013 4:33 AM, Tony Proctor wrote: > "Janis Papanagnou" <janis_papanagnou@hotmail.com> wrote in message > news:kg7f2e$qjg$1@speranza.aioe.org... >> Am 22.02.2013 09:49, schrieb Tony Proctor: >>> "Kerry Raymond" <kraymond@iprimus.com.au> wrote in message >>> news:AMSdnSntPaO0LbvMnZ2dnUVZ_jidnZ2d@westnet.com.au... >>>> [...] >>>> >>>> Awk is not the perfect tool for every job, but it's handy to have in the >>>> tool kit. >>>> >>>> Kerry >>>> >>> >>> I have used AWK too Kerry so I know it can be useful for automating >>> certain >>> manipulations. My point in this thread was merely that it is designed for >>> manipulating _text_. >> >> Not only that; it's also used to evaluate data, similar to what you >> seem to describe below. >> >>> Depending on what you want to do with the data files >>> then that may be totally adequate. >>> >>> However, I really like the idea of being able to script manipulation of >>> the >>> genealogical entities as opposed to the raw text. I have used similar >>> systems outside of the field of genealogy and they can be remarkably >>> powerful. For instance, if your genealogy software doesn't provide the >>> type >>> of query you need to execute then such a tool could compile your data >>> file >>> into a number of objects in memory (e.g. Persons, Places, Events, etc) >>> and >>> allow you to iterate through them, test properties, correlate >>> relationships, >>> make adjustments to them, etc. >> >> Iterate, test, correlate; this is something I regularily do with awk. >> You know that you have arrays to store the data in memory, associative >> arrays usable for relations, and loops and conditions to operate on >> that data. I am sure you can construct complex requirements that will >> lead to more code than what is typical for awk. But I've also got the >> impression that you might not be fully aware of awk's possibilities? >> >> For another (non-awk) approach organise your data in a database and >> operate on it using SQL alone. >> >> Janis >> >>> It would need a simple scripting language but >>> there are existing precedents for this that could easily be adapted. >>> >>> [...] > > A simple off-the-cuff example, with an arbitrary syntax, just to illustrate > the difference Janis. This bit of script wants to look at all the events in > my timeline, then look at all the people sharing those name events, and > select the ones whose name has the element "Jesson". > > > Person me = New Person("Tony Proctor"); > for (Event e: me.AllEvents()) { > for (Person other: e.AllPersons()) { > if (other.name().contains("Jesson")) { > ...do something with this other person... > } > } > } See the script I posted earlier and copied below which reads each record of your file into an array and then prints that record. If I understand you correctly, the above can be done in awk by reading the whole file into one array of records and then operating on it in the END section. Ed. $ cat parseGedcom.awk function prtRec(rec, tag,numTags,tagNr) { printf "%s",recSep numTags = asorti(rec,tags) for (tagNr=1; tagNr <= numTags; tagNr++) { tag = tags[tagNr] print tag " = <" rec[tag] ">" } recSep = "-------------------\n" } { level = $1 if (level == 0) { prtRec(rec) delete rec maxLevel = 0 } tag = $2 level2tag[level] = tag fullTag="" for (i=0;i<=level;i++) { fullTag = (fullTag ? fullTag "->" : "") level2tag[i] } data = gensub(/^[[:space:]]*([^[:space:]]+[[:space:]]*){2}/,"","") rec[fullTag] = data maxLevel = (level > maxLevel ? level : maxLevel) } END { prtRec(rec) } $ $ gawk -f parseGedcom.awk file ------------------- HEAD = <> HEAD->CHAR = <ANSEL> HEAD->DATE = <20 APR 2002> HEAD->DATE->TIME = <2:58:56> HEAD->DEST = <PAF> HEAD->FILE = <GEDCOM4.ged> HEAD->GEDC = <> HEAD->GEDC->FORM = <LINEAGE-LINKED> HEAD->GEDC->VERS = <5.5> HEAD->SOUR = <ANSTFILE> HEAD->SOUR->CORP = <The Church of Jesus Christ of Latter-day Saints> HEAD->SOUR->CORP->ADDR = <50 East North Temple Street> HEAD->SOUR->CORP->ADDR->CONT = <Salt Lake City, Utah 84150> HEAD->SOUR->DATA = <Ancestral File> HEAD->SOUR->DATA->COPR = <Copyright (c) 1987, June 1998> HEAD->SOUR->DATA->DATE = <5 January 1998> HEAD->SOUR->NAME = <Ancestral File (R)> HEAD->SOUR->VERS = <4.19> HEAD->SUBM = <@SUB01@> HEAD->SUBN = <@N01@> ------------------- @SUB01@ = <SUBM> @SUB01@->ADDR = <50 East North Temple Street> @SUB01@->ADDR->CONT = <Salt Lake City, Utah 84150> @SUB01@->NAME = <Created by FamilySearch (TM) Internet Genealogy Service> ------------------- @S01@ = <SOUR> @S01@->AUTH = <The Church of Jesus Christ of Latter-day Saints> @S01@->PUBL = <Copyright (c) 1987, June 1998, data as of 5 January 1998> @S01@->REPO = <@R01@> @S01@->TITL = <Ancestral File (R)> ------------------- @R01@ = <REPO> @R01@->ADDR = <35 N West Temple Street> @R01@->ADDR->CONT = <Salt Lake City, Utah 84150 USA> @R01@->NAME = <Family History Library> ------------------- @N01@ = <SUBN> @N01@->DESC = <2> @N01@->ORDI = <N> ------------------- @I3GLR-Z3@ = <INDI> @I3GLR-Z3@->AFN = <3GLR-Z3> @I3GLR-Z3@->BIRT = <> @I3GLR-Z3@->BIRT->DATE = <1 Jul 1850> @I3GLR-Z3@->BIRT->PLAC = <Geelong, Vic, Astl> @I3GLR-Z3@->FAMC = <@F524078@> @I3GLR-Z3@->FAMS = <@F1794078@> @I3GLR-Z3@->NAME = <Thomas William /BALDOCK/> @I3GLR-Z3@->NAME->GIVN = <Thomas William> @I3GLR-Z3@->NAME->SURN = <BALDOCK> @I3GLR-Z3@->SEX = <M> @I3GLR-Z3@->SOUR = <@S01@> ------------------- @I3GLR-4R@ = <INDI> @I3GLR-4R@->AFN = <3GLR-4R> @I3GLR-4R@->FAMS = <@F524078@> @I3GLR-4R@->NAME = <Thomas /BALDOCK/> @I3GLR-4R@->NAME->GIVN = <Thomas> @I3GLR-4R@->NAME->SURN = <BALDOCK> @I3GLR-4R@->SEX = <M> @I3GLR-4R@->SOUR = <@S01@> ------------------- @I3GLR-5X@ = <INDI> @I3GLR-5X@->AFN = <3GLR-5X> @I3GLR-5X@->FAMS = <@F524078@> @I3GLR-5X@->NAME = <Anne /CHAMBERS/> @I3GLR-5X@->NAME->GIVN = <Anne> @I3GLR-5X@->NAME->SURN = <CHAMBERS> @I3GLR-5X@->SEX = <F> @I3GLR-5X@->SOUR = <@S01@> ------------------- @I98BW-JC@ = <INDI> @I98BW-JC@->AFN = <98BW-JC> @I98BW-JC@->BIRT = <> @I98BW-JC@->BIRT->DATE = <1854> @I98BW-JC@->BIRT->PLAC = <Geelong, Victoria, Australia> @I98BW-JC@->DEAT = <> @I98BW-JC@->DEAT->DATE = <9 Dec 1890> @I98BW-JC@->DEAT->PLAC = <Geelong, Victoria, Australia> @I98BW-JC@->FAMC = <@F1794093@> @I98BW-JC@->FAMS = <@F1794078@> @I98BW-JC@->NAME = <Emily Jane /THORNTON/> @I98BW-JC@->NAME->GIVN = <Emily Jane> @I98BW-JC@->NAME->SURN = <THORNTON> @I98BW-JC@->SEX = <F> @I98BW-JC@->SOUR = <@S01@> ------------------- @I98BX-N6@ = <INDI> @I98BX-N6@->AFN = <98BX-N6> @I98BX-N6@->FAMS = <@F1794093@> @I98BX-N6@->NAME = <Charles Edwin /THORNTON/> @I98BX-N6@->NAME->GIVN = <Charles Edwin> @I98BX-N6@->NAME->SURN = <THORNTON> @I98BX-N6@->SEX = <M> @I98BX-N6@->SOUR = <@S01@> ------------------- @I98BX-PC@ = <INDI> @I98BX-PC@->AFN = <98BX-PC> @I98BX-PC@->FAMS = <@F1794093@> @I98BX-PC@->NAME = <Emily /GROWDON/> @I98BX-PC@->NAME->GIVN = <Emily> @I98BX-PC@->NAME->SURN = <GROWDON> @I98BX-PC@->SEX = <F> @I98BX-PC@->SOUR = <@S01@> ------------------- @I98CJ-BW@ = <INDI> @I98CJ-BW@->AFN = <98CJ-BW> @I98CJ-BW@->BIRT = <> @I98CJ-BW@->BIRT->DATE = <ABT 1876> @I98CJ-BW@->BIRT->PLAC = <Geelong, Victoria, Australia> @I98CJ-BW@->FAMC = <@F1794078@> @I98CJ-BW@->NAME = <Percy William Growdon /BALDOCK/> @I98CJ-BW@->NAME->GIVN = <Percy William Growdon> @I98CJ-BW@->NAME->SURN = <BALDOCK> @I98CJ-BW@->SEX = <M> @I98CJ-BW@->SOUR = <@S01@> ------------------- @I98BW-LP@ = <INDI> @I98BW-LP@->AFN = <98BW-LP> @I98BW-LP@->BIRT = <> @I98BW-LP@->BIRT->DATE = <1879> @I98BW-LP@->BIRT->PLAC = <Geelong, Victoria, Australia> @I98BW-LP@->DEAT = <> @I98BW-LP@->DEAT->DATE = <6 Sep 1886> @I98BW-LP@->DEAT->PLAC = <Geelong, Victoria, Australia> @I98BW-LP@->FAMC = <@F1794078@> @I98BW-LP@->NAME = <Percy William Growdon /BALDOCK/> @I98BW-LP@->NAME->GIVN = <Percy William Growdon> @I98BW-LP@->NAME->SURN = <BALDOCK> @I98BW-LP@->SEX = <M> @I98BW-LP@->SOUR = <@S01@> ------------------- @I98BW-KJ@ = <INDI> @I98BW-KJ@->AFN = <98BW-KJ> @I98BW-KJ@->BIRT = <> @I98BW-KJ@->BIRT->DATE = <1878> @I98BW-KJ@->BIRT->PLAC = <Geelong, Victoria, Australia> @I98BW-KJ@->FAMC = <@F1794078@> @I98BW-KJ@->NAME = <Arthur Jabez /BALDOCK/> @I98BW-KJ@->NAME->GIVN = <Arthur Jabez> @I98BW-KJ@->NAME->SURN = <BALDOCK> @I98BW-KJ@->SEX = <M> @I98BW-KJ@->SOUR = <@S01@> ------------------- @I98BW-P7@ = <INDI> @I98BW-P7@->AFN = <98BW-P7> @I98BW-P7@->BIRT = <> @I98BW-P7@->BIRT->DATE = <1887> @I98BW-P7@->BIRT->PLAC = <Geelong, Victoria, Australia> @I98BW-P7@->DEAT = <> @I98BW-P7@->DEAT->DATE = <1907> @I98BW-P7@->DEAT->PLAC = <> @I98BW-P7@->FAMC = <@F1794078@> @I98BW-P7@->NAME = <Gladys Claudine /BALDOCK/> @I98BW-P7@->NAME->GIVN = <Gladys Claudine> @I98BW-P7@->NAME->SURN = <BALDOCK> @I98BW-P7@->SEX = <F> @I98BW-P7@->SOUR = <@S01@> ------------------- @I98BW-N2@ = <INDI> @I98BW-N2@->AFN = <98BW-N2> @I98BW-N2@->BIRT = <> @I98BW-N2@->BIRT->DATE = <1884> @I98BW-N2@->BIRT->PLAC = <Geelong, Victoria, Australia> @I98BW-N2@->DEAT = <> @I98BW-N2@->DEAT->DATE = <25 Oct 1951> @I98BW-N2@->DEAT->PLAC = <> @I98BW-N2@->FAMC = <@F1794078@> @I98BW-N2@->NAME = <Clive Alfred /BALDOCK/> @I98BW-N2@->NAME->GIVN = <Clive Alfred> @I98BW-N2@->NAME->SURN = <BALDOCK> @I98BW-N2@->SEX = <M> @I98BW-N2@->SOUR = <@S01@> ------------------- @I98BW-MV@ = <INDI> @I98BW-MV@->AFN = <98BW-MV> @I98BW-MV@->BIRT = <> @I98BW-MV@->BIRT->DATE = <1881> @I98BW-MV@->BIRT->PLAC = <Geelong, Victoria, Australia> @I98BW-MV@->FAMC = <@F1794078@> @I98BW-MV@->NAME = <Lawrence /BALDOCK/> @I98BW-MV@->NAME->GIVN = <Lawrence> @I98BW-MV@->NAME->SURN = <BALDOCK> @I98BW-MV@->SEX = <M> @I98BW-MV@->SOUR = <@S01@> ------------------- @F1794078@ = <FAM> @F1794078@->CHIL = <@I98BW-MV@> @F1794078@->HUSB = <@I3GLR-Z3@> @F1794078@->MARR = <> @F1794078@->MARR->DATE = <20 Apr 1876> @F1794078@->MARR->PLAC = <Geelong, Victoria, Australia> @F1794078@->WIFE = <@I98BW-JC@> ------------------- @F524078@ = <FAM> @F524078@->CHIL = <@I3GLR-Z3@> @F524078@->HUSB = <@I3GLR-4R@> @F524078@->WIFE = <@I3GLR-5X@> ------------------- @F1794093@ = <FAM> @F1794093@->CHIL = <@I98BW-JC@> @F1794093@->HUSB = <@I98BX-N6@> @F1794093@->WIFE = <@I98BX-PC@> ------------------- TRLR = <>
On Thu, 21 Feb 2013 00:01:22 -0500, "T.M. Sommers" <tmsommers2@gmail.com> wrote: >On 2/19/2013 7:15 PM, Dennis Lee Bieber wrote: >> >> PS E:\UserData\Wulfraed\My Documents> get-content e:\sample.ged >> 0 @I1@ INDI >> 1 NAME Gerald "Bernard" /Landry/ >> 2 GIVN Gerald "Bernard" >> 2 SURN Landry >> 1 SEX M >> 1 BIRT >> 2 DATE 9 MAR 1937 >> 2 PLAC St-Jacques >> 0 @I2@ INDI >> 1 NAME Bernard /St-Jacques/ >> 2 GIVN Bernard >> 2 SURN St-Jacques >> 1 SEX M >> 1 FAMS @F1@ >> >> PS E:\UserData\Wulfraed\My Documents> get-content e:\sample.ged | >> foreach {$_ -replace '(.*) NAME (.*) "(.*)" (.*)', '$1 $2 ~$3~ $4' >> -replace '(.*) GIVN (.*) "(.*)"', '$1 GIVN $2 ~$3~'} >e:\new.ged >> >> PS E:\UserData\Wulfraed\My Documents> get-content e:\new.ged >> 0 @I1@ INDI >> 1 Gerald ~Bernard~ /Landry/ > >You left out the NAME tag. Yeah, but that was easy to fix. As a beginner, this looks very impressive.