RootsWeb.com Mailing Lists
Total: 4/4
    1. Re: Search and replace in one field
    2. Denis Beauregard
    3. On Mon, 18 Feb 2013 19:10:22 -0600, Charlie Hoffpauir <invalid@invalid.com> wrote in soc.genealogy.computing: >On Mon, 18 Feb 2013 19:08:16 -0500, Dennis Lee Bieber ><wlfraed@ix.netcom.com> wrote: > >>On Mon, 18 Feb 2013 10:44:39 -0500, Denis Beauregard >><denis.b-at-francogene.com@fr.invalid> declaimed the following in >>soc.genealogy.computing: >> >> >>> Some lines may have an ID as 2nd field, then a tag as 3rd field. >>> This is common with level 0. >>> >>> 0 @I1@ INDI >>> 1 NAME Bernard /Landry/ >>> 2 GIVN Bernard >>> 2 SURN Landry >>> 1 SEX M >>> 1 BIRT >>> 2 DATE 9 MAR 1937 >>> 2 PLAC St-Jacques >>> 0 @I2@ INDI >>> 1 NAME Bernard /St-Jacques/ >>> 2 GIVN Bernard >>> 2 SURN St-Jacques >>> 1 SEX M >>> 1 FAMS @F1@ >>> 1 FAMC @F2@ >>> >>> Example of processing : replace all St-Jacques in PLAC fields >>> by "St-Jacques,,Qc," but not in NAME fields. >> >>PowerShell command (assuming there are none that already have extended >>text)[UNTESTED] >> >>get-content >>'path:to\file.ged' | foreach {$_ -replace >>'(.) PLAC (.*) (St-Jacques) (.*)', '$1 PLAC $2 $3,,Qc, $4'} >> >'path:to\new.ged' > >I'm impressed with both the Powershell and the AWK solutions to the >problem Denis posted. However "that" problem I could also correct >with traditional means, like a rather simple Word macro. However, as I >explained in a follow-up to Denis's post, the actual problem that >started the thread was a bit different. I'm sure either AWK or >Powershell can handle it too, but I do think it's a bit more >complicated. As someone "aspiring" to become proficient in either AWK >or Powershell, I'd really like to see that solution posted. > >In general, the difference is that the original problem was to replace >quote marks with some other character, when quote marks appear in the >name field, for an "arbitrary" name. That is, it is assumed that some >portion of the given names in the file, (but not all) have had quote >marks surrounding one or more of the given names, and these are the >characters that must be changed. So, this problem consists in : replacing "Hugh" "Hugh" by Hugh (or "Hugh") in all lines beginning with 1 NAME where "Hugh" "Hugh" would be any pair of similar names, i.e. "Denis" "Denis" or "Charles" "Charles". >From my experience with Brief, a text editor with regular expressions, I don't know how to define a duplicated word. Brief was not using the standard regular expressions but with it, something like "$1" "$1" was not accepted... Denis -- Denis Beauregard - généalogiste émérite (FQSG) Les Français d'Amérique du Nord - www.francogene.com/genealogie--quebec/ French in North America before 1722 - www.francogene.com/quebec--genealogy/ Sur cédérom à 1780 - On CD-ROM to 1780

    02/18/2013 04:47:46
    1. Re: Search and replace in one field
    2. Ed Morton
    3. On 2/18/2013 10:47 PM, Denis Beauregard wrote: <snip> > So, this problem consists in : > > replacing "Hugh" "Hugh" by Hugh (or "Hugh") in all lines beginning > with > 1 NAME > > where "Hugh" "Hugh" would be any pair of similar names, i.e. > "Denis" "Denis" or "Charles" "Charles". One way to do that would be: awk '/^1 NAME/{ if ($3 == $4) sub("[[:space:]]+"$4,"") }' file That may or may not be the best approach depending what else that line can contain. > > From my experience with Brief, a text editor with regular expressions, > I don't know how to define a duplicated word. Brief was not using the > standard regular expressions but with it, something like "$1" "$1" > was not accepted... Awk uses Extended Regular Expressions (EREs) and splits each line into space-separated fields $1, $2, etc. by default. Ed.

    02/18/2013 11:46:56
    1. Re: Search and replace in one field
    2. Charlie Hoffpauir
    3. On Mon, 18 Feb 2013 23:47:46 -0500, Denis Beauregard <denis.b-at-francogene.com@fr.invalid> wrote: >>In general, the difference is that the original problem was to replace >>quote marks with some other character, when quote marks appear in the >>name field, for an "arbitrary" name. That is, it is assumed that some >>portion of the given names in the file, (but not all) have had quote >>marks surrounding one or more of the given names, and these are the >>characters that must be changed. > >So, this problem consists in : > >replacing "Hugh" "Hugh" by Hugh (or "Hugh") in all lines beginning >with >1 NAME > >where "Hugh" "Hugh" would be any pair of similar names, i.e. >"Denis" "Denis" or "Charles" "Charles". > >From my experience with Brief, a text editor with regular expressions, >I don't know how to define a duplicated word. Brief was not using the >standard regular expressions but with it, something like "$1" "$1" >was not accepted... > > >Denis Not exactly. The duplication of names only occurs when the GEDCOM is imported into RM and then RM displays the name, as in a report. The presumption is that this is caused because Hugh surrounded some given names with Quote marks, doing so to indicate that these given names were what the person was commonly known by. So a portion of the GEDCOM might look like this: 0 @I1@ INDI 1 NAME Gerald "Bernard" /Landry/ 2 GIVN Gerald "Bernard" 2 SURN Landry 1 SEX M 1 BIRT 2 DATE 9 MAR 1937 2 PLAC St-Jacques 0 @I2@ INDI 1 NAME Bernard /St-Jacques/ 2 GIVN Bernard 2 SURN St-Jacques 1 SEX M 1 FAMS @F1@ 1 FAMC @F2@ And we want to process it so that it looks like this: 0 @I1@ INDI 1 NAME Gerald ~Bernard~ /Landry/ 2 GIVN Gerald ~Bernard~ 2 SURN Landry 1 SEX M 1 BIRT 2 DATE 9 MAR 1937 2 PLAC St-Jacques 0 @I2@ INDI 1 NAME Bernard /St-Jacques/ 2 GIVN Bernard 2 SURN St-Jacques 1 SEX M 1 FAMS @F1@ 1 FAMC @F2@ Where the tilde I used might be any other character of our choosisng as long as it were not a character that would also appear elsewhere in the Given name fields of the GEDCOM Once the modified GEDCOM was imported into RM, Hugh would then use the search/replace function on the Given name field to change the tilde back to quote marks. So that is this specific situation.... but there are probably infinitely more situations where modifications to a GEDCOM might be needed to transfer data from one particular Genealogy program to another.... hence the interest in all the solutions proposed. Steve has indicated an interest in development of a series of AWK utilities for this purpose. There once was a quite useful program for modifying GEDCOMs called Gedcom Explorer (GEDX) that utilized a base code with user defined macros to accomplish the same purpose. Perhaps something along that line could be developed.

    02/19/2013 01:13:16
    1. Re: Search and replace in one field
    2. Ian Goddard
    3. Charlie Hoffpauir wrote: > On Mon, 18 Feb 2013 23:47:46 -0500, Denis Beauregard > <denis.b-at-francogene.com@fr.invalid> wrote: > > >>> In general, the difference is that the original problem was to replace >>> quote marks with some other character, when quote marks appear in the >>> name field, for an "arbitrary" name. That is, it is assumed that some >>> portion of the given names in the file, (but not all) have had quote >>> marks surrounding one or more of the given names, and these are the >>> characters that must be changed. >> >> So, this problem consists in : >> >> replacing "Hugh" "Hugh" by Hugh (or "Hugh") in all lines beginning >> with >> 1 NAME >> >> where "Hugh" "Hugh" would be any pair of similar names, i.e. >> "Denis" "Denis" or "Charles" "Charles". >> >>From my experience with Brief, a text editor with regular expressions, >> I don't know how to define a duplicated word. Brief was not using the >> standard regular expressions but with it, something like "$1" "$1" >> was not accepted... >> >> >> Denis > > Not exactly. The duplication of names only occurs when the GEDCOM is > imported into RM and then RM displays the name, as in a report. The > presumption is that this is caused because Hugh surrounded some given > names with Quote marks, doing so to indicate that these given names > were what the person was commonly known by. > > So a portion of the GEDCOM might look like this: > > 0 @I1@ INDI > 1 NAME Gerald "Bernard" /Landry/ > 2 GIVN Gerald "Bernard" > 2 SURN Landry > 1 SEX M > 1 BIRT > 2 DATE 9 MAR 1937 > 2 PLAC St-Jacques > 0 @I2@ INDI > 1 NAME Bernard /St-Jacques/ > 2 GIVN Bernard > 2 SURN St-Jacques > 1 SEX M > 1 FAMS @F1@ > 1 FAMC @F2@ > > And we want to process it so that it looks like this: > > 0 @I1@ INDI > 1 NAME Gerald ~Bernard~ /Landry/ > 2 GIVN Gerald ~Bernard~ > 2 SURN Landry > 1 SEX M > 1 BIRT > 2 DATE 9 MAR 1937 > 2 PLAC St-Jacques > 0 @I2@ INDI > 1 NAME Bernard /St-Jacques/ > 2 GIVN Bernard > 2 SURN St-Jacques > 1 SEX M > 1 FAMS @F1@ > 1 FAMC @F2@ > > Where the tilde I used might be any other character of our choosisng > as long as it were not a character that would also appear elsewhere in > the Given name fields of the GEDCOM > > Once the modified GEDCOM was imported into RM, Hugh would then use the > search/replace function on the Given name field to change the tilde > back to quote marks. > > So that is this specific situation.... but there are probably > infinitely more situations where modifications to a GEDCOM might be > needed to transfer data from one particular Genealogy program to > another.... hence the interest in all the solutions proposed. Steve > has indicated an interest in development of a series of AWK utilities > for this purpose. There once was a quite useful program for modifying > GEDCOMs called Gedcom Explorer (GEDX) that utilized a base code with > user defined macros to accomplish the same purpose. Perhaps something > along that line could be developed. > In Unixland tr is the command: http://unixhelp.ed.ac.uk/CGI/man-cgi?tr+1 -- Ian The Hotmail address is my spam-bin. Real mail address is iang at austonley org uk

    02/22/2013 07:27:11