On Mon, 18 Feb 2013 10:09:10 -0600, Charlie Hoffpauir <invalid@invalid.com> wrote: >On Mon, 18 Feb 2013 10:44:39 -0500, Denis Beauregard ><denis.b-at-francogene.com@fr.invalid> wrote: > >>On Mon, 18 Feb 2013 04:34:31 -0600, Ed Morton <mortonspam@gmail.com> >>wrote in soc.genealogy.computing: >> >>>Thanks for the explanation. If there is anything that awk could help with, >>>sample input + expected output + a brief description is all we'd need. >> >>The GEDCOM format in short : >> >>a series of lines, i.e. a plain text file. Typically, a line is : >> >> >>level tag content >> >> >>level is a number, usually 0 to 3 (I have seen 5). For safety, we >>may presume 0 to 9. Level means the line "belongs" to the previous >>level, i.e. the previous tag with the previous level. 0 INDI is >>an individual entry, 1 BIRT is his birth data, 2 DATE the birth date, >>so the 2 DATE belongs to 1 BIRT which belongs to 0 INDI. >> >>tag is a series of letters, some softwares may use an underscrore >>or even a number, but whatever, the tags are usually standard and >>define the content (i.e. name, date, type of record). >> >>content is facultative and depends on tag. This is the part that >>we may want to change. For example, replace all names in uppercase >>to lowercase with an uppercase initial, or isolate towns to >>standardize them or even replace sources by some ID. An example >>(I kept lines that need usually to be processed) : >> >>Some lines may have an ID as 2nd field, then a tag as 3rd field. >>This is common with level 0. >> >>0 @I1@ INDI >>1 NAME Bernard /Landry/ >>2 GIVN Bernard >>2 SURN Landry >>1 SEX M >>1 BIRT >>2 DATE 9 MAR 1937 >>2 PLAC St-Jacques >>0 @I2@ INDI >>1 NAME Bernard /St-Jacques/ >>2 GIVN Bernard >>2 SURN St-Jacques >>1 SEX M >>1 FAMS @F1@ >>1 FAMC @F2@ >> >>Example of processing : replace all St-Jacques in PLAC fields >>by "St-Jacques,,Qc," but not in NAME fields. > > Denis gave an "almost" perfect example of the problem posed by the >OP. The original post described a situation where a string enclosed by >quote marks were to be replaced by the string without quote marks. >What Denis described is a particular atring, what is needed is the >ability to replace any string in quote marks by the same string >without quote marks. A further comment (as a refresher). This all started because we are trying to change a name, and it could be any name in any record, that Huge had enclosed in quotation marks to indicate that that given name was the name by which the person was "called". In importing this GEDCOM to RM, RM apparently treats this name as a nickname, and so puts it in the nickname field, and so it appears twice when RM displays the name. So the idea was to change the quote marks to "something else" that RM would not interpret to be a nickname, and then once the data was in RM, use the search/replace feature to remove taht "something else". Of course there are many other reasons for doing similar things to GEDCOM files, as Denis has explained, so there is quite a lot of interest by several readers, me included, in all the general solutions proposed. I was certainly pleased to see the spreadsheet proposal posted by Denis earlier in the thread.... simple, easy to follow... and almost everyone already has the tools needed! But hopefully there are more elegant solutions of a general nature, and AWK certainly looks like a good tool.