"Steve Hayes" <hayesstw@telkomsa.net> wrote in message news:h4p6i8l277lgaqsbl8ad70jkek4injed62@4ax.com... > On Tue, 19 Feb 2013 11:02:25 +0000, Ian Goddard <goddai01@hotmail.co.uk> > wrote: > > I think you are being too dismissive. > > I'm not taking about XML files, but about Gedcom files, and I'm not > talking > about a DOM, but about AWK. > > And I'm not taking about some hypothetical Platonic ideal of the perfect > Gedcom replacement, but about the actual Gedcom files that millions of > genealogists have on their computers now. > > These "you can't get there from here" comments are really not very > helpful. > > I gave advice based on experience Steve. For every AWK file you write, I can contrive a GEDCOM example that will break it. Another post in this thread mentioned ambiguities, and having to assume the availability of special characters that won't occur in names or notes. A scripting library could be useful, but I would never use one based simply on a text-processing language. I mentioned XML, not because you wanted to use it but as an analogy. Scripting manipulation of XML is usually done with XSLT, which for all its faults and obfuscation deals with entities rather than text. Tony Proctor
On Tue, 19 Feb 2013 19:15:50 -0500, Dennis Lee Bieber <wlfraed@ix.netcom.com> wrote: >On Tue, 19 Feb 2013 08:13:16 -0600, Charlie Hoffpauir ><invalid@invalid.com> declaimed the following in >soc.genealogy.computing: > > >> >> Not exactly. The duplication of names only occurs when the GEDCOM is >> imported into RM and then RM displays the name, as in a report. The >> presumption is that this is caused because Hugh surrounded some given >> names with Quote marks, doing so to indicate that these given names >> were what the person was commonly known by. >> >> So a portion of the GEDCOM might look like this: >> >> 0 @I1@ INDI >> 1 NAME Gerald "Bernard" /Landry/ >> 2 GIVN Gerald "Bernard" >> 2 SURN Landry >> 1 SEX M >> 1 BIRT >> 2 DATE 9 MAR 1937 >> 2 PLAC St-Jacques >> 0 @I2@ INDI >> 1 NAME Bernard /St-Jacques/ >> 2 GIVN Bernard >> 2 SURN St-Jacques >> 1 SEX M >> 1 FAMS @F1@ >> 1 FAMC @F2@ >> >> And we want to process it so that it looks like this: >> >> 0 @I1@ INDI >> 1 NAME Gerald ~Bernard~ /Landry/ >> 2 GIVN Gerald ~Bernard~ >> 2 SURN Landry >> 1 SEX M >> 1 BIRT >> 2 DATE 9 MAR 1937 >> 2 PLAC St-Jacques >> 0 @I2@ INDI >> 1 NAME Bernard /St-Jacques/ >> 2 GIVN Bernard >> 2 SURN St-Jacques >> 1 SEX M >> 1 FAMS @F1@ >> 1 FAMC @F2@ >> >> Where the tilde I used might be any other character of our choosisng >> as long as it were not a character that would also appear elsewhere in >> the Given name fields of the GEDCOM > > As before -- the "big" PowerShell command line is wrapping... > >PS E:\UserData\Wulfraed\My Documents> get-content e:\sample.ged > 0 @I1@ INDI > 1 NAME Gerald "Bernard" /Landry/ > 2 GIVN Gerald "Bernard" > 2 SURN Landry > 1 SEX M > 1 BIRT > 2 DATE 9 MAR 1937 > 2 PLAC St-Jacques > 0 @I2@ INDI > 1 NAME Bernard /St-Jacques/ > 2 GIVN Bernard > 2 SURN St-Jacques > 1 SEX M > 1 FAMS @F1@ > >PS E:\UserData\Wulfraed\My Documents> get-content e:\sample.ged | >foreach {$_ -replace '(.*) NAME (.*) "(.*)" (.*)', '$1 $2 ~$3~ $4' >-replace '(.*) GIVN (.*) "(.*)"', '$1 GIVN $2 ~$3~'} >e:\new.ged > >PS E:\UserData\Wulfraed\My Documents> get-content e:\new.ged > 0 @I1@ INDI > 1 Gerald ~Bernard~ /Landry/ > 2 GIVN Gerald ~Bernard~ > 2 SURN Landry > 1 SEX M > 1 BIRT > 2 DATE 9 MAR 1937 > 2 PLAC St-Jacques > 0 @I2@ INDI > 1 NAME Bernard /St-Jacques/ > 2 GIVN Bernard > 2 SURN St-Jacques > 1 SEX M > 1 FAMS @F1@ >PS E:\UserData\Wulfraed\My Documents> > > Which translates, roughly, to "read the source file, feeding the >lines, one at a time, to a loop (in which the current line is $_) >perform two replace operations; the first looking for NAME lines (at any >level), and the second looking for "GIVN" lines (also at any level), >sending the results of the lines to a new file. THAT is elegant! I will have to learn more about Powershell. I've downloaded several documents and the MS Powershell e-book powershell_selflearn.pdf. It all looks formidable, but then I only need (I think) the info on file handling and Regular Expressions to get started with handling text files like GEDCOMS. Thanks very much for the example.
On Tue, 19 Feb 2013 11:14:52 +0200, Steve Hayes <hayesstw@telkomsa.net> wrote: >What I would like to see is the development of a kind of library of AWK >routines to manipulate GEDCOM files. That idea sounds promising to me. Obviously the unique needs of individuals would see an endless of expansion of the library. But with some simple instructions how to construct many of us would be able to write for our individual needs. And come here only when we OOPSed. Start with very simple tasks to, as you say, "manipulate GEDCOM files". If you see quote marks, remove them might be the first routine - showing my bias. Hugh
On Mon, 18 Feb 2013 23:47:46 -0500, Denis Beauregard <denis.b-at-francogene.com@fr.invalid> wrote: >So, this problem consists in : > >replacing "Hugh" "Hugh" by Hugh (or "Hugh") in all lines beginning >with >1 NAME > >where "Hugh" "Hugh" would be any pair of similar names, i.e. >"Denis" "Denis" or "Charles" "Charles". > >From my experience with Brief, a text editor with regular expressions, >I don't know how to define a duplicated word. Brief was not using the >standard regular expressions but with it, something like "$1" "$1" >was not accepted... > > >Denis Or, even more apt to me at least, is replacing the quotes BEFORE they are incorporated into the second program. IOW make the adjustment to the gedcom after creation but before import if the programs Search and Replace can't handle the task. Seems like there would be a simple way to remove the quotes from the gedcom. That would solve my problem but not the problem for someone who uses quotes in another manner. Hugh
On Tue, 19 Feb 2013 12:08:11 +0000, "Peter J. Seymour" <Newsgroups@pjsey.demon.co.uk> wrote: >Now, what was the problem in question? Any problem that AWK can handle. -- Steve Hayes from Tshwane, South Africa Blog: http://khanya.wordpress.com E-mail - see web page, or parse: shayes at dunelm full stop org full stop uk
On Tue, 19 Feb 2013 11:14:52 +0200, Steve Hayes <hayesstw@telkomsa.net> wrote: >What I would like to see is the development of a kind of library of AWK >routines to manipulate GEDCOM files. Lots of genealogists have GEDCOM files, >and some would like to make changes to them, or extract information from them >in ways that might not be possible with other genealogy programs. For those in soc.genealogy.computing who don't know what AWK is or does, here is a description (apologies to those in comp.lang.awk who already know this) * Gawk-3.1.6 for Windows * ========================== What is it? ----------- Gawk: pattern scanning and processing language Description ----------- Several kinds of tasks occur repeatedly when working with text files. You might want to extract certain lines and discard the rest. Or you may need to make changes wherever certain patterns appear, but leave the rest of the file alone. Writing single-use programs for these tasks in languages such as C, C++ or Pascal is time-consuming and inconvenient. Such jobs are often easier with awk. The awk utility interprets a special-purpose programming language that makes it easy to handle simple data-reformatting jobs. The GNU implementation of awk is called gawk; it is fully compatible with the System V Release 4 version of awk. gawk is also compatible with the POSIX specification of the awk language. This means that all properly written awk programs should work with gawk. Thus, we usually dont distinguish between gawk and other awk implementations. Using awk allows you to: - Manage small, personal databases - Generate reports - Validate data - Produce indexes and perform other document preparation tasks - Experiment with algorithms that you can adapt later to other computer languages. In addition, gawk provides facilities that make it easy to: - Extract bits and pieces of data for processing - Sort data - Perform simple network communications. The Win32 port has some limitations, In particular the |& operator and TCP/IP networking are not supported. Homepage -------- http://www.gnu.org/software/gawk/gawk.html Sources: http://ftp.gnu.org/gnu/gawk/gawk-3.1.6.tar.gz System ------ - Win32, i.e. MS-Windows 95 / 98 / ME / NT / 2000 / XP / 2003 / Vista with msvcrt.dll - if msvcrt.dll is not in your Windows/System folder, get it from Microsoft <http://support.microsoft.com/default.aspx?scid=kb;en-us;259403"> or by installing Internet Explorer 4.0 or higher <http://www.microsoft.com/windows/ie> Notes ----- - Bugs and questions on this MS-Windows port: gnuwin32@users.sourceforge.net Package Availability -------------------- - in: http://gnuwin32.sourceforge.net Installation ------------ Sources ------- - gawk-3.1.6-1-src.zip Compilation ----------- The package has been compiled with GNU auto-tools, GNU make, and Mingw (GCC for MS-Windows). Any differences from the original sources are given in gawk-3.1.6-1-GnuWin32.diffs in gawk-3.1.6-1- src.zip. Libraries needed for compilation can be found at the lines starting with 'LIBS = ' in the Makefiles. Usually, these are standard libraries provided with Mingw, or libraries from the package itself; 'gw32c' refers to the libgw32c package, which provides MS-Windows substitutes or stubs for functions normally found in Unix. For more information, see: http://gnuwin32.sourceforge.net/compile.html and http://gnuwin32.sourceforge.net/packages/libgw32c.htm. -- Steve Hayes from Tshwane, South Africa Blog: http://khanya.wordpress.com E-mail - see web page, or parse: shayes at dunelm full stop org full stop uk
On 19.02.2013 10:14, Steve Hayes wrote: > In an earlier message I suggested using AWK to manipulate a GEDCOM file to > solve a particular problem. > > That point tended to get lost in discussion of other points like using other > ways to solve the problem, or discussion of flaws in the GEDCOM data model > itself and proposals for its replacement, which I see as a separate question. > > What I would like to see is the development of a kind of library of AWK > routines to manipulate GEDCOM files. Lots of genealogists have GEDCOM files, > and some would like to make changes to them, or extract information from them > in ways that might not be possible with other genealogy programs. You can consider the awk operations to be quite primitive for the given syntax of the GEDCOM files, so a library seems not really necessary; just write the awk command. I will give examples below. But first I'd like to ask for confirmation what a GEDCOM "field" actually is, per semantic and syntax. Is it _one whole line_ with a specific 4-letter tag in column 2, or is it the _rest_ of a line where the first two columns are some number and a data type tag? To change data of a specific line identify the line by a pattern on the type field (please note that Lew already gave such example). To perform action on a "NAME" field, replacing "Service" by "S." awk '$2 == "NAME" { sub(/Service/, "S.") } { print $0 }' likewise negate the condition if you want to select type tags other than name. To exclude tag names prom processing that seem to have a specific meaning awk '$2 !~ /@.*@/ { sub(/Service/, "S.") } { print $0 }' You can combine those using logical operations like && (and) awk '$2 !~ /@.*@/ && $2 == "NAME" { sub(/Service/, "S.") } { print $0 }' If you want something like a library (as you said) it's harder to cover all conditions in an own ("invented-here") language frame without making it more complex than the awk language itself. But I will give an example how to parameterise awk when using only simple conditions. awk -v tag="NAME" -v from="Service" -v to="S." ' $2 == tag { sub(from,to) } { print $0 } ' where the string constants may be passed through shell variables awk -v tag="${1:?}" -v from="${2:?}" -v to="${3:?}" ' $2 == tag { sub(from,to) } { print $0 } ' (One caveat in advance; a /from/ pattern passed as a string, either as "from" or per variable, will be subject to other interpretation than a pattern constant. But I guess that would be an issue once we are sure that this is what you want. Another Caveat; for simplicity I substituted over the whole line, so any substitution may affect the tag fields, say, if you want to substitute in the data columns /ME/ it would change the "ME" in the tag "NAME". In case that you have the "rest of the line (after number and tag)" is the actual data it may be advantageous to operate on sub-strings of the whole line. I will expand on that on demand.) Janis > > Here is a GEDCOM file. > > I tried to choose a short one to use as an example, which shows the structure > of the file. > > 0 HEAD > 1 SOUR ANSTFILE > 2 VERS 4.19 > 2 NAME Ancestral File (R) > 2 CORP The Church of Jesus Christ of Latter-day Saints > 3 ADDR 50 East North Temple Street > 4 CONT Salt Lake City, Utah 84150 > 2 DATA Ancestral File > 3 DATE 5 January 1998 > 3 COPR Copyright (c) 1987, June 1998 > 1 DEST PAF > 1 DATE 20 APR 2002 > 2 TIME 2:58:56 > 1 FILE GEDCOM4.ged > 1 GEDC > 2 VERS 5.5 > 2 FORM LINEAGE-LINKED > 1 CHAR ANSEL > 1 SUBM @SUB01@ > 1 SUBN @N01@ > 0 @SUB01@ SUBM > 1 NAME Created by FamilySearch (TM) Internet Genealogy Service > 1 ADDR 50 East North Temple Street > 2 CONT Salt Lake City, Utah 84150 > 0 @S01@ SOUR > 1 AUTH The Church of Jesus Christ of Latter-day Saints > 1 TITL Ancestral File (R) > 1 PUBL Copyright (c) 1987, June 1998, data as of 5 January 1998 > 1 REPO @R01@ > 0 @R01@ REPO > 1 NAME Family History Library > 1 ADDR 35 N West Temple Street > 2 CONT Salt Lake City, Utah 84150 USA > 0 @N01@ SUBN > 1 DESC 2 > 1 ORDI N > 0 @I3GLR-Z3@ INDI > 1 NAME Thomas William /BALDOCK/ > 2 GIVN Thomas William > 2 SURN BALDOCK > 1 AFN 3GLR-Z3 > 1 SEX M > 1 SOUR @S01@ > 1 BIRT > 2 DATE 1 Jul 1850 > 2 PLAC Geelong, Vic, Astl > 1 FAMS @F1794078@ > 1 FAMC @F524078@ > 0 @I3GLR-4R@ INDI > 1 NAME Thomas /BALDOCK/ > 2 GIVN Thomas > 2 SURN BALDOCK > 1 AFN 3GLR-4R > 1 SEX M > 1 SOUR @S01@ > 1 FAMS @F524078@ > 0 @I3GLR-5X@ INDI > 1 NAME Anne /CHAMBERS/ > 2 GIVN Anne > 2 SURN CHAMBERS > 1 AFN 3GLR-5X > 1 SEX F > 1 SOUR @S01@ > 1 FAMS @F524078@ > 0 @I98BW-JC@ INDI > 1 NAME Emily Jane /THORNTON/ > 2 GIVN Emily Jane > 2 SURN THORNTON > 1 AFN 98BW-JC > 1 SEX F > 1 SOUR @S01@ > 1 BIRT > 2 DATE 1854 > 2 PLAC Geelong, Victoria, Australia > 1 DEAT > 2 DATE 9 Dec 1890 > 2 PLAC Geelong, Victoria, Australia > 1 FAMS @F1794078@ > 1 FAMC @F1794093@ > 0 @I98BX-N6@ INDI > 1 NAME Charles Edwin /THORNTON/ > 2 GIVN Charles Edwin > 2 SURN THORNTON > 1 AFN 98BX-N6 > 1 SEX M > 1 SOUR @S01@ > 1 FAMS @F1794093@ > 0 @I98BX-PC@ INDI > 1 NAME Emily /GROWDON/ > 2 GIVN Emily > 2 SURN GROWDON > 1 AFN 98BX-PC > 1 SEX F > 1 SOUR @S01@ > 1 FAMS @F1794093@ > 0 @I98CJ-BW@ INDI > 1 NAME Percy William Growdon /BALDOCK/ > 2 GIVN Percy William Growdon > 2 SURN BALDOCK > 1 AFN 98CJ-BW > 1 SEX M > 1 SOUR @S01@ > 1 BIRT > 2 DATE ABT 1876 > 2 PLAC Geelong, Victoria, Australia > 1 FAMC @F1794078@ > 0 @I98BW-LP@ INDI > 1 NAME Percy William Growdon /BALDOCK/ > 2 GIVN Percy William Growdon > 2 SURN BALDOCK > 1 AFN 98BW-LP > 1 SEX M > 1 SOUR @S01@ > 1 BIRT > 2 DATE 1879 > 2 PLAC Geelong, Victoria, Australia > 1 DEAT > 2 DATE 6 Sep 1886 > 2 PLAC Geelong, Victoria, Australia > 1 FAMC @F1794078@ > 0 @I98BW-KJ@ INDI > 1 NAME Arthur Jabez /BALDOCK/ > 2 GIVN Arthur Jabez > 2 SURN BALDOCK > 1 AFN 98BW-KJ > 1 SEX M > 1 SOUR @S01@ > 1 BIRT > 2 DATE 1878 > 2 PLAC Geelong, Victoria, Australia > 1 FAMC @F1794078@ > 0 @I98BW-P7@ INDI > 1 NAME Gladys Claudine /BALDOCK/ > 2 GIVN Gladys Claudine > 2 SURN BALDOCK > 1 AFN 98BW-P7 > 1 SEX F > 1 SOUR @S01@ > 1 BIRT > 2 DATE 1887 > 2 PLAC Geelong, Victoria, Australia > 1 DEAT > 2 DATE 1907 > 2 PLAC > 1 FAMC @F1794078@ > 0 @I98BW-N2@ INDI > 1 NAME Clive Alfred /BALDOCK/ > 2 GIVN Clive Alfred > 2 SURN BALDOCK > 1 AFN 98BW-N2 > 1 SEX M > 1 SOUR @S01@ > 1 BIRT > 2 DATE 1884 > 2 PLAC Geelong, Victoria, Australia > 1 DEAT > 2 DATE 25 Oct 1951 > 2 PLAC > 1 FAMC @F1794078@ > 0 @I98BW-MV@ INDI > 1 NAME Lawrence /BALDOCK/ > 2 GIVN Lawrence > 2 SURN BALDOCK > 1 AFN 98BW-MV > 1 SEX M > 1 SOUR @S01@ > 1 BIRT > 2 DATE 1881 > 2 PLAC Geelong, Victoria, Australia > 1 FAMC @F1794078@ > 0 @F1794078@ FAM > 1 HUSB @I3GLR-Z3@ > 1 WIFE @I98BW-JC@ > 1 CHIL @I98CJ-BW@ > 1 CHIL @I98BW-LP@ > 1 CHIL @I98BW-KJ@ > 1 CHIL @I98BW-P7@ > 1 CHIL @I98BW-N2@ > 1 CHIL @I98BW-MV@ > 1 MARR > 2 DATE 20 Apr 1876 > 2 PLAC Geelong, Victoria, Australia > 0 @F524078@ FAM > 1 HUSB @I3GLR-4R@ > 1 WIFE @I3GLR-5X@ > 1 CHIL @I3GLR-Z3@ > 0 @F1794093@ FAM > 1 HUSB @I98BX-N6@ > 1 WIFE @I98BX-PC@ > 1 CHIL @I98BW-JC@ > 0 TRLR >
On Tue, 19 Feb 2013 11:02:25 +0000, Ian Goddard <goddai01@hotmail.co.uk> wrote: >Tony Proctor wrote: >> "Steve Hayes" <hayesstw@telkomsa.net> wrote in message >> news:31g6i85ir4tlbhov78e19vf81e19c8toab@4ax.com... >>> In an earlier message I suggested using AWK to manipulate a GEDCOM file to >>> solve a particular problem. >>> >>> That point tended to get lost in discussion of other points like using >>> other >>> ways to solve the problem, or discussion of flaws in the GEDCOM data model >>> itself and proposals for its replacement, which I see as a separate >>> question. >>> >>> What I would like to see is the development of a kind of library of AWK >>> routines to manipulate GEDCOM files. Lots of genealogists have GEDCOM >>> files, >>> and some would like to make changes to them, or extract information from >>> them >>> in ways that might not be possible with other genealogy programs. >> >> If you're OK using AWK Steve then I would recommend a more reliable >> approach. Textual manipulation to solve a problem is usually >> less-than-satisfactory due to ambiguities looking at plain text, and the >> fact that a simple text-processing language cannot easily understand the >> grammar of something like GEDCOM. >> >> In a similar vein, no one would (or should) try and manipulate an XML file >> directly from its textual representation. They would first load it into a >> DOM (Document Object Model) before processing the associated objects. >> >> I would recommend loading the GEDCOM file into an object-representation, in >> memory, and manipulate its objects instead. I have seen free tools for doing >> this although I do not have a reference to hand. > >This idea also occurred to me. I dismissed it fairly quickly. I think you are being too dismissive. I'm not taking about XML files, but about Gedcom files, and I'm not talking about a DOM, but about AWK. And I'm not taking about some hypothetical Platonic ideal of the perfect Gedcom replacement, but about the actual Gedcom files that millions of genealogists have on their computers now. These "you can't get there from here" comments are really not very helpful. > >First it just becomes Yet Another GEDCOM Based Application. Like all >the other YAGBAs it has the problems of dealing with the ways GEDCOM has >been twisted by all the other YAGBAs & their users. Maybe it would need >some sort of expert system to understand how to parse incoming GEDCOM >files from A & how to write them for B. > >Then it would need all manner of editing functions otherwise someone >would be complaining that although it does what someone else wanted it >doesn't do what they want, and, of course, an easy-to-use user interface >to all this. Then why can't it just provide for new data entry as well? > And why can't it display a nice tree? And do reports? And and and... > >If its import and export facilities were good enough to be the universal >GEDCOM handler it would need to be scope creep would be inevitable. By >v2.0 it would probably adding its own GEDCOM semantics & become part of >the problem. -- Steve Hayes from Tshwane, South Africa Blog: http://khanya.wordpress.com E-mail - see web page, or parse: shayes at dunelm full stop org full stop uk
On 2013-02-19 11:02, Ian Goddard wrote: > If its import and export facilities were good enough to be the universal > GEDCOM handler it would need to be scope creep would be inevitable. By > v2.0 it would probably adding its own GEDCOM semantics & become part of > the problem. It is simpler to treat the problem in two parts: input and output. It is feasible (if long winded) to provide a good approximation to a universal gedcom reader. However, providing for all variations of gedcom output is a much more severe problem and is the part best not attempted. So, for instance, if you want to provide a gedcom output option try just v5.5. In practice I find that once a particuar problem has been analysed, possibly by loading the file into a suitable program, it is often possible to go back to the original gedcom and do some editing although possibly manually rather than using a script. The human involvement providing that little bit of extra intelligence that can make the difference. If it is a large file, you just have to go plodding through. It might take a bit of discipline but it is doable. Now, what was the problem in question?
Steve Hayes wrote: > On Tue, 19 Feb 2013 11:02:25 +0000, Ian Goddard <goddai01@hotmail.co.uk> > wrote: > >> Tony Proctor wrote: >>> "Steve Hayes" <hayesstw@telkomsa.net> wrote in message >>> news:31g6i85ir4tlbhov78e19vf81e19c8toab@4ax.com... >>>> In an earlier message I suggested using AWK to manipulate a GEDCOM file to >>>> solve a particular problem. >>>> >>>> That point tended to get lost in discussion of other points like using >>>> other >>>> ways to solve the problem, or discussion of flaws in the GEDCOM data model >>>> itself and proposals for its replacement, which I see as a separate >>>> question. >>>> >>>> What I would like to see is the development of a kind of library of AWK >>>> routines to manipulate GEDCOM files. Lots of genealogists have GEDCOM >>>> files, >>>> and some would like to make changes to them, or extract information from >>>> them >>>> in ways that might not be possible with other genealogy programs. >>> >>> If you're OK using AWK Steve then I would recommend a more reliable >>> approach. Textual manipulation to solve a problem is usually >>> less-than-satisfactory due to ambiguities looking at plain text, and the >>> fact that a simple text-processing language cannot easily understand the >>> grammar of something like GEDCOM. >>> >>> In a similar vein, no one would (or should) try and manipulate an XML file >>> directly from its textual representation. They would first load it into a >>> DOM (Document Object Model) before processing the associated objects. >>> >>> I would recommend loading the GEDCOM file into an object-representation, in >>> memory, and manipulate its objects instead. I have seen free tools for doing >>> this although I do not have a reference to hand. >> >> This idea also occurred to me. I dismissed it fairly quickly. > > I think you are being too dismissive. > > I'm not taking about XML files, but about Gedcom files, and I'm not talking > about a DOM, but about AWK. > > And I'm not taking about some hypothetical Platonic ideal of the perfect > Gedcom replacement, but about the actual Gedcom files that millions of > genealogists have on their computers now. > > These "you can't get there from here" comments are really not very helpful. > Steve, Note that I was replying to Tony, not you. -- Ian The Hotmail address is my spam-bin. Real mail address is iang at austonley org uk
In an earlier message I suggested using AWK to manipulate a GEDCOM file to solve a particular problem. That point tended to get lost in discussion of other points like using other ways to solve the problem, or discussion of flaws in the GEDCOM data model itself and proposals for its replacement, which I see as a separate question. What I would like to see is the development of a kind of library of AWK routines to manipulate GEDCOM files. Lots of genealogists have GEDCOM files, and some would like to make changes to them, or extract information from them in ways that might not be possible with other genealogy programs. Here is a GEDCOM file. I tried to choose a short one to use as an example, which shows the structure of the file. 0 HEAD 1 SOUR ANSTFILE 2 VERS 4.19 2 NAME Ancestral File (R) 2 CORP The Church of Jesus Christ of Latter-day Saints 3 ADDR 50 East North Temple Street 4 CONT Salt Lake City, Utah 84150 2 DATA Ancestral File 3 DATE 5 January 1998 3 COPR Copyright (c) 1987, June 1998 1 DEST PAF 1 DATE 20 APR 2002 2 TIME 2:58:56 1 FILE GEDCOM4.ged 1 GEDC 2 VERS 5.5 2 FORM LINEAGE-LINKED 1 CHAR ANSEL 1 SUBM @SUB01@ 1 SUBN @N01@ 0 @SUB01@ SUBM 1 NAME Created by FamilySearch (TM) Internet Genealogy Service 1 ADDR 50 East North Temple Street 2 CONT Salt Lake City, Utah 84150 0 @S01@ SOUR 1 AUTH The Church of Jesus Christ of Latter-day Saints 1 TITL Ancestral File (R) 1 PUBL Copyright (c) 1987, June 1998, data as of 5 January 1998 1 REPO @R01@ 0 @R01@ REPO 1 NAME Family History Library 1 ADDR 35 N West Temple Street 2 CONT Salt Lake City, Utah 84150 USA 0 @N01@ SUBN 1 DESC 2 1 ORDI N 0 @I3GLR-Z3@ INDI 1 NAME Thomas William /BALDOCK/ 2 GIVN Thomas William 2 SURN BALDOCK 1 AFN 3GLR-Z3 1 SEX M 1 SOUR @S01@ 1 BIRT 2 DATE 1 Jul 1850 2 PLAC Geelong, Vic, Astl 1 FAMS @F1794078@ 1 FAMC @F524078@ 0 @I3GLR-4R@ INDI 1 NAME Thomas /BALDOCK/ 2 GIVN Thomas 2 SURN BALDOCK 1 AFN 3GLR-4R 1 SEX M 1 SOUR @S01@ 1 FAMS @F524078@ 0 @I3GLR-5X@ INDI 1 NAME Anne /CHAMBERS/ 2 GIVN Anne 2 SURN CHAMBERS 1 AFN 3GLR-5X 1 SEX F 1 SOUR @S01@ 1 FAMS @F524078@ 0 @I98BW-JC@ INDI 1 NAME Emily Jane /THORNTON/ 2 GIVN Emily Jane 2 SURN THORNTON 1 AFN 98BW-JC 1 SEX F 1 SOUR @S01@ 1 BIRT 2 DATE 1854 2 PLAC Geelong, Victoria, Australia 1 DEAT 2 DATE 9 Dec 1890 2 PLAC Geelong, Victoria, Australia 1 FAMS @F1794078@ 1 FAMC @F1794093@ 0 @I98BX-N6@ INDI 1 NAME Charles Edwin /THORNTON/ 2 GIVN Charles Edwin 2 SURN THORNTON 1 AFN 98BX-N6 1 SEX M 1 SOUR @S01@ 1 FAMS @F1794093@ 0 @I98BX-PC@ INDI 1 NAME Emily /GROWDON/ 2 GIVN Emily 2 SURN GROWDON 1 AFN 98BX-PC 1 SEX F 1 SOUR @S01@ 1 FAMS @F1794093@ 0 @I98CJ-BW@ INDI 1 NAME Percy William Growdon /BALDOCK/ 2 GIVN Percy William Growdon 2 SURN BALDOCK 1 AFN 98CJ-BW 1 SEX M 1 SOUR @S01@ 1 BIRT 2 DATE ABT 1876 2 PLAC Geelong, Victoria, Australia 1 FAMC @F1794078@ 0 @I98BW-LP@ INDI 1 NAME Percy William Growdon /BALDOCK/ 2 GIVN Percy William Growdon 2 SURN BALDOCK 1 AFN 98BW-LP 1 SEX M 1 SOUR @S01@ 1 BIRT 2 DATE 1879 2 PLAC Geelong, Victoria, Australia 1 DEAT 2 DATE 6 Sep 1886 2 PLAC Geelong, Victoria, Australia 1 FAMC @F1794078@ 0 @I98BW-KJ@ INDI 1 NAME Arthur Jabez /BALDOCK/ 2 GIVN Arthur Jabez 2 SURN BALDOCK 1 AFN 98BW-KJ 1 SEX M 1 SOUR @S01@ 1 BIRT 2 DATE 1878 2 PLAC Geelong, Victoria, Australia 1 FAMC @F1794078@ 0 @I98BW-P7@ INDI 1 NAME Gladys Claudine /BALDOCK/ 2 GIVN Gladys Claudine 2 SURN BALDOCK 1 AFN 98BW-P7 1 SEX F 1 SOUR @S01@ 1 BIRT 2 DATE 1887 2 PLAC Geelong, Victoria, Australia 1 DEAT 2 DATE 1907 2 PLAC 1 FAMC @F1794078@ 0 @I98BW-N2@ INDI 1 NAME Clive Alfred /BALDOCK/ 2 GIVN Clive Alfred 2 SURN BALDOCK 1 AFN 98BW-N2 1 SEX M 1 SOUR @S01@ 1 BIRT 2 DATE 1884 2 PLAC Geelong, Victoria, Australia 1 DEAT 2 DATE 25 Oct 1951 2 PLAC 1 FAMC @F1794078@ 0 @I98BW-MV@ INDI 1 NAME Lawrence /BALDOCK/ 2 GIVN Lawrence 2 SURN BALDOCK 1 AFN 98BW-MV 1 SEX M 1 SOUR @S01@ 1 BIRT 2 DATE 1881 2 PLAC Geelong, Victoria, Australia 1 FAMC @F1794078@ 0 @F1794078@ FAM 1 HUSB @I3GLR-Z3@ 1 WIFE @I98BW-JC@ 1 CHIL @I98CJ-BW@ 1 CHIL @I98BW-LP@ 1 CHIL @I98BW-KJ@ 1 CHIL @I98BW-P7@ 1 CHIL @I98BW-N2@ 1 CHIL @I98BW-MV@ 1 MARR 2 DATE 20 Apr 1876 2 PLAC Geelong, Victoria, Australia 0 @F524078@ FAM 1 HUSB @I3GLR-4R@ 1 WIFE @I3GLR-5X@ 1 CHIL @I3GLR-Z3@ 0 @F1794093@ FAM 1 HUSB @I98BX-N6@ 1 WIFE @I98BX-PC@ 1 CHIL @I98BW-JC@ 0 TRLR -- Steve Hayes from Tshwane, South Africa Blog: http://khanya.wordpress.com E-mail - see web page, or parse: shayes at dunelm full stop org full stop uk
Steve Hayes wrote: > In an earlier message I suggested using AWK to manipulate a GEDCOM file to > solve a particular problem. > > That point tended to get lost in discussion of other points like using other > ways to solve the problem, or discussion of flaws in the GEDCOM data model > itself and proposals for its replacement, which I see as a separate question. > > What I would like to see is the development of a kind of library of AWK > routines to manipulate GEDCOM files. Lots of genealogists have GEDCOM files, > and some would like to make changes to them, or extract information from them > in ways that might not be possible with other genealogy programs. As matters stand, as languages go AWK is somewhat, ummm, ante-diluvian. And, in fact, what you want to do can be done in just about any scripting language that's universally available - say PERL, python, TK/TCL or even, heaven help us, PHP. I know there is a package of genealogy tools available for PERL and there no doubt is something similar for python (which is what Gramps is written in). And both PERL and python have the advantage of being available on just about every platform, including (shudder) Gates Universal Computer Virus (better known as windows). So, rather than spend your time futzing around with AWK, which I like, I'd suggest you look at the more modern languages as a way of doing what you want. Bob Melson -- Robert G. Melson | Rio Grande Microsolutions | El Paso, Texas ----- Any man who thinks he can be happy and prosperous by letting the Government take care of him, better take a closer look at the American Indian. -- Henry Ford
Tony Proctor wrote: > "Steve Hayes" <hayesstw@telkomsa.net> wrote in message > news:31g6i85ir4tlbhov78e19vf81e19c8toab@4ax.com... >> In an earlier message I suggested using AWK to manipulate a GEDCOM file to >> solve a particular problem. >> >> That point tended to get lost in discussion of other points like using >> other >> ways to solve the problem, or discussion of flaws in the GEDCOM data model >> itself and proposals for its replacement, which I see as a separate >> question. >> >> What I would like to see is the development of a kind of library of AWK >> routines to manipulate GEDCOM files. Lots of genealogists have GEDCOM >> files, >> and some would like to make changes to them, or extract information from >> them >> in ways that might not be possible with other genealogy programs. >> >> Here is a GEDCOM file. >> >> I tried to choose a short one to use as an example, which shows the >> structure >> of the file. >> > > <snip> > >> >> -- >> Steve Hayes from Tshwane, South Africa >> Blog: http://khanya.wordpress.com >> E-mail - see web page, or parse: shayes at dunelm full stop org full stop >> uk > > If you're OK using AWK Steve then I would recommend a more reliable > approach. Textual manipulation to solve a problem is usually > less-than-satisfactory due to ambiguities looking at plain text, and the > fact that a simple text-processing language cannot easily understand the > grammar of something like GEDCOM. > > In a similar vein, no one would (or should) try and manipulate an XML file > directly from its textual representation. They would first load it into a > DOM (Document Object Model) before processing the associated objects. > > I would recommend loading the GEDCOM file into an object-representation, in > memory, and manipulate its objects instead. I have seen free tools for doing > this although I do not have a reference to hand. This idea also occurred to me. I dismissed it fairly quickly. First it just becomes Yet Another GEDCOM Based Application. Like all the other YAGBAs it has the problems of dealing with the ways GEDCOM has been twisted by all the other YAGBAs & their users. Maybe it would need some sort of expert system to understand how to parse incoming GEDCOM files from A & how to write them for B. Then it would need all manner of editing functions otherwise someone would be complaining that although it does what someone else wanted it doesn't do what they want, and, of course, an easy-to-use user interface to all this. Then why can't it just provide for new data entry as well? And why can't it display a nice tree? And do reports? And and and... If its import and export facilities were good enough to be the universal GEDCOM handler it would need to be scope creep would be inevitable. By v2.0 it would probably adding its own GEDCOM semantics & become part of the problem. -- Ian The Hotmail address is my spam-bin. Real mail address is iang at austonley org uk
"Steve Hayes" <hayesstw@telkomsa.net> wrote in message news:31g6i85ir4tlbhov78e19vf81e19c8toab@4ax.com... > In an earlier message I suggested using AWK to manipulate a GEDCOM file to > solve a particular problem. > > That point tended to get lost in discussion of other points like using > other > ways to solve the problem, or discussion of flaws in the GEDCOM data model > itself and proposals for its replacement, which I see as a separate > question. > > What I would like to see is the development of a kind of library of AWK > routines to manipulate GEDCOM files. Lots of genealogists have GEDCOM > files, > and some would like to make changes to them, or extract information from > them > in ways that might not be possible with other genealogy programs. > > Here is a GEDCOM file. > > I tried to choose a short one to use as an example, which shows the > structure > of the file. > <snip> > > -- > Steve Hayes from Tshwane, South Africa > Blog: http://khanya.wordpress.com > E-mail - see web page, or parse: shayes at dunelm full stop org full stop > uk If you're OK using AWK Steve then I would recommend a more reliable approach. Textual manipulation to solve a problem is usually less-than-satisfactory due to ambiguities looking at plain text, and the fact that a simple text-processing language cannot easily understand the grammar of something like GEDCOM. In a similar vein, no one would (or should) try and manipulate an XML file directly from its textual representation. They would first load it into a DOM (Document Object Model) before processing the associated objects. I would recommend loading the GEDCOM file into an object-representation, in memory, and manipulate its objects instead. I have seen free tools for doing this although I do not have a reference to hand. Tony Proctor
On Tue, 19 Feb 2013 08:57:27 -0600, Ed Morton <mortonspam@gmail.com> wrote: >On 2/19/2013 3:14 AM, Steve Hayes wrote: >> In an earlier message I suggested using AWK to manipulate a GEDCOM file to >> solve a particular problem. >> >> That point tended to get lost in discussion of other points like using other >> ways to solve the problem, or discussion of flaws in the GEDCOM data model >> itself and proposals for its replacement, which I see as a separate question. >> >> What I would like to see is the development of a kind of library of AWK >> routines to manipulate GEDCOM files. Lots of genealogists have GEDCOM files, >> and some would like to make changes to them, or extract information from them >> in ways that might not be possible with other genealogy programs. >> >> Here is a GEDCOM file. >> >> I tried to choose a short one to use as an example, which shows the structure >> of the file. > >OK, so that's presumably a good, representative input file for an awk script to >run against. Now - what might an output file look like and (briefly!) why? > > Ed. Posted in the previous thread, one example is: So a portion of the GEDCOM might look like this: 0 @I1@ INDI 1 NAME Gerald "Bernard" /Landry/ 2 GIVN Gerald "Bernard" 2 SURN Landry 1 SEX M 1 BIRT 2 DATE 9 MAR 1937 2 PLAC St-Jacques 0 @I2@ INDI 1 NAME Bernard /St-Jacques/ 2 GIVN Bernard 2 SURN St-Jacques 1 SEX M 1 FAMS @F1@ 1 FAMC @F2@ Where some of the given names (but not all) found in the GEDCOM have quote marks around them. And we want to process it so that it looks like this: 0 @I1@ INDI 1 NAME Gerald ~Bernard~ /Landry/ 2 GIVN Gerald ~Bernard~ 2 SURN Landry 1 SEX M 1 BIRT 2 DATE 9 MAR 1937 2 PLAC St-Jacques 0 @I2@ INDI 1 NAME Bernard /St-Jacques/ 2 GIVN Bernard 2 SURN St-Jacques 1 SEX M 1 FAMS @F1@ 1 FAMC @F2@ Where the tilde I used might be any other character of our choosisng as long as it were not a character that would also appear elsewhere in the Given name fields of the GEDCOM The only difficulty in processing (vs. a simple search and replace in a text editor) is that it is highly likely that there are other quote marks in other fields in the GEDCOM, as GEDCOMS typically contain many paragraphs of plain text. So the need is to direct the processing to occur only on a particular field or fields. It wouldn't be necessary to do all the processing in one "pass". ie, you could do the work on the GIVN field, and then on the NAME field. Why is a bit to explain, but it has been discussed at length in the previous thread.
On 2/19/2013 3:14 AM, Steve Hayes wrote: > In an earlier message I suggested using AWK to manipulate a GEDCOM file to > solve a particular problem. > > That point tended to get lost in discussion of other points like using other > ways to solve the problem, or discussion of flaws in the GEDCOM data model > itself and proposals for its replacement, which I see as a separate question. > > What I would like to see is the development of a kind of library of AWK > routines to manipulate GEDCOM files. Lots of genealogists have GEDCOM files, > and some would like to make changes to them, or extract information from them > in ways that might not be possible with other genealogy programs. > > Here is a GEDCOM file. > > I tried to choose a short one to use as an example, which shows the structure > of the file. OK, so that's presumably a good, representative input file for an awk script to run against. Now - what might an output file look like and (briefly!) why? Ed.
On Mon, 18 Feb 2013 23:47:46 -0500, Denis Beauregard <denis.b-at-francogene.com@fr.invalid> wrote: >>In general, the difference is that the original problem was to replace >>quote marks with some other character, when quote marks appear in the >>name field, for an "arbitrary" name. That is, it is assumed that some >>portion of the given names in the file, (but not all) have had quote >>marks surrounding one or more of the given names, and these are the >>characters that must be changed. > >So, this problem consists in : > >replacing "Hugh" "Hugh" by Hugh (or "Hugh") in all lines beginning >with >1 NAME > >where "Hugh" "Hugh" would be any pair of similar names, i.e. >"Denis" "Denis" or "Charles" "Charles". > >From my experience with Brief, a text editor with regular expressions, >I don't know how to define a duplicated word. Brief was not using the >standard regular expressions but with it, something like "$1" "$1" >was not accepted... > > >Denis Not exactly. The duplication of names only occurs when the GEDCOM is imported into RM and then RM displays the name, as in a report. The presumption is that this is caused because Hugh surrounded some given names with Quote marks, doing so to indicate that these given names were what the person was commonly known by. So a portion of the GEDCOM might look like this: 0 @I1@ INDI 1 NAME Gerald "Bernard" /Landry/ 2 GIVN Gerald "Bernard" 2 SURN Landry 1 SEX M 1 BIRT 2 DATE 9 MAR 1937 2 PLAC St-Jacques 0 @I2@ INDI 1 NAME Bernard /St-Jacques/ 2 GIVN Bernard 2 SURN St-Jacques 1 SEX M 1 FAMS @F1@ 1 FAMC @F2@ And we want to process it so that it looks like this: 0 @I1@ INDI 1 NAME Gerald ~Bernard~ /Landry/ 2 GIVN Gerald ~Bernard~ 2 SURN Landry 1 SEX M 1 BIRT 2 DATE 9 MAR 1937 2 PLAC St-Jacques 0 @I2@ INDI 1 NAME Bernard /St-Jacques/ 2 GIVN Bernard 2 SURN St-Jacques 1 SEX M 1 FAMS @F1@ 1 FAMC @F2@ Where the tilde I used might be any other character of our choosisng as long as it were not a character that would also appear elsewhere in the Given name fields of the GEDCOM Once the modified GEDCOM was imported into RM, Hugh would then use the search/replace function on the Given name field to change the tilde back to quote marks. So that is this specific situation.... but there are probably infinitely more situations where modifications to a GEDCOM might be needed to transfer data from one particular Genealogy program to another.... hence the interest in all the solutions proposed. Steve has indicated an interest in development of a series of AWK utilities for this purpose. There once was a quite useful program for modifying GEDCOMs called Gedcom Explorer (GEDX) that utilized a base code with user defined macros to accomplish the same purpose. Perhaps something along that line could be developed.
On 2/18/2013 10:47 PM, Denis Beauregard wrote: <snip> > So, this problem consists in : > > replacing "Hugh" "Hugh" by Hugh (or "Hugh") in all lines beginning > with > 1 NAME > > where "Hugh" "Hugh" would be any pair of similar names, i.e. > "Denis" "Denis" or "Charles" "Charles". One way to do that would be: awk '/^1 NAME/{ if ($3 == $4) sub("[[:space:]]+"$4,"") }' file That may or may not be the best approach depending what else that line can contain. > > From my experience with Brief, a text editor with regular expressions, > I don't know how to define a duplicated word. Brief was not using the > standard regular expressions but with it, something like "$1" "$1" > was not accepted... Awk uses Extended Regular Expressions (EREs) and splits each line into space-separated fields $1, $2, etc. by default. Ed.
On Tuesday, February 19, 2013 3:14:52 AM UTC-6, Steve Hayes wrote: > In an earlier message I suggested using AWK to manipulate a GEDCOM file to > > solve a particular problem. > > > > That point tended to get lost in discussion of other points like using other > > ways to solve the problem, or discussion of flaws in the GEDCOM data model > > itself and proposals for its replacement, which I see as a separate question. > > > > What I would like to see is the development of a kind of library of AWK > > routines to manipulate GEDCOM files. Lots of genealogists have GEDCOM files, > > and some would like to make changes to them, or extract information from them > > in ways that might not be possible with other genealogy programs. > > > > Here is a GEDCOM file. > > > > I tried to choose a short one to use as an example, which shows the structure > > of the file. > > > > 0 HEAD > > 1 SOUR ANSTFILE > > 2 VERS 4.19 > > 2 NAME Ancestral File (R) > > 2 CORP The Church of Jesus Christ of Latter-day Saints > > 3 ADDR 50 East North Temple Street > > 4 CONT Salt Lake City, Utah 84150 > > 2 DATA Ancestral File > > 3 DATE 5 January 1998 > > 3 COPR Copyright (c) 1987, June 1998 > > 1 DEST PAF > > 1 DATE 20 APR 2002 > > 2 TIME 2:58:56 > > 1 FILE GEDCOM4.ged > > 1 GEDC > > 2 VERS 5.5 > > 2 FORM LINEAGE-LINKED > > 1 CHAR ANSEL > > 1 SUBM @SUB01@ > > 1 SUBN @N01@ > > 0 @SUB01@ SUBM > > 1 NAME Created by FamilySearch (TM) Internet Genealogy Service > > 1 ADDR 50 East North Temple Street > > 2 CONT Salt Lake City, Utah 84150 > > 0 @S01@ SOUR > > 1 AUTH The Church of Jesus Christ of Latter-day Saints > > 1 TITL Ancestral File (R) > > 1 PUBL Copyright (c) 1987, June 1998, data as of 5 January 1998 > > 1 REPO @R01@ > > 0 @R01@ REPO > > 1 NAME Family History Library > > 1 ADDR 35 N West Temple Street > > 2 CONT Salt Lake City, Utah 84150 USA > > 0 @N01@ SUBN > > 1 DESC 2 > > 1 ORDI N > > 0 @I3GLR-Z3@ INDI > > 1 NAME Thomas William /BALDOCK/ > > 2 GIVN Thomas William > > 2 SURN BALDOCK > > 1 AFN 3GLR-Z3 > > 1 SEX M > > 1 SOUR @S01@ > > 1 BIRT > > 2 DATE 1 Jul 1850 > > 2 PLAC Geelong, Vic, Astl > > 1 FAMS @F1794078@ > > 1 FAMC @F524078@ > > 0 @I3GLR-4R@ INDI > > 1 NAME Thomas /BALDOCK/ > > 2 GIVN Thomas > > 2 SURN BALDOCK > > 1 AFN 3GLR-4R > > 1 SEX M > > 1 SOUR @S01@ > > 1 FAMS @F524078@ > > 0 @I3GLR-5X@ INDI > > 1 NAME Anne /CHAMBERS/ > > 2 GIVN Anne > > 2 SURN CHAMBERS > > 1 AFN 3GLR-5X > > 1 SEX F > > 1 SOUR @S01@ > > 1 FAMS @F524078@ > > 0 @I98BW-JC@ INDI > > 1 NAME Emily Jane /THORNTON/ > > 2 GIVN Emily Jane > > 2 SURN THORNTON > > 1 AFN 98BW-JC > > 1 SEX F > > 1 SOUR @S01@ > > 1 BIRT > > 2 DATE 1854 > > 2 PLAC Geelong, Victoria, Australia > > 1 DEAT > > 2 DATE 9 Dec 1890 > > 2 PLAC Geelong, Victoria, Australia > > 1 FAMS @F1794078@ > > 1 FAMC @F1794093@ > > 0 @I98BX-N6@ INDI > > 1 NAME Charles Edwin /THORNTON/ > > 2 GIVN Charles Edwin > > 2 SURN THORNTON > > 1 AFN 98BX-N6 > > 1 SEX M > > 1 SOUR @S01@ > > 1 FAMS @F1794093@ > > 0 @I98BX-PC@ INDI > > 1 NAME Emily /GROWDON/ > > 2 GIVN Emily > > 2 SURN GROWDON > > 1 AFN 98BX-PC > > 1 SEX F > > 1 SOUR @S01@ > > 1 FAMS @F1794093@ > > 0 @I98CJ-BW@ INDI > > 1 NAME Percy William Growdon /BALDOCK/ > > 2 GIVN Percy William Growdon > > 2 SURN BALDOCK > > 1 AFN 98CJ-BW > > 1 SEX M > > 1 SOUR @S01@ > > 1 BIRT > > 2 DATE ABT 1876 > > 2 PLAC Geelong, Victoria, Australia > > 1 FAMC @F1794078@ > > 0 @I98BW-LP@ INDI > > 1 NAME Percy William Growdon /BALDOCK/ > > 2 GIVN Percy William Growdon > > 2 SURN BALDOCK > > 1 AFN 98BW-LP > > 1 SEX M > > 1 SOUR @S01@ > > 1 BIRT > > 2 DATE 1879 > > 2 PLAC Geelong, Victoria, Australia > > 1 DEAT > > 2 DATE 6 Sep 1886 > > 2 PLAC Geelong, Victoria, Australia > > 1 FAMC @F1794078@ > > 0 @I98BW-KJ@ INDI > > 1 NAME Arthur Jabez /BALDOCK/ > > 2 GIVN Arthur Jabez > > 2 SURN BALDOCK > > 1 AFN 98BW-KJ > > 1 SEX M > > 1 SOUR @S01@ > > 1 BIRT > > 2 DATE 1878 > > 2 PLAC Geelong, Victoria, Australia > > 1 FAMC @F1794078@ > > 0 @I98BW-P7@ INDI > > 1 NAME Gladys Claudine /BALDOCK/ > > 2 GIVN Gladys Claudine > > 2 SURN BALDOCK > > 1 AFN 98BW-P7 > > 1 SEX F > > 1 SOUR @S01@ > > 1 BIRT > > 2 DATE 1887 > > 2 PLAC Geelong, Victoria, Australia > > 1 DEAT > > 2 DATE 1907 > > 2 PLAC > > 1 FAMC @F1794078@ > > 0 @I98BW-N2@ INDI > > 1 NAME Clive Alfred /BALDOCK/ > > 2 GIVN Clive Alfred > > 2 SURN BALDOCK > > 1 AFN 98BW-N2 > > 1 SEX M > > 1 SOUR @S01@ > > 1 BIRT > > 2 DATE 1884 > > 2 PLAC Geelong, Victoria, Australia > > 1 DEAT > > 2 DATE 25 Oct 1951 > > 2 PLAC > > 1 FAMC @F1794078@ > > 0 @I98BW-MV@ INDI > > 1 NAME Lawrence /BALDOCK/ > > 2 GIVN Lawrence > > 2 SURN BALDOCK > > 1 AFN 98BW-MV > > 1 SEX M > > 1 SOUR @S01@ > > 1 BIRT > > 2 DATE 1881 > > 2 PLAC Geelong, Victoria, Australia > > 1 FAMC @F1794078@ > > 0 @F1794078@ FAM > > 1 HUSB @I3GLR-Z3@ > > 1 WIFE @I98BW-JC@ > > 1 CHIL @I98CJ-BW@ > > 1 CHIL @I98BW-LP@ > > 1 CHIL @I98BW-KJ@ > > 1 CHIL @I98BW-P7@ > > 1 CHIL @I98BW-N2@ > > 1 CHIL @I98BW-MV@ > > 1 MARR > > 2 DATE 20 Apr 1876 > > 2 PLAC Geelong, Victoria, Australia > > 0 @F524078@ FAM > > 1 HUSB @I3GLR-4R@ > > 1 WIFE @I3GLR-5X@ > > 1 CHIL @I3GLR-Z3@ > > 0 @F1794093@ FAM > > 1 HUSB @I98BX-N6@ > > 1 WIFE @I98BX-PC@ > > 1 CHIL @I98BW-JC@ > > 0 TRLR > > > > -- > > Steve Hayes from Tshwane, South Africa > > Blog: http://khanya.wordpress.com > > E-mail - see web page, or parse: shayes at dunelm full stop org full stop uk I agree for the most part. What I think is needed are a bunch of single purpose utilities that operate on GEDCOMs, sort of in the spirit of *nix. AWK scripts are fine if the person has AWK installed but I think the programming language is irrelevent. 15-20 years ago a wrote a couple of DOS utilities. One did nothing except remove dates and locations from living people. Another produced an HTML file from a GEDCOM. That's all they did. Later on I a wrote a small utility that did nothing except check for internal consistancy. Right now I would love a utility that does nothing except produce a descendancy chart. Or a utility that does nothing except produce group sheets. There is little reason for every genealogy program being able to fulfill every conceivable function. Think small. --
On Mon, 18 Feb 2013 19:10:22 -0600, Charlie Hoffpauir <invalid@invalid.com> wrote in soc.genealogy.computing: >On Mon, 18 Feb 2013 19:08:16 -0500, Dennis Lee Bieber ><wlfraed@ix.netcom.com> wrote: > >>On Mon, 18 Feb 2013 10:44:39 -0500, Denis Beauregard >><denis.b-at-francogene.com@fr.invalid> declaimed the following in >>soc.genealogy.computing: >> >> >>> Some lines may have an ID as 2nd field, then a tag as 3rd field. >>> This is common with level 0. >>> >>> 0 @I1@ INDI >>> 1 NAME Bernard /Landry/ >>> 2 GIVN Bernard >>> 2 SURN Landry >>> 1 SEX M >>> 1 BIRT >>> 2 DATE 9 MAR 1937 >>> 2 PLAC St-Jacques >>> 0 @I2@ INDI >>> 1 NAME Bernard /St-Jacques/ >>> 2 GIVN Bernard >>> 2 SURN St-Jacques >>> 1 SEX M >>> 1 FAMS @F1@ >>> 1 FAMC @F2@ >>> >>> Example of processing : replace all St-Jacques in PLAC fields >>> by "St-Jacques,,Qc," but not in NAME fields. >> >>PowerShell command (assuming there are none that already have extended >>text)[UNTESTED] >> >>get-content >>'path:to\file.ged' | foreach {$_ -replace >>'(.) PLAC (.*) (St-Jacques) (.*)', '$1 PLAC $2 $3,,Qc, $4'} >> >'path:to\new.ged' > >I'm impressed with both the Powershell and the AWK solutions to the >problem Denis posted. However "that" problem I could also correct >with traditional means, like a rather simple Word macro. However, as I >explained in a follow-up to Denis's post, the actual problem that >started the thread was a bit different. I'm sure either AWK or >Powershell can handle it too, but I do think it's a bit more >complicated. As someone "aspiring" to become proficient in either AWK >or Powershell, I'd really like to see that solution posted. > >In general, the difference is that the original problem was to replace >quote marks with some other character, when quote marks appear in the >name field, for an "arbitrary" name. That is, it is assumed that some >portion of the given names in the file, (but not all) have had quote >marks surrounding one or more of the given names, and these are the >characters that must be changed. So, this problem consists in : replacing "Hugh" "Hugh" by Hugh (or "Hugh") in all lines beginning with 1 NAME where "Hugh" "Hugh" would be any pair of similar names, i.e. "Denis" "Denis" or "Charles" "Charles". >From my experience with Brief, a text editor with regular expressions, I don't know how to define a duplicated word. Brief was not using the standard regular expressions but with it, something like "$1" "$1" was not accepted... Denis -- Denis Beauregard - généalogiste émérite (FQSG) Les Français d'Amérique du Nord - www.francogene.com/genealogie--quebec/ French in North America before 1722 - www.francogene.com/quebec--genealogy/ Sur cédérom à 1780 - On CD-ROM to 1780