RootsWeb.com Mailing Lists
Total: 2/2
    1. Using AWK to manipulate GEDCOM files
    2. Steve Hayes
    3. In an earlier message I suggested using AWK to manipulate a GEDCOM file to solve a particular problem. That point tended to get lost in discussion of other points like using other ways to solve the problem, or discussion of flaws in the GEDCOM data model itself and proposals for its replacement, which I see as a separate question. What I would like to see is the development of a kind of library of AWK routines to manipulate GEDCOM files. Lots of genealogists have GEDCOM files, and some would like to make changes to them, or extract information from them in ways that might not be possible with other genealogy programs. Here is a GEDCOM file. I tried to choose a short one to use as an example, which shows the structure of the file. 0 HEAD 1 SOUR ANSTFILE 2 VERS 4.19 2 NAME Ancestral File (R) 2 CORP The Church of Jesus Christ of Latter-day Saints 3 ADDR 50 East North Temple Street 4 CONT Salt Lake City, Utah 84150 2 DATA Ancestral File 3 DATE 5 January 1998 3 COPR Copyright (c) 1987, June 1998 1 DEST PAF 1 DATE 20 APR 2002 2 TIME 2:58:56 1 FILE GEDCOM4.ged 1 GEDC 2 VERS 5.5 2 FORM LINEAGE-LINKED 1 CHAR ANSEL 1 SUBM @SUB01@ 1 SUBN @N01@ 0 @SUB01@ SUBM 1 NAME Created by FamilySearch (TM) Internet Genealogy Service 1 ADDR 50 East North Temple Street 2 CONT Salt Lake City, Utah 84150 0 @S01@ SOUR 1 AUTH The Church of Jesus Christ of Latter-day Saints 1 TITL Ancestral File (R) 1 PUBL Copyright (c) 1987, June 1998, data as of 5 January 1998 1 REPO @R01@ 0 @R01@ REPO 1 NAME Family History Library 1 ADDR 35 N West Temple Street 2 CONT Salt Lake City, Utah 84150 USA 0 @N01@ SUBN 1 DESC 2 1 ORDI N 0 @I3GLR-Z3@ INDI 1 NAME Thomas William /BALDOCK/ 2 GIVN Thomas William 2 SURN BALDOCK 1 AFN 3GLR-Z3 1 SEX M 1 SOUR @S01@ 1 BIRT 2 DATE 1 Jul 1850 2 PLAC Geelong, Vic, Astl 1 FAMS @F1794078@ 1 FAMC @F524078@ 0 @I3GLR-4R@ INDI 1 NAME Thomas /BALDOCK/ 2 GIVN Thomas 2 SURN BALDOCK 1 AFN 3GLR-4R 1 SEX M 1 SOUR @S01@ 1 FAMS @F524078@ 0 @I3GLR-5X@ INDI 1 NAME Anne /CHAMBERS/ 2 GIVN Anne 2 SURN CHAMBERS 1 AFN 3GLR-5X 1 SEX F 1 SOUR @S01@ 1 FAMS @F524078@ 0 @I98BW-JC@ INDI 1 NAME Emily Jane /THORNTON/ 2 GIVN Emily Jane 2 SURN THORNTON 1 AFN 98BW-JC 1 SEX F 1 SOUR @S01@ 1 BIRT 2 DATE 1854 2 PLAC Geelong, Victoria, Australia 1 DEAT 2 DATE 9 Dec 1890 2 PLAC Geelong, Victoria, Australia 1 FAMS @F1794078@ 1 FAMC @F1794093@ 0 @I98BX-N6@ INDI 1 NAME Charles Edwin /THORNTON/ 2 GIVN Charles Edwin 2 SURN THORNTON 1 AFN 98BX-N6 1 SEX M 1 SOUR @S01@ 1 FAMS @F1794093@ 0 @I98BX-PC@ INDI 1 NAME Emily /GROWDON/ 2 GIVN Emily 2 SURN GROWDON 1 AFN 98BX-PC 1 SEX F 1 SOUR @S01@ 1 FAMS @F1794093@ 0 @I98CJ-BW@ INDI 1 NAME Percy William Growdon /BALDOCK/ 2 GIVN Percy William Growdon 2 SURN BALDOCK 1 AFN 98CJ-BW 1 SEX M 1 SOUR @S01@ 1 BIRT 2 DATE ABT 1876 2 PLAC Geelong, Victoria, Australia 1 FAMC @F1794078@ 0 @I98BW-LP@ INDI 1 NAME Percy William Growdon /BALDOCK/ 2 GIVN Percy William Growdon 2 SURN BALDOCK 1 AFN 98BW-LP 1 SEX M 1 SOUR @S01@ 1 BIRT 2 DATE 1879 2 PLAC Geelong, Victoria, Australia 1 DEAT 2 DATE 6 Sep 1886 2 PLAC Geelong, Victoria, Australia 1 FAMC @F1794078@ 0 @I98BW-KJ@ INDI 1 NAME Arthur Jabez /BALDOCK/ 2 GIVN Arthur Jabez 2 SURN BALDOCK 1 AFN 98BW-KJ 1 SEX M 1 SOUR @S01@ 1 BIRT 2 DATE 1878 2 PLAC Geelong, Victoria, Australia 1 FAMC @F1794078@ 0 @I98BW-P7@ INDI 1 NAME Gladys Claudine /BALDOCK/ 2 GIVN Gladys Claudine 2 SURN BALDOCK 1 AFN 98BW-P7 1 SEX F 1 SOUR @S01@ 1 BIRT 2 DATE 1887 2 PLAC Geelong, Victoria, Australia 1 DEAT 2 DATE 1907 2 PLAC 1 FAMC @F1794078@ 0 @I98BW-N2@ INDI 1 NAME Clive Alfred /BALDOCK/ 2 GIVN Clive Alfred 2 SURN BALDOCK 1 AFN 98BW-N2 1 SEX M 1 SOUR @S01@ 1 BIRT 2 DATE 1884 2 PLAC Geelong, Victoria, Australia 1 DEAT 2 DATE 25 Oct 1951 2 PLAC 1 FAMC @F1794078@ 0 @I98BW-MV@ INDI 1 NAME Lawrence /BALDOCK/ 2 GIVN Lawrence 2 SURN BALDOCK 1 AFN 98BW-MV 1 SEX M 1 SOUR @S01@ 1 BIRT 2 DATE 1881 2 PLAC Geelong, Victoria, Australia 1 FAMC @F1794078@ 0 @F1794078@ FAM 1 HUSB @I3GLR-Z3@ 1 WIFE @I98BW-JC@ 1 CHIL @I98CJ-BW@ 1 CHIL @I98BW-LP@ 1 CHIL @I98BW-KJ@ 1 CHIL @I98BW-P7@ 1 CHIL @I98BW-N2@ 1 CHIL @I98BW-MV@ 1 MARR 2 DATE 20 Apr 1876 2 PLAC Geelong, Victoria, Australia 0 @F524078@ FAM 1 HUSB @I3GLR-4R@ 1 WIFE @I3GLR-5X@ 1 CHIL @I3GLR-Z3@ 0 @F1794093@ FAM 1 HUSB @I98BX-N6@ 1 WIFE @I98BX-PC@ 1 CHIL @I98BW-JC@ 0 TRLR -- Steve Hayes from Tshwane, South Africa Blog: http://khanya.wordpress.com E-mail - see web page, or parse: shayes at dunelm full stop org full stop uk

    02/19/2013 04:14:52
    1. Re: Using AWK to manipulate GEDCOM files
    2. Steve Hayes
    3. On Tue, 19 Feb 2013 11:14:52 +0200, Steve Hayes <hayesstw@telkomsa.net> wrote: >What I would like to see is the development of a kind of library of AWK >routines to manipulate GEDCOM files. Lots of genealogists have GEDCOM files, >and some would like to make changes to them, or extract information from them >in ways that might not be possible with other genealogy programs. For those in soc.genealogy.computing who don't know what AWK is or does, here is a description (apologies to those in comp.lang.awk who already know this) * Gawk-3.1.6 for Windows * ========================== What is it? ----------- Gawk: pattern scanning and processing language Description ----------- Several kinds of tasks occur repeatedly when working with text files. You might want to extract certain lines and discard the rest. Or you may need to make changes wherever certain patterns appear, but leave the rest of the file alone. Writing single-use programs for these tasks in languages such as C, C++ or Pascal is time-consuming and inconvenient. Such jobs are often easier with awk. The awk utility interprets a special-purpose programming language that makes it easy to handle simple data-reformatting jobs. The GNU implementation of awk is called gawk; it is fully compatible with the System V Release 4 version of awk. gawk is also compatible with the POSIX specification of the awk language. This means that all properly written awk programs should work with gawk. Thus, we usually don’t distinguish between gawk and other awk implementations. Using awk allows you to: - Manage small, personal databases - Generate reports - Validate data - Produce indexes and perform other document preparation tasks - Experiment with algorithms that you can adapt later to other computer languages. In addition, gawk provides facilities that make it easy to: - Extract bits and pieces of data for processing - Sort data - Perform simple network communications. The Win32 port has some limitations, In particular the ‘|&’ operator and TCP/IP networking are not supported. Homepage -------- http://www.gnu.org/software/gawk/gawk.html Sources: http://ftp.gnu.org/gnu/gawk/gawk-3.1.6.tar.gz System ------ - Win32, i.e. MS-Windows 95 / 98 / ME / NT / 2000 / XP / 2003 / Vista with msvcrt.dll - if msvcrt.dll is not in your Windows/System folder, get it from Microsoft <http://support.microsoft.com/default.aspx?scid=kb;en-us;259403"> or by installing Internet Explorer 4.0 or higher <http://www.microsoft.com/windows/ie> Notes ----- - Bugs and questions on this MS-Windows port: gnuwin32@users.sourceforge.net Package Availability -------------------- - in: http://gnuwin32.sourceforge.net Installation ------------ Sources ------- - gawk-3.1.6-1-src.zip Compilation ----------- The package has been compiled with GNU auto-tools, GNU make, and Mingw (GCC for MS-Windows). Any differences from the original sources are given in gawk-3.1.6-1-GnuWin32.diffs in gawk-3.1.6-1- src.zip. Libraries needed for compilation can be found at the lines starting with 'LIBS = ' in the Makefiles. Usually, these are standard libraries provided with Mingw, or libraries from the package itself; 'gw32c' refers to the libgw32c package, which provides MS-Windows substitutes or stubs for functions normally found in Unix. For more information, see: http://gnuwin32.sourceforge.net/compile.html and http://gnuwin32.sourceforge.net/packages/libgw32c.htm. -- Steve Hayes from Tshwane, South Africa Blog: http://khanya.wordpress.com E-mail - see web page, or parse: shayes at dunelm full stop org full stop uk

    02/19/2013 07:33:29