RootsWeb.com Mailing Lists
Total: 3/3
    1. Re: Place Name Gazetteer Formats
    2. Tony Proctor
    3. "Tom Wetmore" <ttw4@verizon.net> wrote in message news:9581b106-022b-4e7f-89cd-ed0f654e15da@googlegroups.com... > In an associated thread I mentioned formats for gazetteers that could be > used as a "database" for implementing multiple hierarchy place > authorities. > > Here is a too simple example format, with a few examples to give an idea > of a > very simple approach. A complete gazetteer would obviously have many > millions of entries. > > Example Format: > > uniqueId : name : timePeriod : language : type : parentId* > > Example Gazetteer File: > > 10101: North America : : en : Continent > 34343: Europe : : en : Continent > > 11111: United States : 1776 to present : en : NationState: 10101 > 44444: United Kingdom: xxxx to xxxx : en : NationState: 34343 > 89999: Denmark : xxxx to xxxx : en : NationState : 34343 > > 22222: New England : : en : InformalRegion : 11111 > > 33333: Connecticut : 1776 to present : en : ProvinceState: 11111, 22222 > 55555: Connecticut Colony : 1636 to 1776 : en : Colony : 44444, 22222 > > 34543: New London : 1636 to 1776 : en : County : 55555 > 34544: New London : 1776 to present : en : County : 33333 > > 34643: New London : 1636 to 1776 : en : City : 34543 > 34643: New London : 1776 to present : en : City : 34544 > > 66666: Great Britain : : en : Island : 34343 > > 77777: Greenland : : en : Island : 10101 > 88888: Greenland : : en : Dependency: 89999 > > 80808: Isle of Man : xxxx to xxxx : en : CrownDependency : 44444 > 81818: Andreas : xxxx to xxxx : en : CivilParish : 80808 > > Omit dates where the name is "timeless" (e.g., North America). > There would be a relatively small number of types. I've suggested > some of them here. > > Some names are used twice as part of different > hierarchies. Names are ambiguous when not fully > specified. It's the way the world works. Deal with it. > In a real gazetteer some names would be used > hundreds of times, fortunately each with a uniqueID. > > Note that United Kingdom is a NationState whereas > Great Britain is "just" an Island in Europe. That's > the way of it. Homework: Add England, > Scotland, Wales, Northern Ireland. What > type would you give them? > > Try adding "British Isles" (what type would you > give them) and handling both > the NationState of Ireland and the Island of Ireland. > > Note how Greenland is both an Island (so in North > America) and a member of the Danish > Commonwealth (so "in" Denmark). Note how in > the case of Greenland we need to worry about > the difference between political containment > and geographical containment. That is, > Greenland, politically part of Denmark, is not > geographically part of Denmark or Europe! > Lots of complications, but not uncontrollable. > Smart people can figure this stuff out. > > I've shown the tip of a large iceberg here. > I've avoided certain issues, such as "official" > names, "short form" names, which would require > gazetteer entries to have more properties > than are shown in this simple example. So > the format would clearly need to have more > "columns." How many do you think there would > have to be? Political hierarchies will get > real nasty real fast. Try to imagine what would > be needed to cover all names found in what is > today Poland over the past three centuries! > > This would be a massive undertaking, but > I think the path is pretty clear. I already have > some massive gazetteer files very similar to this > format for a few of the software programs > I have written. > > Tom Wetmore The type does not need a language-code prefix Tom. It should be defined as part of a controlled vocabulary, and is hence part of the data syntax. That type should always be mapped to a descriptive term appropriate for the locale of the end-user, but that is separate from the syntax of the data itself. I see no support for alternative spellings of places, either in the same language or in separate languages. Many places have alternative spellings in the normal language of that locality (especially over time), but in dual-language regions then there will be different names in different languages too. Tony Proctor

    10/10/2012 03:28:27
    1. Re: Place Name Gazetteer Formats
    2. Tom Wetmore
    3. Tony, The language field is not for the type but for the name itself. Sorry for the confusion. Alternate spellings in the same language can be handled by allowing name fields to to contain comma-separated lists. Alternate spelling in different languages can be handled by either: 1) adding language namespaces to names; or 2) adding a new line in the gazetteer for each language. I would suggest that "official" names and "short form" names be handled by new columns. The example is only a suggestion intended to get some discussion going. I was interested is demonstrating how easy it is to specify multiple-containment data in a way that would allow a sophisticated place authority to operate.

    10/09/2012 10:34:16
    1. Re: Place Name Gazetteer Formats
    2. Tony Proctor
    3. "Tom Wetmore" <ttw4@verizon.net> wrote in message news:d0200e0f-68c2-4451-b47b-51c3cbbddb7a@googlegroups.com... > Tony, > > The language field is not for the type but for the name itself. Sorry for > the confusion. > > Alternate spellings in the same language can be handled by allowing name > fields to to contain > comma-separated lists. Alternate spelling in different languages can be > handled by either: > > 1) adding language namespaces to names; or > 2) adding a new line in the gazetteer for each language. > > I would suggest that "official" names and "short form" names be handled by > new columns. > > The example is only a suggestion intended to get some discussion going. I > was interested > is demonstrating how easy it is to specify multiple-containment data in a > way that would > allow a sophisticated place authority to operate. Comparing the STEMMA spec (http://www.familyhistorydata.parallaxview.co/home/document-structure/place/place-names) with your gazetteer format, Tom, the only substantial difference is that STEMMA supports date ranges on the alternative names in order to cope with renames as opposed to simply alternative spellings or colloquialisms. There are some smaller differences such as: STEMMA provides a canonical name to be used for display purposes (as distinct from the alternatives accepted during input and matching algorithms), and STEMMA's dates can be from alternative calendars (not just Gregorian). Tony Proctor

    10/10/2012 07:18:54