Note: The Rootsweb Mailing Lists will be shut down on April 6, 2023. (More info)
RootsWeb.com Mailing Lists
Total: 1/1
    1. Re: How Should We Store Evidence in Genealogical Databases?
    2. Richard Smith
    3. On May 27, 3:53 pm, Bob Melson <[email protected]> wrote: > Seems to me at least one requirement for universality is that the same data > input on different machines results in the same output on those machines. No, you're misunderstanding the meaning of the 'universal' in a UUID (or 'global' in a GUID, if you prefer the Microsoft terminology). The guarantee that a UUID provides is that every time you generate one, it will unique across all databases, past, present or future, on any computer, anywhere in the world. UUIDs are not necessarily generated for any particular data, though in practice in a genealogical application, they may well be associated with a piece of data, such as a person, event or place. What you're asking for is not possible with a UUID, partly because there's no concept of generating a UUID for a specific piece of data -- there is no input when you generate a UUID. What you're asking for sounds more like a hash. This is where you take everything you know (or perhaps just a specific part of the data), and use it to generate a number -- the hash -- which can be used as an identifier. Each time you have the same input, you'll get the same hash out. But that's not actually very useful for genealogy. We've all got individuals on our family trees about whom we know very little, and what we do know is poorly documented. Someone in the family (but you can't remember who) said that second-cousin Bob had a nephew called John Smith who lived in London. All we know of this John Smith is that he lived in London. That must describe hundreds or thousands of different people. How do we ensure that the genuinely different John Smiths end up with different identifiers, while also ensuring that two different researchers without co-operating or even mutual knowledge of each other, can end up assigning the same identifier to the same individual even if they both have exactly the same information? It can't be done. It's all well and good saying we'd like it, but it's a technical impossibility. Even with the researchers co- operating in generating identifiers (for example, by using some central internet-based generator), it can't be done because "John Smith in London" simply isn't a unique handle. So we must compromise. Either we lose uniqueness -- that is, accept that two different people might sometimes get assigned the same "unique" identifier. Or we lose repeatability -- that is, accept that sometimes the same data will lead to a the same people, with the same known information, being assigned multiple identifiers. In the former case, a hash is a good implementation strategy; in the latter, a UUID is good. Or we may decide that because we've lost at least one of these guarantees, we may as well lose both an go for a simpler implementation, such as the xrefs used in GEDCOM (these are the I0001- type things). Richard

    05/27/2011 11:47:33