Note: The Rootsweb Mailing Lists will be shut down on April 6, 2023. (More info)
RootsWeb.com Mailing Lists
Total: 7/7
    1. Re: How Should We Store Evidence in Genealogical Databases?
    2. Ian Goddard
    3. Bob Melson wrote: > On Sunday 29 May 2011 09:57, Ian Goddard ([email protected]) opined: > >> Bob Melson wrote: >>> Y'all'd have to check the archives for the down'n'dirty, but Wes and I >>> and I don't remember who all else had a discussion of this very thing >>> (_UID, >>> _UUID, _GUID) some time back. I contended then that UUID (Universal >>> Unique ID) and it's close kin, UID and GUID, meant that the number >>> assigned to Cousin Mortimer should be the same universally - here, >>> there, >>> wherever it appears. Not so. The "universe" is the machine where the > > My original (erroneous?) contention was that identical data should result > in identical UUIDs irrespective of where generated, so that Cousin > Mortimer's record will produce the same UUID whether done on your machine > or mine or somebody else's. As Wes pointed out, this effectively > describes a checksum and, in honesty, I have to admit that that's very > much how I conceptualized *IDs. > > It appears, however, that Cousin Mortimer's record will result in a unique > ID on every machine on which a *ID is generated, so for N copies of the > record on N machines there will be N unique identifiers. And that raises > a question about the utility of those identifiers - if every one is > unique, how do/can we know that they refer to the same record? > > The other component of my now thoroughly beaten-down original belief was > that the *IDs were somehow akin to a digital signature, on the order > of "none genuine without this ID". Assuming _that_ and that I've > published my public key, anybody coming across ol' Mort's record anywhere > could verify (1) that it came from me and (2) whether it's identical with > the original. That obviously won't work with *IDs as they are at present. > (Still, I find the idea of a per record digital signature attractive, tho' > that's an entirely different topic for discussion). > So someone transcribes a record of ol' Mort & writes out his surname with a capitalised initial because that's what's in the original and someone else transcribes the same record exactly except that they give the surname in upper case because that's the convention they adopt. Oops! Different checksums. Another example. Older church registers tend to have headings for year and month (or a page heading for year with intermediate headings for month) and just the day of month on a per record. So the transcriber has to make up his own way of representing dates which might differ from transcriber to transcriber so again ol' Mort could get his name transcribed using the same convention but get the dates represented differently. Again, different checksums. So the checksumming depends not only on the data but also on the way it was transcribed. And of course, different representations could be imposed by the S/W which might insist on upper-case surnames or different date formats. I think there are two separate issues. One is the unambiguous identification of the same publication and the other is its authentication. The latter is the more difficult. Not only are there the sort of problems I've outlined above but also the problem of maintaining a PKI (public key infrastructure). When I suggested what was, in effect, a distributed database using exchanges of UUID-labeled XML records I considered the possibility of digital signing. On balance I decided to leave it out as: 1. XML introduces even more variations. Consider, for example; <EmptyTag></EmptyTag> <EmptyTag/> both of which are equivalent but will checksum differently. 2. The PKI would also have to be distributed. 3. If someone did decide to massage a record they could simply re-sign it with their own private key so the advantages could be illusory. -- Ian The Hotmail address is my spam-bin. Real mail address is iang at austonley org uk

    05/29/2011 05:23:08
    1. Re: How Should We Store Evidence in Genealogical Databases?
    2. Bob Melson
    3. -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Sunday 29 May 2011 16:23, Ian Goddard ([email protected]) opined: > Bob Melson wrote: >> On Sunday 29 May 2011 09:57, Ian Goddard ([email protected]) >> opined: >> >>> Bob Melson wrote: >>>> Y'all'd have to check the archives for the down'n'dirty, but Wes and I >>>> and I don't remember who all else had a discussion of this very thing >>>> (_UID, >>>> _UUID, _GUID) some time back. I contended then that UUID (Universal >>>> Unique ID) and it's close kin, UID and GUID, meant that the number >>>> assigned to Cousin Mortimer should be the same universally - here, >>>> there, >>>> wherever it appears. Not so. The "universe" is the machine where the >> >> My original (erroneous?) contention was that identical data should >> result in identical UUIDs irrespective of where generated, so that >> Cousin Mortimer's record will produce the same UUID whether done on your >> machine >> or mine or somebody else's. As Wes pointed out, this effectively >> describes a checksum and, in honesty, I have to admit that that's very >> much how I conceptualized *IDs. >> >> It appears, however, that Cousin Mortimer's record will result in a >> unique ID on every machine on which a *ID is generated, so for N copies >> of the >> record on N machines there will be N unique identifiers. And that >> raises a question about the utility of those identifiers - if every one >> is unique, how do/can we know that they refer to the same record? >> >> The other component of my now thoroughly beaten-down original belief was >> that the *IDs were somehow akin to a digital signature, on the order >> of "none genuine without this ID". Assuming _that_ and that I've >> published my public key, anybody coming across ol' Mort's record >> anywhere could verify (1) that it came from me and (2) whether it's >> identical with >> the original. That obviously won't work with *IDs as they are at >> present. (Still, I find the idea of a per record digital signature >> attractive, tho' that's an entirely different topic for discussion). >> > > So someone transcribes a record of ol' Mort & writes out his surname > with a capitalised initial because that's what's in the original and > someone else transcribes the same record exactly except that they give > the surname in upper case because that's the convention they adopt. > Oops! Different checksums. > > Another example. Older church registers tend to have headings for year > and month (or a page heading for year with intermediate headings for > month) and just the day of month on a per record. So the transcriber > has to make up his own way of representing dates which might differ from > transcriber to transcriber so again ol' Mort could get his name > transcribed using the same convention but get the dates represented > differently. Again, different checksums. > > So the checksumming depends not only on the data but also on the way it > was transcribed. And of course, different representations could be > imposed by the S/W which might insist on upper-case surnames or > different date formats. You're quite right and that's why I early on, well early, well abandoned the idea that the *IDs were in some way a super-checksum. > > I think there are two separate issues. One is the unambiguous > identification of the same publication and the other is its > authentication. And, indeed, they are. Recognizing that, however, took a bit of time because of my failure to grasp the *ID "concept". I knew what I wanted and expected *IDs to be and that was not at all what they are in the real world. > > The latter is the more difficult. Not only are there the sort of > problems I've outlined above but also the problem of maintaining a PKI > (public key infrastructure). The PKI is already there, as just about everybody who uses SSH will tell you. The problem is not the PKI but the chain of trust involved. Ideally, my key will have been validated as to its trustworthiness by somebody who already has a trusted key and who knows me, this before I submit my key to the "registry". Finding somebody to sign your newly generated key with his already signed/trusted key can be a real PITA. > > When I suggested what was, in effect, a distributed database using > exchanges of UUID-labeled XML records I considered the possibility of > digital signing. On balance I decided to leave it out as: > > 1. XML introduces even more variations. Consider, for example; > > <EmptyTag></EmptyTag> > <EmptyTag/> > > both of which are equivalent but will checksum differently. And so? I don't see a problem here WRT digital signatures. > > 2. The PKI would also have to be distributed. Moot. See above. > > 3. If someone did decide to massage a record they could simply re-sign > it with their own private key so the advantages could be illusory. Yep. But with that signed record you'd be able to determine who it was who made the modification - or merely re-signed the record. There's no doubt that there are problems with the idea of digitally signing a record/persona, not least how you'd go about handling the person who can't be bothered to obtain the appropriate software, generate and validate a key pair, or out of pure don't-give-a-damn contrariness signs nothing he throws into the genealogy pool. Those we'll always have with us, I suspect. The advantage of digital signing, as I see it, is that if I sign a record and either publish it or send it directly to you, you can easily verify that it _did_ come from me and, based on whether you trust my research and conclusions can decide whether to accept it or not. Conversely, if you receive or find something purporting to come from me but the signature check fails, you have reason to question the data - it's been changed in some unknown way. What can't be controlled is the case in which Snively Whiplash copies my information, deletes the signature and resigns the otherwise unchanged information with his own key, claiming it as his own. That's already a problem, of course, for which there's no ready solution. Bob - -- Robert G. Melson | Rio Grande MicroSolutions | El Paso, Texas - ----- The greatest tyrannies are always perpetrated in the name of the noblest causes -- Thomas Paine -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.17 (FreeBSD) iEYEARECAAYFAk3i0/4ACgkQGX60pjRVDrPl2wCfbLGOaQHuIP7hc4Tf9G45fnRn rTkAn3Jqq0x6KMz/bWOixbhn+dNV9blQ =DRW+ -----END PGP SIGNATURE-----

    05/29/2011 11:17:00
    1. Re: How Should We Store Evidence in Genealogical Databases?
    2. Ian Goddard
    3. Bob Melson wrote: >> 2. The PKI would also have to be distributed. > > Moot. See above. What I had in mind was this: someresource.org publishes a load of useful stuff and digitally signs it with the public key also being published on the site. The records from this site are much used and passed from one researcher to another so the records become part of a distributed database. 10 years later someresource.org goes off-line - it's funding runs out; it was a one-man operation and the one man looses interest or dies; whatever. Someone picks up a copy of one of someresource.org's records and wants to check the digital signature. They can't because the public key is no longer there. Unless, of course, copies of the public key are also being distributed with the other records. OK, so instead of everyone publishing their keys alongside the record they publish them at someotherresource.org. But 10 years later..... -- Ian The Hotmail address is my spam-bin. Real mail address is iang at austonley org uk

    05/30/2011 05:06:24
    1. Trust in genealogical applications [Was: Re: How Should We Store Evidence in Genealogical Databases?
    2. Richard Smith
    3. [Pulling together the replies to two posts by Ian Goddard as they seem to overlap.] On May 30, 11:40 am, Ian Goddard <[email protected]> wrote: > >> 3. If someone did decide to massage a record they could simply re-sign > >> it with their own private key so the advantages could be illusory. > > > They've signed it with their private key rather than the original > > source's private key. This is easily detectable. > > Not really. You would end up with two versions of the record, each > correctly signed with a private key. You only know which is the > original version if you know the original source. Well, a signed piece of a data is only as trustworthy as the signing party. All that the digital signature guarantees is that the data hasn't been changed since it was signed. How the data was produced is beyond the scope of a digital signature. Specifically: digital signatures are not designed to prevent or detect unauthorised copying. If you have two versions of the record, each correctly signed, then you have to make a decision on which (if either) party you trust. But this is just the same if you've only got one piece of signed data? Should you trust the author? Even today, with unsigned data, you have that problem. On May 30, 11:06 am, Ian Goddard <[email protected]> wrote: > Someone picks up a copy of one of someresource.org's records and wants > to check the digital signature.  They can't because the public key is no > longer there.  Unless, of course, copies of the public key are also > being distributed with the other records. It's a common enough problem, and I think allowing for signed distribution of the old key is the usual way to handle it. It should be standard practice for applications to store the public key of any source they've ever used. This allows for detection of a man-in-the-middle attack -- in practice, we probably don't care too much about this in a genealogy application, but assuming we're using an existing PKI, we'll get it for free. Suppose someresource.org (who originally signed the data) is no more and you decide to send me a copy of some of the data that you had downloaded from the site. You have a copy of their public key (because your software stores these indefinitely), and a copy of the signed data. So you sign both of these and pass them on to me. So I now have your public key, their public key signed by you, and their data signed by them and then by you. Perhaps I now make some inferences based on this data -- for example, how the individuals named in it are related. Some time later, I may wish to pass this on: so I sign your public key; add my signature to yours on the someresource.org key and on the data from them; and encode my inferences and sign them. I then hand over four things to the third party, who simultaneously gets my public key. What we're forming is a chain of trust. We all trust someresource.org to ship data of a good quality, double-checked against the sources. I trust you to be honest -- that is, not to alter the data from someresource.org and then re-sign it with a different key. It doesn't matter whether or not I trust you to be a good genealogist. The third party who I pass everything on to needs to trust me to be honest and competent to make the inferences I have and trust you to be honest. A few years later, I might get a copy of the someresource.org key from someone else, and I can now verify that you were indeed being honest in passing the data on to me. This might make me more inclined to trust you in the future. And so on... > OK, so instead of everyone publishing their keys alongside the record > they publish them at someotherresource.org.  But 10 years later..... The solution there is to have lots of sites all storing keys, and all acting as backups of each other. This already exists. They're called public key servers. Assuming a genealogy application uses an existing PKI, it can simply make use of existing key servers. It's incredibly unlikely that all of these would vanish simultaneously, and because there are lots of them, someone wishing to compromise the system would need to compromise most of the key servers simultaneously. That's highly unlikely to happen. And with the advent of key servers, a large part of the chain of trust vanishes. If I got data from A via B, C, D and E, then without a key server, I need to trust each of these parties; but with a key server, I only need to trust everything back from the earliest link in the chain that has a key on a public key server. So if B, D and E all have keys on some key servers, then I only need to trust A and B; and, of course, the key servers themselves, but trust there is established by vote across many servers. The good thing about this sort of distributed system is that many of the problems have already been solved -- often by peer-to-peer networks with nefarious purposes. Richard

    05/29/2011 11:45:17
    1. Re: How Should We Store Evidence in Genealogical Databases?
    2. Richard Smith
    3. On May 29, 11:23 pm, Ian Goddard <[email protected]> wrote: > So someone transcribes a record of ol' Mort & writes out his surname > with a capitalised initial because that's what's in the original and > someone else transcribes the same record exactly except that they give > the surname in upper case because that's the convention they adopt. > Oops! Different checksums. [snip another similar example] The problem exists the other way around, too. Suppose I want to record that a relative (but I've forgotten exactly who) told me that my grandfather had a second-cousin called John Smith who lived in London. There must be lots of John Smiths in London, many of whom are second cousins to someone's grandfather, so we get the same checksum for several different people (or for anecdotes about different people if the checksum applies at the source level). Even if "my grandfather" is always interpreted as *my* grandfather, rather than the author's grandfather, it's quite possible that it might still refer to multiple people. After all, it's a common name; doubly so in the family of someone with the surname Smith. > I think there are two separate issues.  One is the unambiguous > identification of the same publication and the other is its authentication. > > The latter is the more difficult.  Not only are there the sort of > problems I've outlined above but also the problem of maintaining a PKI > (public key infrastructure). > > When I suggested what was, in effect, a distributed database using > exchanges of UUID-labeled XML records I considered the possibility of > digital signing.  On balance I decided to leave it out as: It's not a trivial problem, but large parts of it have been solved. "XML DSIG" is a good search term for more information. > 1. XML introduces even more variations.  Consider, for example; > > <EmptyTag></EmptyTag> > <EmptyTag/> > > both of which are equivalent but will checksum differently. There's something called XML Canonicalisation that handles all of the low-level aspects of this. There may still be a need for application- level canonicalisation, if there are several ways of expressing the same information, such as: <persona id="p1"> <name>John Smith</name> <relation type="father" ref="p2"/> </persona> <persona id="p2"> <name>George Smith</name> </persona> and <persona id="p1"> <relation type="father"> <persona id="p2"> <name>George Smith</name> </persona> </relation> </persona> > 2. The PKI would also have to be distributed. Most such infrastructures are already distributed. I think implementing a new PKI from scratch would be a lot of work for no obvious benefit, when compared to using an existing PKI. > 3. If someone did decide to massage a record they could simply re-sign > it with their own private key so the advantages could be illusory.oul They've signed it with their private key rather than the original source's private key. This is easily detectable. Of course, if they were the original source and have subsequently decided to alter it and resign it, that's not something you can detect. But the point of cryptographic signatures is to allow you to verify that the data is the form the author sent it in, not to verify when the author produced it and whether he's subsequently altered it. Richard

    05/29/2011 01:25:13
    1. Re: How Should We Store Evidence in Genealogical Databases?
    2. Ian Goddard
    3. Richard Smith wrote: > On May 29, 11:23 pm, Ian Goddard <[email protected]> wrote: >> 1. XML introduces even more variations. Consider, for example; >> >> <EmptyTag></EmptyTag> >> <EmptyTag/> >> >> both of which are equivalent but will checksum differently. > > There's something called XML Canonicalisation that handles all of the > low-level aspects of this. There may still be a need for application- > level canonicalisation, if there are several ways of expressing the > same information, such as: > > <persona id="p1"> > <name>John Smith</name> > <relation type="father" ref="p2"/> > </persona> > <persona id="p2"> > <name>George Smith</name> > </persona> > > and > > <persona id="p1"> > <relation type="father"> > <persona id="p2"> > <name>George Smith</name> > </persona> > </relation> > </persona> If the schema is sufficiently tied down this shouldn't be a problem - you'd only allow one form of expression. >> 3. If someone did decide to massage a record they could simply re-sign >> it with their own private key so the advantages could be illusory. > > They've signed it with their private key rather than the original > source's private key. This is easily detectable. Not really. You would end up with two versions of the record, each correctly signed with a private key. You only know which is the original version if you know the original source. -- Ian The Hotmail address is my spam-bin. Real mail address is iang at austonley org uk

    05/30/2011 05:40:58
    1. Re: How Should We Store Evidence in Genealogical Databases?
    2. Wes Groleau
    3. On 05-29-2011 18:23, Ian Goddard wrote: > So someone transcribes a record of ol' Mort & writes out his surname > with a capitalised initial because that's what's in the original and > someone else transcribes the same record exactly except that they give > the surname in upper case because that's the convention they adopt. > Oops! Different checksums. There are different checksum algorithms. One might write one that does certain trivial transforms before summing to avoid such differences. But even if the records are identical in content, typography, and layout, that alone doesn't guarantee they represent the same person. And since the checksum by itself is useless, the data must be sent--at which point you could do your own comparison without the checksum. -- Wes Groleau There are two types of people in the world … http://Ideas.Lang-Learn.us/barrett?itemid=1157

    05/30/2011 07:03:39