On Monday, July 4, 2016 at 7:05:46 PM UTC+1, Stewart Baldwin via wrote: > On 7/4/2016 6:56 AM, WJH via wrote: > > > Never one to refuse a challenge, here are some starting thoughts. To my mind these could apply to any genealogical "wiki" more-or less irrespective of its specific purpose. > > Obviously, most of your items have the potential to generate long > separate discussions. I will make some brief comments on a few of them > and some general comments. I've interpolated a few responses and brought some other points together to address what is emerging as the key issue at the end hereof. > > > b. multiple alternates for elements such as dates, places, names etc. (FS - P) > > I'm not sure exactly what this means. If it means that alternate > spellings, etc. should be indicated, then there should be an effort to > keep it from cluttering things too much. If it means providing > alternates in cases of genuine controversy, then I believe that the > entry should be "unknown" ("uncertain", etc.), with a separate > discussion of the issues. > > > d. merging that preserves data as alternates (FS - N) > > If done right to begin with, there should be little need for merging. > When necessary, merging should be done with extreme care, one individual > at a time, with reference to the primary sources in case of conflict. > In cases of controversy, reasonable merging cannot be done reasonably by > someone unfamiliar with the primary evidence, and any data that is > clearly wrong should not be kept (or should be placed in a clearly > marked "trash bin"). > > > e. bitlock style data history to ensure auditability (FS - P) > > I have no idea what that is. Essentially it's the ultimate in database integrity software that means that every change is logged and auditable > > > f. data deletion to require justification (FS - P) > > "No known source" should be sufficient reason for deletion (or > temporarily placing in a "trash bin"). > > > i. provision for alternate "universes" where data is capable of more than one interpretation, ideally with ability to view side-by-side (FS - N) > > Maintaining data structures and allowing the flexibility needed for > specific cases are in clear conflict here. I'm not sure how you could > do this, as the presentation of complicated issues almost always needs > to be organized by some creative process if it is to be comprehensible. > > > j. automatic recognition / translation of old and new style dates in appropriate periods (FS - N) > > Clearly, it is a good idea to have a O.S./N.S. label, but in my opinion, > "translation" from O.S. to N.S. is almost always a bad idea, except > where some sort of chronological discussion is necessary. (Conversions > from other calendars, like the Islamic or Jewish calendars, are another > story). Enough flexibility has to be left for unusual cases. Before, > after, and "boxed-in" (between) dates need to be supported. Dates > obtained by calculation should be clearly labelled, e.g., b. ca. 1535 > (calc.), or b. ca. 1535 (aged 26 in 1561 [with source]). A clear > distinction needs to be made between approximate dates (dates proven by > known evidence to be close to the indicated date, usually indicated by > "about" or "circa/ca.") and estimated dates (dates estimated by more > general considerations, such as generation length or typical age at > marriage, usually indicated in the scholarly genealogical literature by > the word "say", e.g. "b. say 1520"). I'm really thinking about the transition period (which I accept doesn't concern medievalists) where it's not always clear whether a date is OS or NS and so any comparison with other sources needs to bear in mind that that date may be equivalent to one 11 days earlier OR later. > > > 4. data elements to be colour-coded according to source: e.g. reddish - no source; orangey - secondary source; yellowy - primary source transcription; greeny - primary source viewed; bluey - primary source image attached. Fixed formats for sources defined by moderators. (FS - N). > > In my opinion, including any unsourced information in the principle > account of an individual is a very bad idea, especially for medieval > individuals. If you really want a color scheme, how about reddish for a > secondary source which does not cite primary evidence, and "orangey" for > a secondary source which gives the primary evidence (preferably cited > indirectly). In the medieval period, if you can't find a pre-Internet > published source for some information in somebody's GEDCOM file, then it > is almost certainly bad information, and one of the points of such an > exercise is to help prevent the spread of bad information. If you want > to keep the information as a potential clue, why not have a separate > "scratch-paper" page for the individual (preferably with a prominent > in-your-face dire warning that the information might be false) > containing unconfirmed information that might be used as a finding aid > (and perhaps as an incentive for those who do have the proof to produce > it). Such a page (or a separate page) might also include disproven or > discredited information on the individual, perhaps with a "red flag" > attached to data that is suspicious but has not actually been > disproven. Agreed - especially since it's easier to prove a negative than a positive... However, in the medieval period, unsourced dates of birth or > places of birth or death are usually just guesswork, and should just be > scrapped to start with, unless there is some indication that the data > might provide a useful clue. Also agreed > > > 5. A separate place name database, subject to similar rules as regards sources, with different matrices e.g Saxon shires & hundreds, Domesday, English historic, English administrative pre-1973, registration districts etc. with relevant dates. (A particular bug-bear of mine stemming from being a Geographer at heart) (FS - P) > > And if you are going to do this, separate lists of office holders, with > appropriate links, seems like a good idea. Agreed > > > 6. ideally linked archive of sources in public domain to permit internal rather than external links. (FS - P) > > A huge project in itself. However, as appealing as items 5 and 6 are, > spreading the efforts too thin seems like a recipe for disaster. If > high quality is the main goal, it is best to start small. Also agreed, but at this stage I feel we're dealing with ideals rather than practicalities... > > > One additional issue that is unavoidable in genealogy, and especially > difficult in any thinly documented period or place, is the problem of > identification. Many (probably even a majority) of the errors which > occur in genealogy are due either directly or indirectly to the > incorrect identification of individuals of the same (or similar) names > appearing in two or more different records. When genealogists are trying > to verify research done by others in the primary sources, the step that > they are most likely to overlook is verifying that the John Smiths > appearing in two different records were in fact the same man. You have no idea how many William Haddocks there were within a day's travel of Wearside by 1700 and the problem gets worse in the 19th century. The problem's even worse if you're interested in Hudsons or Robsons.... When > genealogists are researching a large number of individuals with the same > surname, one of the problems they face is sorting out who was who, and > even when the identifications are all correct, the reasoning behind them > is not always adequately spelled out. Even for the experienced > genealogist writing up his/her results, deciding how much discussion to > devote to identification of individuals can be difficult. Some such > discussions would amount to "beating a dead horse" (e.g., a man with an > unusual name appearing in many records in a given location, with no > indications of a namesake in the area), and sometimes the > identifications are clear from the other evidence presented (at least to > an experienced genealogist), even if they are not explicitly discussed. > > A good "reference" website would need a reasonable way of dealing with > the identification of individuals. If an identification is well > documented, but not reasonably clear in the context of the evidence > mentioned (e.g., in cases of significant geographical moves), there > should at least be a comment to see such-and-such a source for proof > that the John Smith in record #1 is the same as the John Smythe in > record #2. Cases involving significant disagreement among scholars > regarding identification are much more difficult, and extremely numerous > in medieval genealogy. See, for example, the page on Hunroch, count of > Ternois, in the Henry Project. Some way needs to be found to deal with > such difficult situations, preferably without catering to the > fill-in-the-blank mentality of many amateur genealogists. The more that > amateurs are forced to think independently about such issues, and the > more they are led to the better sources which allow them to do so in an > informed manner, then the more they might become experienced enough to > make worthwhile contributions to such a project. > > There is one obvious question that hasn't been mentioned yet. Who would > host such a website and provide the necessary funding? Who would have > ultimate control? Any profit-based website is likely to eventually get > corrupted by forces that are more interested in profit than in accuracy, > and any genuinely high-quality genealogical website is going to have to > reject the work of many amateurs, some of whom would not be happy that > their "ancestors" were rejected (i.e., "angry customers"). I think that > some sort of academic-based support would be necessary to ensure the > necessary quality. Again, that's a practicality, which I agree is important, but falls outside the request to create a set of rules... Having said that, my suspicion is these issues might well end up making an organisation that's not in it for profit quite attractive, which might mean wikipedia or it might mean our friends in Utah. > > Stewart Baldwin The big issue that emerges from this is the need to allow users to identify what is proven "fact" and what is "hypothesis" requiring subsequent confirmation. The rules I've proposed try to deal with that in a number of different ways (colour coding, etc.; allowing alternates; use of long-form identification, parallel universes etc.) some of which are effectively the same as Stewart's methods (e.g. scratch paper) but ultimately it will be the boundary between the two that the moderators will need to police. If we get that right most of the rest will probably fall into place. A related issue is how concepts such as Occam's razor or the balance of probabilities should be allowed to be used. Do we just allow users or moderators to assign percentage probabilities to different scenarios with the most likely being accepted pro tem, or does the fact that there's another possibility, however unlikely, mean that the simplest idea should be forever treated as "unproven". Looking at it another way, what is the burden of proof threshold before a "hypothesis" becomes a fact, or doe have some sort of sliding scale? Another issue that I suspect will have to be dealt with at some point is the impact of DNA analysis on questions of "legitimacy" of descent and parental identification. Putting it crudely, how do we deal with unacknowledged or unknown bastards? (There's been a recent court case in the UK that has reassigned a clan chiefdom based on DNA evidence even though the illegitimacy happened while the mother was married and her husband accepted the son as his own...) NB one wrinkle I didn't address directly is that every date field should have a linked field containing qualifiers such as before, after, about etc. Regards James
On 06/07/16 12:12, WJH wrote: > I'm really thinking about the transition period (which I accept doesn't concern medievalists) where it's not always clear whether a date is OS or NS and so any comparison with other sources needs to bear in mind that that date may be equivalent to one 11 days earlier OR later. Forget 11 days - it can be a year out if you don't know what date was taken for the start of the year! -- Hotmail is my spam bin. Real address is ianng at austonley org uk