Bob has touched upon some classic philosophical issues. Perhaps by exploring them a bit we can arrive at a better grasp of the significance of data and their interpretation. The first thing that needs to be done is to be clear about the difference between data and facts. Data refer to organized information, usually observational information and (necessarily) a set of associated premises (axioms). Data have nothing to do with truth, but instead are provided with justification. For example, if the birth certificate of person A indicates that the father was B, that information represents a datum. It may be that person C was actually the father, but the birth certificate statement nevertheless remains a datum. Premises are obviously not proved, but only justified, for otherwise they would not be "premises". The justification of our premises does not refer to their truth, but to the reasons why we select them. We choose premises for a variety of reasons, such as their moral implications, their aesthetic, or their utility. Our data is likewise justified by the reason why we adopt them and take them as data: what did the birth certificate actually state; how do I know that and how did I find out; are they relevant to the issue in hand? Data lacks truth value. As Bob points out, data may be erroneous for reasons that are subjective, but that is only because we manage to misread a datum. However, it remains a datum. If in the notes I took when inspecting the birth certificate I managed to misspell the father's name, that would be an error on my part, but that misinformation is still a datum. It is known to be incorrect only after I acquire some knowledge of the facts of the matter, but it does not for that reason cease being a datum, for my note remains. Only now I am aware of the fact that the inference I drew drawn from the datum is false. Despite what the certificate claims, DNA evidence might show that "in fact" person C, not person B, was the actual father. Here we employ different data that points to a fact different from our original inference drawn from the birth certificate. It is a "fact" because we have reason to believe it to be more likely true than what we inferred from reading the certificate. A "fact" is a statement about something that has "truth value". However, truth value is a relation. Truths are only true in reference to something else. In the special theory of relativity, the size and mass of something depends on its frame of reference. In quantum mechanics (Heisenberg indeterminacy), the observation of data changes the truth of what is observed. In the philosophy of science, all observations entail unproven observational hypotheses (Lakatos). Establishing the truth of the fact depends on a framework consisting of procedures for ascertaining truth and a body of knowledge that is generally presumed to be true. In serious discourse, the truth of a fact is argued in terms of a specific pedagogy, such as physical science, historiography, or genealogy. These sciences not only convey a body of facts that can be taken as true, but also a framework for establishing or testing such facts. This frame of reference in relation to which truth is established (so that hypothesis might become theory or fact) represents its environment. The environment of a system is anything with which the system has a relevant causal relation, and obviously, and not just in human sciences, an observer or student of the system must enter into a causal relation with it. Since we have a causal relation with a system under study, facts are in part socially constructed. This has led to dismay because some have taken it to imply "subjectivism" - the reduction of truth to just fashion or personal whim. This is not so, of course. To insist upon the subjective component in truth does not obviate the truth value of facts, but merely points out that no truth is absolutely self-contained and necessarily entails a greater whole of it is a part. A truth isolated from context, from the whole, is not for that reason false, but instead is what is often called "one sided". It has truth value, but a limited one. The universal laws in physics are an artifact of laboratory isolation. This does not falsify those laws, but makes them a one-sided aspect of a world that does not reduce to rigid universal laws. As often pointed out, such laws don't really explain anything, for they are only observations of general behavior. Bob notes that "History is aided by facts when they are available, but useful history can be built based on analysis of subjective information as well". Indeed, a historical theory is socially constructed from facts that the historical profession, some other authority says is valid or by virtue of some argumentation, and so are taken to be true. He hints of the distinction between a fact and a scientific theory, such as history or genealogy. Both are socially constructed, but their utility is in relation to different things. A fact has utility in relation to a theory; a theory constructed from the facts has utility in relation to society, including our understanding and activity in the world. A nice way to define the relation of data to the construction of facts and theory is to see the data as constraints on the possible facts or theories we might construct by using them. More accurately, data constrain the probability distribution of the truth value of the possible socially constructed facts and theories. -- Haines Brown, KB1GRM
Haines Brown <brownh@teufel.hartford-hwp.com> wrote: >Bob has touched upon some classic philosophical issues. Perhaps by >exploring them a bit we can arrive at a better grasp of the significance >of data and their interpretation. > >The first thing that needs to be done is to be clear about the >difference between data and facts. >... I find nothing in your summary objectionable. But you omitted one concept I don't fully understand myself in its technical aspects, but still use metaphorically when thinking about data and facts. That concept is the one of "fuzzy" truth values (and thereby fuzzy logic). Very seldom can we assign perfect truth or perfect falsehood to an alleged fact, and sometimes we aren't all that sure what standard of truth we want to apply to a putative fact. Hugh has three possible ancestors to his earliest "proven" ancestor. He could, by whatever means he chose, assign a fuzzy truth value to each of them, thereby entertaining all three possibilities at once. If he ever gets further information, the truth values he assigns might change. In computer terms, since this is a computing newsgroup ... One of the problems with data display in modern genealogy programs is that there is usually no way to communicate fuzzy truth. In Legacy, I can assign a confidence level to a source for a particular datum, but unless someone digs down into the innards of my data and looks at that confidence level, they'll never see it - it doesn't figure into any of the displays or reports above the obscure footnote level that in fact probably nobody will ever read, including me. So I never bother filling it in. If I had 3 different possible ancestors, and could assign weights of probable truth to them, I'd kinda like to see some sort of probability-tree display, so that when I show the pedigree of X, I can somehow see the range of possibilities, the confidence level, and of course which choice is currently the most likely. I envision a tree display that would display the most likely choice, but perhaps if you hovered on a particular link, it might display multiple trees overlayed, with different colors or densities based on relative likelihoods. I could imagine that with proper calculation that I don't myself know how to do, that immediate ancestors that one is sure of, would show up dark and boldly colored, and as one works back up the tree towards Adam and Eve, the lower probabilities of the data being factual would show up less bold and dimmer. The lines one has more evidence for would be strong, and the lines with weaker evidence or multiple possibilities would show up weak (and expand into the multiple options being displayed in the appropriate user-interface conditions). But I haven't thought this through, much less spec'd it out enough that I could expect someone to write software for it. But in all the debates on this forum regarding data models, when we have numerous definitions of the relation "father" as well as the inherent uncertainty of the data which assigns a particular label "father" to a relation, being able to report and display multiple options graphically or hypertextually (or whatever other newfangled means someone might come up with) would seem to be an aspect of the problem worth consideration. After all, the data model should take into consideration the forms of input and output that are needed. I've probably rambled too long, on only a couple hours sleep, so I'll stop here and let you or other shoot me down. %^) lojbab
Bob LeChevalier <lojbab@lojban.org> writes: > Haines Brown <brownh@teufel.hartford-hwp.com> wrote: > I find nothing in your summary objectionable. But you omitted one > concept I don't fully understand myself in its technical aspects, but > still use metaphorically when thinking about data and facts. > > That concept is the one of "fuzzy" truth values (and thereby fuzzy > logic). Very seldom can we assign perfect truth or perfect falsehood > to an alleged fact, and sometimes we aren't all that sure what > standard of truth we want to apply to a putative fact. This gets kinda difficult. Yes, our observations are approximate. I'll never forget studying standard deviation as a neophyte freshman ;-(. But this obscures the difference between objective probability (processes are actually rather random), and subjective probability (we are ignorant or our measurements are inaccurate). Fuzzy logic is an aspect of set theory that addresses degrees of truth, not probabilities of fact. The difference is often taken as being between subjective truth in the former case, and objective truth in the latter. The latter engages such things as a probabilistic causality that are quite independent of the observer. This is often brought up in the context of quantum mechanics, but it is of general application. Einstein insisted that God does not play dice, but now we all know better. On the other hand, fuzzy logic supports partial membership in a set so that our logical statements cab accommodate loose categories. > Hugh has three possible ancestors to his earliest "proven" ancestor. > He could, by whatever means he chose, assign a fuzzy truth value to > each of them, thereby entertaining all three possibilities at once. > If he ever gets further information, the truth values he assigns might > change. Yes, there are many examples. We might identify ourselves as "Americans", as Black, as male, etc. Each represents a truth in a certain conceptual framework. You might have some weak evidence that points to descent from a particular person, but you are not entirely sure. Is this a question of objective truth or of subjective truth where fuzzy logic would apply? Generally there is really only one father (as DNA would show), but we are ignorant of just who it is. So maybe fuzzy logic would be appropriate. > One of the problems with data display in modern genealogy programs is > that there is usually no way to communicate fuzzy truth. In Legacy, I > can assign a confidence level to a source for a particular datum, but > unless someone digs down into the innards of my data and looks at that > confidence level, they'll never see it - it doesn't figure into any of > the displays or reports above the obscure footnote level that in fact > probably nobody will ever read, including me. So I never bother > filling it in. Very interesting. What you are saying is that a print (printed page or on a browser) cannot accurately reflect the reality of the lineage with all its uncertainties. Normally when one prints something, what is printed is static, unambivalent. Is it possible to print uncertainty? A conventional way might be to use a dotted line rather than a solid line, or lines with different colors, although that would be unconventional and would require the display of a color key to indicate degrees of uncertainty. > I envision a tree display that would display the most likely choice, > but perhaps if you hovered on a particular link, it might display > multiple trees overlayed, with different colors or densities based on > relative likelihoods. That seems easy to do in CSS, but I suspect it is like your footnotes: people are unaware of the other possibilities until their mouse hovers over a person, fact or relation. For that matter, such as hover could easily cause a pop-up that provides the information about the uncertainty. But I get the impression your aim is to have the uncertainty immediately obvious rather than depend on the visitor to the site pursuing more information. Of course, nearly all display software does not use CSS, and since most of it is proprietary, there's not much you can do about changing appearances. All I know is that if one can display a lineage in CSS, what you suggest can be done. Incidentally, I was once interested in how to render a descendant report entirely in CSS. Here's my little experiment: www.hartford-hwp.com/genealogy/Brown/brown-1.html . I didn't try to develop this little rendition experiment because no one seemed particularly interested. > I could imagine that with proper calculation that I don't myself know > how to do, that immediate ancestors that one is sure of, would show up > dark and boldly colored, and as one works back up the tree towards > Adam and Eve, the lower probabilities of the data being factual would > show up less bold and dimmer. The lines one has more evidence for > would be strong, and the lines with weaker evidence or multiple > possibilities would show up weak (and expand into the multiple options > being displayed in the appropriate user-interface conditions). There are surely practical difficulties, but the notion of increasing transparency or fading (quite different things) are easily handled in CSS. -- Haines Brown, KB1GRM