Note: The Rootsweb Mailing Lists will be shut down on April 6, 2023. (More info)
RootsWeb.com Mailing Lists
Total: 3/3
    1. Re: think when starting: formats (dates, places, ...)
    2. Charles Ellson
    3. On Sun, 07 Aug 2011 11:11:37 +0100, Ian Goddard <[email protected]> wrote: >Tony Proctor wrote: >> "Ian Goddard" <[email protected]> wrote in message >> news:[email protected] >>> Tony Proctor wrote: >>>> "Ian Goddard" <[email protected]> wrote in message >>>> news:[email protected] >>>>> Tony Proctor wrote: >>>>>> ISO 8601 is definitely good John, if you know the date. As a software >>>>>> architect I was always surprised how few companies - particularly in >>>>>> the US - were aware of it. >>>>> Actually it's a problem when used for genealogy. It mandates Gregorian. >>>>> Once you get back to 1752 (in UK & colonies) dates in sources will be >>>>> given in Julian. If you adhere to the standard you'll have to convert >>>>> them to Gregorian which will confuse anyone comparing your data with the >>>>> sources. >>>>> >>>>>> However, in genealogy - and historical data in general - the date is >>>>>> not always known exactly. As you point out, you may know the 'quarter', >>>>>> say for a registration (as opposed to the BMD event itself), or you may >>>>>> know the month of an event but not the day. You might similarly know >>>>>> the decade (e.g. 1950's) or the century (e.g. 1800's). >>>>>> >>>>>> Any product for storing or representing such data would therefore have >>>>>> to consider arbitrary degrees of uncertainty and to be able to store >>>>>> what you know without having to 'bend the rules' or somehow cheat the >>>>>> product. I did spend some time researching into the development of a >>>>>> 'universal object model' (I won't elaborate - unless asked - but this >>>>>> isn't the same as a storage format or database design) for representing >>>>>> historical data but the paid job got in the way. > >>>>> OK, consider yourself asked! It's something I've given the occasional >>>>> thought to without coming to any useful conclusion. It would be >>>>> straightforward enough to have a date class with subclasses for Julian, >>>>> Gregorian, Britain and colonies etc. But when I considered the >>>>> possibility of devising a data type for object relational databases such >>>>> as Informix or Postgres the need for comparison code crops up. How do >>>>> you collate 01 Jun 1810, 1810, "first quarter of C19th" and "early >>>>> C19th"? >>>>> >>>>> -- >>>>> Ian >>>>> >>>>> The Hotmail address is my spam-bin. Real mail address is iang >>>>> at austonley org uk >>>> SQL databases have made a complete pig's ear of date/time support Ian. >>>> They fail to acknowledge the different between a point in time >>>> (technically called a 'timestamp') and a calendar or clock marker (e.g. a >>>> d.o.b or a wake-up time). The first is independent of any time-zone >>>> change and also independent of any particular date/time representation. >>>> It would normally be stored as an offset - say in milliseconds - from >>>> some absolute (apologies to Einstein there) epoch. The latter would need >>>> to store the elements of the date/time separately (e.g. y/m/d) and >>>> time-zones would not be applicable to it. >>> If I've followed you correctly it looks as if Postgresql provides both: >>> (see http://www.postgresql.org/docs/8.4/static/datatype-datetime.html ) in >>> that it allows for timestamps with and without timezone although it keeps >>> both as an offset time. >>> >>> In addition both Informix and Postgresql provide for user-defined types so >>> it would be perfectly possible to create a data type storing each element >>> separately. Providing, of course, subtracting July from August doesn't >>> give January as a result ;) In other words ensure that you use interval >>> types as required. >>> >>> However I think this is the least of our worries compared to calendar >>> issues and the imprecision of some sources. >>> >>>> Anyway, no relational database that I'm aware of has a single data-type >>>> that could be used for all the cases required in genealogy. It would have >>>> to be some clever composite. >>>> >>>> An object model, though, is a run-time representation of the data rather >>>> than a stored format. If such a standard model existed then it could >>>> allow interoperability between products without mandating a particular >>>> database or storage format. Good news if you want the option of moving >>>> your data between products, or of picking-and-choosing some analysis or >>>> reporting product without changing your database. In effect, it would >>>> allow the software products to specialise as either >>>> database/storage/query or reporting/analysis. Well, that's the theory >>>> although I can't see it happening any time soon. As is usually the case, >>>> vendors want complete control over everything. Standards like that come >>>> too late, if at all. >>>> >>>> Such a model might allow an analysis product to contrast/compare two >>>> different databases, e.g. yours and someone else's. It might also be >>>> applicable to 'cloud computing' (depending on its design) where you could >>>> publish your data in a shared location and allow various disparate >>>> products to access it, subject to security controls of course. > >>> The runtime format can't be cleanly unpicked from storage however. Queries >>> and reporting both need to have workable collating sequences. If you >>> search, for instance, for data with a date between 01 Jan 1810 and 31 Dec >>> 1810 do you return data where the date was simply given and first quarter >>> of the C19th? And where, in a sorted report, do you include something >>> with that sort of date? But a genealogical system would need to handle >>> that sort of material presented at runtime and to be able to store it >>> usefully. >>> >>> -- >>> Ian >>> >>> The Hotmail address is my spam-bin. Real mail address is iang >>> at austonley org uk >> >> With dates in particular Ian, there's a lot that can be done. For instance, >> irrespective of the representation of a date, or the calendar being used >> (e.g. Gregorian, Julian), a date can be compiled into a timestamp which >> would be a simple "long integer" that relational databases excel at using >> for queries. The original representation of the date would probably need to >> be stored for viewing/editing purposes but the database would work from the >> compiled version for both unambiguity and efficiency. >> >> Given that all historical dates have some degree of uncertainty, they all >> implicitly represent an interval. For instance, the start/end of a day, or >> quarter, or year, etc. Hence, any attempt to correlate dates should always >> treat them as potentially-overlapping intervals. > >With a date type as opposed to a date-time type a date is atomic, of course. > >To go back to my original thoughts on coding comparisons for many cases >it's possible providing the S/W doesn't get thrown by situations such as: > >A = B >A = C >B < C > >(A is a lower resolution data such as 1810 and B and C are higher >resolution dates such as Jan 1810 and 23 Apr 1810) > >The real problem is in the vaguer dates such as the catalogued date of >early C14th I've seen given for a document although I suspect it to be >earlier than one which gave itself an exact date in (IIRC) 1295. > >And, of course, a good genealogical dating system shouldn't blink if >given a date such as June 31 if that's what the original document said. > It should just quietly collate it between June 30 and July 1. > It should at the least ask "are you sure?" even if it still accepts it but even then ought to have some kind of flag to indicate that the information is anomolous.

    08/07/2011 02:32:11
    1. Re: think when starting: formats (dates, places, ...)
    2. Ian Goddard
    3. Charles Ellson wrote: > On Sun, 07 Aug 2011 11:11:37 +0100, Ian Goddard > <[email protected]> wrote: >> And, of course, a good genealogical dating system shouldn't blink if >> given a date such as June 31 if that's what the original document said. >> It should just quietly collate it between June 30 and July 1. >> > It should at the least ask "are you sure?" even if it still accepts it > but even then ought to have some kind of flag to indicate that the > information is anomolous. Actually I was just thinking about the indexing mechanism in the background which doesn't talk to the user directly. But you're right, of course; the front end should challenge the user in such a situation. -- Ian The Hotmail address is my spam-bin. Real mail address is iang at austonley org uk

    08/07/2011 03:04:47
    1. Re: think when starting: formats (dates, places, ...)
    2. Wes Groleau
    3. On 08-07-2011 15:32, Charles Ellson wrote: > On Sun, 07 Aug 2011 11:11:37 +0100, Ian Goddard >> And, of course, a good genealogical dating system shouldn't blink if >> given a date such as June 31 if that's what the original document said. >> It should just quietly collate it between June 30 and July 1. >> > It should at the least ask "are you sure?" even if it still accepts it > but even then ought to have some kind of flag to indicate that the > information is anomolous. I wish some systems wouldn't take so literally GEDCOM's dictates that a month without a year is not a date. -- Wes Groleau There are two types of people in the world … http://Ideas.Lang-Learn.us/barrett?itemid=1157

    08/07/2011 06:29:27