Tony Proctor wrote: > "Ian Goddard" <[email protected]> wrote in message > news:[email protected] >> Tony Proctor wrote: >>> "Ian Goddard" <[email protected]> wrote in message >>> news:[email protected] >>>> Tony Proctor wrote: >>>>> ISO 8601 is definitely good John, if you know the date. As a software >>>>> architect I was always surprised how few companies - particularly in >>>>> the US - were aware of it. >>>> Actually it's a problem when used for genealogy. It mandates Gregorian. >>>> Once you get back to 1752 (in UK & colonies) dates in sources will be >>>> given in Julian. If you adhere to the standard you'll have to convert >>>> them to Gregorian which will confuse anyone comparing your data with the >>>> sources. >>>> >>>>> However, in genealogy - and historical data in general - the date is >>>>> not always known exactly. As you point out, you may know the 'quarter', >>>>> say for a registration (as opposed to the BMD event itself), or you may >>>>> know the month of an event but not the day. You might similarly know >>>>> the decade (e.g. 1950's) or the century (e.g. 1800's). >>>>> >>>>> Any product for storing or representing such data would therefore have >>>>> to consider arbitrary degrees of uncertainty and to be able to store >>>>> what you know without having to 'bend the rules' or somehow cheat the >>>>> product. I did spend some time researching into the development of a >>>>> 'universal object model' (I won't elaborate - unless asked - but this >>>>> isn't the same as a storage format or database design) for representing >>>>> historical data but the paid job got in the way. >>>> OK, consider yourself asked! It's something I've given the occasional >>>> thought to without coming to any useful conclusion. It would be >>>> straightforward enough to have a date class with subclasses for Julian, >>>> Gregorian, Britain and colonies etc. But when I considered the >>>> possibility of devising a data type for object relational databases such >>>> as Informix or Postgres the need for comparison code crops up. How do >>>> you collate 01 Jun 1810, 1810, "first quarter of C19th" and "early >>>> C19th"? >>>> >>>> -- >>>> Ian >>>> >>>> The Hotmail address is my spam-bin. Real mail address is iang >>>> at austonley org uk >>> SQL databases have made a complete pig's ear of date/time support Ian. >>> They fail to acknowledge the different between a point in time >>> (technically called a 'timestamp') and a calendar or clock marker (e.g. a >>> d.o.b or a wake-up time). The first is independent of any time-zone >>> change and also independent of any particular date/time representation. >>> It would normally be stored as an offset - say in milliseconds - from >>> some absolute (apologies to Einstein there) epoch. The latter would need >>> to store the elements of the date/time separately (e.g. y/m/d) and >>> time-zones would not be applicable to it. >> If I've followed you correctly it looks as if Postgresql provides both: >> (see http://www.postgresql.org/docs/8.4/static/datatype-datetime.html ) in >> that it allows for timestamps with and without timezone although it keeps >> both as an offset time. >> >> In addition both Informix and Postgresql provide for user-defined types so >> it would be perfectly possible to create a data type storing each element >> separately. Providing, of course, subtracting July from August doesn't >> give January as a result ;) In other words ensure that you use interval >> types as required. >> >> However I think this is the least of our worries compared to calendar >> issues and the imprecision of some sources. >> >>> Anyway, no relational database that I'm aware of has a single data-type >>> that could be used for all the cases required in genealogy. It would have >>> to be some clever composite. >>> >>> An object model, though, is a run-time representation of the data rather >>> than a stored format. If such a standard model existed then it could >>> allow interoperability between products without mandating a particular >>> database or storage format. Good news if you want the option of moving >>> your data between products, or of picking-and-choosing some analysis or >>> reporting product without changing your database. In effect, it would >>> allow the software products to specialise as either >>> database/storage/query or reporting/analysis. Well, that's the theory >>> although I can't see it happening any time soon. As is usually the case, >>> vendors want complete control over everything. Standards like that come >>> too late, if at all. >>> >>> Such a model might allow an analysis product to contrast/compare two >>> different databases, e.g. yours and someone else's. It might also be >>> applicable to 'cloud computing' (depending on its design) where you could >>> publish your data in a shared location and allow various disparate >>> products to access it, subject to security controls of course. >> The runtime format can't be cleanly unpicked from storage however. Queries >> and reporting both need to have workable collating sequences. If you >> search, for instance, for data with a date between 01 Jan 1810 and 31 Dec >> 1810 do you return data where the date was simply given and first quarter >> of the C19th? And where, in a sorted report, do you include something >> with that sort of date? But a genealogical system would need to handle >> that sort of material presented at runtime and to be able to store it >> usefully. >> >> -- >> Ian >> >> The Hotmail address is my spam-bin. Real mail address is iang >> at austonley org uk > > With dates in particular Ian, there's a lot that can be done. For instance, > irrespective of the representation of a date, or the calendar being used > (e.g. Gregorian, Julian), a date can be compiled into a timestamp which > would be a simple "long integer" that relational databases excel at using > for queries. The original representation of the date would probably need to > be stored for viewing/editing purposes but the database would work from the > compiled version for both unambiguity and efficiency. > > Given that all historical dates have some degree of uncertainty, they all > implicitly represent an interval. For instance, the start/end of a day, or > quarter, or year, etc. Hence, any attempt to correlate dates should always > treat them as potentially-overlapping intervals. With a date type as opposed to a date-time type a date is atomic, of course. To go back to my original thoughts on coding comparisons for many cases it's possible providing the S/W doesn't get thrown by situations such as: A = B A = C B < C (A is a lower resolution data such as 1810 and B and C are higher resolution dates such as Jan 1810 and 23 Apr 1810) The real problem is in the vaguer dates such as the catalogued date of early C14th I've seen given for a document although I suspect it to be earlier than one which gave itself an exact date in (IIRC) 1295. And, of course, a good genealogical dating system shouldn't blink if given a date such as June 31 if that's what the original document said. It should just quietly collate it between June 30 and July 1. -- Ian The Hotmail address is my spam-bin. Real mail address is iang at austonley org uk
"Ian Goddard" <[email protected]> wrote in message news:[email protected] > Tony Proctor wrote: >> "Ian Goddard" <[email protected]> wrote in message >> news:[email protected] >>> Tony Proctor wrote: >>>> "Ian Goddard" <[email protected]> wrote in message >>>> news:[email protected] >>>>> Tony Proctor wrote: >>>>>> ISO 8601 is definitely good John, if you know the date. As a software >>>>>> architect I was always surprised how few companies - particularly in >>>>>> the US - were aware of it. >>>>> Actually it's a problem when used for genealogy. It mandates >>>>> Gregorian. Once you get back to 1752 (in UK & colonies) dates in >>>>> sources will be given in Julian. If you adhere to the standard you'll >>>>> have to convert them to Gregorian which will confuse anyone comparing >>>>> your data with the sources. >>>>> >>>>>> However, in genealogy - and historical data in general - the date is >>>>>> not always known exactly. As you point out, you may know the >>>>>> 'quarter', say for a registration (as opposed to the BMD event >>>>>> itself), or you may know the month of an event but not the day. You >>>>>> might similarly know the decade (e.g. 1950's) or the century (e.g. >>>>>> 1800's). >>>>>> >>>>>> Any product for storing or representing such data would therefore >>>>>> have to consider arbitrary degrees of uncertainty and to be able to >>>>>> store what you know without having to 'bend the rules' or somehow >>>>>> cheat the product. I did spend some time researching into the >>>>>> development of a 'universal object model' (I won't elaborate - unless >>>>>> asked - but this isn't the same as a storage format or database >>>>>> design) for representing historical data but the paid job got in the >>>>>> way. > >>>>> OK, consider yourself asked! It's something I've given the occasional >>>>> thought to without coming to any useful conclusion. It would be >>>>> straightforward enough to have a date class with subclasses for >>>>> Julian, Gregorian, Britain and colonies etc. But when I considered >>>>> the possibility of devising a data type for object relational >>>>> databases such as Informix or Postgres the need for comparison code >>>>> crops up. How do you collate 01 Jun 1810, 1810, "first quarter of >>>>> C19th" and "early C19th"? >>>>> >>>>> -- >>>>> Ian >>>>> >>>>> The Hotmail address is my spam-bin. Real mail address is iang >>>>> at austonley org uk >>>> SQL databases have made a complete pig's ear of date/time support Ian. >>>> They fail to acknowledge the different between a point in time >>>> (technically called a 'timestamp') and a calendar or clock marker (e.g. >>>> a d.o.b or a wake-up time). The first is independent of any time-zone >>>> change and also independent of any particular date/time representation. >>>> It would normally be stored as an offset - say in milliseconds - from >>>> some absolute (apologies to Einstein there) epoch. The latter would >>>> need to store the elements of the date/time separately (e.g. y/m/d) and >>>> time-zones would not be applicable to it. >>> If I've followed you correctly it looks as if Postgresql provides both: >>> (see http://www.postgresql.org/docs/8.4/static/datatype-datetime.html ) >>> in that it allows for timestamps with and without timezone although it >>> keeps both as an offset time. >>> >>> In addition both Informix and Postgresql provide for user-defined types >>> so it would be perfectly possible to create a data type storing each >>> element separately. Providing, of course, subtracting July from August >>> doesn't give January as a result ;) In other words ensure that you use >>> interval types as required. >>> >>> However I think this is the least of our worries compared to calendar >>> issues and the imprecision of some sources. >>> >>>> Anyway, no relational database that I'm aware of has a single data-type >>>> that could be used for all the cases required in genealogy. It would >>>> have to be some clever composite. >>>> >>>> An object model, though, is a run-time representation of the data >>>> rather than a stored format. If such a standard model existed then it >>>> could allow interoperability between products without mandating a >>>> particular database or storage format. Good news if you want the option >>>> of moving your data between products, or of picking-and-choosing some >>>> analysis or reporting product without changing your database. In >>>> effect, it would allow the software products to specialise as either >>>> database/storage/query or reporting/analysis. Well, that's the theory >>>> although I can't see it happening any time soon. As is usually the >>>> case, vendors want complete control over everything. Standards like >>>> that come too late, if at all. >>>> >>>> Such a model might allow an analysis product to contrast/compare two >>>> different databases, e.g. yours and someone else's. It might also be >>>> applicable to 'cloud computing' (depending on its design) where you >>>> could publish your data in a shared location and allow various >>>> disparate products to access it, subject to security controls of >>>> course. > >>> The runtime format can't be cleanly unpicked from storage however. >>> Queries and reporting both need to have workable collating sequences. If >>> you search, for instance, for data with a date between 01 Jan 1810 and >>> 31 Dec 1810 do you return data where the date was simply given and first >>> quarter of the C19th? And where, in a sorted report, do you include >>> something with that sort of date? But a genealogical system would need >>> to handle that sort of material presented at runtime and to be able to >>> store it usefully. >>> >>> -- >>> Ian >>> >>> The Hotmail address is my spam-bin. Real mail address is iang >>> at austonley org uk >> >> With dates in particular Ian, there's a lot that can be done. For >> instance, irrespective of the representation of a date, or the calendar >> being used (e.g. Gregorian, Julian), a date can be compiled into a >> timestamp which would be a simple "long integer" that relational >> databases excel at using for queries. The original representation of the >> date would probably need to be stored for viewing/editing purposes but >> the database would work from the compiled version for both unambiguity >> and efficiency. >> >> Given that all historical dates have some degree of uncertainty, they all >> implicitly represent an interval. For instance, the start/end of a day, >> or quarter, or year, etc. Hence, any attempt to correlate dates should >> always treat them as potentially-overlapping intervals. > > With a date type as opposed to a date-time type a date is atomic, of > course. > > To go back to my original thoughts on coding comparisons for many cases > it's possible providing the S/W doesn't get thrown by situations such as: > > A = B > A = C > B < C > > (A is a lower resolution data such as 1810 and B and C are higher > resolution dates such as Jan 1810 and 23 Apr 1810) > > The real problem is in the vaguer dates such as the catalogued date of > early C14th I've seen given for a document although I suspect it to be > earlier than one which gave itself an exact date in (IIRC) 1295. > > And, of course, a good genealogical dating system shouldn't blink if given > a date such as June 31 if that's what the original document said. It > should just quietly collate it between June 30 and July 1. > > -- > Ian > > The Hotmail address is my spam-bin. Real mail address is iang > at austonley org uk You'll have to excuse me if I misunderstand Ian. If "1810" represented anywhere in that year, "jan-1810" represented anywhere in that January, and "23 Apr 1810" represented anywhere in that day, then each could be represented by two ends of an equivalent "interval" that are compiled into the internal format used by the software, i.e. 1810-01-01/1810-12-31, 1810-01-01/1810-01-31, and 1810-04-23/1810-04-23 respectively. The date analysis then becomes a matter of handling "intervals" where there may be equality, partial overlap, no overlap, or enclosure (i.e. a smaller interval being totally inside a bigger interval such as "23 Apr 1810" being enclosed by "1810". Tony Proctor