Saturday, February 8, 2020

Suspicious Numerals in the Forest of Representations

In speaking of representational ambiguity as it arises in connection with the Miser Project, I realize that such ambiguities should not be surprising.   Representation ambiguities are plentiful in the daily lives of those reading this post.  They arise casually but suffer when incorporated in computer-processed data.

Here’s a value that is relevant in my world: 800-00-0271.  It has the form of a United States Social Security Number (SSN).  It might be one, if it’s established that it was so-assigned to someone by the US Social Security administration, and if that is the intended usage.

Calling something a Social Security Number because it has the pattern of one is different than claiming it is the SSN of an identified person.  In working around computers, we need to be more careful in distinguishing what a data element might be intended to designate and the simple data form used absent any context.

So far, the “800” is not used in SSNs, and it is used where I see it on the assumption that will never change. 

Something that 800-00-0271 can be is a Student ID Number appearing on a transcript from Regents College of the University of the State of New York.  I attest that I am that student. 

US educational institutions have tended to rely on actual SSNs of students as identifiers for student records and other purposes.  In the State of New York, at the time I registered, the State requested SSNs but did not require them. It was considered illegal to compel SSN for that purpose at that time.  I declined and was provided with a unique identifier having the same form.  It has SSN form because the Student ID Number in their systems are mostly SSNs and have that format.  The “800” part is not found in any actual SSN (so far) and that ID number is safely comingled in the college’s records and databases indexed by their Student ID numbers, ones which are most-often also SSNs.

Another case arose recently, with the date of February 2, 2020 being represented as 02-02-2020.  The question is, considering that as representative of a date, is the form mm-dd-yyyy or dd-mm-yyyy?  It can be either, and the form does not reveal the answer.  In this case, it doesn’t matter.  Both forms are satisfied and can be taken to reference to the same date (with agreement about the same calendar).  The differences are apparent when different dates are recorded in different international contexts.  Both forms have been used in the USA in the past.

It might be easier now to understand the International Standard preference for recording dates in form yyyy-mm-dd.  Although only about as well received in the USA as the metric system, this form does tend to be used in the internals of data systems.  If we mean that to be a date somewhere on the planet, the time zone becomes relevant and without it, the expanded form 2020-02-02T02:02 is still representationally ambiguous, assuming of course, that this is intended to represent a local-time date somewhere on the planet.  Then, what about Daylight Time?  In Australia?

[added 2020-02-09T11:03] The “T11:03” here resolves another representational ambiguity.  It is in 24-hour time.  So there need be no concern whether it is AM or PM (T23:03).

[added 2020-02-09T10:53] Although it took a few years too many for Microsoft Outlook to deal with local times and time zones, it now does so and I can record on my calendar a trip starting in one local time (departure) and ending in a different local time (arrival) on the same or different date.  I can also record times from two time zones along a day-calendar page.  I use s pair of UTC and whatever the local time zone I awoke in on a particular day.  The tricky part comes with Daylight switch-overs and the fact that the daylight time status of a date in the past is not retained.  Recurring items can get a little wonky.

[added 2020-02-09T19:10Z] (with the “Z” for Zulu signifying UTC) One case I hadn’t figured out how to manage in all of this is for timestamps on my public web sites.  I author pages on a local machine and file timestamps are in my local time, the way that has always been presented on the Windows file system.  When I FTP changed pages to the public site, they get a different time stamp.  So I can have my FTP utility check and only upload newer pages than those there, I have to specify a time-offset so FTP gets it right.  I fumble this far too often, and passing in and out of Daylight time messes me up even more.  I would love to just use UTC in this context and have not figured out how to accomplish it.  Hmmm.  Maybe if I fudge the clock on the local web-site development server?

No comments:

Post a Comment