Tuesday, January 8, 2019

Hive: derive timezone from local timestamp and UTC(GMT) timestamp


PROBLEM
Derive time zone of the record. Calculate the hour difference (aka offset) between UTC and Local time stamps.
A table contains two column: 
  • UTC unix timestamp
  • Local time zone human readable time


See timezone offset mapping for the United States:  Time Zones in the United States

SOLUTION
In Hive 2.1.1-amzn-0 the following SQL will make a cut. The SQL statement converts to timestamp and extracts an HOUR interval difference.

SELECT cast(from_unixtime(cast(log_time_utc_sec AS bigint)) AS TIMESTAMP)
      -cast(log_time,                                       AS TIMESTAMPAS HR_DIFF
FROM mybigtable
LIMIT 10
;


Elaborating the example: 

SELECT log_time_utc_sec
      ,log_time
     ,hour(cast(from_unixtime(cast(log_time_utc_sec AS bigint)) AS TIMESTAMP)-cast(regexp_replace(log_time, '^((.*?-){2}.*?)-''$1 'AS TIMESTAMP)) AS HR_DIFF
,case hour(cast(from_unixtime(cast(log_time_utc_sec AS bigint)) AS TIMESTAMP)-cast(regexp_replace(log_time, '^((.*?-){2}.*?)-''$1 'AS TIMESTAMP))
       when 10 then 'HST'
       when  9 then 'AKST'
       when  8 then 'PST'
       when  7 then 'MST'
       when  6 then 'CST'
       when  5 then 'EST' 
else CONCAT('UTC-',hour(cast(from_unixtime(cast(log_time_utc_sec AS bigint)) AS TIMESTAMP)-cast(regexp_replace(log_time, '^((.*?-){2}.*?)-''$1 'AS TIMESTAMP)))
end TIMEZONE
FROM mybigtable
LIMIT 10
;