PROBLEM
Derive time zone of the record. Calculate the hour difference (aka offset) between UTC and Local time stamps.
A table contains two column:
- UTC unix timestamp
- Local time zone human readable time
See timezone offset mapping for the United States: Time Zones in the United States
SOLUTION
In Hive 2.1.1-amzn-0 the following SQL will make a cut. The SQL statement converts to timestamp and extracts an HOUR interval difference.
SELECT cast(from_unixtime(cast(log_time_utc_sec AS bigint)) AS TIMESTAMP)
-cast(log_time, AS TIMESTAMP) AS HR_DIFF
FROM mybigtable
LIMIT 10
;
Elaborating the example:
SELECT log_time_utc_sec
,log_time
,hour(cast(from_unixtime(cast(log_time_utc_sec AS bigint)) AS TIMESTAMP)-cast(regexp_replace(log_time, '^((.*?-){2}.*?)-', '$1 ') AS TIMESTAMP)) AS HR_DIFF
,case hour(cast(from_unixtime(cast(log_time_utc_sec AS bigint)) AS TIMESTAMP)-cast(regexp_replace(log_time, '^((.*?-){2}.*?)-', '$1 ') AS TIMESTAMP))
when 10 then 'HST'
when 9 then 'AKST'
when 8 then 'PST'
when 7 then 'MST'
when 6 then 'CST'
when 5 then 'EST'
else CONCAT('UTC-',hour(cast(from_unixtime(cast(log_time_utc_sec AS bigint)) AS TIMESTAMP)-cast(regexp_replace(log_time, '^((.*?-){2}.*?)-', '$1 ') AS TIMESTAMP)))
end TIMEZONE
FROM mybigtable
LIMIT 10
;