PROBLEM:
This is to prevent data loss of fractional datatypes precision during the data transfer into target table, which otherwise will implicit happen. E.g. precision of a decimal datatype will be reduced to fit source value into the target field.
For data migration from source system to another target database, we need to identify precision (p) and scale (s) in of DOUBLE PRECISION, REAL, or FLOAT datatype in order to create target table with explicitly stated datatype DECIMAL (p,s).
SOLUTION:
In this case we migrating data from Netezza NPS 7.2 to Hortonworks Hive 1.2 Main idea is to convent fractional part of a number to string and count its length:
select
max(
length(
trim(
trim(
to_char(
DECIMAL_COL01-int8(DECIMAL_COL01) /* <- chop off integers part: 1234.56789 - 1234 = .56789 */
,'9.999999999999999999999999999999') /* <- append char with zeros: .56789 -> 0.567890000000000000000000000000 */
) /* <- trim blank spaces: */
,'0') /* <- trim leading and trailing '0': .56789 */
)-1) /* <- lenght of a string without '.': */
as MAX01 from TESTDB..MYNUMERICS;
PUTTING THIS ALL TOGETHER:
/* Source table definition */
CREATE TABLE TESTDB..MYNUMERICS
(
DECIMAL_COL01 DOUBLE PRECISION,
DECIMAL_COL02 DOUBLE PRECISION,
DECIMAL_COL03 DOUBLE PRECISION,
DECIMAL_COL04 DOUBLE PRECISION,
DECIMAL_COL05 DOUBLE PRECISION,
DECIMAL_COL06 DOUBLE PRECISION,
DECIMAL_COL07 DOUBLE PRECISION,
DECIMAL_COL08 DOUBLE PRECISION,
DECIMAL_COL09 DOUBLE PRECISION,
DECIMAL_COL10 DOUBLE PRECISION,
DECIMAL_COL11 DOUBLE PRECISION,
DECIMAL_COL12 DOUBLE PRECISION
) DISTRIBUTE ON RANDOM;
SELECT * FROM TESTDB..MYNUMERICS ORDER BY RANDOM() LIMIT 1000;
Get maximum scale (s) for all fractional columns:
/* Get lenght of FRACTIONAL PART part, e.g. how many digits precision after decimal point */
select
max(length(trim(trim(to_char(DECIMAL_COL01-int8(DECIMAL_COL01),'9.999999999999999999999999999999')),'0'))-1) as MAX01 --- DOUBLE PRECISION,
,max(length(trim(trim(to_char(DECIMAL_COL02-int8(DECIMAL_COL02),'9.999999999999999999999999999999')),'0'))-1) as MAX02 --- DOUBLE PRECISION,
,max(length(trim(trim(to_char(DECIMAL_COL03-int8(DECIMAL_COL03),'9.999999999999999999999999999999')),'0'))-1) as MAX03 --- DOUBLE PRECISION,
,max(length(trim(trim(to_char(DECIMAL_COL04-int8(DECIMAL_COL04),'9.999999999999999999999999999999')),'0'))-1) as MAX04 --- DOUBLE PRECISION,
,max(length(trim(trim(to_char(DECIMAL_COL05-int8(DECIMAL_COL05),'9.999999999999999999999999999999')),'0'))-1) as MAX05 --- DOUBLE PRECISION,
,max(length(trim(trim(to_char(DECIMAL_COL06-int8(DECIMAL_COL06),'9.999999999999999999999999999999')),'0'))-1) as MAX06 --- DOUBLE PRECISION,
,max(length(trim(trim(to_char(DECIMAL_COL07-int8(DECIMAL_COL07),'9.999999999999999999999999999999')),'0'))-1) as MAX07 --- DOUBLE PRECISION,
,max(length(trim(trim(to_char(DECIMAL_COL08-int8(DECIMAL_COL08),'9.999999999999999999999999999999')),'0'))-1) as MAX08 --- DOUBLE PRECISION,
,max(length(trim(trim(to_char(DECIMAL_COL09-int8(DECIMAL_COL09),'9.999999999999999999999999999999')),'0'))-1) as MAX09 --- DOUBLE PRECISION,
,max(length(trim(trim(to_char(DECIMAL_COL10-int8(DECIMAL_COL10),'9.999999999999999999999999999999')),'0'))-1) as MAX10 --- DOUBLE PRECISION,
,max(length(trim(trim(to_char(DECIMAL_COL11-int8(DECIMAL_COL11),'9.999999999999999999999999999999')),'0'))-1) as MAX11 --- DOUBLE PRECISION,
,max(length(trim(trim(to_char(DECIMAL_COL12-int8(DECIMAL_COL12),'9.999999999999999999999999999999')),'0'))-1) as MAX12 --- DOUBLE PRECISION,
from TESTDB..MYNUMERICS;
/* Get lenght of WHOLE NUMBER part*/
select
max(length(trim(to_char(int8(DECIMAL_COL01),'999999999999999999999999999999')))) as MAX01 --- DOUBLE PRECISION,
,max(length(trim(to_char(int8(DECIMAL_COL02),'999999999999999999999999999999')))) as MAX02 --- DOUBLE PRECISION,
,max(length(trim(to_char(int8(DECIMAL_COL03),'999999999999999999999999999999')))) as MAX03 --- DOUBLE PRECISION,
,max(length(trim(to_char(int8(DECIMAL_COL04),'999999999999999999999999999999')))) as MAX04 --- DOUBLE PRECISION,
,max(length(trim(to_char(int8(DECIMAL_COL05),'999999999999999999999999999999')))) as MAX05 --- DOUBLE PRECISION,
,max(length(trim(to_char(int8(DECIMAL_COL06),'999999999999999999999999999999')))) as MAX06 --- DOUBLE PRECISION,
,max(length(trim(to_char(int8(DECIMAL_COL07),'999999999999999999999999999999')))) as MAX07 --- DOUBLE PRECISION,
,max(length(trim(to_char(int8(DECIMAL_COL08),'999999999999999999999999999999')))) as MAX08 --- DOUBLE PRECISION,
,max(length(trim(to_char(int8(DECIMAL_COL09),'999999999999999999999999999999')))) as MAX09 --- DOUBLE PRECISION,
,max(length(trim(to_char(int8(DECIMAL_COL10),'999999999999999999999999999999')))) as MAX10 --- DOUBLE PRECISION,
,max(length(trim(to_char(int8(DECIMAL_COL11),'999999999999999999999999999999')))) as MAX11 --- DOUBLE PRECISION,
,max(length(trim(to_char(int8(DECIMAL_COL12),'999999999999999999999999999999')))) as MAX12 --- DOUBLE PRECISION,
from TESTDB..MYNUMERICS;
Outcome:
The DOUBLE PRECISION columns in this data set is safe to set as DECIMAL(12,10) in target Hive table to guarantee no data loss, which could implicitly happen during the data load transfer.
RELATED LINKS:
- Data types and aliases, IBM® PureData™ System for Analytics, Version 7.2.1
- Hive Data Types: Numeric Types
No comments:
Post a Comment