Wednesday, November 2, 2016

Netezza: data transfer speed Fluid Query JDBC v.s. nz_migrate

TEST CASE:
Compare how fast data can be copied between two Netezza appliances: 
  1. Using Netezza Fluid Query connection which is Java JDBC driver
  2. Using nz_migrate, which is nzbackup/nzrestore behing the hood

RESULT:
nz_migrate in ASCII transfer mode is 20 times faster than JDBC, and 80 times faster in BINARY mode. 

DETAILS

Copying 1 GB table from one system to another:


Tuesday, October 25, 2016

Netezza: find ALTERED tables that need GROOM ... VERSIONS




Also see related: Netezza: ALTER TABLE to MODIFY a column datatype


In Aginity run SQL to list all ALTERED tables that need to be GROOM ... VERSIONS :

SELECT         
        tab.database as "DATABASE"  
       ,tab.schema   as "SCHEMA"                
       ,MAX (  
              CASE WHEN tab.objclass = 4910               THEN SUBSTR(tab.objname, 3)                        
                   WHEN tab.objclass in (4951,4959,4963 THEN NULL                         
                   ELSE   tab.objname                    
            END ) AS "TABLENAME"
       ,TO_CHAR (nvl(SUM(used_bytes),0), '999,999,999,999,999,999' AS "SIZE (BYTES)"               
       ,count(distinct(decode(objclass,4959,objid,4963,objid,null))) AS "# OF VERSIONS"
FROM            _V_OBJ_RELATION_XDB   AS tab        
left outer join _V_SYS_OBJECT_DSLICE_INFO on
(     tab.objid = _V_SYS_OBJECT_DSLICE_INFO.tblid  
and  _V_SYS_OBJECT_DSLICE_INFO.tblid > 200000         )
WHERE tab.objclass in (4905,4910,4940,4959,4951,4953,4961,4963)
  and tab.objid > 200000
GROUP BY "DATABASE",  "SCHEMA", tab.visibleid
HAVING   "# OF VERSIONS" >= 2
ORDER BY "DATABASE", "SCHEMA", "TABLENAME";



At Netezza host run the script to identify the altered tables: 

[nz@netezza ~]$ /nz/support/contrib/bin/nz_altered_tables

# Of Versioned Tables         18
     Total # Of Versions      38

  Database   |  Schema  |             Table Name       | Size (Bytes)       | # Of Versions
-------------+----------+------------------------------+--------------------+---------------
 TESTDB      | MYUSERSD | FILTER_DATA                  |         16,515,072 |      2
 TESTDB      | MYUSERSD | MASTER_DATA_TESTS            |          7,864,320 |      3
 TESTDB      | MYUSERSD | PREMIUM_TEST_TABLE_DATA      |          6,815,744 |      2
 TESTDB      | MYUSERSD | REPORT_MASTER                |         25,427,968 |      2
 TESTDB      | MYUSERSD | UTILI_TEST_TABLE_DATA        |          6,291,456 |      2
 TESTDB      | MYUSERSD | WORK_TEST_TABLE_DATA         |                  0 |      2
 TSTDB2      | MYUSERSD | AUDIT_IDW_DATALOAD           |         31,195,136 |      2
 TSTDB2      | MYUSERSD | ANOTHER_SUBSET_FILTER_DATA   |         40,501,248 |      2
 TSTDB2      | MYUSERSD | MASTER_MASTER_DATA_TESTS     |          5,898,240 |      3
 TESTDB5     | MYUSERSD | ORDER_ITEMS                  |    110,182,924,288 |      2
 TESTDB5     | MYUSERSD | LARGE_DATA_DIM               |          6,422,528 |      2
 TESTDB5     | MYUSERSD | ORGANIZATION_DIM             |         62,914,560 |      2
 TESTDB67    | ADMIN    | ALLEGED_DATA_SUBSET          |                  0 |      2
 TESTDB67    | ADMIN    | AUDIT_DOWNLOAD               |          1,703,936 |      2
 TESTDB67    | ADMIN    | TESTDATA_TABLE_1             |         30,146,560 |      2
 TESTDB67    | ADMIN    | TESTDATA_TABLE_2             |                  0 |      2
 TESTDB67    | ADMIN    | TESTDATA_TABLE_3             |                  0 |      2
 TESTDB67    | ADMIN    | TESTDATA_TABLE_4             |                  0 |      2
(18 rows)



At Netetezza host run the script to GROOM the altered tables:

[nz@netezza ~]$ /nz/support/contrib/bin/nz_altered_tables -groom

# Of Versioned Tables         18
     Total # Of Versions      38

  Database   |  Schema  |             Table Name       | Size (Bytes)       | # Of Versions
-------------+----------+------------------------------+--------------------+---------------
 TESTDB      | MYUSERSD | FILTER_DATA                  |         16,515,072 |      2
 TESTDB      | MYUSERSD | MASTER_DATA_TESTS            |          7,864,320 |      3
 TESTDB      | MYUSERSD | PREMIUM_TEST_TABLE_DATA      |          6,815,744 |      2
 TESTDB      | MYUSERSD | REPORT_MASTER                |         25,427,968 |      2
 TESTDB      | MYUSERSD | UTILI_TEST_TABLE_DATA        |          6,291,456 |      2
 TESTDB      | MYUSERSD | WORK_TEST_TABLE_DATA         |                  0 |      2
 TSTDB2      | MYUSERSD | AUDIT_IDW_DATALOAD           |         31,195,136 |      2
 TSTDB2      | MYUSERSD | ANOTHER_SUBSET_FILTER_DATA   |         40,501,248 |      2
 TSTDB2      | MYUSERSD | MASTER_MASTER_DATA_TESTS     |          5,898,240 |      3
 TESTDB5     | MYUSERSD | ORDER_ITEMS                  |    110,182,924,288 |      2
 TESTDB5     | MYUSERSD | LARGE_DATA_DIM               |          6,422,528 |      2
 TESTDB5     | MYUSERSD | ORGANIZATION_DIM             |         62,914,560 |      2
 TESTDB67    | ADMIN    | ALLEGED_DATA_SUBSET          |                  0 |      2
 TESTDB67    | ADMIN    | AUDIT_DOWNLOAD               |          1,703,936 |      2
 TESTDB67    | ADMIN    | TESTDATA_TABLE_1             |         30,146,560 |      2
 TESTDB67    | ADMIN    | TESTDATA_TABLE_2             |                  0 |      2
 TESTDB67    | ADMIN    | TESTDATA_TABLE_3             |                  0 |      2
 TESTDB67    | ADMIN    | TESTDATA_TABLE_4             |                  0 |      2
(18 rows)

A 'GROOM TABLE <tablename> VERSIONS;' will now be performed on each of the above tables.
========================================================================================
Grooming TESTDB.MYUSERSD.FILTER_DATA    @ 2016-10-24 18:01:53
NOTICE:  Groom will not purge records deleted after transactions that started after transaction 0xcaf2a1, due to the backup at 2016-01-30 00:36:49.
NOTICE:  If this process is interrupted please either repeat GROOM VERSIONS or issue 'GENERATE STATISTICS ON "FILTER_DATA"'
NOTICE:  Groom processed 0 pages; purged 0 records; scan size unchanged; table size unchanged.
GROOM VERSIONS
Elapsed time: 0m1.769s

Grooming TESTDB.MYUSERSD.MASTER_DATA_TESTS      @ 2016-10-24 18:01:54
NOTICE:  Groom will not purge records deleted after transactions that started after transaction 0xcaf2a1, due to the backup at 2016-01-30 00:36:49.
NOTICE:  If this process is interrupted please either repeat GROOM VERSIONS or issue 'GENERATE STATISTICS ON "MASTER_DATA_TESTS"'
NOTICE:  Groom processed 35 pages; purged 0 records; scan size unchanged; table size shrunk by 1 extents.
GROOM VERSIONS
Elapsed time: 0m3.603s

Grooming TESTDB.MYUSERSD.PREMIUM_TEST_TABLE_DATA      @ 2016-10-24 18:01:58
NOTICE:  Groom will not purge records deleted after transactions that started after transaction 0xcaf2a1, due to the backup at 2016-01-30 00:36:49.
NOTICE:  If this process is interrupted please either repeat GROOM VERSIONS or issue 'GENERATE STATISTICS ON "PREMIUM_TEST_TABLE_DATA"'
NOTICE:  Groom processed 0 pages; purged 0 records; scan size unchanged; table size unchanged.
GROOM VERSIONS
Elapsed time: 0m2.188s

Grooming TESTDB.MYUSERSD.REPORT_MASTER     @ 2016-10-24 18:02:00
NOTICE:  Groom will not purge records deleted after transactions that started after transaction 0xcaf2a1, due to the backup at 2016-01-30 00:36:49.
NOTICE:  If this process is interrupted please either repeat GROOM VERSIONS or issue 'GENERATE STATISTICS ON "REPORT_MASTER"'
NOTICE:  Groom processed 106 pages; purged 0 records; scan size shrunk by 2 pages; table size shrunk by 2 extents.
GROOM VERSIONS
Elapsed time: 0m3.110s

....




Refer NPS 7.2 documentation:  GROOM TABLE and  ALTER TABLE



Friday, September 23, 2016

Netezza: transpose delimited string into table rows e.g. split delimited character field into records

Related posts: 

Have this




Need this



















SOLUTION:  This will transpose delimited record to table rows

SELECT c.cnt                                                           AS POSITION_IN_STRING_ARRAY
      ,SQLX..REGEXP_EXTRACT(a.longstring,'[^|]+',1,cast(c.cnt as int)) AS GET_VALUE_EVERY_ELEMENT
FROM (
              select '201|202|203|204|205|206|207|208|209|210|211|212|213|214|215' as longstring
       ) a
cross join (select row_number () OVER (order by 1) as cnt
              from (SELECT 1 FROM HISTDB.."$hist_log_entry_3" LIMIT 70) v
           )c
WHERE GET_VALUE_EVERY_ELEMENT IS NOT NULL




EXAMPLE AND DETAILS 

This is real problem. Split pipe delimited string into individual elements: one table record for each element.



SOLUTION #1: Using  Netezza SQL Toolkit array functions  array_split () and get_value_varchar () .

Proof of concept: 

SELECT
 sqlx..array_split(STR,'|')                                 as THIS_IS_ARRAY
,sqlx..get_value_varchar(sqlx..array_split(STR,'|'), NULL as GET_NULL
,sqlx..get_value_varchar(sqlx..array_split(STR,'|'),  1  )  as GET_01  --ARR starts w. 1
,sqlx..get_value_varchar(sqlx..array_split(STR,'|'),  2  )  as GET_02
,sqlx..get_value_varchar(sqlx..array_split(STR,'|'), 15  )  as GET_15
,sqlx..array_count(sqlx..array_split(STR,'|'))              as ARRAY_COUNT
--,sqlx..get_value_varchar(sqlx..array_split(STR,'|'), 16 ) as GET_NONEXIST 
--- ERROR [HY000] ERROR:  0 : Invalid Index Id
--,sqlx..get_value_varchar(sqlx..array_split(STR,'|'), '1') as GET_STRING   
--- ERROR [HY000] ERROR:  0 : Invalid Index Name
  from (
       select '201|202|203|204|205|206|207|208|209|210|211|212|213|214|215' as STR) AS F
;

Putting this all together

select F.ROWID AS RECID                    --- Original source records PK or unique identifier.            
,c.CNT        AS POSITION_IN_STRING_ARRAY --- Position of elemen in the field array (optional).
,sqlx..get_value_varchar(ARR,CAST(DECODE(SIGN(LEN-c.CNT),1,c.CNT,null) AS INT) ) AS GET_VALUE_EVERY_ELEMENT  
FROM  --- CAST is a must otherwise GET_VALUE_VARCHAR fails with:  "ERROR: Invalid Index Name"
(select ROWID                                                          --- Get ROWID or replace it with PK column
      ,sqlx..array_split(LONGSTRING,'|')                    AS ARR   --- Parse string into array only once
      ,sqlx..array_count(sqlx..array_split(LONGSTRING,'|')) AS LEN   --- Get lenth of this field to be used later
  from TESTDB.LONGSTRING
) AS --- Generate sequence from 1..N. Where N is the longest array in the field CNTRCT_SPEED
cross join (select row_number () OVER (order by 1) as cnt             
                 FROM (SELECT 1 FROM TESTDB.LONGSTRING limit 70 ) v
) AS C --- Eleminate non-existing array elements. They still in Cartesian Product (1..70) where there is no match.
where DECODE(SIGN(LEN-c.CNT),1,c.CNT,null) is not null
;




SOLUTION #2: Using  Netezza SQL Toolkit regular expression function regexp_extract () .

Proof of concept: 

 SELECT
 sqlx..regexp_match_count(STR,'[^|]+')      as MATCH_COUNT
,sqlx..regexp_extract(STR,'[^|]+',1,NULL)   as GET_NULL
,sqlx..regexp_extract(STR,'[^|]+',1,1)      as GET_01
,sqlx..regexp_extract(STR,'[^|]+',1,2)       as GET_02
,sqlx..regexp_extract(STR,'[^|]+',1,15)     as GET_15
,sqlx..regexp_extract(STR,'[^|]+',1,16)     as GET_GET_16
,sqlx..regexp_extract(STR,'[^|]+',1,222)    as GET_NONEXIST
---,sqlx..regexp_extract(STR,'[^|]+',1,0)   as GET_ZERO   
--- ERROR [HY000] ERROR:  0 : Regex: Illegal Reference Value 0
---,sqlx..regexp_extract(STR,'[^|]+',1,'1') as GET_STRING 
--- ERROR [HY000] ERROR:  0 : Regex: Illegal Option (1)
  from (
       select '201|202|203|204|205|206|207|208|209|210|211|212|213|214|215' as STR) AS F
;


Putting this all together
select a.ROWID AS RECID                  --- Original source records PK or unique identifier. 
      ,c.cnt AS POSITION_IN_STRING_ARRAY --- Position of elemen in the field array (optional).
      ,sqlx..regexp_extract(LONGSTRING,'[^|]+',1,cast(c.cnt as int)) GET_VALUE_EVERY_ELEMENT
from (select ROWID, * from TESTDB.LONGSTRING
) a --- Generate sequence from 1..N. Where N is the longest array in the field CNTRCT_SPEED
cross join (select row_number () OVER (order by 1) as cnt    
                 FROM (SELECT 1 FROM TESTDB.LONGSTRING  limit 70 ) v ) c
--- Eliminate non-existing elements. They still listed in Cartesian Product (1..70) where there is no match.
WHERE GET_VALUE_EVERY_ELEMENT IS NOT NULL
;




PERFORMANCE COMPARISON:

Notice,  limiting depth of search in crossjoin  ... limit70 will reduce the data explosion and improve performance.  This is tested at TF6 with 14 million record original source table.



SOLUTION #1array_split ()  and get_value_varchar () .
/*  LIMIT 5,000,000 Duration 19.1540956 sec. Records Affected:   6,605,274. */
/* LIMIT 10,000,000 Duration 52.2369878 sec. Records Affected:  43,817,113. */
/* LIMIT 12,000,000 Duration 45.5606059 sec. Records Affected:  79,234,929.*/
/* LIMIT 13,000,000 Duration 49.7078431 sec. Records Affected:  98,032,515. */
/* LIMIT 14,000,000 Duration 50.1268671 sec. Records Affected: 106,848,663.*/


SOLUTION #2:  regexp_extract () .
/* LIMIT  5,000,000 Duration  131.9048611 sec. Records Affected:    6,605,274. */
/* LIMIT 10,000,000 Duration  683.4229685 sec. Records Affected:   43,817,113. */
/* LIMIT 12,000,000 Duration 1175.0658919 sec. Records Affected:   79,234,929. */
/* LIMIT 13,000,000 Duration 1433.8532871 sec. Records Affected:   98,032,515. */
/* LIMIT 14,000,000 Duration 1562.2020755 sec. Records Affected:  106,848,663. */