Thursday, April 28, 2016

Netezza: datatype storage CHARACTER v.s. NATIONAL CHARACTER

SUMMARY

There is NO significant storage saving (just 3.7%) when choosing between  NATIONAL CHARACTER (Unicode) and CHARACTER (ASCII) for database table design at Netezza.

I recommend choosing a datatype that pertains to design and scalability of the application.

DETAILS

Compare storage saving using CHARACTER VARYING  instead of NATIONAL CHARACTER VARYING  datatype at Netezza NPS 7.2.1.

NATIONAL CHARACTER
CHARACTER
CREATE TABLE ARCHIVE_TABLE_201405
(
       CMTS_NM     NATIONAL CHARACTER VARYING (30 ),
       INTF_NM     NATIONAL CHARACTER VARYING (30 ),
       MAC_ADDR    NATIONAL CHARACTER VARYING (20 ),
       CM_STA      NATIONAL CHARACTER VARYING (20 ),
       TM_OFST     CHARACTER VARYING (20 ),
       IP_ADDR     NATIONAL CHARACTER VARYING (20 ),
       NODE_KEY           NATIONAL CHARACTER VARYING (50 ),
       BATCH_TS    TIMESTAMP,
       AS_OF_DT    TIMESTAMP,
       DATA_ACQ_ID INTEGER
)
DISTRIBUTE ON (AS_OF_DT, MAC_ADDR);
CREATE TABLE ARCHIVE_TABLE_201405_VARCHAR
(
       CMTS_NM     CHARACTER VARYING (30 ),
       INTF_NM     CHARACTER VARYING (30 ),
       MAC_ADDR    CHARACTER VARYING (20 ),
       CM_STA      CHARACTER VARYING (20 ),
       TM_OFST     CHARACTER VARYING (20 ),
       IP_ADDR     CHARACTER VARYING (20 ),
       NODE_KEY           CHARACTER VARYING (50 ),
       BATCH_TS    TIMESTAMP,
       AS_OF_DT    TIMESTAMP,
       DATA_ACQ_ID INTEGER
)
DISTRIBUTE ON (AS_OF_DT, MAC_ADDR);
ALLOCATED_GB: 38 Gb
USED_GB     : 38 Gb
RECORDS     : 1,369,391,278
ALLOCATED_GB: 36.7 Gb
USED_GB     : 36.7 Gb
RECORDS     : 1,369,391,278


From description below, once could assume that each NVARCHAR (Unicode) character takes 4x larger storage than VARCHAR (ASCII).  The result of the experiment indicates that this is not the case.

VARCHARCHARACTER VARYING, VARCHAR(n), CHARACTER VARYING(n), CHAR VARYING(n)Variable length character string to a maximum length of n. If you do not specify n, the default is an unsized VARCHAR value. There is no blank padding, and the value is stored as entered. The maximum character string size is 64,000.
NVARCHARNATIONAL CHARACTER VARYING, NATIONAL CHAR VARYING, NATIONAL CHAR VARYING(n), NATIONAL CHARACTER VARYING(n), and NVARCHAR(n)Variable length character string to a maximum length of n. If you do not specify n, the default is an unsized NVARCHAR value. The maximum length is 16,000 characters.


3 comments:

  1. this experiment doesn't look quite ok. For true size measure you should have created both tables with a single column, one with varchar and another with nvarchar, and then compared the difference. Because the relative persent to the whole size might be small, while the percent of diff per column might be more significant.

    ReplyDelete
  2. what is unsized value?

    ReplyDelete
    Replies
    1. It seems Netezza creates table with unused space to be ready for inserts. I noticed when creating a tiny table that might be a few kilobytes, Netezza allocates many megabytes. Inserting more rows didn't seem to allocate more space.

      You may also see unused after doing a DELETE and before GROOM, but I haven't verified it.

      Delete