SUMMARY:
There is NO significant storage saving (just 3.7%) when choosing between NATIONAL CHARACTER (Unicode) and CHARACTER (ASCII) for database table design at Netezza.
I recommend choosing a datatype that pertains to design and scalability of the application.
DETAILS:
Compare storage saving using CHARACTER VARYING instead of NATIONAL CHARACTER VARYING datatype at Netezza NPS 7.2.1.
NATIONAL CHARACTER
|
CHARACTER
|
CREATE TABLE ARCHIVE_TABLE_201405
(
CMTS_NM NATIONAL CHARACTER VARYING (30 ),
INTF_NM NATIONAL CHARACTER VARYING (30 ),
MAC_ADDR NATIONAL CHARACTER VARYING (20 ),
CM_STA NATIONAL CHARACTER VARYING (20 ),
TM_OFST CHARACTER VARYING (20 ),
IP_ADDR NATIONAL CHARACTER VARYING (20 ),
NODE_KEY NATIONAL CHARACTER VARYING (50 ),
BATCH_TS TIMESTAMP,
AS_OF_DT TIMESTAMP,
DATA_ACQ_ID INTEGER
)
DISTRIBUTE ON (AS_OF_DT, MAC_ADDR);
|
CREATE TABLE ARCHIVE_TABLE_201405_VARCHAR
(
CMTS_NM CHARACTER VARYING (30 ),
INTF_NM CHARACTER VARYING (30 ),
MAC_ADDR CHARACTER VARYING (20 ),
CM_STA CHARACTER VARYING (20 ),
TM_OFST CHARACTER VARYING (20 ),
IP_ADDR CHARACTER VARYING (20 ),
NODE_KEY CHARACTER VARYING (50 ),
BATCH_TS TIMESTAMP,
AS_OF_DT TIMESTAMP,
DATA_ACQ_ID INTEGER
)
DISTRIBUTE ON (AS_OF_DT, MAC_ADDR);
|
ALLOCATED_GB: 38 Gb
USED_GB : 38 Gb
RECORDS : 1,369,391,278
|
ALLOCATED_GB: 36.7 Gb
USED_GB : 36.7 Gb
RECORDS : 1,369,391,278
|
From description below, once could assume that each NVARCHAR (Unicode) character takes 4x larger storage than VARCHAR (ASCII). The result of the experiment indicates that this is not the case.
VARCHAR | CHARACTER VARYING, VARCHAR(n), CHARACTER VARYING(n), CHAR VARYING(n) | Variable length character string to a maximum length of n. If you do not specify n, the default is an unsized VARCHAR value. There is no blank padding, and the value is stored as entered. The maximum character string size is 64,000. |
NVARCHAR | NATIONAL CHARACTER VARYING, NATIONAL CHAR VARYING, NATIONAL CHAR VARYING(n), NATIONAL CHARACTER VARYING(n), and NVARCHAR(n) | Variable length character string to a maximum length of n. If you do not specify n, the default is an unsized NVARCHAR value. The maximum length is 16,000 characters. |
REF: Netezza Data types and aliases
this experiment doesn't look quite ok. For true size measure you should have created both tables with a single column, one with varchar and another with nvarchar, and then compared the difference. Because the relative persent to the whole size might be small, while the percent of diff per column might be more significant.
ReplyDeletewhat is unsized value?
ReplyDeleteIt seems Netezza creates table with unused space to be ready for inserts. I noticed when creating a tiny table that might be a few kilobytes, Netezza allocates many megabytes. Inserting more rows didn't seem to allocate more space.
DeleteYou may also see unused after doing a DELETE and before GROOM, but I haven't verified it.