StarQuest Technical Documents
StarSQL Character Conversion and National Language Support
Last Update: 08 March 2011
Product: StarSQL for Linux, UNIX, and Windows (ODBC driver)
Version: v5.3 and later
Article ID: SQV00SQ015
Abstract
The document describes the format of the file that StarSQL for Linux, UNIX, and Windows (the ODBC driver) uses to look up CCSID values and determine which character conversion routine to use for converting character data between disparate systems, and lists the CCSIDs that are currently supported by StarSQL.
Solution
StarSQL uses a data-driven architecture to support character conversions, which allows support for specific languages and character encoding schemes to be added without modifying the StarSQL source code. You can specify particular Coded Character Set Identifiers (CCSIDs) to use by setting the TypDefOver settings of the data source definition that StarSQL uses to connect to the host. Refer to the documentation for your version of StarSQL for details about customizing the StarSQL driver data source.
The Conversion Table
StarSQL performs inbound data conversion from the host system based upon the conversions that are defined in the ccsid.csv table that is installed with StarSQL. The ccsid.csv table is platform-specific, and is installed to the \Programs\StarSQL directory of a Windows-based computer or to the $STARSQL/etc/conf subdirectory of a Linux- or UNIX-based computer. The format of the CCSID.CSV table is as follows:
Column 1 | Column 2 | Column 3 | Column 4 | Column 5 | Column 6 |
---|---|---|---|---|---|
CCSID | 'A' for
ASCII, 'E' for EBCDIC, 'U' for Unicode |
'S' (for SBCS) 'M' (for MBCS) 'G' (for Graphic) |
mapping for the iconv codeset to use for the CCSID | client locale codeset name if different from iconv codeset | an optional, single-byte CCSID to associate with a multi-byte CCSID |
Contact StarQuest Customer Support to request that additional languages or codesets be added to the ccsid.csv table and conversion routines.
StarSQL uses the ccsid.csv table and conversion routines to convert all inbound data to characters defined by the local code page of the client computer, and to convert single-byte outbound data as necessary to match the CCSID expected by DB2.
StarSQL and DB2 communicate which CCSIDs that they plan to use in the TYPDEFOVR parameters that are sent at connect time. If you encounter a problem when connecting or sending data, review the error messages for information about an unsupported CCSID or an invalid CCSID in the TYPDEFOVR setting.
Supported CCSIDs
StarSQL supports converting character data between a wide range of CCSID-to-CCSID pairs and CCSID-to-code-page pairs. It supports the Group 1, Group 1A, Group 2, and Unicode character sets as defined by CDRA (IBM Character Definition Representation Architecture)..
-
Group 1 covers the Roman Alphabet Number 1, which includes Australia, Hong Kong, New Zealand, North and South America, and Western Europe.
-
Group 1A covers multilingual scripts Cyrillic, Hebrew, Greek, and Turkish. The Latin 2 character set associated with Central Europe is supported in this group.
-
Group 2 covers double-byte coding for Japan, Korea, the People’s Republic of China, the Republic of China, and Thailand.
The following table lists the CCSIDs that StarSQL currently supports. Some CCSIDs may not be available on all platforms.
CCSID | Description |
---|---|
037 | Europe EBCDIC (Australia, Brazil, Canada, Netherlands, New Zealand, Portugal) |
256 | Netherlands EBCDIC |
273 | Austria, Germany EBCDIC |
277 | Denmark, Norway EBCDIC |
278 | Finland, Sweden EBCDIC |
280 | Italian EBCDIC |
284 | Spanish EBCDIC |
285 | United Kingdom EBCDIC |
290 | Japanese EBCDIC (SBCS) |
297 | French EBCDIC |
300 | Japanese EBCDIC (DBCS) |
301 | Japanese PC-Data (DBCS including 1880 UDC) |
367 | US ANSI X3.4 ASCI |
420 | Arabic EBCDIC |
423 | Greek EBCDIC |
424 | Hebrew EBCDIC |
437 | USA PC-Data |
500 | International EBCDIC |
813 | ISO 8859-7 ASCII |
819 | ISO 8859-1 ASCII (Latin Alphabet) |
833 | Korean EBCDIC |
834 | Korean EBCDIC (DBCS) |
835 | Traditional Chinese EBCDIC (DBCS) |
836 | Simplified Chinese EBCDIC (extended SBCS) |
837 | Simplified Chinese EBCDIC (MBCS) |
838 | Thailand EBCDIC |
850 | PC-Data MLP 222 Latin-1 |
856 | Hebrew PC-Data |
866 | Cyrillic PC-Data |
870 | Latin-2 EBCDIC |
871 | Iceland EBCDIC |
874 | Thai PC-Data |
875 | Greek EBCDIC |
878 | Kois-Russian Cyrillic |
880 | Cyrillic EBCDIC |
895 | Japanese (7-bit Latin) |
897 | Japanese PC-Data (SBCS) |
905 | Turkey EBCDIC |
912 | ISO 8859-2 ASCII |
913 | ISO 8859-3 ASCII |
914 | ISO 8859-4 ASCII |
915 | ISO 8859-5 ASCII |
916 | ISO 8859-8 ASCII |
918 | Urdu EBCDIC |
920 | ISO 8859-9 ASCII |
921 | ISO 8859-13 ASCII |
923 | ISO 8859-15 ASCII |
924 | Latin 9 EBCDIC |
930 | Japan EBCDIC (MBCS) |
932 | Japan PC-Data (MBCS) |
933 | Korea EBCDIC (MBCS) |
935 | Simplified Chinese EBCDIC (MBCS) |
936 | Simplified Chinese PC-Data (SBCS) |
937 | Traditional Chinese EBCDIC (SBCS) |
938 | Traditional Chinese PC-Data (MBCS) |
939 | Japan EBCDIC (MBCS) |
943 | Japan PC-Data (MBCS) for Open environment |
949 | Korea PC-Data (MBCS) |
950 | Traditional Chinese PC-Data (mixed for IBM BIG-5) |
951 | IBM KS PC-Data (MBCS) |
954 | Japanese EUC |
964 | Traditional Chinese EUC |
970 | Korean EUC |
1025 | Cyrillic EBCDIC |
1026 | Turkey Latin-5 EBCDIC |
1027 | Japan Latin EBCDIC |
1041 | Japan PC-Data |
1046 | Arabic PC-Data |
1047 | Latin Open System EBCDIC |
1051 | HP emulation |
1088 | Korea KS PC-Data |
1089 | Arabic ISO 8859-6 |
1097 | Farsi EBCDIC |
1112 | Baltic EBCDIC |
1122 | Estonia EBCDIC |
1123 | Ukraine EBCDIC |
1130 | Vietnamese EBCDIC |
1132 | Lao EBCDIC |
1140 | COM Europe ECECP |
1141 | Austria, Germany ECECP |
1142 | Denmark, Norway ECECP |
1143 | Finland, Sweden ECECP |
1144 | Italian ECECP |
1145 | Spanish ECECP |
1146 | United Kingdom ECECP |
1147 | French ECECP |
1148 | International ECECP |
1149 | Iceland ECECP |
1153 | Latin-2 EBCDIC |
1154 | Cyrillic EBCDIC |
1155 | Turkey Latin-5 with euro |
1156 | Baltic, Multilingual with euro |
1157 | Estonia EBCDIC |
1160 | Thai EBCDIC (SBCS) |
1161 | Thai PC-Data (SBCS) |
1167 | Kois Russian |
1168 | Kois Ukrainian |
1200 | UTF-16 Big Endian with IBM PUA |
1208 | UTF-8 with IBM PUA |
1250 | MS-Windows Latin-2 |
1251 | MS-Windows Cyrillic |
1252 | MS-Windows Latin-1 |
1253 | MS-Windows Greek |
1254 | MS-Windows Turkey |
1255 | MS-Windows Hebrew |
1256 | MS-Windows Arabic |
1257 | MS-Windows Baltic |
1258 | MS-Windows Vietnamese |
1363 | MS-Windows Korean |
1364 | Korean mixed Extended |
1375 | Big-5 extension for HKSCS (MBCS) |
1381 | Simplified Chinese PC-Data mixed (IBM GB) |
1383 | Simplified Chinese EUC |
1386 | Simplified Chinese PC-Data GBK |
1388 | Simplified Chinese EBCDIC (MBCS) |
1390 | Extended Japanese Katakana-Kanji (Extended SBCS) |
1392 | Simplified Chinese PC-Data mixed for GB18030 |
1399 | Extended Japanese Latin-Kanji (Extended SBCS) |
4930 | Korean (Extended DBCS) |
4933 | Simplified Chinese EBCDIC |
4971 | Greek EBCDIC |
5026 | Japanese Katakana EBCDIC |
5035 | Japanese English EBCDIC |
5050 | Japanese EUC |
5123 | Japanese Latin (Extended SBCS) |
5347 | MS-Windows Cyrillic |
5488 | Simplified Chinese PC-Data mixed (fixed) for GB18030 |
8482 | Japanese Katakana |
8612 | Arabic EBCDIC |
9005 | Greek ISO 8859-7:2003 |
9030 | Thai (Extended SBCS) |
12712 | Hebrew EBCDIC |
13121 | Korean (Extended SBCS) |
13124 | Simplified Chinese EBCDIC |
13488 | Unicode UTF-16 |
16684 | Extended Japanese Latin (DBCS) |
16804 | Arabic EBCDIC |
28709 | Traditional Chinese EBCDIC |
DISCLAIMER
The information in technical documents comes without any warranty or applicability for a specific purpose. The author(s) or distributor(s) will not accept responsibility for any damage incurred directly or indirectly through use of the information contained in these documents. The instructions may need to be modified to be appropriate for the hardware and software that has been installed and configured within a particular organization. The information in technical documents should be considered only as an example and may include information from various sources, including IBM, Microsoft, and other organizations.