TDS 7.0 for Nonwestern Languages

TDS 7.0 uses 2-byte Unicode (technically, UCS-2) to transfer character data between servers and clients. Included in "character data" are query text (i.e., SQL), metadata (table names and such), and bona fide data of datatypes nchar, nvarchar, and ntext.

Since most Unix tools and languages do not support UCS-2, FreeTDS allows conversion by the client to other character sets using the iconv standard. Background information on Unicode and how it affects FreeTDS can be found in the appendix. If no iconv library is found, or if it is explicitly disabled, FreeTDS will use its built-in iconv substitute, and will be capable of converting between only ISO-8859-1 and UCS-2.

To learn what character set the client is using, FreeTDS examines the freetds.conf entry. If it finds nothing there, it assumes the client is using ISO-8859-1. That is generally a safe assumption for western languages such as English or French, but produces garbage for other languages.

To list all supported iconv character sets try iconv(1). GNU's does:

$ iconv --list

For other systems, consult your documentation (most likely man iconv will give you some hints).

In this example a server named mssql will return data encoded in the GREEK character set.

Example 5-2. Configuring for GREEK freetds.conf setting

[mssql]
	host = ntbox.mydomain.com
	port = 1433
	tds version = 7.0
	client charset = GREEK

If FreeTDS runs into a character it can not convert, its behavior varies according to the severity of the problem. On retrieving data from the server, FreeTDS substitutes an ASCII '?' in the character's place, and emits a warning message stating that some characters could not be converted. On sending data to the server, FreeTDS aborts the query and emits an error message. It is well to ensure that the data contained in the database is representable in the client's character set.

If you have a mix of character data that can not be contained in a single byte character set, you may wish to use UTF-8. UTF-8 is a variable length unicode encoding that is compatible with ASCII in the range 0 to 127. With UTF-8, you are guaranteed to never have an unconvertible character.

Important

FreeTDS is not fully compatible with multi-byte character sets such as UCS-2. You must use an ASCII-extension charset (e.g., UTF-8, ISO-8859-*)[1]. Extreme care should be taken with testing applications using these encodings. Specifically, many applications do not expect the number of characters returned to exceed the column size (in bytes). On the other hand, support of UTF-8 and UCS-2 is a high priority for the developers. Patches and bug reports in this area are especially welcome.

In the following example, a server named mssql will return data encoded in the UTF-8 character set.

Example 5-3. Configuring for UTF-8 freetds.conf setting

[mssql]
	host = ntbox.mydomain.com
	port = 1433
	tds version = 7.0
	client charset = UTF-8

It is also worth clarifying that TDS 7.0 and above do not accept any specified character set during login, as 4.2 does. A TDS 7.0 login packet uses UCS-2.

Notes

[1]

not EBCDIC or other weird charsets