Unicode

Unicode — Unicode utility functions.

Synopsis

typedef             librdf_unichar;
int                 librdf_unicode_char_to_utf8         (librdf_unichar c,
                                                         unsigned char *output,
                                                         int length);
int                 librdf_utf8_to_unicode_char         (librdf_unichar *output,
                                                         const unsigned char *input,
                                                         int length);
unsigned char *     librdf_utf8_to_latin1               (const unsigned char *input,
                                                         int length,
                                                         int *output_length);
unsigned char *     librdf_latin1_to_utf8               (const unsigned char *input,
                                                         int length,
                                                         int *output_length);
void                librdf_utf8_print                   (const unsigned char *input,
                                                         int length,
                                                         FILE *stream);

Description

Utility functions to convert between UTF-8, full Unicode and Latin-1. Redland uses UTF-8 for all string formats (except where noted) but these may need to be converted to other Unicode encodings or downgraded with loss to Latin-1.

Details

librdf_unichar

typedef raptor_unichar librdf_unichar;

Unicode codepoint.


librdf_unicode_char_to_utf8 ()

int                 librdf_unicode_char_to_utf8         (librdf_unichar c,
                                                         unsigned char *output,
                                                         int length);

Convert a Unicode character to UTF-8 encoding.

deprecated: Use raptor_unicode_utf8_string_put_char()

If buffer is NULL, then will calculate the length rather than perform it. This can be used by the caller to allocate space and then re-call this function with the new buffer.

c :

Unicode character

output :

UTF-8 string buffer or NULL

length :

buffer size

Returns :

bytes written to output buffer or <0 on failure

librdf_utf8_to_unicode_char ()

int                 librdf_utf8_to_unicode_char         (librdf_unichar *output,
                                                         const unsigned char *input,
                                                         int length);

Convert an UTF-8 encoded buffer to a Unicode character.

deprecated: Use raptor_unicode_utf8_string_get_char() noting that the arg order has changed to input, length, output.

If output is NULL, then will calculate the number of bytes that will be used from the input buffer and not perform the conversion.

output :

Pointer to the Unicode character or NULL

input :

UTF-8 string buffer

length :

buffer size

Returns :

bytes used from input buffer or <0 on failure

librdf_utf8_to_latin1 ()

unsigned char *     librdf_utf8_to_latin1               (const unsigned char *input,
                                                         int length,
                                                         int *output_length);

Convert a UTF-8 string to ISO Latin-1.

Converts the given UTF-8 string to the ISO Latin-1 subset of Unicode (characters 0x00-0xff), discarding any out of range characters.

If the output_length pointer is not NULL, the returned string length will be stored there.

input :

UTF-8 string buffer

length :

buffer size

output_length :

Pointer to variable to store resulting string length or NULL

Returns :

pointer to new ISO Latin-1 string or NULL on failure

librdf_latin1_to_utf8 ()

unsigned char *     librdf_latin1_to_utf8               (const unsigned char *input,
                                                         int length,
                                                         int *output_length);

Convert an ISO Latin-1 encoded string to UTF-8.

Converts the given ISO Latin-1 string to an UTF-8 encoded string representing the same content. This is lossless.

If the output_length pointer is not NULL, the returned string length will be stored there.

input :

ISO Latin-1 string buffer

length :

buffer size

output_length :

Pointer to variable to store resulting string length or NULL

Returns :

pointer to new UTF-8 string or NULL on failure

librdf_utf8_print ()

void                librdf_utf8_print                   (const unsigned char *input,
                                                         int length,
                                                         FILE *stream);

Print a UTF-8 string to a stream.

Pretty prints the UTF-8 string in a pseudo-C character format like \uhex digits when the characters fail the isprint() test.

input :

UTF-8 string buffer

length :

buffer size

stream :

FILE* stream