Encoding {base} | R Documentation |
Read or set the declared encodings for a character vector.
Encoding(x) Encoding(x) <- value
x |
A character vector. |
value |
A character vector of positive length. |
Character strings in R can be declared to be in
"latin1"
or "UTF-8"
. These declarations can be read by
Encoding
, which will return a character vector of values
"latin1"
, "UTF-8"
or "unknown"
, or set, when
value
is recycled as needed and other values are silently
treated as "unknown"
. As from R 2.8.0, ASCII strings will
never be marked with a declared encoding, since their representation
is the same in all encodings.
There are other ways for character strings to acquire a declared
encoding apart from explicitly setting it. Functions
scan
, read.table
, readLines
and parse
have an encoding
argument that is used to declare encodings, iconv
declares encodings from its from
argument, and console input in
suitable locales is also declared. intToUtf8
declares
its output as "UTF-8"
, and output text connections are marked
if running in a suitable locale.
Most character manipulation functions will set the encoding on output
strings if it was declared on the corresponding input. These include
chartr
, strsplit
, strtrim
,
substr
, tolower
and toupper
as well as sub(useBytes = FALSE)
and
gsub(useBytes = FALSE)
. (Also, under some circumstances
paste
will set an encoding.) Note that such functions
do not preserve the encoding, but if they know the input
encoding and that the string has been successfully re-encoded to the
current encoding, they mark the output with the latter (if it is
"latin1"
or "UTF-8"
).
As from R 2.7.0 substr
does preserve the encoding, and
chartr
, tolower
and toupper
preserve UTF-8 encoding on systems with Unicode wide characters. With
their fixed
and perl
options, strsplit
,
sub
and gsub
will give a UTF-8 result if any of
the inputs are UTF-8.
As from R 2.8.0 paste
and sprintf
return a UTF-8 encoded element if any of the inputs to that element
are UTF-8.
A character vector.
## x is intended to be in latin1 x <- "fa\xE7ile" Encoding(x) Encoding(x) <- "latin1" x xx <- iconv(x, "latin1", "UTF-8") Encoding(c(x, xx)) c(x, xx)