list4xt : Mailing list for the XT users community.
[list4xt] Re: Hacking XT to work with non-unicode.
Subject: [list4xt] Re: Hacking XT to work with non-unicode.
User: Website From: Mike Brown (mike@skew.org)
Date: 24/05/2000 - 21:17
> XP> XP supports the following encodings:
> XP>
> XP> UTF-8
> XP> UTF-16
> XP> ISO-8859-1
> XP> US-ASCII
>
> If XP is not firing an error on a 'broken' encoding, but silently
> corrupts the content - this is serious bug in XP then. I can't belive
> XP is such a ... thing.
>
> you can reproduce this problem easily - just try to transform tiny
> text file with <text> symbols greater than 128 </text>
Section 4.3.3 of XML 1.0 says:
"Parsed entities which are stored in an encoding other than UTF-8 or
UTF-16 must begin with a text declaration containing an encoding
declaration"
and
"... it is an error ... for an entity which begins with neither a Byte
Order Mark nor an encoding declaration to use an encoding other than
UTF-8."
Therefore, if you are not declaring the encoding, it is not reasonable to
expect an XML parser to know that your file's bytes represent characters
via windows-1251 encoding rather than UTF-8.
If you are declaring the encoding as windows-1251, XP will tell you
"unsupported encoding", as it should.
If you are not declaring the encoding, UTF-8 will usually be assumed, and
the document is then subjected to the requirements of UTF-8. You will
likely get "character not allowed" from XP when it encounters a byte
sequence that is not allowed in UTF-8. If the bytes happen to form legal
UTF-8 sequences, then the document will be parsed but the data will be
'corrupted'. This is not a flaw in XP.
- Mike
___________________________________________________________
Mike J. Brown, software engineer, Webb Interactive Services
XML/XSL stuff: http://www.skew.org/ http://www.webb.net/
--
Mailing list for the XT users community. (http://adultstoredesign.com)
(mailto:list4xt-request@4xt.org?Subject=unsubscribe to unsubscribe)
Archive générée par hypermail 2b28 le 06/11/2001 - 11:46 CET
webmaster@4xt.org
|