[Home]  [List]  [News]  [Docs]  [FAQ]  [Downloads]  [Resources]  [About]
Search :
/Home /List

list4xt : Mailing list for the XT users community.

[list4xt] Re: Hacking XT to work with non-unicode.

Subject: [list4xt] Re: Hacking XT to work with non-unicode.

User: Website

From: Mike Brown (mike@skew.org)
Date: 24/05/2000 - 21:17


> XP> XP supports the following encodings:
> XP>
> XP> UTF-8
> XP> UTF-16
> XP> ISO-8859-1
> XP> US-ASCII
>
> If XP is not firing an error on a 'broken' encoding, but silently
> corrupts the content - this is serious bug in XP then. I can't belive
> XP is such a ... thing.
>
> you can reproduce this problem easily - just try to transform tiny
> text file with <text> symbols greater than 128 </text>

Section 4.3.3 of XML 1.0 says:

"Parsed entities which are stored in an encoding other than UTF-8 or
UTF-16 must begin with a text declaration containing an encoding
declaration"

and

"... it is an error ... for an entity which begins with neither a Byte
Order Mark nor an encoding declaration to use an encoding other than
UTF-8."

Therefore, if you are not declaring the encoding, it is not reasonable to
expect an XML parser to know that your file's bytes represent characters
via windows-1251 encoding rather than UTF-8.

If you are declaring the encoding as windows-1251, XP will tell you
"unsupported encoding", as it should.

If you are not declaring the encoding, UTF-8 will usually be assumed, and
the document is then subjected to the requirements of UTF-8. You will
likely get "character not allowed" from XP when it encounters a byte
sequence that is not allowed in UTF-8. If the bytes happen to form legal
UTF-8 sequences, then the document will be parsed but the data will be
'corrupted'. This is not a flaw in XP.

   - Mike
___________________________________________________________
Mike J. Brown, software engineer, Webb Interactive Services
XML/XSL stuff: http://www.skew.org/ http://www.webb.net/

--
Mailing list for the XT users community.     (http://adultstoredesign.com)
(mailto:list4xt-request@4xt.org?Subject=unsubscribe to unsubscribe)



Archive générée par hypermail 2b28 le 06/11/2001 - 11:46 CET

webmaster@4xt.org