list4xt : Mailing list for the XT users community.
[list4xt] Re: Hacking XT to work with non-unicode.
Subject: [list4xt] Re: Hacking XT to work with non-unicode.
From: Eric van der Vlist (vdv@dyomedea.com)
Date: 16/05/2000 - 09:39
Paul,
Paul Tchistopolskii wrote:
>
> Cyrillic ( Windows ) ( See MSIE View->Encoding )
The one mentioned as Windows-1251 ?
> The idea of that encoding is that anything greater than 128 is a russian
> character ( not double byte ;-)
>
> XT corrupts those 'strange' characters. xsl:output method="text" corrupts
> them in one way method="html" corrupts in another.
>
> This should not be an issue of XP ( because transformation works ) there
> could be something wrong in output handler or at some other stage between
> reading source XML document and writing out the results.
Are you sure ?
The list of "officially" supported encodings for XP is :
XP> XP supports the following encodings:
XP>
XP> UTF-8
XP> UTF-16
XP> ISO-8859-1
XP> US-ASCII
> I don't know. You may try to transform the attached xml file with XT ...
> to see what happens.
I have configured the list server software (Listar) to remove
attachments...
> > If your problem is only with the output handler, you may have a look at
> > the XHTML Output Handler I have published.
>
> I don't know what is the problem. Do you ?
Not for sure, but I can give some tracks to follow.
XT's characters input is straightforward :
(XMLProcessorImpl.java)
public void characters(char ch[], int start, int length) {
int need = length + dataBufUsed;
if (need > dataBuf.length) {
int newLength = dataBuf.length << 1;
while (need > newLength)
newLength <<= 1;
char[] tem = dataBuf;
dataBuf = new char[newLength];
if (dataBufUsed > 0)
System.arraycopy(tem, 0, dataBuf, 0, dataBufUsed);
}
for (; length > 0; length--)
dataBuf[dataBufUsed++] = ch[start++];
}
You may still have a problem with XP/SAX (which you can check adding a
trace).
Then you have the output handlers which are performing the final
translation.
The HTMLOutputHandler, for instance, is managing a maxRepresentableChar
value and also setting a Writer media type (which may also create
potential problems) :
public DocumentHandler init(Destination dest, AttributeList atts)
throws IOException {
String mediaType = atts.getValue("media-type");
if (mediaType == null)
mediaType = "text/html";
encoding = atts.getValue("encoding");
if (encoding == null) {
// not all Java implementations support ASCII
writer = dest.getWriter(mediaType, "iso-8859-1");
// use character references for non-ASCII characters
maxRepresentableChar = '\u007F';
}
else {
writer = dest.getWriter(mediaType, encoding);
encoding = dest.getEncoding();
if (encoding.equalsIgnoreCase("iso-8859-1"))
maxRepresentableChar = '\u00FF';
else if (encoding.equalsIgnoreCase("us-ascii"))
maxRepresentableChar = '\u007F';
}
keepOpen = dest.keepOpen();
if ("no".equals(atts.getValue("indent")))
indent = false;
return this;
}
The maxRepresentableChar is used in the characters method :
if (c <= maxRepresentableChar)
write(c);
else
write(getCharString(c));
break;
Depending on the media type you are using, it can corrupt your output...
Hope this helps.
Eric
--
------------------------------------------------------------------------
Eric van der Vlist Dyomedea http://adultwebsource.com
http://merchantaccounthighrisk.com - http://wewantpeace2012.org http://thepaymentguru.com
------------------------------------------------------------------------
--
Mailing list for the XT users community. (http://militaryacceptanceproject.org)
(mailto:list4xt-request@4xt.org?Subject=unsubscribe to unsubscribe)
Archive générée par hypermail 2b28 le 06/11/2001 - 11:46 CET
webmaster@4xt.org
|