[Home]  [List]  [News]  [Docs]  [FAQ]  [Downloads]  [Resources]  [About]
Search :
/Home /List

list4xt : Mailing list for the XT users community.

[list4xt] Re: Hacking XT to work with non-unicode.

Subject: [list4xt] Re: Hacking XT to work with non-unicode.
From: Eric van der Vlist (vdv@dyomedea.com)
Date: 16/05/2000 - 09:39


Paul,

Paul Tchistopolskii wrote:
>
> Cyrillic ( Windows ) ( See MSIE View->Encoding )

The one mentioned as Windows-1251 ?

> The idea of that encoding is that anything greater than 128 is a russian
> character ( not double byte ;-)
>
> XT corrupts those 'strange' characters. xsl:output method="text" corrupts
> them in one way method="html" corrupts in another.
>
> This should not be an issue of XP ( because transformation works ) there
> could be something wrong in output handler or at some other stage between
> reading source XML document and writing out the results.

Are you sure ?
The list of "officially" supported encodings for XP is :

XP> XP supports the following encodings:
XP>
XP> UTF-8
XP> UTF-16
XP> ISO-8859-1
XP> US-ASCII

> I don't know. You may try to transform the attached xml file with XT ...
> to see what happens.

I have configured the list server software (Listar) to remove
attachments...
 
> > If your problem is only with the output handler, you may have a look at
> > the XHTML Output Handler I have published.
>
> I don't know what is the problem. Do you ?

Not for sure, but I can give some tracks to follow.

XT's characters input is straightforward :

(XMLProcessorImpl.java)
    public void characters(char ch[], int start, int length) {
      int need = length + dataBufUsed;
      if (need > dataBuf.length) {
        int newLength = dataBuf.length << 1;
        while (need > newLength)
          newLength <<= 1;
        char[] tem = dataBuf;
        dataBuf = new char[newLength];
        if (dataBufUsed > 0)
          System.arraycopy(tem, 0, dataBuf, 0, dataBufUsed);
      }
      for (; length > 0; length--)
        dataBuf[dataBufUsed++] = ch[start++];
    }

You may still have a problem with XP/SAX (which you can check adding a
trace).

Then you have the output handlers which are performing the final
translation.

The HTMLOutputHandler, for instance, is managing a maxRepresentableChar
value and also setting a Writer media type (which may also create
potential problems) :

  public DocumentHandler init(Destination dest, AttributeList atts)
    throws IOException {
    String mediaType = atts.getValue("media-type");
    if (mediaType == null)
      mediaType = "text/html";
    encoding = atts.getValue("encoding");
    if (encoding == null) {
      // not all Java implementations support ASCII
      writer = dest.getWriter(mediaType, "iso-8859-1");
      // use character references for non-ASCII characters
      maxRepresentableChar = '\u007F';
    }
    else {
      writer = dest.getWriter(mediaType, encoding);
      encoding = dest.getEncoding();
      if (encoding.equalsIgnoreCase("iso-8859-1"))
        maxRepresentableChar = '\u00FF';
      else if (encoding.equalsIgnoreCase("us-ascii"))
        maxRepresentableChar = '\u007F';
    }
    keepOpen = dest.keepOpen();
    if ("no".equals(atts.getValue("indent")))
      indent = false;
    return this;
  }

The maxRepresentableChar is used in the characters method :

  if (c <= maxRepresentableChar)
    write(c);
  else
    write(getCharString(c));
  break;

Depending on the media type you are using, it can corrupt your output...

Hope this helps.

Eric

-- 
------------------------------------------------------------------------
Eric van der Vlist       Dyomedea                    http://adultwebsource.com
http://merchantaccounthighrisk.com -           http://wewantpeace2012.org              http://thepaymentguru.com
------------------------------------------------------------------------

-- Mailing list for the XT users community. (http://militaryacceptanceproject.org) (mailto:list4xt-request@4xt.org?Subject=unsubscribe to unsubscribe)



Archive générée par hypermail 2b28 le 06/11/2001 - 11:46 CET

webmaster@4xt.org


A site designed by Dyomedea