http://xml.apache.org/http://www.apache.org/http://www.w3.org/

Home

Readme
Installation
Download
CVS Repository

Samples
API JavaDoc
FAQs

Features
Properties

XNI Manual
XML Schema
DOM
Limitations

Release Info
Report a Bug

Questions
 

Answers
 
How do I use Bugzilla to report bugs?
 

Please refer to the Reporting bugs in bugzilla


What happened to xerces.jar
 

In order to take advantage of the fact that this parser is very often used in conjunction with other XML technologies, such as XSLT processors, which also rely on standard API's like DOM and SAX, xerces.jar was split into two jarfiles:

  • xmlParserAPIs.jar contains the DOM level 2, SAX 2.0 and the parsing component of JAXP 1.2 API's;
  • xercesImpl.jar contains the implementation of these API's as well as the XNI API.

For backwards compatibility, we have retained the ability to generate xerces.jar. For instructions, see the installation documentation.


I don't need all the features Xerces provides, but I'm running in an environment where space is at a premium. Is there anything I can do?
 

Partially to address this issue, we've recently begun to distribute compressed jarfiles instead of our traditionally uncompressed files. But if you still need a smaller jar, and don't need things like support for XML Schema or the WML/HTML DOM implementations that Xerces provides, then look at the dtdjars and tinyjars targets in our buildfile. tinyjars will generate a xercesImpl.jar fifty percent smaller than the one we distribute.


How do I turn on DTD validation?
 

You can turn validation on and off via methods available on the SAX2 XMLReader interface. While only the SAXParser implements the XMLReader interface, the methods required for turning on validation are available to both parser classes, DOM and SAX.
The code snippet below shows how to turn validation on -- assume that parser is an instance of either org.apache.xerces.parsers.SAXParser or org.apache.xerces.parsers.DOMParser.

parser.setFeature("http://xml.org/sax/features/validation", true);


What international encodings are supported by Xerces-J?
 
  • UTF-8
  • UTF-16 Big Endian, UTF-16 Little Endian
  • IBM-1208
  • ISO Latin-1 (ISO-8859-1)
  • ISO Latin-2 (ISO-8859-2) [Bosnian, Croatian, Czech, Hungarian, Polish, Romanian, Serbian (in Latin transcription), Serbocroatian, Slovak, Slovenian, Upper and Lower Sorbian]
  • ISO Latin-3 (ISO-8859-3) [Maltese, Esperanto]
  • ISO Latin-4 (ISO-8859-4)
  • ISO Latin Cyrillic (ISO-8859-5)
  • ISO Latin Arabic (ISO-8859-6)
  • ISO Latin Greek (ISO-8859-7)
  • ISO Latin Hebrew (ISO-8859-8)
  • ISO Latin-5 (ISO-8859-9) [Turkish]
  • Extended Unix Code, packed for Japanese (euc-jp, eucjis)
  • Japanese Shift JIS (shift-jis)
  • Chinese (big5)
  • Chinese for PRC (mixed 1/2 byte) (gb2312)
  • Japanese ISO-2022-JP (iso-2022-jp)
  • Cyrillic (koi8-r)
  • Extended Unix Code, packed for Korean (euc-kr)
  • Russian Unix, Cyrillic (koi8-r)
  • Windows Thai (cp874)
  • Latin 1 Windows (cp1252) (and all other cp125? encodings recognized by IANA)
  • cp858
  • EBCDIC encodings:
    • EBCDIC US (ebcdic-cp-us)
    • EBCDIC Canada (ebcdic-cp-ca)
    • EBCDIC Netherland (ebcdic-cp-nl)
    • EBCDIC Denmark (ebcdic-cp-dk)
    • EBCDIC Norway (ebcdic-cp-no)
    • EBCDIC Finland (ebcdic-cp-fi)
    • EBCDIC Sweden (ebcdic-cp-se)
    • EBCDIC Italy (ebcdic-cp-it)
    • EBCDIC Spain, Latin America (ebcdic-cp-es)
    • EBCDIC Great Britain (ebcdic-cp-gb)
    • EBCDIC France (ebcdic-cp-fr)
    • EBCDIC Hebrew (ebcdic-cp-he)
    • EBCDIC Switzerland (ebcdic-cp-ch)
    • EBCDIC Roece (ebcdic-cp-roece)
    • EBCDIC Yugoslavia (ebcdic-cp-yu)
    • EBCDIC Iceland (ebcdic-cp-is)
    • EBCDIC Urdu (ebcdic-cp-ar2)
    • Latin 0 EBCDIC
    • EBCDIC Arabic (ebcdic-cp-ar1)
Note:UCS-4 is not yet supported, but it is hoped that support will be available soon.

Parser is not able to access schema documents or external entities available on the Internet
 

Parser is not able to access various external entities or schema documents (imported, included etc..) available on Internet, Say, "http://www.w3.org/2001/XMLSchema.xsd" Schema for Schemas or "http://www.w3.org/2001/xml.xsd" schema defining xml:base, xml:lang attributes etc.. or any other external entity available on the Internet.
One of the reason could be your proxy settings that does not allow parser to make URL connections through proxy server. To solve this problem application has to set two System properties "http.proxyHost" and "http.proxyPort" before the parsing begins. Other reasons could be the strict firewall settings which doesn't allow any URL connection to be made to the outside web or the server on which documents are kept is currently not running etc..


Why does the SAX parser lose some character data or why is the data split into several chunks?
 

If you read the SAX documentation, you will find that SAX may deliver contiguous text as multiple calls to characters(), for reasons having to do with parser efficiency and input buffering. It is the programmer's responsibility to deal with that appropriately, e.g. by accumulating text until the next non-characters() event.


Is there any way I can determine what encoding an entity was written in, or what XML version the document conformed to, if I'm using SAX?
 

The answer to this question is that, yes there is a way, but it's not particularly beautiful. There is no way in SAX 2.0.0 or 2.0.1 to get hold of these pieces of information; the SAX Locator2 interface from the 1.1 extensions--still in Alpha at the time of writing--does provide methods to accomplish this, but since Xerces is required to support precisely SAX 2.0.0 by Sun TCK rules, we cannot ship this interface. However, we can still support the appropriate methods on the objects we provide to implement the SAX Locator interface. Therefore, assuming Locator is an instance of the SAX Locator interface that Xerces has passed back in a setDocumentLocator call, you can use a method like this to determine the encoding of the entity currently being parsed:

 
    import java.lang.reflect.Method;
    private String getEncoding(Locator locator) {
        String encoding = null;
        Method getEncoding = null;
        try {
            getEncoding = locator.getClass().getMethod("getEncoding", new Class[]{});
            if(getEncoding != null) {
                encoding = (String)getEncoding.invoke(locator, null);
            }
        } catch (Exception e) {
            // either this locator object doesn't have this
            // method, or we're on an old JDK
        }
        return encoding;
    } 
        

This code has the advantage that it will compile on JDK 1.1.8, though it will only produce non-null results on 1.2.x JDK's and later. Substituting getXMLVersion for getEncoding will enable you to determine the version of XML to which the instance document conforms.




Copyright © 1999-2002 The Apache Software Foundation. All Rights Reserved.