http://xml.apache.org/http://www.apache.org/http://www.w3.org/

Overview

Compiler design

Whitespace
xsl:sort
Keys
Comment design

lang()
Unparsed entities

If design
Choose|When|Otherwise design
Include|Import design
Variable|Param design

Runtime

Internal DOM
Namespaces

Translet & TrAX
XPath Predicates
Xsltc Iterators
Xsltc Native API
Xsltc TrAX API
Performance Hints

Credits

Contents
 

Note: This document describes the design of XSLTC's TrAX implementation. The XSLTC TrAX API user documentation is kept in a separate document.

The structure of this document is, and should be kept, as follows:

  • A brief introduction to TrAX/JAXP
  • Overall design of the XSLTC TrAX implementation
  • Detailed design of various TrAX components

Abstract
 

JAXP is the Java extension API for XML parsing. TrAX is an API for XML transformations and is included in the later versions of JAXP. JAXP includes two packages, one for XML parsing and one for XML transformations (TrAX):

    javax.xml.parsers
    javax.xml.transform

XSLTC is an XSLT processing engine and fulfills the role as an XML transformation engine behind the TrAX portion of the JAXP API. XSLTC is a provider for the TrAX API and a client of the JAXP parser API.

This document describes the design used for integrating XSLTC translets with the JAXP TrAX API. The heart of the design is a wrapper class around the XSLTC compiler that extends the JAXP SAXTransformerFactory interface. This factory delivers translet class definitions (Java bytecodes) wrapped inside TrAX Templates objects. These Templates objects can be used to instanciate Transformer objects that transform XML documents into markup or plain text. Alternatively a Transformer object can be created directly by the TransformerFactory, but this approach is not recommended with XSLTC. The reason for this will be explained later in this document.


TrAX basics
 

The Java API for XML Processing (JAXP) includes an XSLT framework based on the Transformation API for XML (TrAX). A JAXP transformation application can use the TrAX framework in two ways. The simplest way is:

  • create an instance of the TransformerFactory class
  • from the factory instance and a given XSLT stylesheet, create a new Transformer object
  • call the Transformer object's transform() method, specifying the XML input and a Result object.
    import javax.xml.transform.*;

        public class Compile {

            public void run(Source xsl) {
                ....
                TransformerFactory factory = TransformerFactory.newInstance();
                Transformer transformer = factory.newTransformer(xsl);
                ....
            }
        }

This suits most conventional XSLT processors that transform XML documents in one go. XSLTC needs one extra step to compile the XSL stylesheet into a Java class (a "translet"). Fortunately TrAX has another approach that suits XSLTC two-step transformation model:

  • create an instance of the TransformerFactory class
  • from the factory instance and a given XSLTC, stylesheet, create a new Templates object (this step will compile the stylesheet and put the bytecodes for translet class(es) into the Templates object)
  • from the Template object create a Transformer object (this will instanciate a new translet object).
  • call the Transformer object's transform() method, specifying the XML input and a Result object.
    import javax.xml.transform.*;

        public class Compile {

            public void run(Source xsl) {
                ....
                TransformerFactory factory = TransformerFactory.newInstance();
                Templates templates = factory.newTemplates(xsl);
                Transformer transformer = templates.newTransformer();
                ....
            }
        }

Note that the first two steps need be performed only once for each stylesheet. Once the stylesheet is compiled into a translet and wrapped in a Templates object, the Templates object can be used over and over again to create Transformer object (instances of the translet). The Templates instances can even be serialized and stored on stable storage (ie. in a memory or disk cache) for later use.

The code below illustrates a simple JAXP transformation application that creates the Transformer directly. Remember that this is not the ideal approach with XSLTC, as the stylesheet is compiled for each transformation.

    import javax.xml.transform.stream.StreamSource;
    import javax.xml.transform.stream.StreamResult;
    import javax.xml.transform.Transformer;
    import javax.xml.transform.TransformerFactory;

    public class Proto {

        public void run(String xmlfile, String xslfile) {
            Transformer transformer;
            TransformerFactory factory = TransformerFactory.newInstance();

        try {
            StreamSource stylesheet = new StreamSource(xslfile);
            transformer = factory.newTransformer(stylesheet);
            transformer.transform(new StreamSource(xmlfile),
                                  new StreamResult(System.out));
        }
        catch (Exception e) {
            // handle errors...
        }
        :
        :
    }

This approach seems simple is probably used in many applications. But, the use of Templates objects is useful when multiple instances of the same Transformer are needed. Transformer objects are not thread safe, and if a server wants to handle several clients requests it would be best off to create one global Templates object, and then from this create a Transformer object for each thread handling the requests. This approach is also by far the best for XSLTC, as the Templates object will hold the class definitions that make up the translet and its auxiliary classes. (Note that the bytecodes and not the actuall class definitions are stored when serializing a Templates object to disk. This is because of class loader security restrictions.) To accomodate this second approach to TrAX transformations, the above class would be modified as follows:

    try {
        StreamSource stylesheet = new StreamSource(xslfile);
        Templates templates = factory.newTemplates(stylesheet);
        transformer = templates.newTransformer();
        transformer.transform(new StreamSource(inputFilename),
                              new StreamResult(System.out));
    }
    catch (Exception e) {
        // handle errors...
    }

TrAX configuration
 

JAXP's TransformerFactory is configurable similar to the other Java extensions. The API supports configuring thefactory by:

  • passing vendor-specific attributes from the application, through the TrAX interface, to the underlying XSL processor
  • registering an ErrorListener that will be used to pass error and warning messages from the XSL processor to the application
  • registering an URIResolver that the application can use to load XSL and XML documents on behalf of the XSL processor (the XSL processor will use this to support the xsl:include and xsl:import elements and the document() functions.

The JAXP TransformerFactory can be queried at runtime to discover what features it supports. For example, an application might want to know if a particular factory implementation supports the use of SAX events as a source, or whether it can write out transformation results as a DOM. The factory API queries with the getFeature() method. In the above code, we could add the following code before the try-catch block:

    if (!factory.getFeature(StreamSource.FEATURE) || !factory.getFeature(StreamResult.FEATURE)) {
        System.err.println("Stream Source/Result not supported by TransformerFactory\nExiting....");
        System.exit(1);
    }

Other elements in the TrAX API are configurable. A Transformer object can be passed settings that override the default output settings and the settings defined in the stylesheet for indentation, output document type, etc.


XSLTC TrAX architecture
 

XSLTC's architecture fits nicely in behind the TrAX interface. XSLTC's compiler is put behind the TransformerFactory interface, the translet class definition (either as a set of in-memory Class objects or as a two-dimmensional array of bytecodes on disk) is encapsulated in the Templates implementation and the instanciated translet object is wrapped inside the Transformer implementation. Figure 1 (below) shows this two-layered TrAX architecture:


Figure 1: Translet class definitions are wrapped inside Templates objects

The TransformerFactory implementation also implements the SAXTransformerFactory and ErrorListener interfaces from the TrAX API.

The TrAX implementation has intentionally been kept completely separate from the XSLTC native code. This prevents users of XSLTC's native API from having to include the TrAX code in an application. All the code that makes up our TrAX implementation resides in this package:

    org.apache.xalan.xsltc.trax

Message to all XSLTC developers: Keep it this way! Do not mix TrAX and Native code!


TrAX implementation details
 

The main components of our TrAX implementation are:

TransformerFactory implementation
 

The methods that make up the basic TransformerFactory iterface are:

    public Templates newTemplates(Source source);
    public Transformer newTransformer();
    public ErrorListener getErrorListener();
    public void setErrorListener(ErrorListener listener);
    public Object getAttribute(String name);
    public void setAttribute(String name, Object value);
    public boolean getFeature(String name);
    public URIResolver getURIResolver();
    public void setURIResolver(URIResolver resolver);
    public Source getAssociatedStylesheet(Source src, String media, String title, String charset);

And for the SAXTransformerFactory interface:

    public TemplatesHandler   newTemplatesHandler();
    public TransformerHandler newTransformerHandler();
    public TransformerHandler newTransformerHandler(Source src);
    public TransformerHandler newTransformerHandler(Templates templates);
    public XMLFilter newXMLFilter(Source src);
    public XMLFilter newXMLFilter(Templates templates);

And for the ErrorListener interface:

    public void error(TransformerException exception);
    public void fatalError(TransformerException exception);
    public void warning(TransformerException exception);
TransformerFactory basics
 

The very core of XSLTC TrAX support for XSLTC is the implementation of the basic TransformerFactory interface. This factory class is more or less a wrapper around the the XSLTC compiler and creates Templates objects in which compiled translet classes can reside. These Templates objects can then be used to create Transformer objects. In cases where the Transformer is created directly by the factory we will use the Templates class internally. In that way the transformation will appear to be done in one step from the users point of view, while we in reality use to steps. As described earler, this is not the best approach when using XSLTC, as it causes the stylesheet to be compiled for each and every transformation.


TransformerFactory attribute settings
 

The getAttribute() and setAttribute() methods only recognise two attributes: translet-name and debug. The latter is obvious - it forces XSLTC to output debug information (dumps the stack in the very unlikely case of a failure). The translet-name attribute can be used to set the default class name for any nameless translet classes that the factory creates. A nameless translet will, for instance, be created when the factory compiles a translet for the identity transformation. There is a default name, GregorSamsa, for nameless translets, so there is no absolute need to set this attribute. (Gregor Samsa is the main character from Kafka's "Metamorphosis" - transformations, metamorphosis - I am sure you see the connection.)


TransformerFactory stylesheet handling
 

The compiler is can be passed a stylesheet through various methods in the TransformerFactory interface. A stylesheet is passed in as a Source object that containin either a DOM, a SAX parser or a stream. The getInputSource() method handles all inputs and converts them, if necessary, to SAX. The TrAX implementation contains an adapter that will generate SAX events from a DOM, and this adapter is used for DOM input. If the Source object contains a SAX parser, this parser is just passed directly to the compiler. A SAX parse is instanciated (using JAXP) if the Source object contains a stream.


TransformerFactory URI resolver
 

A TransformerFactory needs a URIResolver to locate documents that are referenced in <xsl:import> and <xsl:include> elements. XSLTC has an internal interface that shares the same purpose. This internal interface is implemented by the TransformerFactory:

    public InputSource loadSource(String href, String context, XSLTC xsltc);

This method will simply use any defined URIResolver and proxy the call on to the URI resolver's resolve() method. This method returns a Source object, which is converted to SAX events and passed back to the compiler.



Templates design
 
Templates creation
 

The TransformerFactory implementation invokes the XSLTC compiler to generate the translet class and auxiliary classes. These classes are stored inside our Templates implementation in a manner which allows the Templates object to be serialized. By making it possible to store Templates on stable storage we allow the TrAX user to store/cache translet class(es), thus making room for XSLTC's one-compilation-multiple-transformations approach. This was done by giving the Templates implementation an array of byte-arrays that contain the bytecodes for the translet class and its auxiliary classes. When the user first requests a Transformer instance from the Templates object for the first time we create one or more Class objects from these byte arrays. Note that this is done only once as long as the Template object resides in memory. The Templates object then invokes the JVM's class loader with the class definition(s) to instanciate the translet class(es). The translet objects are then wraped inside a Transformer object, which is returned to the client code:


    // Contains the name of the main translet class
    private String   _transletName = null;

    // Contains the actual class definition for the translet class and
    // any auxiliary classes (representing node sort records, predicates, etc.)
    private byte[][] _bytecodes = null;
    
    /**
     * Defines the translet class and auxiliary classes.
     * Returns a reference to the Class object that defines the main class
     */
    private Class defineTransletClasses() {
	TransletClassLoader loader = getTransletClassLoader();

	try {
	    Class transletClass = null;
	    final int classCount = _bytecodes.length;
	    for (int i = 0; i < classCount; i++) {
		Class clazz = loader.defineClass(_bytecodes[i]);
		if (clazz.getName().equals(_transletName))
		    transletClass = clazz;
	    }
	    return transletClass; // Could still be 'null'
	}
	catch (ClassFormatError e) {
	    return null;
	}
    }

Translet class loader
 

The Templates object will create the actual translet Class object(s) the first time the newTransformer() method is called. (The "first time" means the first time either after the object was instanciated or the first time after it has been read from storage using serialization.) These class(es) cannot be created using the standard class loader since the method:

    Class defineClass(String name, byte[] b, int off, int len);

of the ClassLoader is protected. XSLTC uses its own class loader that extends the standard class loader:

    // Our own private class loader - builds Class definitions from bytecodes
    private class TransletClassLoader extends ClassLoader {
        public Class defineClass(byte[] b) {
            return super.defineClass(null, b, 0, b.length);
        }
    }

This class loader is instanciated inside a privileged code section:

    TransletClassLoader loader = 
        (TransletClassLoader) AccessController.doPrivileged(
            new PrivilegedAction() {
                public Object run() {
                    return new TransletClassLoader();
                }
            }
        );

Then, when the newTransformer() method returns it passes back and instance of XSLTC's Transformer implementation that contains an instance of the main translet class. (One transformation may need several Java classes - for sort-records, predicates, etc. - but there is always one main translet class.)


Class loader security issues
 

When XSLTC is placed inside a JAR-file in the $JAVA_HOME/jre/lib/ext it is loaded by the extensions class loader and not the default (bootstrap) class loader. The extensions class loader does not look for class files/definitions in the user's CLASSPATH. This can cause two problems: A) XSLTC does not find classes for external Java functions, and B) XSLTC does not find translet or auxiliary classes when used through the native API.

Both of these problems are caused by XSLTC internally calling the Class.forName() method. This method will use the current class loader to locate the desired class (be it an external Java class or a translet/aux class). This is prevented by forcing XSLTC to use the bootstrap class loader, as illustrated below:


Figure 2: Avoiding the extensions class loader

These are the steps that XSLTC will go through to load a class:

  1. the application requests an instance of the transformer factory
  2. the Java extensions mechanism locates XSLTC as the transformer factory implementation using the extensions class loader
  3. the extensions class loader loads XSLTC
  4. XSLTC's compiler attempts to get a reference to an external Java class, but the call to Class.forName() fails, as the extensions class loader does not use the user's class path
  5. XSLTC attempts to get a reference to the bootstrap class loader, and requests it to load the external class
  6. the bootstrap class loader loads the requested class

Step 5) is only allowed if XSLTC has special permissions. But, remember that this problem only occurs when XSLTC is put in the $JAVA_HOME/jre/lib/ext directory, where it is given all permissions (by the default security file).



Transformer detailed design
 

The Transformer class is a simple proxy that passes transformation settings on to its translet instance before it invokes the translet's doTransform() method. The Transformer's transform() method maps directly to the translet's doTransform() method.

Transformer input and output handling
 

The Transformer handles its input in a manner similar to that of the TransformerFactory. It has two methods for creating standard SAX input and output handlers for its input and output files:

    private DOMImpl getDOM(Source source, int mask);
    private ContentHandler getOutputHandler(Result result);

One aspect of the getDOM method is that it handles four various types of Source objects. In addition to the standard DOM, SAX and stream types, it also handles an extended XSLTCSource input type. This input type is a lightweight wrapper from XSLTC's internal DOM-like input tree. This allows the user to create a cache or pool of XSLTC's native input data structures containing the input XML document. The XSLTCSource class is located in:

    org.apache.xalan.xsltc.trax.XSLTCSource

Transformer parameter settings
 

XSLTC's native interface has get/set methods for stylesheet parameters, identical to those of the TrAX API. The parameter handling methods of the Transformer implementation are pure proxies.


Transformer output settings
 

The Transformer interface of TrAX has for methods for retrieving and defining the transformation output document settings:

    public Properties getOutputProperties();
    public String getOutputProperty(String name);
    public void setOutputProperties(Properties properties);
    public void setOutputProperty(String name, String value);

There are three levels of output settings. First there are the default settings defined in the XSLT 1.0 spec, then there are the settings defined in the attributes of the <xsl:output> element, and finally there are the settings passed in through the TrAX get/setOutputProperty() methods.


Figure 3: Passing output settings from TrAX to the translet

The AbstractTranslet class has a series of fields that contain the default values for the output settings. The compiler/Output class will compile code into the translet's constructor that updates these values depending on the attributes in the <xsl:output> element. The Transformer implementation keeps in instance of the java.util.Properties class where it keeps all properties that are set by the setOutputProperty() and the setOutputProperties() methods. These settings are written to the translet's output settings fields prior to initiating the transformation.


Transformer URI resolver
 

The uriResolver() method of the Transformer interface is used to set a locator for documents referenced by the document() function in XSL. The native XSLTC API has a defined interface for a DocumentCache. The functionality provided by XSLTC's internal DocumentCache interface is somewhat complimentary to the URIResolver, and can be used side-by-side. To acomplish this we needed to find out in which ways the translet can load an external document:


Figure 4: Using URIResolver and DocumentCache objects

From the diagram we see that these three ways are:

  • LoadDocument -> .xml
  • LoadDocument -> DocumentCache -> .xml
  • LoadDocument -> URIResolver -> .xml
  • LoadDocument -> DocumentCache -> URIResolver -> .xml




Copyright © 2003 The Apache Software Foundation. All Rights Reserved.