JavaScript Library Code

This second part of the final tutorial takes a look at the XML generator library code. This library also depends on code from the introspection tutorial. The intention is to discuss the problems encountered, and their solutions. Personally, although the specifications were quite simple, I am disappointed that so much code was required to implement it.
Nonetheless, it is all functional, and has been tested. Firstly, I will discuss the few helper functions, then we'll get down to the details of the converter function toXML().

The Helper Functions

As for the closures tutorial, I have placed all the functions in the app namespace. There are three helper functions, and one debugging function, which I will discuss first:
/**
 Emits an error.

 @param text the error message.
*/

app.complain = function (text) {
  if (syger.exists(window, "alert")) {
    window.alert(text);
  }
  return new Error(text);
};
This is a generic debugging function. Clearly, although the JavaScript syntax for our XML structures is quite simple, it is possible to get it wrong. When this happens the complain() method is called, which will display an alert (when available on the platform), and then create an Error object. This should be used in your code together with the throw keyword, thus:
  if (some_error_occurred) {
    throw complain("Some error occurred");
  }
The three helper functions are designed to provide escaping or encoding mechanisms for three distinct circumstances; in text (app.escape()), in attribute values (app.escapeAttribute()), and in a component of a query (app.escapeQueryComponent()).
/**
 Escapes plain text.
 Converts <, & and > to XML entities.
 See http://javascript.crockford.com/remedial.html

 @param text the plain text.
 @returns the modified text.
*/

app.escape = function (text) {
  if (text.match(/\S/g) === null) {
    return text;
  }
  return text.replace(/&/g, "&amp;").replace(/</g, "&lt;")
          .replace(/>/g, "&gt;");
};
The app.escape() method replaces all occurrences of '<', '&', and '>' into their entity equivalents of &lt;, &amp; and &gt;. These characters must necessarily be converted, since otherwise they would (falsely) trigger the XML parser.
/**
 Escapes plain text in an attribute.
 Converts &lt;, &amp; &#39; and &gt; to XML entities.
 See http://javascript.crockford.com/remedial.html

 @param text the plain text.
 @returns the modified text.
*/

app.escapeAttribute = function (text) {
  if (text.match(/\S/g) === null) {
    return text;
  }
  return text.replace(/&/g, "&amp;").replace(/</g, "&lt;")
          .replace(/>/g, "&gt;").replace(/'/g, "&#39;");
};
Similarly, attribute values must also be escaped. This function is nearly identical to app.escape() except that it also escapes the single quote (') character. The toXML() method always delimits attribute quotes with the single quote. The method uses the &#39; entity, which is identical to the &apos; entity, but has the advantage of being recognised by all browsers.
The third helper function escapes illegal characters in components of the query part of a hyperlink. This is explained in detail in RFC 2396.
/**
 Escapes plain text in an query (name or parameter).
 Converts illegal characters to %xx format.

 @param text the plain text.
 @returns the modified text.
*/

app.escapeQueryComponent = function (text) {
  if (text.match(/\S/g) === null) {
    return text;
  }
  if (syger.exists(Server, "URLEncode")) {
    return Server.URLEncode(text);
  }
  else {
    return encodeURIComponent(text);
  }
};
The function delegates to the built-in method available, depending on the platform. As an example, suppose we have a blog site which can find a specific blog item given the title. If the title is “JavaScript: == considered harmful?” then the correctly escaped query part is:
  title=JavaScript%3A%20%3D%3D%20considered%20harmful%3F
Which can be produced using:
var query = ["title", "=", 
  app.escapeQueryComponent("JavaScript: == considered harmful?")].join('');

The toXML() Function

The major part of the code lies in the toXML() function, which actually consists of an inner function printXML(). Firstly I'll look at the outer function, then go through the inner function, section by section.
/**
 Converts the element array to a string.
 Example usage:
 <pre>
 var elem = ["p", "Paragraph text"];
 app.toXML(elem, "", "    ");
 </pre>

 @param elem the element to convert.
 @param indent the optional initial indentation to use.
 @param gap the optional indentation increment, 
  the default is two spaces.
*/

app.toXML = function (elem, indent) {
  var hasIndent = typeof indent === "string";
  gap = gap || "  ";

  // The function that does the actual work
  function printXML(elem, indent) {
    // explained later...
  }

  // Do the work
  return printXML(elem, indent)[0];
};
The first two lines take care of the optional arguments. Then comes the inner function definition printXML(), which is called and the first part of the returned value (the converted text) is returned. The reason for this extra layer of complexity is indentation. I like the generated code to look good, not just because it is 'pretty', but also because it is easier to follow. Let me give you an example:
<div class="foo">
  <div class="bar">
    Baz
  </div>
</div>
and the same text without the 'pretty' indentation:
<div class="foo"><div class="bar">Baz</div></div>
For just a few lines of XML its still understandable, but I wouldn't like to try reading a page full of flattened XML text. Both are perfectly acceptable to the browser's parser, of course.
The dilema is that this is quite difficult to do. Firstly, the generator has no idea which tags shouldn't be indented. In theory we should be able to indent them all, but in my experience I have found that it is best not to do so with a, img, and span tags. When they are surrounded by text, browsers seem to make mistakes. Most notable is a small underline 'tail' visible in an anchored text:
<p>Text 
  <a href="#">
    anchored text
  </a>
  more text.
</p>
If I'm lucky, you'll see the defect below, otherwise you'll just have to take my word for it:

Text anchored text more text.

The problem disappears when flat:
<p>Text <a href="#">anchored text</a> more text.</p>
As you can see below:

Text anchored text more text.

So we have to tell toXML() which tags we don't want to indent. This is achieved with the app.nonIndentingXML collection:
/**
 Non indenting elements. Used by toXML to prevent indentation.
*/

app.nonIndentingXML = {"a": true, "img": true, "span": true};
Just add the name of any other tags you do not want to be indented.
Going back to the toXML() function, after this rather long parenthesis, to get indentation to work properly, I need to know if the child nodes performed indentation or not, together with the converted text, before deciding on whether the parent node should use indentation or not. This requires that the converting function returns two pieces of information, whereas toXML() will only return one – the converted text. Hence the need for the inner function, which we'll look at next.
  // The function that does the actual work
  function printXML(elem, indent, gap) {

    // Sanity checking...
    if (!elem || !elem.push) {
      throw app.complain("element must be an array");
    }
    var len = elem.length;
    if (len < 1) {
      throw app.complain("element array is empty");
    }
    var name = elem[0];
    if (typeof name !== "string") {
      throw app.complain("element name must be a string");
    }
As explained in the documentation comment, the indent and gap parameters specify the initial indent, and succeeding indent increments. Then the code checks that the elem parameter is an array, and that its first value is a string.
    // Convert the element name and any embedded attributes  
    var identName = null;
    var className = null;
    var openTag = "";
    var pos = name.indexOf('.');
    if (pos > 0) {
      hasAttrs = true;
      className = name.substring(pos + 1);
      openTag = [openTag, " class='", className, "'"].join('');
      name = name.substring(0, pos);
    }
    pos = name.indexOf('#');
    if (pos > 0) {
      hasAttrs = true;
      identName = name.substring(pos + 1);
      openTag = [openTag, " id='", identName, "'"].join('');
      name = name.substring(0, pos);
    }
Next it parses the name to extract the embedded identifier, and class name.
    // Build the opening tag
    openTag = ["<", name, openTag].join('');

    // Get the element attributes
    var idx = 1;
    if (len > 1) {
      var attrs = elem[1];
      var type = syger.typeOf(attrs);

      // Try to be sure it's an Object and not an Array
      if (type === "object" && !attrs.push) {
        idx = 2;
        for(var attr in attrs) {
          if (attr === "id" && typeof identName === "string") {
            // Already added
          }
          else if (attr === "class" && typeof className === "string") {
            // Already added
          }
          else {
            var value = attrs[attr];
            type = typeof value;
            if (type === "number") {
              value = value.toString();
            }
            if (type === "string" || type === "number") {
              openTag = [openTag, " ", attr, "='", value, "'"].join('');
            }
          }
        }
      }
    }
At this point, the opening tag can be partially built. The array value after the name, if it exists, can be either the attributes collection or a child node. If it is the collection, the code adds the attributes and their values to the opening tag, but skips the id and class attributes if they have already been written.
    // Now check the indenting
    var indented = false;
    var result = "";
    if (idx < len) {
      openTag += ">";
      var closeTag = ["</", name, ">"].join('');
Next comes the indenting. This first part above, simply separates an element by whether it has children, or is childless. The opening tag, and closing tag can be prepared at this point.
      // Print the children nodes
      var nextIndent = hasIndent ? indent + gap : null;
      for ( ; idx < len; idx++) {
        var child = elem[idx];
        if (typeof child === "string") {
          result += child;
        }
        else if (child && child.push) {
          var printed = printXML(child, nextIndent);
          result += printed[0];
          indented = indented || printed[1];
        }
        else {
          throw app.complain("child must be a string or array");
        }
      }
The code iterates though the child nodes, accumulating converted text, and whether the children used indenting.
      // Print this node
      if (hasIndent && app.nonIndentingXML[name] === undefined) {
        if (indented) {
          result = [indent, openTag, "\n", result, 
                     indent, closeTag, "\n"].join('');
        }
        else {
          result = [indent, openTag, result, closeTag, "\n"].join('');
        }
        indented = true;
      }
      else {
        result = [openTag, result, closeTag].join('');
      }
    }
Now this node can finally be printed (converted to a string). It has two choices; if it is an indenting node, then it will indent, otherwise it won't. The first case is a little more complex, because it will add a newline after the opening tag, only if the children used indentation. This gives a more compact format, while still using indentation.
    else {
      openTag += " />";
      if (hasIndent && app.nonIndentingXML[name] === undefined) {
        indented = true;
        result = [indent, openTag, "\n"].join('');
      }
      else {
        result = openTag;
      }
    }
    return [result, indented];
  }
This last part of the code takes care of childless elements. The opening tag is completed, and the element is checked for indentation. The function then returns the text, indentation flag pair as an array.
After this long explanation of what is a very large function, we can move on to the example.
All the scripts in these tutorials are available for download as two compressed archives; Scripts.zip and AspScripts.zip, both distributed under the GNU Lesser General Public License.

Contacts

Syger can be contacted for consultancy work on any of the topics mentioned in this article, by sending an email to info@syger.it.

Valid CSS

Valid XHTML 1.0

Valid Atom 1.0