Recent Changes - Search:


Mod /

XTL

(:toc:)

XTL - Xml as a Tcl List

xtl.tcl is a small (~1500 lines) Tcl script containing an XML parser and features for managing XTL formatted data.

An XTL is a formal mapping of XML into a Tcl list. Specifically, an XTL list has an even number of name/value pairs wherein the name represents a tag + attributes and value represents the body.

(:showex div=cmpex:)

Consider the following XTL (from the User menus in Ted) and beside it the equivalent XML:

{menu +} {
        x String-Downcase
        {x - -A <Control-asciicircum> } String-Upcase
         {menu +  -l List-Ops} {
                x   List-Reverse
                x   List-Sort
        }
        {menu +  -l Tests} {
                x   Invoke-Error
                c   Invoke-Inspect
         }
        # "End of menus"
}
<menu>
      <x>String-Downcase</x>
      <x A='&lt;Control-asciicircum&gt;'>String-Upcase</x>
      <menu l='List-Ops'>
            <x>List-Reverse</x>
            <x>List-Sort</x>
      </menu>
      <menu l='Tests'>
            <x>Invoke-Error</x>
            <c>Invoke-Inspect</c>
      </menu>
      <!-- End of menus -->
</menu>

These two are equivalent (except the dash prefix on attribute names; see evaluate validation.).

General Form

XTL has two general forms:

   TAG BODY
   {TAG treeflag  NAME1 VAL1  NAME2 VAL2 ... } BODY

The first TAG is simply a token with no attributes and a terminal body. The second has a treeflag which must be either "+" indicating that BODY is a non-terminal or "-" if it is a terminal. Here is a concrete example using HTML tags:

   B {Some Text}
   {IMG - SRC http://foo.bar/img.gif} {}
   {FORM + ACTION http://foo.bar} { {INPUT - NAME FN} {} }

Well-formed XTL

Precisely Four rules determine if a list is a well-formed XTL:

  1. The number of elements is even (eg. {atag body atag body})
  2. The list length of atag is 1 or even (eg. tag or {tag + nam val nam val})
  3. When atag has a second element it is either "-" or "+" (for a subtree body).
  4. Tag and attr names can not contain spaces characters.

XTL validation ensures that a Tcl list meets the minimal criteria for the above definitions. It can be performed with the check command.

In addition, to the above tag names starting with a leading "#" get special attention. A single "#" indicates a comment. The tag "#str" can represent an inline text string. Other uses of "#" are treated as comments. See Future Expansion for more details.

Character Entities

When XML gets imported to XTL (using [fromxml]), character entities such as &lt; are automatically converted to regular Tcl characters. When using XTL directly, special characters may be embedded using the equivalent backslash-escapes inside double-quoted values. Even Tcl-special characters can be embedded this way.

(:showex div=charex:)

  set val {
     x { some value }
     y " \x7d else \x7b "
  }
  puts [lindex $val 3]

Which outputs:

  set val {
     x { some value }
     y } else {
  }
  puts [lindex $val 3]

Hint: Hex charcodes can be looked up using Edit/Extra/Show-Key in Ted.

Usage

xtcl.tcl can be used either from the command-line or within a Tcl application. There are no critical dependencies (but see evaluate).

Command-line

xtl.tcl supports command-line functions, specifically:

ArgumentsFunction
-fromxml F.xml ...Convert file to xtl using [fromxml].
-toxml F.xtl ..Convert file to xml using [toxml].
-format F.xtl ?F2.xtlDo [reformat] and [valid] on xtl file
-search F.xtl TAGS ...Do [search] on xtl using TAGS.
-eval F.xtl F.tcl NS...Do [evaluate] on xtl using tcl tag handlers.
-count F.xtl ?N?Count terminals using [traverse].
-check F.xtlReturn descriptive message if not well-formed
-tags F.xtlCount tag frequencies using [traverse].
-speed F.xmlTest speed of xtl and TclXML.
-testMini self-test.
-helpList all procs or give help on one.

(:showex div=cmdex:)

  xtl.tcl  -fromxml myfile.xml  -out myfile.xtl
  xtl.tcl  -fromxml myfile.xtl  -out myfile2.xml
  diff myfile.xml myfile2.xml

Programming

As XTL is a list specification, it requires no code library to process it. However, a number of access functions have been made available anyways (see the XTL Manual, externs, or source for details). Command-line access is available to functions (via cmds) as well as programatically.

Following are a couple of examples of xtl.tcl builtins.


Searching: xtl search $xtl $taglst ...

The [search] command matches tags and returns XTL of the form: {atag body}.

(:showex div=srchex:)

The following example is used against the XTL above:

    foreach {n val} {
        0 menu/menu/x
        1 {menu menu x}
        2 {menu {menu + -l Tests} x}
        3 {* {menu + -l Tests} x}
        4 {* * c}
    } {  puts "$n:[xtl search $xtl $val]" }
    puts ""
    puts A:[xtl search $xtl c -max 10]
    puts B:[xtl search $xtl c -max 10 -retidx 1]
    puts C:[xtl search $xtl c -max 10 -rettag 1]
    puts C:[xtl search $xtl c -max 10 -rettag 2]

which outputs:

0: x   List-Reverse
1: x   List-Reverse
2: x   Invoke-Error
3: x   Invoke-Error
4: c Invoke-Inspect

A: c Invoke-Inspect
B: menu menu c
C: {menu +} {menu +  -l Tests} c

The command-line version of search would use something like:

  xtl.tcl -search  menu.xtl  "menu menu x"

Evaluating: xtl evaluate $xtl $ns ...

The [evaluate] command traverses XTL calling ${ns}::$tag with val and attrbutes as arguments. This provides a simple method of validating XTL tags and their attributes directly against Tcl code. Attribute validation is provided via Opts.

(:showex div=evalex:)

Here is an example, again using the above XTL:

namespace eval ::menu::layout {

  proc menu {val tag {treeflag -} args} {
     # Create an menu item.
     Opts p $args {
         { -A     {} "Accelerator for item" }
      } 1
  }

  proc x {val tag {treeflag -} args} {
     # Create a command menu-entry.
     Opts p $args {
         { -l      {} "Label to use for item" }
         { -A     {} "Accelerator for item" }
      } 1
  }

}

# ....

xtl evaluate $xlst ::menu::layout -check 2

For each tag in $xlst, the matching command in namespace ::menu::layout is called using eval [list $val] $atag. This catches undefined tags and checks attributes against Opt definitions.

In this example, two problems are seen: menu was used with an unknown attribute -l, and the tag c has no such command defined. In the real world of course, the code would perform actual processing, and perhaps further check, but here is the output.

Args '-l' not in '
         { -A     {} "Accelerator for item" }
      ' from '-l List-Ops' in ::menu::layout::menu
Args '-l' not in '
         { -A     {} "Accelerator for item" }
      ' from '-l Tests' in ::menu::layout::menu
Undefined command '::menu::layout::c' for tag 'c'

The command-line version of validation would use something like:

  xtl.tcl -eval  menu.xtl menu.tcl ::menu::layout -check 2

Note: tag and attribute name checking requires Opts.tcl . For full [evaluate] validation including type-checking requires Mod.


XTL Advantages

XTL offers a number of advantages over XML.

Complexity

Far and away the most important feature of XTL is complexity reduction. The implementation of xtl.tcl is about 1500 lines of code, making it small enough to include in almost any application. However, the most important advantage of XTL is that access routines are not required to use it. Tcl itself provides the XTL parser, and processing is simple, fast list traversal.

XTL provides the capabilities of XML, while side-stepping its complexity. For importing/exporting, xtl.tcl provides a light-weight XML parser and XML generator. Addition helper functions may also be of use.

Well-Formed

XTL is more naturally Well-Formed than XML. The lack of named end-tags means that invalid nesting order isn't possible. And there is just less clutter and verbosity from markup. Moreover, the Tcl code required to validate or reformat XTL is a dozen lines each.

Context-Free

Wherein XML needs to examine the body of a node in order to determine if it is a terminal, in XTL this is explicitly specified by the "+" treeflag.

Visualization

Programmers are the primary users of XTL. Thus the ability to visualize the format of XTL can simplify things. XTL can be directly typed into program code. And an XTL-Formatting filtering option is provided in theTed editor (via <Control-Backslash>).

Flexibility

Hand-editing XTL is simple. Editor brace-matching and code-folding makes handling even size-able XTL objects straightforward. And writing code to process XTL is trivial; a simple foreach loop will easily traverse tag/body pairs.

(:showex div=useex:)

The following example code traverses an XTL list, and calls cmd for each node.

  proc traverse {xlst cmd} {
      foreach {atag val} $xlst {
           eval $cmd [list $atag $val]
           if {[lindex $atag 1] == "+"} {
                traverse $val $cmd
           }
      }
  }

  proc visit {atag val} { puts "VISIT: $atag $val" }

  traverse $menu visit

The Tcl C API, which is both powerful and stable, and can also be used to provide similar access at the low level.

Performance

XTL processing speed is amazingly good. This is because Tcl list handling is inherently fast. Importing XML to XTL is less so, but still acceptable.

The following measure the use of xtl.tcl on a 1.2 Meg XML file (a zap2it.com TV-programs listings file) executing on a 1.5 Ghz CPU with 1 Gig Ram. All times are in seconds, rounded:

  • 12 - Parse XML and Convert to XTL file.
  • 1 - Load XTL and traverse/count 30K leaf elements.

Compare the above with TclXML:

  • 3 - Load XML file
  • 65 - Load and traverse/count 30K leaf elements.

Apparently TclXML has an edge on the initial loading. However when it comes to processing, there is no comparison: XTL is substantially faster.

Memory usage is at worst comparable (see issues).

The -speed option may be used from the command-line to compare with TclXML. eg.

(:showex div=speedex:)

$ ./xtl.tcl -speed chans3.xml
TEST FILE 'chans3.xml': 1170961 bytes
XTC CVT:          12366891 microseconds per iteration
XTC CNT:            1133222 microseconds per iteration
TclXML CNT:     65182270 microseconds per iteration

Issues

There are several issues to be aware of when using XTL with large data sets.

Memory

If a large XTL list places the bulk of its data nested 6 levels deep (as zap2it does), memory usage can be higher than expected. This is due to inner list elements storing redundent and huge sub-copies of the text representation.

(:showex div=memex:)

{?xml - version 1.0 encoding utf-8} {}
{SOAP-ENV:Envelope + xmlns:SOAP-ENV http://schemas.xml...}  {
    {SOAP-ENV:Body +} {
        {ns1:downloadResponse + SOAP-ENV:encodingStyle ...}  {
            {xtvdResponse + xsi:type ns1:xtvdResponse} {
                {xtvdDocument + xsi:type ns1:xtvd} {
                    {xtvd + from 2004-05-12T07:00:00Z to 2004-05-13T07:00:00Z ...}} {
                        {stations +} { Finally, some data!!!

Simply unnesting data out of all those enveloping levels can reduce memory to one-third. There are two general ways to do this. One way to do this is to use [search] with the -child option. The other uses [fromxml] with the -extract flag.

(:showex div=nestex:)

Here is a the use of search:

   xtl.tcl -search  tv.xtl  "xtvd"  -child 1  -out tv2.xtl

This has the effect of storing only the child nodes of xtvd. Unfortunately, this will also loose all the envelope data at or above xtvd.

Alternatively, the de-nesting can be performed at the same time as XML translation in [fromxml] using the -extract flag:

 xtl.tcl -fromxml tv.xml -out tv.xtl  -extract "xtvd"
 xtl.tcl -fromxml tv.xml  -out tv.xtl  -extract "xtvd *"

The first causes the xtvd atag+value to be moved to the toplevel. The second moves the children of xtvd to the toplevel, ie:

{?xml...?}
{SOAP-ENV ...} {...}
{xtvd ...} {}
{stations +} {...}
{lineups +} {...}

Input/Output

Currently, when using -in or -out the default read/writes use utf-8 encoding. Encoding will get extracted and used by [fromxml], but all other routines are unaware and ignore embedded encoding info.

Finally, users of -in/-out should avoid using any file name which looks like a channel name, including: stdin, stdout, file1, sock1, etc. The easiest way to ensure this is using a dot in file names.

Future Expansion

The "#" tag prefix, in addition to being used for comments and inline text strings, is reserved for future extensibility. All #* tags should normally be ignored as comments by user processing code.

License

As xtl is part of Mod, it is under the BSD License.

Edit - History - Print - Recent Changes - Search
Page last modified on April 15, 2010, at 02:11 PM