MTran

Back To [Publications]

What is MTran?

MTran is a programming language for XML transformation based on MSO (Monadic Second-order Logic) queries.

For more information, please contact: kiki .a.t. kmonos .d.o.t. net.

Some examples

"gather" expressions

The following example gathers all <a> elements in the input document, and simply prints them.

result[
  {gather x :: x in <a> :: {x}}
]

sample input

<html>
  <head>
    <title>Hello, World</title>
  </head>
  <body>
    <a href="http://www.google.com/">Google</a> and
    <a href="http://www.yahoo.com/">Yahoo!</a> are
    two famous search engines.
  </body>
</html>

sample output

<result>
  <a href="http://www.google.com/">Google</a>
  <a href="http://www.yahoo.com/">Yahoo!</a>
</result>

"visit" expressions

"Visit" expressions, as their name shows, visits each element in the input satisfying the condition and rewrites them. Other elements are kept unchanged.

{visit x
   :: <a>/@href/x :: "http://proxy/-_-" {x}
   ::    x in <a> :: a[ @target["_blank"] {gather y :: x/y :: {y}} ]
}

sample output

<html>
  <head>
    <title>Hello, World</title>
  </head>
  <body>
    <a target="_blank" href="http://proxy/-_-http://www.google.com/">Google</a> and
    <a target="_blank" href="http://proxy/-_-http://www.yahoo.com/">Yahoo!</a> are
    two famous search engines.
  </body>
</html>

Table of Contents

The example below is a template to add a table of contents to a given input XHTML document. It retrieves the heading elements from the input document, constructs a tree of itemized lists that reflect the hierarchical structure of the input, and prepends it to the original document.

{pred subsection(var1 a, var1 b, var2 B, var2 A) =
   a<b & b in B & ~?x.(a<x & x<b & x in A);
}
 
{visit b :: /<html>/b:<body> ::
  h1["index"]
  ul[
    {gather h2 :: h2 in <h2>                  :: li[ {h2/_:#} ul[" "
      {gather h3 :: subsection(h2,h3,<h3>,<h2>) :: li[ {h3/_:#} ul[" "
        {gather h4 :: subsection(h3,h4,<h4>,<h3>) :: li[ {h4/_:#} ul[" "
          {gather h5 :: subsection(h4,h5,<h5>,<h4>) :: li[ {h5/_:#} ul[" "
           ]]} ]]} ]]} ]]}
  ]
  {b/_}
}

sample input

<html><head><title>Title</title></head><body>
    <h1>Title</h1>
    <h2>Chapter 1</h2>
    <h3>Section 1.1</h3>   <p>The quick</p>
    <h4>Section 1.1.1</h4> <p>brown fox</p>
    <h3>Section 1.2</h3>   <p>jumps over</p>
    <h2>Chapter2</h2>      <p>the lazy</p>
    <h3>Section 2.1</h3>   <p>dog.</p>
</body></html>

sample output

<html><head><title>Title</title></head><body>
    <h1>Index</h1>
    <ul><li>Chapter 1 <ul>
          <li>Section 1.1 <ul>
            <li>Section 1.1.1 <ul/></li>
          </ul></li>
          <li>Section 1.2 <ul/></li>
        </ul></li>
        <li>Chapter 2 <ul>
          <li>Section 2.1 <ul/></li>
        </ul></li> </ul>
    <h1>Title</h1>
    <h2>Chapter 1</h2>
    <h3>Section 1.1</h3>   <p>The quick</p>
    <h4>Section 1.1.1</h4> <p>brown fox</p>
    <h3>Section 1.2</h3>   <p>jumps over</p>
    <h2>Chapter2</h2>      <p>the lazy</p>
    <h3>Section 2.1</h3>   <p>dog.</p>
</body></html>

MathML Conversion

The following example reads arithmetic expressions using <plus>, <minus>, and <times> operators in MathML 'content' markup, and converts them into MathML 'presentation' markup. No redundant parenthesis are produced.

{
  pred single_arg( var1 op )     = ~ex1 c.(op.1.1=c);
  pred follows( var1 x, var1 y ) =  ex1 p.(p/x & p/y & x<y);
  pred need_paren( var1 ap ) =
     (ap.0 in <plus> | ap.0 in <minus>) & ex1 op. (
       follows(op,ap) & (
          (op in <minus> & single_arg(op))
        | (op in <minus> & op.1 ~= ap)
        | (op in <times>)
     ));
}
 
mrow[
  {visit x
    :: x in <ci>                    :: mi[ {x/_} ]
    :: x in <cn>                    :: mn[ {x/_} ]
    :: x in <apply> & need_paren(x) :: mo["("] {_=x.0} mo[")"]
    :: x in <apply>                 ::         {_=x.0}
    :: x in <minus> & single_arg(x) :: mo["-"] {_=x.1}
    :: x in <plus>  :: {_=x.1} {gather y :: follows(x.1,y) :: mo["+"] {y}}
    :: x in <minus> :: {_=x.1} {gather y :: follows(x.1,y) :: mo["-"] {y}}
    :: x in <times> :: {_=x.1} {gather y :: follows(x.1,y) :: mo["*"] {y}}
  }
]

sample input

<apply>
  <times/>
  <cn>1</cn>
  <apply> <plus/> <cn>2</cn> <cn>3</cn> </apply>
  <apply> <minus/> <cn>4</cn> </apply>
</apply>

sample output

<mrow>
  <mn>1</mn>
  <mo>*</mo>
  <mo>(</mo>
  <mn>2</mn>
  <mo>+</mo>
  <mn>3</mn>
  <mo>)</mo>
  <mo>*</mo>
  <mo>(</mo>
  <mo>-</mo>
  <mn>4</mn>
  <mo>)</mo>
</mrow>

Quick Reference (informal)

Basic Syntax

 
Program    ::= Expression
Expression ::= VisitExpression
             | GatherExpression
             | VarExpression
             | XmlLiteral
             | Expression+ 
 
VisitExpression ::= { visit x :: MSOFormula :: Expression
                               :: MSOFormula :: Expression
                                         ...
                               :: MSOFormula :: Expression } 
 
GatherExpression ::= { gather x :: MSOFormula :: Expression } 
 
VarExpression ::= { x } 
 
XmlLiteral ::= elem [ Expression ] 
             | @att [ Expression ] 
             | "string" 
MSOFormula ::=  MSOFormula & MSOFormula       // and
             |  MSOFormula | MSOFormula       // or
             |  MSOFormula => MSOFormula      // if-then
             |  MSOFormula <=> MSOFormula     // equivalent
             |  ~ MSOFormula                  // not
             | ex1 x. MSOFormula    // there exists an element x s.t.
             | ex2 X. MSOFormula    // there exists a set of elements X s.t.
             | all1 x. MSOFormula   // all elements x satisfies ...
             | all2 X. MSOFormula   // all sets of elements X ...
             | FstTerm in SndTerm
             | FstTerm = FstTerm
             | FstTerm < FstTerm    // position comparison with document order
             | true 
             | false 
             | PathFormula
 
PathFormula ::= x/y       // y is a child of x
              | x//y      // y is an descendant of x
              | x/Y       // shorthand for ex1 y. (y in Y & x/y)
              | x/y/z     // shorthand for x/y & y/z
              | x/y:Y/z   // shorthand for x/y & y in Y & y/z
              | ...       // etc
 
FstTerm ::= x 
          | FstTerm.0    // first child
          | FstTerm.1    // next siblin
 
SndTerm ::= X
          | <elem>     // set of all elements tagged <elem>
          | @att       // set of all atribute node with the name "att"
          | <*>       // set of all element nodes
          | @*        // set of all attribute nodes
          | #         // set of all text nodes

Advanced Syntax

macros

 
Expression ::= ... | MacroDef
MSOFormula ::= ... | MacroUse
 
MacroDef ::= { pred MacroName ( ParameterList ) =  MSOFormula ; ... } 
 
MacroUse ::= MacroName ( ArgumentList ) 

Macros are expanded at compile time. Example:

{ pred url(var1 x) =
    <a>/@href/x | <link>/@href/x; }
 
html[body[
  {gather x :: url(x) :: {x} br[]}
]]

implicit gather

Expression ::= ... | { MSOFormula }

Using formulae with one free variable "_" makes it easier to write transformations in the "gather and just print" pattern. For example:

result[ {_ in <a>} ]

means the same thing with

result[ {gather _ :: _ in <a> :: {_}} ]
presented by k.inaba (kiki .a.t. kmonos.net) under CC0