NISTIR 7469

Last modified: Wed Nov 28 17:11:23 EST 2007


Rendering UML Activity Diagrams as Human-Readable Text

David Flater
National Institute of Standards and Technology, U.S.A.

Philippe A. Martin
Griffith University, Australia

Michelle L. Crane
Queen's University, Canada

November 2007

Abstract

We describe a modification of the Petri Net Linear Form notation to support the rendering of Unified Modeling Language (UML) Activity Diagrams as human-readable text. This new notation, called the Activity Diagram Linear Form, allows UML Activity Diagrams to be expressed in an alternate form with the superior accessibility, compatibility, and simplicity of use of a plain text representation. For some applications, these benefits greatly outweigh the ćsthetical and pedagogical advantages of a visual representation.

1   Introduction

An Activity Diagram as defined by the Unified Modeling Language (UML) [1] models procedural actions, the sequencing of actions (control flow), and conditions for coordinating behaviors. Basic Activity Diagrams may be elaborated with UML features to describe the object flow between actions (inputs and outputs) and other relevant information.

A human-readable text form of Activity Diagrams may be used for any number of reasons, such as:

In this document, we describe the Activity Diagram Linear Form (ADLF), a modification of the Petri Net Linear Form (PNLF) [3]. Note that PNLF and ADLF were developed for specific applications. No attempt has been made to handle every possible UML feature that may legally appear in an Activity Diagram. Only the notation pertinent to those features that were required in the original application is defined.

This document is structured as follows: Section 2 discusses related work. Section 3 briefly describes PNLF and provides an example. Section 4 describes the differences between ADLF and PNLF and provides examples of the new notation. Section 5 presents two grammars suitable for recognizing ADLF. Finally, Section 6 gives the conclusion.

Henceforth, familiarity with Activity Diagrams [1] and Petri Nets [4] is assumed.

2   Related Work

PNLF was invented by Philippe Martin to address usability and convenience issues that occurred in dealing with both graphical and XML-based Petri Net notations. Inspired by the Conceptual Graph Linear Form [5], it represents both the graph structure (places, transitions, and arcs) and execution state (explicit tokens) of "plain old" (non-extended) Petri Nets.

There are several relevant XML-based interchange formats that are not intended to be human-readable. For Petri Nets there is the Petri Net Markup Language [6]. For UML there is the XML Metadata Interchange (XMI) Specification [7] and the Diagram Interchange Specification [8].

The UML Human-Usable Textual Notation [9] is designed to conform to human-usability criteria. However, it is defined in terms of the Meta Object Facility (MOF), which only directly supports class modelling. While any UML concept can ultimately be abstracted using classes and expressed using the MOF, for Activity Diagrams this would entail significant obfuscation.

Semantic differences between Petri Nets and UML Activity Diagrams are explored by Störrle and Hausmann [10]. These differences are substantial enough that one cannot define a clean notation that merges concepts from Petri Nets and Activity Diagrams; hence the present work is an adaptation of PNLF rather than an extension or application of it.

3   PNLF Notation

In the following discussion of PNLF and plain Petri Nets, the term node refers to either a place or a transition.

The mathematical formalism of plain Petri Nets does not include node names, so node names have no formal meaning in plain Petri Nets. However, node names are commonly used in the graphical representations of Petri Nets to associate nodes informally with types or concepts relevant to the states and processes being modelled. These external types or concepts are separate from the Petri Nets per se and should not be confused with place or transition types that are defined formally within extended Petri Nets such as High-Level Petri Nets [11].

As described by Martin [3], PNLF defines the following:

3.1   Example 1

img src="PetriNet.png" alt="Graphical Petri Net to be translated to PNLF"

Figure 1: Petri Net for Example 1 (used by permission from Wil van der Aalst [12])

The Petri Net in Figure 1 is translated to the following:

[red_yellow *ry]
  { <-(c1 *c1 @),
    ->(red *r @) { ->[*ry], ->[yellow_green *yg] },  
    ->(yellow *y)
      { ->[yellow_red *yr] { ->(*c1), ->(*r) },
        ->[*yg]->(green)->[green_yellow] { ->(*y), ->(c2)->[*yr] }
      }
  }.

This example demonstrates several elements of PNLF style:

More examples using PNLF can be found at the cited location [3].

4   ADLF Notation

While PNLF is suitable for describing Petri Nets, ADLF is used for describing UML Activity Diagrams. To create ADLF, PNLF was adapted in the following ways:

Object nodes in ADLF have the same expressiveness limitations as the "standalone style" notation in UML Activity Diagrams, where the node name typically indicates only the type of the object node. Using the more expressive "pin style" defined by UML [1, 12.3.44], it is possible to handle complications such as when an action inputs or outputs two different parameters of the same type, or when the types of the input and output parameters are different.

Two examples using conceptual Activity Diagrams follow. Although the content and conventions of the diagrams are not important to their suitability as examples, it may aid understanding to know that some simplifications were made:

4.1   Example 2

img src="administer.png" alt="Graphical Activity Diagram to be translated to ADLF"

Figure 2: Activity Diagram for Example 2

The Activity Diagram in Figure 2 is translated to the following:

<InitialNode>-><MergeNode *merge>->("Prepare for election")
  ->["Equipment, voter lists, ballot styles and/or ballots"]-><ForkNode>
    { ->("Prepare for voting (precinct)")-><ForkNode>
        { ->("Gather in-person vote") // Includes early voting.
            ->["Ballots and/or ballot images"]->(Collect *c),
          "Precinct count"->("Count (precinct count)")
            ->["Machine totals"]->0..1(*c)
        },
      ->("Gather absentee / remote votes")->["Ballots and/or ballot images"]
        ->(*c),
      ->("Prepare for voting (central)")->("Wrap up voting (central)" *w)
    };

(*c)->["Ballots, ballot images and/or machine totals"]
  ->("Wrap up voting (precinct)")
    ->["Ballots, ballot images and/or precinct totals"]->(*w)
      ->["Counts" state=certified]->("Wrap up election")-><*merge>.

4.2   Example 3

img src="prepare.png" alt="Graphical Activity Diagram to be translated to ADLF"

Figure 3: Activity Diagram for Example 3

The Activity Diagram in Figure 3 is translated to the following:

<InitialNode>-><ForkNode>
  { ->("Define precincts") // This action refers to configuring the voting system
                           // to realize the precincts as defined by state law.
      ->["Precinct definitions"]-><ForkNode>
        { ->("Train poll workers")-><FlowFinal>,
          ->("Register voters")->["Voter lists"]->(Collect *c1),
          ->("Program election")->["Election definition"]->("Prepare ballots")
            ->["Ballot styles"]-><ForkNode>
              { ->(*c1),
                "Centrally programmed ballot styles"->["Ballot styles"]
                  ->0..1("Configure & calibrate precinct equipment (central)" *cc)
              }
        },
    ->("Maintain equipment in storage")->[Equipment state=old]->(*cc),
    "Need new equipment"->("Procure equipment")->[Equipment state=new]->0..1(*cc)
  };

(*c1)->["Voter lists, ballot styles"]-><ForkNode>
  { ->("Educate / notify / inform voters")-><FlowFinal>,
    ->(Collect *c2),
    "Paper ballots"->("Produce ballots")->[Ballots]->0..1(*c2)
  };

(*cc)->[Equipment state=configured]->("Test precinct equipment (central)")
  ->[Equipment state=tested]->("Transport equipment")->[Equipment state=deployed]
    ->(*c2)->["Equipment, voter lists, ballot styles and/or ballots"].

5   ADLF Grammars

A complete syntactic parser developed using Flex [13] and Bison [14] is available for download [15].2 The semantic analysis/validation has not yet been implemented. A different parser for a subset of ADLF has been developed using JavaCC [16].

Flex and Bison (based on Lex [17] and Yacc [18]) generate an LALR(1) parser while JavaCC generates an LL(1) parser [19]. LR parsers are described as working in a "bottom-up" fashion while LL parsers are described as working in a "top-down" fashion. Additionally, JavaCC supports Extended Backus-Naur Form (EBNF) [20], with optional terms and other features, while Flex/Bison only supports "plain" Backus-Naur Form (BNF) [21]. These differences force one to use different grammatical idioms in order to achieve an elegant implementation. The goal here is to provide near-enough equivalent expressions of ADLF using the idioms that are natural in each environment, thereby enabling elegant implementations using both kinds of parser generators.

The grammars from the two parsers are detailed in the following subsections. Please note that these grammars accept input whose interpretation as an Activity Diagram may violate well-formedness constraints on Activity Diagrams, e.g., the constraint that the edges coming into and out of a decision node must be either all object flows or all control flows [1, Section 12.3.22]. While many such constraints could possibly be enforced in a purely grammatical fashion, validating Activity Diagrams expressed in ADLF is separable from defining ADLF per se and has been deferred to future work.

5.1   Flex/Bison

5.1.1   Tokens

The regular expressions below are written using Lex syntax, wherein the interpretation of special characters depends on context and may be counterintuitive to some readers. For example, [...] gives a set of acceptable characters, but if the first character after the left square bracket is a caret, it signifies all characters except that set. Hence [^^] means any character except caret. The right square bracket is used to close the set and the hyphen is used to specify a range of characters; however, if either of these appears as the first character after the left square bracket, it is instead interpreted as a plain character. So [][=*()<>;,.{}] means any of the characters between the leftmost and rightmost square brackets, including both the left and right square brackets, and [-a-zA-Z0-9_] means any alphanumeric character, hyphen, or underscore.

[ \f\n\r\t\v]+ (Whitespace ignored)
"//".* (C++ style comments ignored)
"/*"((\*[^/])|[^*])*"*/" (C style comments ignored)
"(^"((\^[^)])|[^^])*"^)" ANNOTATIONSTRING
".." DOTDOT
"<-" LEFTARROW
"->" RIGHTARROW
[][=*()<>;,.{}] (Characters tokenized as themselves)
"InitialNode" INITIALNODE
"ForkNode" FORKNODE
"JoinNode" JOINNODE
"DecisionNode" DECISIONNODE
"MergeNode" MERGENODE
"ActivityFinal" ACTIVITYFINAL
"FlowFinal" FLOWFINAL
[0-9]+ NUMBER
(((([a-zA-Z0-9_])|(\\(.|\n)))+)|
 ((([a-zA-Z0-9_])|(\\(.|\n)))
  (([-a-zA-Z0-9_])|(\\(.|\n)))*
  (([a-zA-Z0-9_])|(\\(.|\n)))))
IDENTIFIER
\"(([^"\\])|(\\(.|\n)))*\" IDENTIFIER
'(([^'\\])|(\\(.|\n)))*' IDENTIFIER

5.1.2   Grammar

The following grammar is expressed using Yacc syntax. The start symbol is ADLF.

action: '(' nodeGuts ')'
ADLF: statementVector '.'
annotation: IDENTIFIER '=' IDENTIFIER
| IDENTIFIER '=' NUMBER
| ANNOTATIONSTRING
| IDENTIFIER
annotationVector: annotation
| annotationVector annotation
arrow: leftArrow | rightArrow
branch: arrow chain
branchVector: branch
| branchVector ',' branch
chain: noTreeChain
| noTreeChain tree
controlNode: '<' controlNodeGuts '>'
controlNodeGuts: controlNodeIdentifier
| reference
| controlNodeIdentifier reference
| controlNodeIdentifier annotationVector
| reference annotationVector
| controlNodeIdentifier reference annotationVector
controlNodeIdentifier: INITIALNODE | FORKNODE | JOINNODE | DECISIONNODE
| MERGENODE | ACTIVITYFINAL | FLOWFINAL
leftArrow: LEFTARROW
| multiplicity LEFTARROW
| LEFTARROW IDENTIFIER
| multiplicity LEFTARROW IDENTIFIER
multiplicity: NUMBER
| NUMBER DOTDOT NUMBER
| NUMBER DOTDOT '*'
| '*'
node: controlNode | action | object
nodeGuts: IDENTIFIER
| reference
| IDENTIFIER reference
| IDENTIFIER annotationVector
| reference annotationVector
| IDENTIFIER reference annotationVector
noTreeChain: node
| noTreeChain arrow node
object: '[' nodeGuts ']'
reference: '*' IDENTIFIER | '*' NUMBER
rightArrow: RIGHTARROW
| RIGHTARROW multiplicity
| IDENTIFIER RIGHTARROW
| IDENTIFIER RIGHTARROW multiplicity
statementVector: chain
| statementVector ';' chain
tree: '{' branchVector '}'

5.2   JavaCC

The JavaCC grammar differs slightly from the Flex/Bison grammar defined previously; the language recognized by the former is a proper subset of the language recognized by the latter. Specifically, the JavaCC grammar has the following limitations:

These limitations were deemed acceptable as they do not significantly limit the Activity Diagrams that can be recognized. An activity making use of the left arrow can be rewritten to make use of only the right arrow. Activity Diagrams that require violating the other limitations are ill-formed.

5.2.1   Tokens

The regular expressions below are written using JavaCC syntax. In contrast with Lex, JavaCC requires all character data to be quoted, so character data cannot be confused with operators.

" " | "\t" | "\n" | "\r" | "\f" | "\013" (Whitespace ignored)
"//..." (Java single-line comments ignored)3
"/*...*/" (Java multi-line comments ignored)3
"InitialNode" <INITIAL>
"ForkNode" <FORK>
"JoinNode" <JOIN>
"DecisionNode" <DECISION>
"MergeNode" <MERGE>
"ActivityFinal" <ACTIVITYFINAL>
"FlowFinal" <FLOWFINAL>
";" <SPLIT>
"->" <SEQ>
"." <END>
"{" <PARBEGIN>
"}" <PAREND>
"," <PARSPLIT>
(["0"-"9"])+ <NUMBER>
".." <DOTDOT>
"*" <STAR>
((["a"-"z","A"-"Z","0"-"9","_"]|"\\"~[])+ |
 (["a"-"z","A"-"Z","0"-"9","_"]|"\\"~[])
 (["a"-"z","A"-"Z","0"-"9","_","-"]|"\\"~[])*
 (["a"-"z","A"-"Z","0"-"9","_"]|"\\"~[]))
<IDENTIFIER>
"\"" ( ~["\"","\\"] | "\\"~[] )* "\"" <DOUBLEQUOTED_STRING>
"'" ( ~["'","\\"] | "\\"~[] )* "'" <SINGLEQUOTED_STRING>
"(^" ( "^"~[")"] | ~["^"] )* "^)" <ANNOTATION_STRING>
<EOF> (End of file)

5.2.2   Grammar

The Extended Backus-Naur Form (EBNF) specification below describes ADLF, subject to the limitations described in Section 5.2. The following conventions are used:

The start symbol is Activity.

ActionName ::= Identifier
ActionNode ::= "(" ( ActionNodeInitial | ActionNodeReferral )
ActionNodeInitial ::= ActionName (Label)? (AnnotationSeq)? ")"
ActionNodeReferral ::= Label (AnnotationSeq)? ")"
Activity ::= (FullStatement)+ <END> <EOF>
ActivityFinalNode ::= <ACTIVITYFINAL> (Label)? (AnnotationSeq)?
Annotation ::= ( Identifier ( "=" ( Identifier | <NUMBER> ) )?
| <ANNOTATION_STRING> )
AnnotationSeq ::= (Annotation)+
ControlNode ::= ( ControlNodeInitial | ControlNodeReferral )
ControlNodeInitial ::= ( JoinNode | MergeNode | ActivityFinalNode | FlowFinalNode ) ">"
ControlNodeReferral ::= Label (AnnotationSeq)? ">"
DecisionNode ::= <DECISION> (Label)? (AnnotationSeq)?
DecisionStatement ::= DecisionNode ">"
<PARBEGIN>
StatementSeq ( <PARSPLIT> StatementSeq )*
<PAREND>
FlowFinalNode ::= <FLOWFINAL> (Label)? (AnnotationSeq)?
ForkNode ::= <FORK> (Label)? (AnnotationSeq)?
ForkStatement ::= ForkNode ">"
<PARBEGIN>
StatementSeq ( <PARSPLIT> StatementSeq )*
<PAREND>
FullStatement ::= ( "<" ( InitialNode ">" (MultipleOutgoing)? | ControlNodeReferral )
| ObjectNode
| ActionNode )
( StatementSeq (<SPLIT>)? )?
Guard ::= Identifier
Identifier ::= ( <IDENTIFIER> | <SINGLEQUOTED_STRING> | <DOUBLEQUOTED_STRING> )
InitialNode ::= <INITIAL> (Label)? (AnnotationSeq)?
JoinNode ::= <JOIN> (Label)? (AnnotationSeq)?
Label ::= "*" ( Identifier | <NUMBER> )
Lower ::= <NUMBER>
MergeNode ::= <MERGE> (Label)? (AnnotationSeq)?
MultipleOutgoing :: = <PARBEGIN>
StatementSeq ( <PARSPLIT> StatementSeq )*
<PAREND>
Multiplicity ::= ( Lower <DOTDOT> )? UpperOrOnly
ObjectNode ::= "[" Identifier (Label)? (AnnotationSeq)? "]"
Statement ::= ( "<" ( ForkStatement | DecisionStatement | ControlNode )
| ActionNode (MultipleOutgoing)?
| ObjectNode (MultipleOutgoing)? )
StatementSeq ::= ( (Guard)? <SEQ> (Multiplicity)? Statement )+
UpperOrOnly ::= ( <STAR> | <NUMBER> )

6   Conclusion

We have described a modification of the Petri Net Linear Form (PNLF) notation to support the rendering of UML Activity Diagrams as human-readable text. The Activity Diagram Linear Form (ADLF) can be processed by screen readers and is more usable than XML-based syntaxes. As a textual notation, it is suitable for embedding in text-only documents and it accommodates standard search and copy/paste operations. Finally, ADLF, along with its attendant parsers, is useful during the prototyping of new applications. Instead of interfacing with graphical representations or full-featured interchange formats, one can use the textual format for input/output, allowing the majority of effort to be spent on the core application.

Acknowledgment

The authors thank Paul Black and Conrad Bock for their helpful reviews and suggestions.

Bibliography

[1] OMG Unified Modeling Language Superstructure Specification, version 2.1.1. Document formal/2007-02-05, Object Management Group, February 2007. http://www.omg.org/cgi-bin/doc?formal/2007-02-05.

[2] World Wide Web Consortium. Extensible Markup Language (XML), 2007. http://www.w3.org/XML/.

[3] Philippe A. Martin. The Petri Net Linear Form, 2007. http://www.phmartin.info/wf/pnlf/.

[4] Carl Adam Petri. Kommunikation mit Automaten. PhD thesis, University of Bonn, 1962.

[5] John F. Sowa. Conceptual Structures: Information Processing in Mind and Machine. Addison-Wesley, 1984.

[6] Michael Weber. Petri Net Markup Language, 2006. http://www2.informatik.hu-berlin.de/top/pnml/about.html.

[7] OMG Meta Object Facility (MOF) 2.0 / XML Metadata Interchange (XMI) Mapping Specification, version 2.1. Document formal/2005-09-01, Object Management Group, September 2005. http://www.omg.org/cgi-bin/doc?formal/2005-09-01.

[8] OMG Unified Modeling Language Diagram Interchange Specification, version 1.0. Document formal/2006-04-04, Object Management Group, April 2006. http://www.omg.org/cgi-bin/doc?formal/2006-04-04.

[9] OMG Human-Usable Textual Notation (HUTN) Specification, version 1.0. Document formal/2004-08-01, Object Management Group, August 2004. http://www.omg.org/cgi-bin/doc?formal/2004-08-01.

[10] Harald Störrle and Jan Hendrik Hausmann. Towards a formal semantics of UML 2.0 activities. In Software Engineering 2005, Fachtagung des GI-Fachbereichs Softwaretechnik, volume 64 of LNI, pages 117–128. GI, 2005. http://wwwcs.uni-paderborn.de/cs/ag-engels/Papers/2005/SE2005-Stoerrle-Hausmann-ActivityDiagrams.pdf.

[11] Software and system engineering—High-level Petri nets—Part 1: Concepts, definitions and graphical notation. ISO/IEC 15909-1, International Organization for Standardization, December 2004. http://www.iso.org/.

[12] Wil van der Aalst and Kees van Hee. Workflow Management: Models, Methods, and Systems. MIT Press, March 2004. See also http://www.workflowcourse.com/.

[13] Flex: the Fast Lexical Analyzer, 2006. http://flex.sourceforge.net/.

[14] Bison: GNU parser generator, 2006. http://www.gnu.org/software/bison/.

[15] NIST. Human-readable text form for UML activity diagrams (download page for parser), 2007. http://purl.org/net/dflater/org/nist/adlf.

[16] JavaCC, the Java Compiler Compiler, 2006. https://javacc.dev.java.net/.

[17] M. E. Lesk. Lex—a lexical analyzer generator. Computing Science Technical Report 39, Bell Laboratories, October 1975. See also http://dinosaur.compilertools.net/lex/.

[18] Stephen C. Johnson. Yacc—yet another compiler-compiler. Computing Science Technical Report 32, Bell Laboratories, July 1975. See also http://dinosaur.compilertools.net/yacc/.

[19] Theodore S. Norvell. The JavaCC FAQ, 2007. http://www.engr.mun.ca/~theo/JavaCC-FAQ/javacc-faq-moz.htm.

[20] Information technology—Syntactic metalanguage—Extended BNF. ISO/IEC 14977, International Organization for Standardization, 1996. http://standards.iso.org/ittf/PubliclyAvaliableStandards/s026153_ISO_IEC_14977_1996(E).zip.

[21] Peter Naur (editor), J. W. Backus, F. L. Bauer, J. Green, C. Katz, J. McCarthy, A. J. Perlis, H. Rutishauser, K. Samelson, B. Vauquois, J. H. Wegstein, A. van Wijngaarden, and M. Woodger. Revised report on the algorithmic language ALGOL 60. Communications of the ACM, 6(1):1–17, 1963.

Footnotes

1 The option to use leftward arrows preserves, to the extent possible, a degree of freedom that is enjoyed by the graphical representation, where arrows may point in any direction. Nevertheless, leftward arrows are neither strictly necessary in theory nor heavily used in practice, and only one of the two grammars provided later in this report supports them.

2 Specific software is identified in this paper to support reproducibility of results. Such identification does not imply recommendation or endorsement by the National Institute of Standards and Technology, nor does it imply that the software identified is necessarily the best available for the purpose.

3 The description of these tokens requires several lines in the JavaCC jj file. Essentially, single-line comments terminate with a carriage return; multi-line comments terminate with a closing */. Any symbol is permissible within either type of comment.