Language
Configuration
A language
configuration (LC) is a well-defined configuration of components that
together specifies the used programming language. The phrase “this code is
written in Java” usually assumes much more: a specific version of Java, a
set of the used Java packages with their concrete versions, etc. In OpenL
each language configuration is specified explicitly by its name. LC usually
consists of Parser, Binder and Virtual Machine (VM). Two language configurations LC1 and LC2 are
considered equivalent if the same OpenL code produces the same result
either it runs within LC1 or LC2. One LC may extend another LC. We will tell
that “LC2 extends LC1” if any code working in LC1 produces the same
result in LC2. For example, LC2=Java 1.4.1_02 + Xerces2.0.4 extends
LC1=Java 1.4.1_02.
Grammar
OpenL Grammar is context-free. It helps Parser
to produce Syntax Tree, a tree of named Syntax Nodes. One of the key feature
of OpenL is an ability to provide a programmer with a language of his/her
choice, in particular with a grammar of his/her choice. OpenL development
team uses
JavaCC v3.0 to produce their grammar implementations. This choice was
made because we were familiar with JavaCC and knew it does its job. Plus
JavaCC has a powerful list of freely available
Grammar Repository for different languages. Anyway, we would be happy to
learn from the proponents of other similar products (for example, ANTLR) if
they are convinced that such product is better suited for OpenL purposes. It
is important to underscore that in OpenL one grammar implementation does not
exclude another implementation of the same grammar. We are also interested
in the following topics and would be glad to get help from the experts in
the following areas: self-correcting grammars, modular grammars, dynamically
built grammars, natural language grammars.
Binder
Binder uses a Binding Context (a type system) to convert
Syntax Tree into Bound Tree,
a tree of Bound Nodes. You may think of this process as a compilation, but
it is slightly different in a sense that no byte-code or object-code is
produced at this stage. After a successful Binding, the OpenL code is valid
and ready for some form of execution by VM. This approach achieves the
following:
Syntax Tree
A tree of named Syntax Nodes. Produced by
Parser from Source Code. Syntax Tree is used by Binder to produce a Bound
Tree.
Syntax Node
A Syntax Node is a named Tree Node. It is a
part of the Syntax Tree. The name of the Syntax Node is used by
Binder to produce a particular Bound Node. While the name of the node is
arbitrary it is useful to use standardized names in different
Grammars to maximize a reuse of language configurations. A table of
recommended Syntax Node names is located here. A Java interface
containing recommended names is located here.
Example. An expression “x + 5” will be parsed
in some Grammar into the following Syntax Tree:
Syntax Node memorizes the start and end
positions of the node in the source code. This allows OpenL to be used in
IDE for syntax coloring, code completion, compile and run-time error
reporting and debugging.
Bound Node
A Bound Node is produced by Binder usually (but not necessarily) from some Syntax Node. For example,
3 syntax nodes produced by expression “x + 5” above may be bound within some
language configuration similar to Java as following:
-
the syntax node
literal.integer(5) will be bound into some LiteralBoundNode with a type
JavaOpenClass(int) and value Integer(5).
Note. We have to keep Java primitives as
Objects, but the distinction is made on type level.
-
for the syntax node
identifier(x) the variable x must be resolved into either local or
external variable and, if successful, some
FieldBoundNode
will be created with an accessor to this variable.
-
for
the syntax node op.binary.add(+) the following mechanism may be
used: if x has type double and 5 has type int the Binder will look in the
binding context for a static method add(double,int). The accessor to
this method will be stored in the MethodBoundNode.
Please keep in mind that the OpenL’s binding
mechanism described above is not hard-coded. You may use this binding
mechanism as is, or you may extend it, modify or completely replace as you
wish. In our reference implementation org.openl.simple the binding is
provided by mapping of a syntax node name into a name of the NodeBinder java
class. It provides a good starting point for everybody looking for the
customization of the existing binding mechanism.
Bound Tree
A tree of Bound Nodes. Produced by Binder from a
Syntax Tree using Binding Context. Bound Tree is used by
VM for execution or debugging. It also may be used by a Code Generator.
VM
Virtual machine (VM) runs or debugs OpenL Bound
Code. It does so by providing Runner and Debugger objects. There could be
multiple instances of VM in your system.
Binding Context
Probably the most important (and most
complex) concept for understanding OpenL’s approach. Let’s take, for
example, a snippet of code: sin(x) + 2 < 4L. If it were a Java code, we
would say that:
-
sin() is either one of the methods of this
class or its superclasses (interfaces); x is either a local variable, a
parameter or a variable defined in this class or its superclasses;
-
2 is integer constant of type int;
-
4L is integer constant of type long;
-
operators + and < are defined by Java VM
specification for primitive types.
If it were C++ code the context for sin and x
would be extended to include global functions and variables. Operators < and
+ also can be built-in C++ operators for basic types or can be overloaded.
The operator overloading mechanism is a part of C++ specification and cannot
be changed (in particular, an implication operator => can not be added).
OpenL allows you to configure all items you
put into your context. Those items include:
-
Variables
-
Methods
-
Operators
-
Types
-
Casts.
While mechanisms used in the
OpenL reference implementation are powerful and convenient (in our
opinion, of course), OpenL’s philosophy does not limit you to those
approaches but always allows you to extend or override existing
implementations.
In the example above, we put the entire
java.lang.Math into the binding context, thus allowing an access to all math
functions (sin, cos, ln, etc.). Constants 2 and 4L are handled by
NodeBinder implementation for node literal.integer according to Java
specification.
Variable x could be either a local variable
or a parameter, or can be put into binding context using other OpenL
mechanisms. For example, constants PI and E from java.lang.Math also were
put into binding context together with methods.
For more detailed information see
Configuring OpenL (to be done)
OpenClass
Generalization of Class. Allows to extend OpenL
with data-types like XML, RDBMS/JDBC, RDF etc.
Reference Implementation
OpenL comes not only as innovative concept to a
programming language configuration but also as a tool you can use for your
programming needs right away. The reference implementation called for
historical reasons org.openl.j is a ready to use language with
Java-like syntax and some extra abilities like operator overloading. Also
this reference implementation serves as a base for our several OpenL
extensions like org.openl.j.science, org.openl.xml.dom etc.
Namespace
<TBD>