As an example to show how to use the TyS framework, we are going to develop a simple type Calculator, being able to calculate a logical or arithmetical value, depending on type inference. We are going to use a YACC version for Java: BYACC/Java.
This calculator accepts logic and arithmetical statements separated by a form feed. Each statement is a mathematical expression. If its syntax and semantic is correct, it would be evaluated and the resulting value will be shown in the standard output.
The calculator has three types: integer, real and boolean. It supports the following arithmetic operations: addition, subtraction, multiplication, division, and modulus. It also has boolean and, or and not operations.
You can download the full example source code.
At first, we are going to define the lexical components (tokens) of our small language. The next table shows language lexemes and the tokens associated.
Lexical items | Token |
---|---|
'+' '-' '*' '/' '%' '\n' '(' ')' '!' '|' '&' | Its ASCII value |
Integer Constant | CONST_INT |
Real Constant | CONST_REAL |
"true" | TRUE |
"false" | FALSE |
This defines our lexical specification. As we are going to use YACC,
the scanner must be implemented by the yylex
function.
As its specification is pretty simple, we have developed it by hand.
We will use a LALR(1) grammar, resolving the ambiguity conflict of infix operator notation following the conventional operator associativity and precedence:
Operator | Number of Operands | Associativity | Syntax |
---|---|---|---|
| & | 2 | left | infix |
+ - | 2 | left | infix |
/ * % | 2 | left | infix |
! | 1 | none | prefix |
( ) | 1 | none | surrounding the expression |
This is the LALR(1) grammar of the calculator language:
<calc> :: <calc> <stament> | <stament> ; <stament> ::= <expression> '\n' | '\n' ; <expression> ::= <expression> '+'<expression> | <expression> '-' <expression> | <expression> '*' <expression> | <expression> '/'<expression> | <expression> '%' <expression> | <expression> '&' <expression> | <expression> '|' <expression> | '!' <expression> | '+' <expression> | '-' <expression> | '(' expression ')' | CONST_INT | CONST_REAL | TRUE | FALSE ;
In view of the fact that we are going to use
BYACC/Java,
we should define a class that represent the semantic value of every
terminal or non-terminal grammar symbol. These semantic values are
the ones to reference on the type and token sections of the yacc grammar.
The class is called SemanticValue
.
These attributes of the objects SemanticValue
are going to represent
with their attributes the classical values to be assigned to YACC %union
.
The interesting attribute we are going to employ on the semantic analysis is
Type
: it represents the type of every expression.
The complete specification can be obtained from the file
SemanticValue.java.
We are about to specify the semantic specification of our language in a natural (informal)
way. Let's show every operation defined and type inference to be developed.
We also identified each method of the Evaluator
class explained later on.
SV stands for Semantic Value and e1, e2, and e represents language expressions.
Operation | Semantics | Evaluator Method |
---|---|---|
e1 & e2 | e1, e2: SV Boolean type. Resulting type: Boolean. Evaluation: Logical AND. |
logical(e1, e2:SV, operator:char):SV |
e1 | e2 | e1, e2: SV Boolean type. Resulting type: Boolean. Evaluation: Logical OR. |
logical(e1, e2:SV, operator:char):SV |
! e | e: SV Boolean type. Resulting type: Boolean. Evaluation: Logical NOT. |
not(e:SV):SV |
e1 + e2 | e1, e2: SV Real or Integer types. Resulting type: Major of both. Evaluation: Addition. |
arithmetical(e1, e2: SV, operator:char):SV |
e1 - e2 | e1, e2: SV Real or Integer types. Resulting type: Major of both. Evaluation: Subtraction. |
arithmetical(e1, e2: SV, operator:char):SV |
e1 * e2 | e1, e2: SV Real or Integer types. Resulting type: Major of both. Evaluation: Multiplication. |
arithmetical(e1, e2: SV, operator:char):SV |
e1 / e2 | e1, e2: SV Real or Integer types. Resulting type: Major of both. Evaluation: Division. |
arithmetical(e1, e2: SV, operator:char):SV |
e1 % e2 | e1, e2: SV Integer type. Resulting type: Integer. Evaluation: Modulus. |
mod(e1, e2:SV):SV |
+ e | e: SV Real or Integer type. Resulting type: The same as e. Evaluation: Do nothing. |
unarySign(e:SV, operator:char) |
- e | e: SV Real or Integer type. Resulting type: The same as e. Evaluation: Multiplication by -1. |
unarySign(e:SV, operator:char) |
In case previous conditions are not satisfied, a semantic error must be displayed.
It is time to use our TyCC
processor of the TyS
system in
order to automatically develop type checking of the semantic analysis.
We create type specification in a simple file: CalcTypes.txt. A type specification file is composed of three sections:
This section might be used to insert any kind of code between
the %{
and %}
pairs of characters.
This code will be simply copied into the beginning of the destination file
Type.java
generated by TyCC
.
In our example, we have used it import the necessary packages:
%{ import java.lang.io.*; %}
In this section types are declared at the same time as type promotion: rules
that give a type the ability to implicitly convert to another type.
By this declaration, code to automatically make this type promotion is
generated. Our language has three types:
Int
, Real
and Bool
.
For numeric types, integer may be implicitly converted to real type. The same way, Integer may promote to Boolean. We just have to write:
Types = { Int < Real Bool }
The "less than" operator represents promotion. If promotion does not exist, types are separated by white space or tab character. So, why have not we represented Boolean to Integer promotion? Every type has to be introduced in this declaration, but they can just appear once. We will see how our tool offers enough flexibility to solve this limitation.
Implicit type conversion is managed by the method
implicitCast(Type): Type
generated by
TyCC. In case we want to modify how this mechanism works,
we just have to modify this method implementation
in a specific class. In our example, we ought to override
the implicitCast
method of the Int
class.
Types are defined following a syntax very similar to the Java programming language.
At first it is written the reserved word class
, followed
by the unique type identifier. Optionally, after the identifier, a comma-separated
type list could be located between parentheses. Finally, method definition or
embedded code could be written between braces.
ClassDefinition ::= CLASS ID '(' [ID{, ID}] ')' '{' {MethodDefinition|EmbbededCode} '}'
Depending on the target language selected, code will be represented following the specific language rules.
Types following type identifier means that the type is a type constructor. Types that this constructor will be applied to, are the comma-separated identifiers. In this tutorial we are not going to illustrate this capability. For more information, please read the user guide or samples.
In the calculator, we define three types:
class Bool { ... } class Int { ... } class Real { ... }
In order to obtain any type implementation, TyS offers a types
table by means of the Type.typesTable
reference.
Its main method getType
receives as its sole parameter a
type expression, returning the appropriate Type
object.
TyC generates a type object for each type specified. If we want to
directly use a type table, the correct reference is
Type.TypesTableAdapter
.
Our system uses a Type Expression Language (TEL). If you want to know its specification, check out the TyS User's Guide. A type is basically represented by its name and components (attributes) --if any exists.
In our example, we have three types with no components. If would like to obtain the Boolean type, the correct statement is:
Type t=Type.typesTable.getType("Bool");
Type identifiers are case sensitive.
Following the Java syntax, we will define methods of the three types declared.
The modulus operation is just defined for integer type. We will, therefore, define it only
in the Int
type. The rest of types will not have defined this operation as
a correct one. This way, if the mod
message is sent to another type, the
type checker would generate a semantic error.
class Int { Type mod() { return Type.typesTable.getType("Int"); } ... }
What we have done to access the integer type is used the types table. TyS has a type table
from which we can obtain any type implementation by using its type expression. The only
way that the user may interact with the system is by means of the Type
Facade class. Its static attribute typesTable offers every type defined in TyS.
We have grouped every arithmetic operation (except modulus) whose
operators are two into a unique method arithmetical
.
This operation checks the semantic restrictions imposed to three
operators: +
, -
, *
,
and /
. It also infers the resulting type. We will apply it
both to integer and real types.
Type inference is based in promotion. Real type is commonly represented as major
than integer. This way, we define a method majorTyppe
that checks if
two types are compliant and, if so, it infers the major one. It will be defined later on.
class Int { Type arithmetical (Type t){ if (majorType(t) == null) throw new TypeException("Operation arithmetical not allowed between Int and " + t.getName()); return t; } ... } class Real { Type arithmetical (Type t){ if (majorType(t) == null) throw new TypeException("Operation arithmetical not allowed between Real and " + t.getName()); return majorType(t); } ... }
Boolean type does not implement this type. A semantic error would be thrown when trying to use it.
Next method represents -
and +
unary
operators. It is defined in number types (integer and real), returning the same type.
class Int { Type unarySign() { return Type.typesTable.getType("Int"); } ... } class Real { Type unarySign() { return Type.typesTable.getType("Real"); } ... }
In the type declaration section, we could not have defined promotion from integer to
both real and Boolean. Since the specified promotion is performed by the
implicitCast
method of BaseType
(any Type
extends BaseType
), we just have to override this method
implementation in the Int
class.
Note that, in case that the parameter is not Boolean type, we call the base method --that is, the default implicit cast behavior.
class Int { Type implicitCast(Type t) { if (t.getName().equals("Bool")) return Type.typesTable.getType("Bool"); // Default behavior return super.implicitCast(t); } ... } }
This method concerns to calculator expressions evaluation. Depending on the expression value, it would be represented as a logical expression. In our language, integers (but not real numbers) may be represented as booleans.
Thus, we implement a makeBool
method in Integer and Boolean that
makes explicit this conversion and evaluates it. Note that the parameter is not
a Type, but a SemanticValue representing expression value.
class Int { boolean makeBool(SemanticValue v) { return v.dval != 0; } ... } class Bool { boolean makeBool(SemanticValue v) { return v.bval; } ... }
At runtime, we will show the expression value of each of the three types in our
language. We will also display a message representing the type --this is performed
by the info
implemented afterwards.
class Int { void print(SemanticValue v) { System.out.println(info() + ", Value = " + (int)v.dval); } ... } class Real { void print(SemanticValue v) { System.out.println(info() + ", Value = " + v.dval); } ... } class Bool { void print(SemanticValue v) { System.out.println(info() + ", Value = " + v.bval); } ... }
As we have seen, each type identified by the user will be translated into a class representing
the appropriate type implementation. Each type will be derived from the
BaseType
class. Its functionality is refactoring every common
behavior that exists in all the types.
This class should never be declared in the preamble. However, it is defined as another type.
class BaseType { ... }
Since the rest of the operations are going to be defined in the
BaseType
, they will be shared to every type.
This method is used by the arithmetical
method to represent
type promotion. It returns the major type or null
if there
is no promotion.
As we have previously mentioned, type promotion defined by the user is automatically
implemented by the implicitCast
method of this class. So, the
implementation of majorType
is pretty simple:
class BaseType { Type majorType(Type t) { Type ret = implicitCast(t); if (ret == null) ret = t.implicitCast(Type.typesTable.getType(getName())); return ret; } ... }
getName
returns the type expression.
This method returns the type expression in order to trace a specific implementation.
class BaseType { String info() { return "Type -> " + getName(); } ... }
We could have implemented this method in every Type that admits
the logical not operation --in our example, Integer and Boolean types.
However, as it is exactly the same implementation, it always returns the
Boolean type, and type promotion is offered by implicitCast
,
we can develop it at the BaseType
class:
class BaseType { Type not() { Type t = implicitCast(Type.typesTable.getType("Bool")); if (t == null) throw new TypeException("Cannot make operation upon a " + getName()); return t; } ... }
Following the same idea, we may identify the logical binary operators as a
BaseType
method. We just have to check if both
operands can be implicitly converted to Booleans and, if so, this type
is returned. In other occasion, a semantic error is thrown.
class BaseType { Type logical(Type t) { if (implicitCast(Type.typesTable.getType("Bool")) == null) throw new TypeException("Cannot make logical operations upon a " + getName()); if (t.implicitCast(Type.typesTable.getType("Bool")) == null) throw new TypeException("Cannot make logical operations upon a " + getName()); return Type.typesTable.getType("Bool"); } ... }
Now that we have specified the semantic constraints and we have implemented types in TyS, we just have to define the rest of the language processor. This is the Parser.y YACC grammar:
%token CONST_INT CONST_REAL TRUE FALSE %token ERROR %left '&' '|' %left '-' '+' %left '*' '/' '%' %nonassoc '!' %% calc : calc stament | ; stament : expression '\n' { eval.print($1); } | '\n' ; expression : CONST_REAL { $$ = eval.getConst( $1, "Real" ); } | CONST_INT { $$ = eval.getConst( $1, "Int" ); } | TRUE { $$ = eval.getConst( $1, "Bool" ); } | FALSE { $$ = eval.getConst( $1, "Bool" ); } | '+' expression %prec '!' { $$ = eval.unarySign( $2, '+' ); } | '-' expression %prec '!' { $$ = eval.unarySign( $2, '-' ); } | '!' expression %prec '!' { $$ = eval.not( $2 ); } | expression '+' expression { $$ = eval.arithmetical( $1, $3, '+' ); } | expression '-' expression { $$ = eval.arithmetical( $1, $3, '-' ); } | expression '*' expression { $$ = eval.arithmetical( $1, $3, '*' ); } | expression '/' expression { $$ = eval.arithmetical( $1, $3, '/' ); } | expression '%' expression { $$ = eval.mod ( $1, $3 ); } | expression '&' expression { $$ = eval.logical( $1, $3, '&' ); } | expression '|' expression { $$ = eval.logical( $1, $3, '|' ); } | '(' expression ')' { $$ = $2; } ; ... Evaluator eval = new Evaluator(); %%
Remember that every $x is a reference to a SemanticValue
.
What we have added to the parser is an evaluator object.
This object's main aim is:
Evaluator.java is the module were the evaluation is performed. These are all its operations:
This method first checks that type semantics is correct, and then
evaluates the arithmetical binary operation. Type checking is so
simple as calling the arithmetical
method created by TyC.
Here is the code:
SemanticValue aritmetical(SemanticValue v1, SemanticValue v2, char operator) { SemanticValue r = new SemanticValue(v1.dval); try { r.type = v1.type.aritmetical(v2.type);//TypeChecking switch (operator) { case '+' : r.dval += v2.dval; break; case '-' : r.dval -= v2.dval; break; case '/' : r.dval /= v2.dval; break; case '*' : r.dval *= v2.dval; break; default: throw new TypeException("Unknown operator " + operator); } } catch (TypeException e) { System.err.println(e); } return r; }
First we create a new SemanticValue
object to be
returned. Then we call the type checking appropriate operation and
finally the evaluation is performed. Note that our error handler just shows the
semantic error by console and follows processing expressions --more sophisticated
ones could be developed modifying exception handlers.
Following the same scheme, logical
checks semantic
constraints and then evaluates the logical expression.
SemanticValue logical(SemanticValue v1, SemanticValue v2, char operator) { SemanticValue r = new SemanticValue(v1.dval); try { r.type = v1.type.logical(v2.type); //TypeChecking // Evaluation boolean b1 = makeBool(v1); boolean b2 = makeBool(v2);// If typechecking is OK continue with evaluation switch (operator) { case '&' : r.bval = b1 && b2; break; case '|' : r.bval = b1 || b2; break; default: throw new TypeException("Unknown operator " + operator); } . . . . . . }
In this example, we have used TyS, besides for type checking, as part of
expression evaluator. If you remember, makeBool
takes a Semantic
Value and returns, if possible, its respective Boolean. This shows us the high versatility of this system.
SemanticValue mod(SemanticValue v1, SemanticValue v2) { SemanticValue r = new SemanticValue(v1.dval); try { r.type = v1.type.mod(v2.type);//TypeChecking r.dval = (int)v1.dval % (int)v2.dval; .... }
This method checks and evaluates the unary operators +
and -
:
SemanticValue unarySign(SemanticValue v, char operator) { SemanticValue r = new SemanticValue(v.dval); try { r.type = v.type.unarySign();//TypeChecking if (operator == '-') r.dval = -r.dval; ... }
SemanticValue not(SemanticValue v) { SemanticValue r = new SemanticValue(v.dval); try { r.type = v.type.not(); //TypeChecking r.bval = !makeBool(v);//TypeChecking Interpreter Interaction } ... }
Once we have created all the modules, it is time to build our typed calculator.
At first, first compile the parser grammar with BYACC/J. We have to make explicit the use of SemanticValue:
yacc -JSemantic=SemanticValue Parser.y
If everything goes well, a Parser.java file is generated.
Once the parser is generated, we produce the type checker with TyC. In this case, there is no need of specific parameters, simply:
TyCC CalcTypes.txt
A Type.java file is created with our type checker. System's facade is Type
.
Next step is application (language processor) compilation. Everything is Java source, so we have:
javac Type.java Parser.java Evaluator.java
To run the calculator, we write:
java Parser
Sample valid programs are:
2 + 2 3 * 5.6 3 & 4 | true 3 % 2
Wrong ones and, therefore, semantic errors are displayed in these ones:
2 % 2.2 ! 2.3 4 | 2.3
It has been shown how TyS makes possible the isolation of type systems specification (the most important part of semantic analysis). The user, employing object orientation, defines types, its defined operations and promotion. The checker is automatically generated and the language processor just uses it.
As YACC is a well-known parser generator tool, we have done this tutorial with YACC. However, using more sophisticated tools (like ANTLR) demonstrate that the rest of semantic analysis is commonly reduced to symbol table management --see TyS user's guide.