There have been different efforts to bring static type-checking to object-oriented dynamic languages such as Python, Ruby or JavaScript. These languages provide no type annotations, and type checking is postponed until runtime. However, a compiler may infer type information, and use it to detect some type errors statically and to improve the runtime performance of the target code.
Constraint-based type inference is an approach to statically infer types of dynamically typed code. DRuby and StaDyn are two example languages that represent types as constraints, checked by a constraint resolution algorithm. Rubydust also follows this approach, and introduces the idea of generating constraints with dynamic program executions. Runtime variables are wrapped to associate them to type variables, and the wrapper generates constraints when the wrapped value is used. At the end of the execution, constraints are solved to type-check the program. This constraint-based dynamic type inference approach is also followed by Flow, which additionally provides gradual typing for JavaScript.
Refinement types have also been used to type-check dynamic languages. For example, Dependent JavaScript (DJS) is a statically typed dialect of JavaScript that provides a dependent type system, expressive enough to reason about a variety of tricky JavaScript idioms. DJS requires the programmer to annotate types, but some type annotations might be inferred by analyzing the source program.
Our proposal is based on type-checking Python programs with Python itself. Static type checking is obtained with the dynamic execution of a convergent Python type checker. Instead of generating constraints or predicates in a refinement logic, types are represented in a subset of Python that avoids non-termination. The generated type checkers take advantage of the rich meta-programming features of the Python programming language: introspection is used to inspect the structures of objects, classes and modules, and the inheritance graph; the AST of the original program is easily obtained, transformed, compiled into the target type checker, and executed; recursion can be detected dynamically with the use of annotations; and the types of variables, functions, classes and modules can be changed throughout the program execution.
Figures 1 and 2 show a simple example program and (a simplification of) the corresponding type checker generated, respectively. Each variable in the original program in Figure 1 (Counter
, count
, inc
, obj
and sum
) represents a type variable in the generated type checker (Figure 2). Python facilitates this association, since most of the language entities (modules, classes, functions, types...) are first-class objects.
Our type system considers a different type for each single object. The type of an object is represented with another object holding the types of its members. This consideration is especially useful for type-checking structural intercession operations, since the structure of Python objects, classes and modules can be modified at runtime.
class
Counter: count = 0def
inc(self, value): self.count += valuereturn
self.count obj = Counter() sum = obj.inc(1) + obj.inc(0.2)
The code in Figure 2 evaluates types of expressions instead of their values. Python code is used to infer types instead of representing them as constraints or predicates. When the inferred type of an expression is TypeError
, it means that it is not well typed. After executing the type checker, the collection of instantiated TypeError
s is consulted to see if the program is well typed.
Types are saved in a type store (ts), instead of using Python globals
and locals
functions. When the original program uses an undefined variable, the get method of ts returns a TypeError
. Similarly, the get_member
function makes use of Python meta-programing to check if an object, class or module provides a member by inspecting the inheritance tree. The generated type checker should never produce a runtime error, since it is aimed at type-checking the source program.
The inc
method in Figure 2 checks the original inc
method in Figure 1. If the number of parameters is not the expected one, a TypeError
is returned. Otherwise, the type of an addition is computed and assigned to the count
field of the particular Counter
object pointed by self
. In our example, the second invocation to inc
changes the type of obj.count
from int
to float
(in other languages, this side effect is represented with a special kind of constraint).
class
Counter: count = intdef
inc(self, *args)if
len(args) != 1:return
TypeError("Wrong number of args"
) self.count = plus_op(get_member(self,"count"
), args[0])return
get_member(self,"count"
) ts.set("Counter"
, Counter) ts.set("obj"
, ts.get("Counter"
)()) ts.set("sum"
, plus_op(get_member(ts.get("obj"
),"inc"
)(int
), get_member(ts.get("obj"
),"inc"
)(float
) )
In dynamic languages, programmers often give variables flow-sensitive types. In the left-hand side of Figure 3, the dynamic type of obj.count
depends on the dynamic value of condition. We represent this flow-sensitiveness with union types.
The generated code (right-hand side of Figure 3) runs the conditional execution paths sequentially. Before each branch, the type store is cloned. The else body is executed with initial type store, before the if body. After the if-else
statement, the modified types in each execution path are "joined" with union types. After type-checking the code in Figure 3, the type of obj.count
is int
\/float
.
obj = Counter() ts.set("obj"
, ts.get("Counter"
)())if
condition: _ts_1 = ts.clone() obj.inc(1) get_member(ts.get("obj"
),"inc"
)(int
)else
: _ts_2 = ts.clone(); ts.set(_ts_1) obj.inc(0.5) get_member(ts.get("obj"
),"inc"
)(float
) ts = join(_ts_1, _ts_2, ts)
Since type checking is performed via dynamic program execution, termination of the generated code (i.e., type checkers) must be ensured. The following criteria were applied to avoid divergent type checkers:
norecursion
decorator we developed. This decorator dynamically detects and avoids recursive calls when types are being inferred.
str
, len
and range
) are replaced with convergent Python functions that infer the return type depending on the parameter types passed.
As shown in Figure 4, stypy consists of a compiler and an API. The compiler takes the original program and generates a particular type checker. Then, the compiler executes the type checker, which uses the type system implemented as a Python API. The list of type errors, if any, is collected by the compiler and shown to the programmer.
We implemented the type system (API) and tested it with most Python features. We support tricky idioms such as conditional code depending on the type of arguments, special method names (e.g., __add__
, __repr__
and __float__
), adaptation of class and object structures, and flow-sensitive definition of functions and classes. The types of native built-in functions are included in the type system (API), expressing their types with Python code.
We do not support dynamic code evaluation with the exec
and eval
functions. For structural intercession, we type-check programs that add/remove class and object members using identifiers, but not dynamically evaluated strings (i.e., we do not support direct access to __dict__
). Currently, our type system detects different type errors that existing tools (such as Pylint, PyChecker, PyCharm, Pyntch, PyStarch and pyflakes) do not detect.
We are currently developing the compiler. We plan to validate its implementation with mutations of real programs, comparing its accuracy with the existing alternatives.