1 Introduction
Programming language semantics is a subfield within theoretical computer science where researchers develop formal descriptions for the meaning of computer programs. Over the years, we have seen the development of denotational semantics, where we mathematically model the effect of an execution of a language construct. Operational semantics formalise mechanical steps that transform program states given a particular program. As we shall argue in this thesis, the challenge of analysing dynamic languages, in both concrete and abstract manner, necessitates a semantics that bridges the gap between the two different semantics in order for such task to be feasible.
Abstract interpretation is a unifying theory for program analysis and verification with which we ascertain runtime properties of a program by approximating its semantics. The properties of interest are almost always undecidable. The task of abstraction interpretation can be thought of as overapproximating a set of concrete states in a finite number of steps. The usual semantic domain is replaced by an abstract domain whose elements describe a set of runtime states. Mathematically, such an abstract domain is a partially ordered set (forming a lattice), the ordering corresponding to subset ordering of the powerset of concrete states.
Two distinct needs motivated the development of the present work. First, there is the theoretician’s need for a simple, concise, elegant way of presenting a formal semantics for a programming language and of developing that into various static analyses. We base our approach on a parametric denotational semantics that is modularised to allow the concrete semantics and the abstract interpretation to share a common framework that uniformly handles most aspects of the programming language. Use of denotational semantics provides a strong foundation in proving correctness of an abstract interpretation, and allows us to focus on algorithmic details of analysis.
Second, there is a practical need for program analyses suitable for the dynamic languages that have been growing in popularity in recent years. Traditionally these languages were called “scripting” languages, as they were mainly used for automating tasks and processing strings. However, with the advent of web applications, languages such as Perl and PHP gained popularity as languages for web application development. On the client side, web pages make heavy use of JavaScript, a dynamically typed language, to deliver dynamic contents to the browser. Recent years have seen an increasing use of JavaScript on the server side, as well.
What these languages provide is an ability to rapidly prototype and validate application models in a real time readevalprint loop. Another strength comes from the fact that programmers do not need to have a class structure defined upfront. Rather, class structures and types of variables in general are dynamically built. This reduces the initial overhead of software design.
However, these features come at a cost. The lack of a formal, static definition of type information makes dynamically typed languages harder to analyse. This difficulty causes several practical problems.

As applications become more mature, more effort is devoted to program unit testing and writing assertions to ensure type safety of systems. This extra effort can sometimes outweigh the benefit of having a dynamically typed language.

Whereas programmers using statically typed languages enjoy an abundance of development tools, the choice of tools for development in dynamically typed languages is limited, and the tools that do exist lack much of the power of the tools for statically typed languages, owing largely to the difficulty or infeasibility of type analysis for such languages.

Lack of static type structure has a significant impact on the performance of dynamically typed languages.
With these problems in mind, we have designed a model language that has a dynamism comparable to that of the aforementioned scripting languages, such as duck typing, reflection, and partial function application. A notable omission is closure scoping. However, allowing function currying gives expressive power to the language comparable to that of languages with closure or lexical scoping.
The two concerns are not distinct ones, but an interconnected dialectic. The theoretical need is there because of the difficulty of describing the abstract and concrete meaning of dynamic languages, which often allow sideeffect causing, typealtering functions. With such complexity, ducktyped languages are interesting test cases for which we formulate concrete semantics, abstract interpretation and the proof of correctness. Our Haskell implementation of both concrete and abstract analysis, appearing in the appendix to this thesis, illustrates the practicality of the proposed programming language semantics.
This work is inspired by Haskell’s use of monads, and we assume the reader’s familiarity with monadic style Haskell programming. We also assume knowledge of lambda notation, denotational semantics, order and fixed point theory at the level of the textbook of Nielson and Nielson’s [1].
In the following section, we discuss other works in the field of language semantics and differentiate our work from them. In section 3, we give a general overview of the proposed language semantics framework and analysis. In sections 4 and 5, we formally introduce our framework. In section 6, we develop a model language with features gradually added on. We also present concrete and abstract analysis of the language in each stage of development in parallel. In sections 7 and 8, we argue formal properties of the language analysis. Finally, in section 9, we conclude this thesis and discuss future direction.
2 Related work
Denotational semantics is the starting point of our development of a formal framework. The idea of incorporating monads into denotational definitions was developed by Liang and Hudak [2, 3]. Whereas these works modularise an analytic framework by having multiple layers of monadic transformations, we instead parametrise the definition of a program state.
Action semantics, as advanced by Mosses [4], shares the motivation that semantics ought to be pragmatic, yet expressive enough to deal with nontrivial, featurerich languages. While action semantics endeavours to devise a new metalanguage for describing semantics, we constrain ourself to the language of denotational semantics, and seek to devise a formalism largely compatible with denotational semantics.
The idea of constructing formulae with parametric types can be found in Wadler’s work [5]. The present work is a special application of the parametricity in the field of language semantics and analysis.
Regarding the type analysis of dynamic languages, there have been numerous studies [6, 7, 8, 9] that consider simple toy languages and their semantics for the purpose of static analysis of dynamic languages. A major difference between those languages and the model language presented in this paper is that our language is designed to capture the critical feature of real world languages which allows functions to alter types through sideeffect causing statements. We point out similarities and differences of this work compared to the cited works as we encounter them in this thesis.
Type analysis plays a crucial part in compiling scripting languages, mainly to improve performance. Ancona et al [10] and Dufour [11] design restricted versions of scripting languages so that static inference of types can be performed. We adopt several techniques employed in those projects, such as the use of named memory allocation sites as static references.
An important use case of functions in dynamically typed languages is “mixin” functions [12]. By passing arguments to a mixin function, objects can be extended with extra methods; that is, functionality can be added dynamically. There are model languages and formalisations of mixin functions, such as the works of Anderson et al [6] and Mens et al [13]. Where those works seek to find functional models for mixins, we define instead a language (with sideeffect causing functions) that is expressive enough to program mixin inheritance.
Jensen et al [14] describe a featurecomplete analyser for the JavaScript language. Our work can be extended further to provide the semantic foundation for such an analyser. Such an attempt to formalise the analysis might pave the way for further refinement and improvement.
3 Overview
Our semantic framework is comprised of two components: one for the syntactic structure, and the other for giving meanings to the primitive operations. What divides the two is the following separation of concerns:

What are the semantic operations entailed in a particular syntactic structure? For example, syntactic structure entails a primitive operation .

How do we interpret such semantic operations in a particular point of view? If we were to give a concrete interpretation, we would interpret as updating an environment with a newly defined variable e.g., .
Observe that an interpretation of syntactic structure can remain agnostic of the structure of a program state at a given point. Therefore, once we remove the actual interpretation of primitive operations, what remains in a semantics can be reused for multiple interpretations of the language. Hence, not only are the primitive operations parametrised, but so is the whole definition of the domain of the program state. Such a separation of concerns also helps to define an extensible semantics, to which adding a new feature takes as little effort as possible.
Now we give a formal definition of our framework.
Definition 1 (Parametric semantics).
A parametric semantics is a quintuple where is a collection of semantic functions for syntactic structures, as outlined below; is a set of representations of computation state, which can be anything to suit a particular analysis; is a set of all possible values that an expression can be evaluated to be; is an initial program state; and is the set of primitive operations of the semantics. We assume throughout that different states are incomparable. In other words, is ordered by identity.
Throughout the analysis, these primitive operations are the parameters of our analysis:

takes a state and reports whether it is escaping (i.e., whether or not control flow reaches the successor statement)

interprets the meaning of a branching point when a value and two transformations (one for true and another for false) are given

takes an identifier and a value, and performs assignment

takes an identifier and produces its meaning

takes a constant and produces its meaning

and define the meanings of console I/O operations

defines the meaning of all binary operations given two values

defines the meaning of a return statement given the value to be returned

defines the meaning of dynamic execution of a function declaration

defines the meaning of (possibly partially) applying a function to a list of values

and define the meaning of getting or setting a member of an object

and define the meanings of keywords and , respectively

defines the meaning of instantiating a new object from a particular allocation site
Types of these operations are given in section 6 as we introduce them.
Our model language, as we let it evolve through this thesis, has a set of features found commonly in scripting languages. In the remainder of the thesis we provide the semantics for a language with many different features. We introduce the components of the language step by step. The aim is to demonstrate that the semantic formalism enables such a stepwise development, each step being incremental in the sense that it does not require revision of the semantic equations developed in earlier steps.
Figure 3.1 is an example of a program written in the model language. We call this model language Simple DuckTyped Language(SDTL). A locallyscoped procedural language with support for higher order functions (lines 1 to 5) is introduced in Section 6.1. Function currying (lines 7, 8 and 20) is introduced in Section 6.2. Object oriented features, including ducktyping and reflection, are introduced in Section 6.3. Finally, exception handling (lines 38 to 42) is introduced in Section 6.4.
4 Analytic framework
In this section we introduce a monadic construct specifically designed for the purpose of program analysis. We then introduce polymorphic auxiliary functions that are useful in extending theories in a modular manner.
First we define the monadic constructions. We define a type constructor and a bind operator .
Definition 2 (Type constructor).
The type constructor has the following polymorphic definition. is the set of program semantics. It is necessary to have this as an input to the state transformation in order to give the fixed point characterisation of semantics. is given a formal definition in section 5. The parameter to the type is used in different context to extract different information from the semantics.
Observe that a single state can give rise to multiple corresponding successor states. We are essentially modelling a nondeterministic state transformation. This gives us the flexibility to handle both concrete and abstract semantics within a single framework.
Every statement is understood as a state transformer. We distinguish between “normal” and “escaping” statements, the latter yielding an “escape” state. For example, when a function returns, the return statement transforms the current state into an escape state. Our “bind” operator relies on a parametric operation to spell out the precise mechanism for escaping the current program execution flow. The function returns true if a state does not continue to the next expression or statement (having encountered a return statement, for example). This provides a flexible and general formalisation of a control flow, and it allows the handling of exceptions as well as function return statements.
In such escaping cases, there is no appropriate value of the type to be associated with the successor states. Hence, we introduce to be assigned to successor states of the escaping states.
Definition 3 (Bind operator).
We define a bind operator .
Definition 4 (Pointwise ordering of state transformations).
Given ,
iff
Definition 5 (Pointwise ordering of monadic functions).
Given , iff
Theorem 6 (Preservation of monotonicity).
Given monads , and ,
Proof.
When a state is a member of for some and an initial state , there exists an intermediate state from which is derived by . Clearly, such intermediate state is also a member of by definition of pointwise ordering.
Formally, by the definition of bind operation. Now, . Hence, . ∎
Having a monadic structure helps provide modularity. For example, if a particular parametrised operation takes a state but only produces a value, it would be redundant to include a state as a part of returning type, to match the definition of monadic binding. In such a case, we take a function that returns only a value, then lift it to be used in the monadic context.
Definition 7 (Monadic functions).
We define the following auxiliary functions to incorporate nonmonadic functions as a part of monadic transformation:

(return for ) is an identity state transformer that takes a constant and lifts it to an identity state transformer with the constant as a return value

(lift for ) lifts a function that takes a state and returns a value to a monadic function

takes a nondeterministic transformation and lifts it to a monadic function
Definition 8 (Record updater).
We model a state as a record with named fields. In this way, an update operation written for a particular set of fields can be reused without redefining it when we add extra dimensions to a domain to accommodate features that are orthogonal to the features of the previous version.
When we have a record with named fields , and when an updater function updates fields , we define a function that takes a record, projects its fields into an ntuple corresponding to the selected fields (), lets update the tuple, and finally updates the whole record with the updated tuple ().
where is a value for the field of a record
Similarly, we define an operation to update a record and return a value.
where and
Finally, we define a value extractor, that takes a record and selects a value from it.
When is an ntuple space for chosen fields and is a domain of a record, the functions defined here have the following type signatures:
Example 9 (Record updater example).
To see these functions in use, suppose we have a simple record structure for personal contacts.

updates age field of a contact record.

returns the previous age field value while updating the age field.

extracts age information from a contact record.
Definition 10 (Singleton lifting).
Another commonly occurring pattern is that functions often return a singleton set. We define a function that takes a function returning a value and lifts it to be a function that returns a singleton set.
For simplicity of notation, we compose this function with the other functions from Definition 8.
We now have monadic constructs and auxiliary functions to describe the semantic functions of the model language. We can now define the semantic functions of the language.
5 Semantic functions
We use syntax nodes as references to various items constituting program environment. To all statements and expressions in a program, we designate unique identifiers in order to reference them. For that purpose, we define the following syntactic nodes and unique identifier spaces.

is the set of statement nodes.

is the set of expression nodes.

is the set of leftexpression nodes.

is the set of statement identifiers.

is the set of expression identifiers.

is the set of alphanumeric identifiers.
Note that we use an sid of a function declaration statement as a reference point for the function defined. The and functions take such an sid and return a list of parameter names, and the arity of the function, respectively.
Where we specifically refer to an identifier to a syntactic construct, we write to mean a statement or expression with an id . In cases where such identifiers are not directly referenced, we omit them for simplicity.
Definition 11 (Semantic functions).
The analytic framework contains the following semantic functions:
and are semantic functions for statements, expressions and left expressions, respectively. is a function space to model the collection of functions in a program. Given the sid of a function declaration site, it gives a statement node for function declaration and a state transformer. Note that in this picture a function “returns” a value by giving a state transformation. Incorporating such a concept as a return value in a itself provides a greater flexibility in describing the effects of executing a statement or an expression at a particular program point.
We define following auxiliary functions to describe the use of the references to syntax nodes.
6 The language under study
We define the model language, the SDTL (Simple DuckTyped Language).
6.1 The procedural core language
We start off with a procedural language with Clike syntax.
<con> ::= <Num>  <Bool>
<Lexp> ::= ID
<Exp> ::= <con>  <Lexp>  ‘input’ <Lexp> ‘(’ [<Exp> [,<Exp>]*]? ‘)’ <Exp> <binop> <Exp> ‘(’ <Exp> ‘)’
<binop> ::= ‘+’  ‘’  ‘*’  ‘/’  ‘>’  ‘<’  ‘==’
<Stm> ::= nil  <Stm> ‘;’ <Stm>  <Exp> ‘output’ <Exp> <Lexp> ‘=’ <Exp> ‘if’ ‘(’ <Exp> ‘)’ ‘’ <Stm> ‘’ ‘if’ ‘(’ <Exp> ‘)’ ‘’ <Stm> ‘’ ‘else’ ‘’ <Stm> ‘’ ‘while’ ‘(’ <Exp> ‘)’ ‘’ <Stm> ‘’ ‘function’ Id ‘(’ [Id [, Id]*]? ‘)’ ‘’ <Stm> ‘’ ‘return’ <Exp>
Here Num and Bool are the syntactic categories for integers and boolean values.
SDTL does not have a separate category for function and variable declarations. Variables are declared ad hoc whenever such variables appear as a left expression to assignment statements. Function declarations are statements themselves, which allow them to appear anywhere in the program.
SDTL supports higherorder functions, which allows functions to be recursively referenced. For example, we can define a factorial function in a recursive manner.
Example 12 (Recursively defined factorial function).
In SDTL, the factorial function can be implemented in a recursive way.
In this example, the function takes two arguments. The first is the function pointer to recursively invoke, and the second is the usual argument to the function. This example illustrates that recursive functions are possible even in the absence of lexical scoping or other special scoping rules to allow a function body to refer to the function itself.
Given the availability of higherorder functions, we formulate the meaning of a function as a fixed point (see the definition of ). The semantic functions for SDTL are defined in figure 6.1. We use auxiliary functions and to describe a function call. (Note that we use for the empty sequence, for the set of sequences of any number of values of type , and the notation to denote concatenation of sequences and .)
is a parametrised function that takes a caller’s state at the time of function invocation and an idtoconstant value mapping, and constructs an initial state for a callee. takes both the caller’s state and the resulting states of callee’s, and constructs the caller’s states after the function call. These functions are parametrised so as to allow each interpretation to define the exact shape of a program state and its manipulation during a function call and return. These functions have the following types:
The types of primitive operations are as follows.
6.1.1 Concrete interpretation
Domain
is a Cartesian product of environment, input/output state and a return value. The return value is set to be a value when a function is returning any value inside a function body. This has been incorporated as a part of a program state so that we can signal escaping from a program flow. The initial program state is where is an initial IO state.
Functions
We omit a detailed description of the IO environment. Normally, IO can be modelled as a queue of inputs and outputs as they are given and produced during the execution of a program.
Example 13 (Concrete interpretation of recursive factorial function).
The program in example 12 is concretely interpreted as follows.

At line 1, updates environment to be assuming the function declaration has a unique id of 1.

At line 5, gives a user input. Assume that the input was 2. gives . In a function call , we first construct the initial state of a function call. gives .

At line 2, evaluating expression yields true. invokes another function call, with initial state .

On the second call fact(f,1), yields false. Hence, invokes which gives final state of .

On the first call, this state is first evaluated to yield value by . Then, f(f,1) * 2 evaluates to 2, which becomes the ultimate return value.

At line 5, after the function call gives as the final state. adds symbol to the environment:

At line 6, evaluates to 2, which is the final output of the program.
6.1.2 Abstract interpretation
At this stage, abstract interpretation looks largely similar to concrete interpretation. Notable differences are that we approximate each constant by its type, and that is a nondeterministic transformation where it collects effects of both branches at a branching point.
Domain
Composition of an abstract domain is similar to that of the concrete counterpart, except that it does not include an IO state. is an undetermined value, which is used to approximate unknown function calls at an initial stage. The initial program state is .
Functions
Function definitions are largely similar to that of concrete definition.
Example 14 (Abstract interpretation of recursive factorial function).
The program in example 12 is abstractly interpreted as follows.

At line 1, updates environment to be assuming the function declaration has a unique id of 1.

At line 5, gives an abstract value . gives . In a function call , we first construct initial state of a function call. gives

The meaning of this function call is determined via a fixed point iteration by progressively updating the current approximation of the meaning of the function call, starting from a null hypothesis that the function call does not return any state.
Current approximation  Meaning of function call  Note 

Fixed point 
Implementation of this fixed point iteration is found in the function in appendix A.4.

This yields

After , and , we have the final state of the program:
Example 15 (Abstract interpretation of a while loop).
The following example illustrates an interpretation of a while loop through a fixed point iteration.
The program calculates sum of a sequence. For the purpose of illustration, we have added variable that changes its type inside a while loop.

At line 4, we have environment .

As an initial hypothesis, we assume that the statement body of a while loop does not cause any change in the program state for any given initial state. We progressively update this approximation until we meet a fixed point.
Current Approximation  Init Final State 

Implementation of this fixed point iteration can be found in the function in appendix A.4.

The resulting final states of the program is calculated to be
6.2 Function currying
We now introduce function currying to the SDTL language. Introduction of this language feature allows the language to be flexible enough to express what JavaScript programmers would do with lexical scoping.
Example 16 (Function currying).
We take a simple add function, and curry one argument to produce different adders.
If we were to write this in JavaScript, we could have written the following for the same effect.
The introduction of function currying does not change the syntax of the language. Therefore, there is no inherent reason for changing semantic functions. However, we do redefine the semantics of function calls to include an eid as an input, for the reason explained below.
Here we introduce the function. It invokes the function when the function arguments are saturated, or it returns a pointer to a curried function otherwise. Its type is as follows.
6.2.1 Concrete interpretation
We need to extend the definition of to hold curried parameters. This means that the function also needs to be modified to match the new type signature.
Domain
The initial program state is unchanged.
Functions
Here, is the length of a sequence .
Example 17 (Concrete interpretation of a function currying).
Consider the program shown in example 16.

At line 1, we are presented with a function declaration. adds the identifier as a reference to a function pointer with no curried value. Assuming has given a unique id of during the parsing of the program, we have in environment .

At lines 5 and 7, we partially apply the function. Since the number of arguments is not saturated, gives for . Similarly, gets .

At line 8, we saturate the parameters, give , where is an arbitrary number given from user input. Hence we have the addition done. It works similarly for .
6.2.2 Abstract interpretation
Note that curried functions introduce a possibility of creating closures requiring an infinite number of arguments.
Example 18 (Currying loop).
Consider the following program.
If we naively interpret this program, we would not be able to reach a fixed point in analysis. Instead, we would have:
A solution to this problem is to have a curried function anchored to a particular language construct. In this case, we can use an eid of a curried expression as a point of reference (or ’0’ if not curried).
Comments
There are no comments yet.