Combining de Bruijn indices and higher-order abstract syntax in Coq

Sorry for the long delay in posting any updates.  This last week, I was invited to give a talk in Sussex on name binding.  In my talk, I was discussing some of the problems with (“strong”) HOAS, specifically negative occurences in the types of constructors, and the problems with using these in Coq.  An audience member asked where he could read more about this problem, and ways around it.  As I’ve already posted the two papers I suggested, I’ll add a third:

By Venanzio Capretta and Amy P. Felty, from TYPES 2006, available from Amy P. Felty’s website:

The use of higher-order abstract syntax is an important approach for the representation of binding constructs in encodings of languages and logics in a logical framework. Formal meta-reasoning about such object languages is a particular challenge. We present a mechanism for such reasoning, formalized in Coq, inspired by the Hybrid tool in Isabelle. At the base level, we define a de Bruijn representation of terms with basic operations and a reasoning framework. At a higher level, we can represent languages and reason about them using higher-order syntax. We take advantage of Coq’s constructive logic by formulating many definitions as Coq programs. We illustrate the method on two examples: the untyped lambda calculus and quantified propositional logic. For each language, we can define recursion and induction principles that work directly on the higher-order syntax.

Coq source files are available here.

FoolProof: a component toolkit for abstract syntax with variable bindings

By Kees Hemerik, available from the Technical University of Eindhoven’s website:

FOOLPROOF is intended as a component toolkit for implementation of formal languages with binding structures. It provides a coherent collection of components for many common language processing tasks, in particular those related to binding structures. FOOLPROOF consists of: a meta-language for specifying signatures with variable bindings; a signature editor for constructing well-formed signatures; a small collection of interfaces for manipulating syntax trees and binding structures at various levels of detail; a set of generic components for processing syntax trees with binding structures, in particular for: copying, substitution, editing, matching, unification and rewriting; a generator which maps signature specifications to signature-specific classes. FOOLPROOF is being implemented in Object Pascal and will eventually take the form of a component library for the Delphi environment.

Nominal rewriting systems

Continuing the rewriting theme, here’s how to rewrite with binders using nominal techniques…

By Maribel Fernandez, Murdoch J. Gabbay and Ian Mackie, from PPDP 2004, available from Murdoch J. Gabbay’s website:

In this paper we present a generalization of first-order rewriting that allows us to deal with terms involving binding operations in an elegant and practical way. We use a nominal approach to binding, in which bound entities are explicitly named (rather than using a nameless syntax such as de Bruijn indices), yet we get a rewriting formalism which respects alpha-conversion and has good properties. This is achieved by adapting the powerful techniques for programming with binders developed by Pitts et al. in the FreshML project to the rewriting framework We extend the syntax of first-order terms with simple constructs to represent binding operations. Terms involving binders can be matched against terms naming bound variables explicitly. Nominal rewriting can be seen as higher-order rewriting with first-order syntax. We show that standard, first-order, rewriting is a particular case of nominal rewriting, and we also show that very expressive higher-order systems such as Klop’s Combinatory Reduction Systems can be easily defined as nominal rewriting. Finally we study confluence and termination properties of nominal rewriting. Keywords: variable binding, alpha-conversion, first and higher-order rewriting.

Term rewriting with variable binders: an initial algebra approach

By Makoto Hamana, from PPDP 2003, available from Makoto Haman’s website:

We present an extension of first-order term rewriting systems, which involves variable binding in the term language. We develop the systems called binding term rewriting systems (BTRSs) in a stepwise manner; firstly we present the term language, then formulate equational logic, and finally define rewrite systems by two styles: rewrite logic and pattern matching styles. The novelty of this development is that we follow an initial algebra approach in an extended notion of Σ-algebras in various functor categories. These are based on Fiore-Plotkin Turi’s presheaf semantics of variable binding and Lüth-Ghani’s monadic semantics of term rewriting systems. We characterise the terms, equational logic and rewrite systems for BTRSs are initial algebras in suitable categories. Finally, we discuss our design choice of BTRSs from a semantic perspective.

Abstracting syntax

By Brian Aydemir, Stephan A. Zdancewic and Stephanie Weirich, University of Pennsylvania Technical Report, available from the UPenn Tech Report website:

Binding is a fundamental part of language specification, yet it is both difficult and tedious to get right. In previous work, we argued that an approach based on locally nameless representation and a particular style for defining inductive relations can provide a portable, transparent, lightweight methodology to define the semantics of binding. Although the binding infrastructure required by this approach is straightforward to develop, it leads to duplicated effort and code as the number of binding forms in a language increases.

In this paper, we critically compare a spectrum of approaches that attempt to ameliorate this tedium by unifying the treatment of variables and binding. In particular, we compare our original methodology with two alternative ideas: First, we define variable binding in the object language via variable binding in a reusable library. Second, we present a novel approach that collapses the syntactic categories of the object language together, permitting variables to be shared between them.

Our main contribution is a careful characterization of the benefits and drawbacks of each approach. In particular, we use multiple solutions to the POPLMARK challenge in the Coq proof assistant to point out specic consequences with respect to the size of the binding infrastructure, transparency of the definitions, impact to the metatheory of the object language, and adequacy of the object language encoding.

Boxes go bananas: encoding higher-order abstract syntax with parametric polymorphism

By Geoffrey Washburn and Stephanie Weirich, from ICFP 2003, available from Stephanie Weirich’s website:

Higher-order abstract syntax is a simple technique for implementing languages with functional programming. Object variables and binders are implemented by variables and binders in the host language. By using this technique, one can avoid implementing common and tricky routines dealing with variables, such as capture avoiding substitution. However, despite the advantages this technique provides, it is not commonly used because it is difficult to write sound elimination forms (such as folds or catamorphisms) for higher-order abstract syntax. To fold over such a data type, one must either simultaneously define an inverse operation (which may not exist) or show that all functions embedded in the data type are parametric. In this paper, we show how first-class polymorphism can be used to guarantee the parametricity of functions embedded in higher-order abstract syntax. With this restriction, we implement a library of iteration operators over data structures containing functionals. From this implementation, we derive “fusion laws” that functional programmers may use to reason about the iteration operator. Finally, we show how this use of parametric polymorphism corresponds to the Schürmann, Despeyroux and Pfenning method of enforcing parametricity through modal types. We do so by using this library to give a sound and complete encoding of their calculus into System F. This encoding can serve as a starting point for reasoning about higher-order structures in polymorphic languages.

Contact details

To contact the site admin with questions or for paper submissions, please e-mail namebinding@gmail.com.  I’ve added a new link to the top of the page, with this information on, too.

Abstract syntax and variable binding

By Marcelo P. Fiore, Gordon Plotkin and Daniele Turi, from LICS 1999, available from the Edinburgh Research Archive website:

We develop a theory of abstract syntax with variable binding. To every binding signature we associate a category of models consisting of variable sets endowed with compatible algebra and substitution structures. The syntax generated by the signature is the initial model. This gives a notion of initial algebra semantics encompassing the traditional one; besides compositionality, it automatically verifies the semantic substitution lemma.

Nominal abstract syntax vs. higher-order abstract syntax

This blog post is a little different from the rest, as it considers two articles from the SIGACT Logic Column.  The first, Logic Column 14, written by Cheney, begins with a critique of higher-order abstract syntax (HOAS), and ends by arguing in favour of nominal abstract syntax, and nominal logic.  Crary and Harper respond in Logic Column 16, with a rebuttal of Cheney’s claims about HOAS, specifically the implementation of HOAS as seen in LF.

Higher-order abstract syntax uses an enriched meta-language, usually the simply typed lambda-calculus, in order to encode languages at the object level.  Name binders at the object level are implemented by binders at the meta-level, using the meta-level lambda.  For instance, a statement like forall x:Nat. p(x) could be encoded in HOAS by the term FORALL (lambda x. P x).  Here FORALL has type (Nat -> Prop) -> Prop, and P has type Nat -> Prop, with lambda being the meta-level binder.  This idea is very old, dating back to Church’s work on simple type theory and his encoding of the universal and existential quantifiers, but HOAS as we know it now was popularised by Pfenning and Elliott (see this prior blog post).

Higher-order abstract syntax has been highly successful: lambdaProlog, Twelf, Delphin, Beluga, LF etc. all use HOAS, and, in effect, at the time Cheney was writing, HOAS was still the “state-of-the-art”, and probably the most widely implemented solution to the “binding problem”. Nominal techniques, and nominal logic in particular, were still new on the scene.

Cheney identifies five areas where he believes HOAS is lacking.  His first and fourth critiques can be lumped together, under the heading “simplicity”:

Higher-order abstract syntax is based on complex semantic and algorithmic foundations (higher-order logic [17], recursive domain equations [77], higher-order unification [46]) so requires a fair amount of ingenuity to learn, implement and analyze [27, 43, 74].

Higher-order language encodings are often so different from their informal “paper” presentations that proving “adequacy” (that is, equivalence of the encoding and the real language) is nontrivial, and elegant-looking encodings can be incorrect for subtle reasons. Hannan [39] developed and proved partial correctness of a closure conversion translation in LF, but did not prove adequacy of the encoding; careful inspection suggests that it is not adequate. Abel [2] investigated an elegant and natural-seeming but inadequate third-order and less elegant but adequate second-order HOAS encoding of the λμ-calculus.

Both of these criticisms are interesting, as they feed directly into what many view as nominal techniques’ greatest advantages: a simple, set-based semantics, where names exist in the denotation independently, with tractable algorithmic content, and straightforward, intuitive encodings, with reasoning close to informal practice.

However, Crary and Harper disagree with the first claim:

All the metatheory of LF can be (and typically is) carried out in first-order logic; higher- order logic is not necessary.

The metatheory of LF is carried out using entirely syntactic means. There is no need to resort to any denotational semantics, including recursive domain equations.

While higher-order unification is used in the Twelf implementation [3] of LF to support type inference and proof search, it is not at all relevant to the methodology of encoding abstract syntax in LF. Moreover, even in regards to the Twelf implementation, experience suggests that understanding the higher-order unification algorithm is not essential; students who do are not familiar with it nevertheless have little difficulty in using Twelf.

And also with the fourth:

A fundamental part of the LF methodology is the notion of an adequate encoding. Syntactic terms and derivations of judgements are represented in LF as certain canonical forms, and the encoding is said to be adequate if there exists an isomorphism (in a sense that can be made precise) between the terms/derivations on the object language and the canonical forms that represent them. This provides a clear criterion for whether an encoding is correct or not.

The column notes that inadequate encodings have appeared in papers, and takes that fact as evidence that it is too hard to develop adequate encodings. To the contrary, we see the existence of incorrect encodings as merely the inevitable consequence of the existence of a correctness criterion.

To the best of our knowledge, nominal logic does not currently enjoy a precise criterion for correctness. Consequently, it is hard formally to justify the statement that anything at all has been encoded in nominal logic. When nominal logic is better understood, we expect that a correctness criterion will be developed for it as well. Once that happens, we expect that that criterion too will be used to identify incorrect encodings.

Cheney’s third and fifth critiques can also be lumped together, under the heading “expressivity”:

Variable names are “second class citizens”; they only represent other object-language expressions and have no existence or meaning outside of their scope. This complicates formalizing languages with generativity (for example, datatype names in ML), program logics with mutable variables such as Hoare logic [51] or dynamic logic [44] and translations such as closure conversion that rely on the ability to test for name-equality.

Higher-order abstract syntax is less expressive than first-order abstract syntax: it apparently cannot deal with situations involving “open terms” mentioning an indefinite number of free variables. For example, HOAS apparently cannot model the behavior of ML-style let-bound polymorphism as usually implemented [41], though a simulation is possible [40].

Again, the standard claim is that nominal techniques handle these features well.  Bound names in nominal encodings can be easily tested for (in)equality.  Indeed, a “standard example” (and by this, I mean an example used more than once!) in the nominal literature is an encoding of alpha-inequivalence, where bound names must be tested by inequality, precisely because it is thought hard to do with HOAS.

However, Crary and Harper disagree:

Moreover, taking alpha-equivalence as built-in does not preclude the isolation of variables in LF (which can be done easily using a hypothetical judgement). Consequently, it is not difficult to encode logics that deal with variables specially or that compare free variables for equality.

The column claims that HOAS cannot address ML-style let-polymorphism due to difficulties with open terms. This is not so. There are at least two perfectly satisfactory formulations of let-polymorphism in LF.

Finally, Cheney argues that using HOAS encodings pollutes the encoded object language with features inherited from the meta-language, whether desirable or not.  For instance, if the meta-language has a weakening property, then this is inherited in the object-language, and thus makes encoding substructural logics, like linear logic, inconvenient:

Properties of the metalanguage (such as weakening and substitution lemmas) are inherited by object languages, whether or not this is desirable; this necessitates modifications to handle logics and languages with unusual behavior. Examples include linear logic [10] and concurrency [86].

Yet, again, however, Crary and Harper disagree:

The column claims that structural properties of the meta-language must be inherited by the object language. This is simply untrue. For example, linear logic can be adequately represented in ordinary LF using a linearity judgement; one need not resort to a linear variant of LF.

A counter rebuttal to Crary and Harper wasn’t provided.  However, both techniques now quite happily live side-by-side.  Nominal implementations, such as Nominal Isabelle, are now extremely popular, and connections between the two approaches are being realised (see the work on higher-order patterns and nominal unification, as well as Gabbay and Hofmann’s nominal renaming sets).

Five axioms of alpha-conversion

By Andrew D. Gordon and Tom Melham, from TPHOLs 1996, available from Andrew D. Gordon’s website:

We present five axioms of name-carrying lambda-terms identified up to alpha-conversion—that is, up to renaming of bound variables. We assume constructors for constants, variables, application and lambda abstraction. Other constants represent a function Fv that returns the set of free variables in a term and a function that substitutes a term for a variable free in another term. Our axioms are (1) equations relating Fv and each constructor, (2) equations relating substitution and each constructor, (3) alpha-conversion itself, (4) unique existence of functions on lambda-terms defined by structural iteration, and (5) construction of lambda abstractions given certain functions from variables to terms. By building a model from de Bruijn’s nameless lambda-terms, we show that our five axioms are a conservative extension of HOL. Theorems provable from the axioms include distinctness, injectivity and an exhaustion principle for the constructors, principles of structural induction and primitive recursion on lambda-terms, Hindley and Seldin’s substitution lemmas and the existence of their length function. These theorems and the model have been mechanically checked in the Cambridge HOL system.

Next Page »