Yet another S-expression crate?

I have been looking for Rust crates that can parse and serialize Lisp S-expressions recently, for use in a project that should be able to act as backend to an Emacs UI. I may go into details about that project at another time, but that topic is of no concern for this post.

I'm thinking about writing a new crate, since the existing crates in this area are (at least to me) unsatisifying. I'll start with a check-list of features I'd want from an S-expression crate, and then go into the reasons why the existing crates fall short in respect to these goals, while also presenting significant features I've discovered while skimming their documentation. At the end, I'll propose a new crate, based on code not published on crates.io, which I stumbled upon while searching for an S-expression crate with serde integration.

Check list

  • A data type that can represent a decent subset of the various S-expression formats in common use. Put in another way, it aims to model a large subset of the union of S-expression formats out there, covering:

    • Keywords
    • Symbols
    • Strings
    • Integers (excluding bignums, at least initially)
    • Floats
    • Proper lists
    • Improper lists (excluding circular lists)

    The goal is to also cover the notational differences between the S-expression formats, for example allowing for both Guile/Racket keyword notation (#:foo) and Emacs Lisp notation (:foo) by providing parser options.

  • A Rust macro that can be used to embed S-expressions into Rust code in a natural way, staying as close to "traditional" S-expression representation as possible within the syntactic constraints imposed by Rust's macro system.

  • A parser that can read arbitrary S-expressions from text and a serializer that can produce text from the S-expression data type. Pretty-printing and customizing the format for various S-expression "dialects" would be nice.

  • Support for serde serialization and deserialization. Due to the partial misalignment between Serde's data model and the S-expression data model, this will probably come with syntactic restrictions, but should still allow serializing and deserializing arbitrary Rust data types to and from S-expression syntax.

Survey of other crates

The following S-expression crates could be found on crates.io at the time of writing:

  • sexp, last updated in 2016. Seems to have a sensible data structure, but lacks:

    • Symbols
    • Keywords
    • Improper lists
    • 64-bit unsigned values
    • A macro for embedding S-expression in Rust
    • Serde support
  • atoms, forked from sexp, last updated in 2017. Points of critique:

    • It lacks keywords.
    • It introduces a distinction between "code" and "data" that is not present in the S-expression surface syntax. Quote (') and unquote (`````) are just syntactic sugar in a Lisp parser.
    • The macro provided for embedding S-expressions feels heavyweight and not close to regular S-expression syntax.
    • No Serde support.

    An interesting feature provided by atoms API is a customizable representation of symbols. It might be a worthwhile avenue to take this further and allow for reference types to represent the contents of symbols, keywords and strings. This would allow to avoid copying strings, and might make "interning" of symbols possible.

    The API for constructing S-expression values seems something worth copying.

  • symbolic_expressions, last updated 2017.

    • Has a very limited data type, just strings, lists and "empty".
    • No embedding macro.
    • No Serde support.
  • asexp, last updated 2016, lacks:

    • support for keywords and symbols,
    • an embedding macro,
    • and Serde support.

    This crate actually has reverse dependencies, so providing a similar API (or a compatibility layer) may make sense.

  • ess, last updated 2018, lacks:

    • keyword support,
    • an embedding macro,
    • and Serde support.

    This crate embeds location information in the S-expression data type. This is an interesting feature, but not one I aim for.

  • sexpr, last updated 2016. This crate is almost empty, and has no documentation whatsoever.

The sexpr crate not found on crates.io

In addition to the crates listed above, there is one codebase -- not the one with the same name found on crates.io -- that implements Serde support: sexpr. According to its documentation, its sounds like exactly what I was looking for, however:

  • The sexpr code comes with a lot of documentation, both in the README and API docs, but much of the example code is not working, as the corresponding implementation is incomplete or missing.

  • The sexpr code is centered around a serde serializer and deserializer implementation. Serde's data model and S-expressions are unfortunatly misaligned in several aspects, and hence the Serde mechanism can not be used as a complete S-expression parser or serializer.

    To be more specific, it cannot turn the text format as understood by various Lisp implementations into a "dynamically typed" (think Rust enums) S-expression data type, as the Serde data model has no direct mapping for improper lists, symbols, or keywords. In the other direction, there is no direct mapping of these "problematic" S-expression values to text via the Serde traits.

    However, when using static types (which Serde is primarily designed for), these mismatches can be dealt with in a "natural way", e.g. a Rust Map type can be turned to a Lisp "alist", and the other way around.

    My idea for approaching Serde support is the following:

    • Provide a fully-featured parser and formatter for S-expressions in the base crate, without serde integration.

    • Implement serde support (possibly in a companion crate) making use of parts of the full-featured parser and serializer.

  • The embedding macro is a macro-rules one-liner, which turns the macro argument into a string and runs it through the Serde-based parser. This could be made more efficient, and better at error-reporting, by implementing a procedural macro that directly operates on the token stream passed to it.