Functional Data Structures
<no subtitle>
# Why "Functional Data Structures"?
Functional programming (FP) can't solve complex problems efficiently without the underlying purely functional data structures (PFDs). With the right data structures, efficient solutions are possible in most scenarios. Functional data structures also allow concurrent usage, without having to introduce mutexes or needing to worry about concurrent mutation. They are a subset of lock-free data structures, though usually not called that, because locks and mutexes are relevant when using mutation, and in functional programming one tries to avoid that as much as feasible.
# Issues
Even though functional data structures are essential for implementing efficient solutions to problems using functional programming, it is often not easy to libraries for a given programming language that implement these data structures. When lacking such a library, it is often difficult to find any practically useful information, on how to implement them.
Often implementing such data structures requires reading academic papers or university lecture scripts, which are meant to go with explanation by a lecturer.
When finding code of already implemented data structures, often there are no comments and no explanations. The picture is radically different from naive imperative data structures for which one can find pseudocode on Wikipedia.
Looking at the academic side of things, I get the impression, that there are many algorithm and data structures lectures for CS students, but those lectures rarely contain anything about PFDs or FP. Usually, they only teach about the standard algorithms. A-star here, Dijkstra there, some graphs and tree algorithms, B++ trees or similar, but nothing about how one would implement such data structures in a purely functional way. I have studied at one of the highest prestige institutes in Germany and have never even heard about PFDs during my studies [1]. I also cannot find many lecture notes of other universities online, that hint at there being lectures about FP including, PFDs.
For example when I searched for information on purely functional AVL trees, I found lecture notes of a lecture at ETH Zürich, but not much else. The Wikipedia page about AVL trees does not mention "functional" or "persistent" at all. Neither does the Wikipedia page about self-balancing binary trees, of which the AVL tree is one. The page about red-black trees does mention "persistent data structures":
Red–black trees are also particularly valuable in functional programming, where they are one of the most common persistent data structures, used to construct associative arrays and sets that can retain previous versions after mutations. -- https://en.wikipedia.org/wiki/Red%E2%80%93black_tree#Applications_and_related_data_structures
# Situation in GNU Guile
Fortunately, there is a library for GNU Guile, which already has several purely functional data structures (PFDS, not to be confused with the file format PDF.) ready for use: guile-pfds by Ian J. Price. Bless him for doing all that work. This library has helped me many times:
- In my blog post about memoization in pure functions I describe how I use guile-pfds' balanced binary tree implementation as a basis for a functional lookup table implementation, which I then used in a purely functional implementation of a function that calculates the string edit distance or Levenshtein distance of 2 strings.
- I used guile-pfds times in my solutions for Advent of Code puzzles. It is the one most used library I keep using in such puzzle solutions, to stick to the functional programming paradigm. Without those data structures, my solutions might be inefficient, sometimes up to the point of becoming infeasible to calculate.
There are some other projects in the Guile ecosystem implementing one or the other data structure, but nothing as impactful as guile-pfds.
Unfortunately, the development of guile-pfds has stopped and without knowing a lot about functional data structures, it is very difficult to understand the inner workings of the implementations of the data structures in guile-pfds. What's more is, that the author Ian J. Price cannot be reached or remains unresponsive for whatever reason. The data structures don't rot or go bad, but the knowledge about their implementation needs to be rediscovered, in order to progress, or a new project needs to be set up, to further the GNU Guile ecosystem.
# My Efforts
Since I need PFDs in some projects in which I employ FP, and the data structures in guile-pfds are not as complete, as one might wish, and one day I want to be able to add useful data structures to the GNU Guile ecosystem, I started my own repository, in which I try to implement more data structures, or simply reimplement the ones I learn about in a very readable way. I want to understand how they work in detail and I want others to be able to understand my code, so that they are able to contribute or make their own.
The repository is at: guile-data-structures
A note at this point: My repository has way fewer things than Ian J. Price's guile-pfds repository and I am still learning about these data structures. I am not yet a Chris Okasaki, able to figure out on my own how one could design a PFD and write a book about them. I am still wondering how to even get the ideas, for how a traditional mutating data structure can be translated into a purely functional one. Hopefully, over time I will be able to understand it better.
# Weblinks
- If you think that functional data structures are merely interesting for users of mainly functional languages, or think that functional data structures cannot be efficient enough, I recommend watching the following tech talk: CppCon 2017: Juan Pedro Bolivar Puente "Postmodern immutable data structures".
- Here is the link to the exercises from a lecture at ETH Zürich: Computer-supported Modeling and Reasoning — Exercises and Solutions — (Isabelle 2004) which contains solution to the exercises about AVL trees, starting on page 135. (Man, they really had a lot of exercises in that lecture!)