Optimizing a Grid Thing

(Note: All code of this article that I wrote is licensed AGPLv3 or later, and is free/libre software.)

I want to write about an optimization problem I once had in an Advent of Code puzzle, to which I found a good solution.

But first some background information. I want to describe my journey to learning a really cool trick in programming.

# The Chess Engine Crafty

There is an open source chess engine called "Crafty". Nowadays it is far from the strongest engine, but at it's time it was pretty strong and apparently it introduced some new techniques or concepts to the chess programming scene. It was mainly developed by Robert Hyatt, a computer scientist and professor, who used Crafty as a vehicle for teaching his students. [1]

An important aspect is, that the engine is open source. Its C source code can be downloaded on its website (though the issuer of the website's TLS certificate is not or no longer recognized). There are also some git repositories, which claim to have the code, though I didn't check them thoroughly. [2] At some point I took a look at its source code. I had read the term "bitboard" in other places before, but that is where I learned about how to bitboards for highly efficient computation. Such highly efficient computation as is necessary for getting the most out of chess engines, which often evaluate millions of positions per second.

# Bitboards

What are bitboards then? Bitboards are sequences of bits, which represent one specific aspect or attribute of things, possibly out of many different aspects or attributes.

Note, that bitboards are also often called "bitmaps" or "square sets" and that bitboards for specific purposes like masking a row or column, or rank or file, are often called "bit mask" or just "mask".

Taking chess as an example, one bitboard could represent the information, on which squares of the chess board there are pieces. Any piece, no matter which color. Color would be a separate aspect, a separate bitboard. Alternatively, one could represent only the white or only the black pieces each by one bitboard. Another bitboard could represent the information on which squares there are knights or the kings.

Lets take the starting position of a chess game as an example:

r	n	b	q	k	b	n	r
p	p	p	p	p	p	p	p




P	P	P	P	P	P	P	P
R	N	B	Q	K	B	N	R

In this example the black pieces are shown as lower case letters and the white pieces are shown as upper case letters. Assuming knowledge about this representing a chess board, which has a known width of 8 squares and assuming we always start with out bit sequence representation from top left of the board (usually where the square a8 is located), row by row, left to right, we could also write it as: rnbqkbnr-pppppppp-00000000-00000000-00000000-00000000-PPPPPPPP-RNBQKBNR or even less readably without the dashes: rnbqkbnrpppppppp00000000000000000000000000000000PPPPPPPPRNBQKBNR. [3]

This can be represented using the following bitmaps:

White pieces bitmap:

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

Or: 00000000-00000000-00000000-00000000-00000000-00000000-11111111-11111111.
Black pieces bitmap:

1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
Or: 11111111-11111111-00000000-00000000-00000000-00000000-00000000-00000000.
Pawns bitmap:

0 0 0 0 0 0 0 0
1 1 1 1 1 1 1 1
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
1 1 1 1 1 1 1 1
0 0 0 0 0 0 0 0
Or: 00000000-11111111-00000000-00000000-00000000-00000000-11111111-00000000.
Knights bitmap:

0 1 0 0 0 0 1 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 1 0 0 0 0 1 0
Or: 01000010-00000000-00000000-00000000-00000000-00000000-00000000-01000010.
Bishops bitmap:

0 0 1 0 0 1 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 1 0 0 1 0 0
Or: 00100100-00000000-00000000-00000000-00000000-00000000-00000000-00100100.
Rooks bitmap:

1 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 1
Or: 10000001-00000000-00000000-00000000-00000000-00000000-00000000-10000001.
Queens bitmap:

0 0 0 1 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 1 0 0 0 0
Or: 00010000-00000000-00000000-00000000-00000000-00000000-00000000-00010000.
Kings bitmap:

0 0 0 0 1 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 1 0 0 0
Or: 00001000-00000000-00000000-00000000-00000000-00000000-00000000-00001000.

We have all the information we need to reconstruct the compound information of the full chess board. We know where what type of piece is located on the board, and we can combine that information with the bitmaps that tell us, where white pieces and where black pieces are located, to find out what color a piece of a specific type has. Also we know where unoccupied or empty squares are. In fact, it turns out, that this representation of a chess board by 8 bitboards is called "Denser Board".

This is only one of many possible representations of the whole board via bitboards. Another idea is to have separate bitboards for black pieces and white pieces, so that the information where a specific white or black piece is, can be looked up by looking at 1 bitboard, instead of having to combine 2 bitboards. It comes at the cost of having to manage a few more bitboards.

But why should we do this? The benefit of representing a chess board position like this is explained as follows:

We can perform low level bit manipulation operations on these sequences of bits, to create updated versions of them. This can be done in very performant ways. For example:

Changing a specific bit to 0: If there is a 1 at the position, update using bb - 2^pos where bb is the current bitboard and pos is the position of the bit from least significant bit (LSB, in this case the square h1) to most significant bit (MSB, in this case the square a8).
Changing a specific bit to 1: If there is a 0 at the position, update using bb + 2^pos.
Get the bit position 1 square up, down, left or right of a square: Remember that the width of the chess board is 8 squares. Up: -8, down: +8, left: +1, right: -1. This is only the position, not the bitmask for a 1 at that position.
Get the bitmask for a 1 at a specific position: 2^pos or 1 leftshift pos.
Get the inverted bitmask: NOT bb, where NOT is the bitwise negation operation.
Get all pieces of a specific type and color: color_bb AND piece_bb, where AND is the bitwise and operation.

Generally speaking, we can derive more complex facts from simpler facts, by performing bit-level operations on these sequences of bits, always keeping in mind the conventions of a chess board's width and the reading from top to bottom, row by row, left to right. One could of course also choose another order. The important thing is to be consistent.

# Interpretation as numbers

These sequences of bits can be treated as integer numbers in the code of the program. For example lets write 10000000-00000000-00000000-00000000-00000000-00000000-00000100-00000101 as a table with the values of each position:

2⁶³	2⁶²	2⁶¹	...	2¹⁰	...	2²	2¹	2⁰
1	0	0	...	1	...	1	0	1

Left is the most significant bit (MSB) as the exponent of 2 is the highest (63) and on the right is the least significant bit, as the exponent is the lowest (0). The number that is represented by the sequence is then calculated: (+ (expt 2 63) (expt 2 10) (expt 2 2) (expt 2 0)) which is 9223372036854776837.

# Usage in FP

These numbers do not require any special data structure. The abstraction can happen exclusively in the procedures working with the numbers, as long as they all are consistent in their interpretation of the sequences of bits. Of course when they are written as a base 10 integer number, they are utterly unreadable.

The fact that we do not need any special data structure to represent a bitboard is very useful, when using a functional language. No need for functional arrays or something like that. Numbers are usually simply created anew, in non-functional as well as in functional languages, without there being an unavoidable performance penalty for the purely functional implementation.

# The Advent of Code puzzle

Now that we know about bitboards, lets take a look at the Advent of Code puzzle 2024, day 15:

The essence is:

A robot pushes boxes around inside a warehouse.
The robot moves up, down, left, right.
We got a list of moves the robot will try to make.
Movements of the robot can be blocked by boxes, if they cannot be moved, or walls.
There is a mechanism to calculate a sum of positions of all boxes.
The sum of positions of all boxes, after the robot has executed all movements given in the moves list, is the solution to the puzzle.

From this we can draw a few conclusions:

We probably need to simulate the movements of the robot, to learn the final positions of all the boxes.
Optional: Maybe there is some clever way to reduce the number of movements, before simulating the movements, but that might be more implementation overhead than it is worth. For example one could probably make opposite direction movements cancel each other out in some cases. I would not follow this idea prematurely, as it might not be needed at all, and would be an optimization.
Since the position of boxes can affect the robot in its movements, we will need to keep track of the positions of boxes while simulating the robot movements.

Additionally, I wanted to use GNU Guile to solve the puzzle. GNU Guile being a Scheme, encourages writing code in functional programming (FP) style. This means, that the perhaps obvious idea of using a mutable 2d-array to store the positions of the robot, walls, and boxes in, is out of the window.

I am no longer quite certain, what my first approach to solving the problem was. I remember it worked for the example input, but was too inefficient for the actual puzzle input. I think it was something along the lines of:

Keep track of the positions of all boxes in a list of boxes.
Keep track of the position of the robot.
For each move of the robot check, whether any boxes are in the way and how they would need to be moved, and whether they could be moved at all.

This is kind of inefficient, because for each move one would go through the list of all boxes, and if any box needs to be moved, go through the list again, to check whether a box could be moved or is blocked by other boxes and/or a wall.

# Bitboard solution

I needed another approach. Something more efficient. That's where the bitboards come into play.

As discussed in the introduction to bitboards above, each bitboard represents exactly one specific thing about the state of something. In the introduction it was an aspect of a chess game position on a chess board, ergo the pieces and their color, one bitboard for each type of piece and one bitboard for each color. In the case of this warehouse of boxes and the robot, we can identify the following aspects:

positions of boxes
position of the robot
positions of walls

So there are 3 bitboards to maintain, one for each of the aspects. Out of those 3 bitboards, we can reconstruct the whole state of the warehouse. As was also mentioned, bitboards can be interpreted as integer numbers. They are simply sequences of bits after all, and when we update a bitboard, we simply create a new one, a new number.

# Imports / Utilities

(use-modules
 ;; receive form
 (srfi srfi-8)
 ;; structs
 (srfi srfi-9)
 ;; for functional structs (not part of srfi-9 directly)
 (srfi srfi-9 gnu)
 ;; cut
 (srfi srfi-26)
 ;; integers as bits
 (srfi srfi-60)
 ;; hash tables
 (srfi srfi-69)
 (ice-9 pretty-print)
 ;; custom libs
 (aoc2024 fileio)
 (aoc2024 pipeline)
 (aoc2024 list-utils)
 (aoc2024 array-utils)
 (aoc2024 bit-integers)
 (aoc2024 timing))

Some quality of life improvements here:

The receive form from srfi-8, for receiving multiple values.
Since we want to do FP, we are also using functional structs from srfi-9 additions of GNU Guile.
The cut macro from srfi-26, which is one of my favorites, and gets probably as close to currying as possible in Scheme.
srfi-60 helps us in treating integers as sequences of bits.
srfi-69 using SRFI hash tables, instead of built-in hash tables, for better portability, in the imaginary case of ever porting the code to another Scheme dialect.
pretty-print as the name indicates, for pretty printing things.
Anything starting with aoc2024 is my own utility modules, accumulated useful functions from previous Advent of Code puzzles of 2024 and earlier:
- fileio for reading files in one function call.
- pipeline for a pipeline/threading macro ->, which makes code more readable and elegant.
- list-utils more list functions.
- array-utils array helper functions.
- bit-integers helper functions for treating integers as bits or bits as integers.
- timing helper functions for timing function calls, to see what performs how well.

With that, we are ready to get started.

# Puzzle input

The example input looks like this:

##########
#..O..O.O#
#......O.#
#.OO..O.O#
#..O@..O.#
#O#..O...#
#O..O..O.#
#.OO.O.OO#
#....O...#
##########

<vv>^<v^>v>^vv^v>v<>v^v<v<^vv<<<^><<><>>v<vvv<>^v^>^<<<><<v<<<v^vv^v>^
vvv<<^>^v^^><<>>><>^<<><^vv^^<>vvv<>><^^v>^>vv<>v<<<<v<^v>^<^^>>>^<v<v
><>vv>v^v^<>><>>>><^^>vv>v<^^^>>v^v^<^^>v^^>v^<^v>v<>>v^v^<v>v^^<^^vv<
<<v<^>>^^^^>>>v^<>vvv^><v<<<>^^^vv^<vvv>^>v<^^^^v<>^>vvvv><>>v^<<^^^^^
^><^><>>><>^^<<^^v>>><^<v>^<vv>>v>>>^v><>^v><<<<v>>v<v<v>vvv>^<><<>^><
^>><>^v<><^vvv<^^<><v<<<<<><^v<<<><<<^^<v<^^^><^>>^<v^><<<^>>^v<v^v<v^
>^>>^v>vv>^<<^v<>><<><<v<<v><>v<^vv<<<>^^v^>^^>>><<^v>>v^v><^^>>^<>vv^
<><^^>^^^<><vvvvv^v<v<<>^v<v>v<<^><<><<><<<^^<<<^<<>><<><^^^>^^<>^>v<>
^^>vv<^v^v<vv>^<><v<^v>^^^>>>^^vvv^>vvv<>>>^<^>>>>>^<<^v>^vvv<>^<><<v>
v^^>>><<^^<>>^v^<v^vv<>v^<<>^<^v^v><^<<<><<^<v><v<>vv>>v><v^<vv<>v^<<^

So there are 2 things in the input. The warehouse:

wall: #
box: O
robot: @
empty space: .

and the robot movements:

right: >
left: <
up: ^
down: v

We want to process them separately, so lets go ahead and get the warehouse data and the movements into separate data structures:

(define-values (field-strings inputs)
  (receive (field inputs)
      (split-at (get-lines-from-file "input")
                (λ (line) (string-null? line)))
    (values field
            (-> (string-join inputs "")
                string-trim-right
                string->list))))

Using define-values, we define 2 values, the field, which is the warehouse state, and inputs which are the robot movements. We use receive to get the whole input file split-at the empty string, which satisfies the predicate string-null? and we make use of get-lines-from-file, which is from my helper module fileio. With field and inputs defined, we employ ->, the pipeline macro from my pipeline helper module, to string-join the lines of the robot movement input and string-trim-right, to get one long string of robot movements and then turn that into a list of characters using string->list.

Remember how I said mutable 2d-arrays are out of the window? Well, not totally. We can still use them, but only if we don't mutate them. What we do next is:

(define grid (-> field-strings
                 (map string->list)
                 (list->array 2)))
(define rows (array-len-in-dim grid 0))
(define cols (array-len-in-dim grid 1))

# Bitboard creation

We convert the list of strings that are the warehouse state into a nested list of characters, by mapping each line with string->list and then turning the nested list into an array using list->array. Then it is trivial to store the number of rows and cols by getting array lengths in dimension 0 and 1. The result will be a 2d-array containing characters representing the warehouse state (., #, O, @).

Having the state as an array allows for accessing in constant O(1) runtime. We iterate over this array and create bitboards, without needing any mutation of the array. Here is the function, that we use for that:

(define grid->bit-integer
  (λ (grid pred indices->position)
    "Create a bit integer from the given array GRID, which has
bits set in all positions for which the corresponding value
of ARR satisfies PRED.

Assumes that GRID is a 2d-array."
    (let* ([shape (list->vector (array-shape grid))]
           [max-dim (- (vector-length shape) 1)])
      (let iter ([result° 0] [indices° #(0 0)])
        (cond
         [indices°
          (let ([position (indices->position indices°)]
                [next-index (array-next-index shape indices° max-dim)])
            (cond
             [(pred (array-cell-ref-vec grid indices°) indices°)
              (iter (set-bit-at-pos result° position) next-index)]
             [else
              (iter result° next-index)]))]
         [else
          result°])))))

The function grid->bit-integer takes the following arguments:

grid, a 2d-array
a predicate pred, whose boolean return value determines at which positions in the bitboard are 1s and 0s.
a function indices->position for translating indices of the array into corresponding "positions" in the bitboard.

grid->bit-integer uses a named let and recursion to iterate through the array. array-next-index is a helper function, that returns the next vector of indices into the array, based on the array shape the previous indices and maximum dimension of the array. This is very convenient, because grid->bit-integer does not itself need to contain any logic for determining the next indices.

In the recursive call (iter (set-bit-at-pos result° position) next-index) set-bit-at-pos is used to functionally update the bitboard result°. result° is never mutated. Instead the new version of it is passed in the recursive call, together with the next array indices next-index.

We need an actual implementation for the functions for converting from row and column indices to positions in the bitboard, which will be used as indices->position argument to grid->bit-integer. To keep things simple, we are sticking to typical row-major order of indices:

(define row-major-2d-indices->position
  (λ (inds)
    "Convert indices of row-major order into a position of the
corresponding bit in a bitmap, mapping lowest indices to the
least significant bits."
    (let ([row (vector-ref inds 0)]
          [col (vector-ref inds 1)]
          [max-pos (- (* rows cols) 1)])
      (- max-pos
         (+ (* row cols) col)))))

This means the first index will be the row ("row-major"), and the second index will be the column. Also we define the reverse function, that returns the indices, given a position in the bitboard:

(define position->row-major-2d-indices
  (λ (pos)
    (receive (r c) (euclidean/ pos cols)
      ;; The higher the position, the further left (most
      ;; significant bit) in the integer we go. A higher
      ;; position should be a lower row index and a lower
      ;; position should be a higher row index, visually
      ;; further down.

      ;; Same goes for columns. A higher position must be
      ;; further left, so we substract from the number of
      ;; available columns.
      (vector (- rows r 1)
              (- cols c 1)))))

In order to refer to characters for walls (blockers), boxes (crates), the robot, and empty spaces more readably, we also define the following:

(define empty-space-char #\.)
(define crate-char #\O)
(define blocker-char #\#)
(define robot-char #\@)

Then we are ready to define the bitboards as follows:

(define robot-bitmap
  (grid->bit-integer grid
                     (λ (elem _indices) (char=? elem robot-char))
                     row-major-2d-indices->position))

(define crates-bitmap
  (grid->bit-integer grid
                     (λ (elem _indices) (char=? elem crate-char))
                     row-major-2d-indices->position))

(define blockers-bitmap
  (grid->bit-integer grid
                     (λ (elem _indices) (char=? elem blocker-char))
                     row-major-2d-indices->position))

# Displaying bitboards

It also makes sense to have a procedure for displaying bitboards in a readable way, as grid, instead of one long sequence of bits:

(define display-bitmap-ex
  (lambda* (bitmap #:key (translate id))
    (display-bitmap bitmap
                    (* rows cols)
                    cols
                    #:translate translate)))

display-bitmap-ex is named -ex because it extends an existing display-bitmap function of the bit-integers module. It provides a simpler interface, relying on rows and cols being defined in the module that display-bitmap-ex is defined in. It also makes use of the identity function id:

(define id (λ (x) x))

For displaying bitboards properly, it would be good to not merely display 0s and 1s, but the actual characters, which we know from the puzzle description, as well. For this purpose we define a few translation functions:

(define crates-translate-bits
  (λ (elem row-ind col-ind)
    (cond
     ;; optional conditions
     [(= row-ind 0) blocker-char]
     [(= row-ind (- rows 1)) blocker-char]
     [(= col-ind 0) blocker-char]
     [(= col-ind (- cols 1)) blocker-char]
     ;; mandatory conditions for correct display
     [(= elem 0) empty-space-char]
     [(= elem 1) crate-char]
     [else elem])))

(define blockers-translate-bits
  (λ (elem row-ind col-ind)
    (cond
     [(= elem 0) empty-space-char]
     [(= elem 1) blocker-char]
     [else elem])))

(define robot-translate-bits
  (λ (elem row-ind col-ind)
    (cond
     [(= elem 0) empty-space-char]
     [(= elem 1) robot-char]
     [else elem])))

These will be used by display-bitmap-ex, for the keyword argument translate.

Lets see how our bitboards look for the initial warehouse state. The example puzzle input for the warehouse looks like this:

##########
#..O..O.O#
#......O.#
#.OO..O.O#
#..O@..O.#
#O#..O...#
#O..O..O.#
#.OO.O.OO#
#....O...#
##########

10 rows, 10 columns, a 10x10 grid from which we derive the following bitboards:

Walls bitboard:

INPUT:      WALLS:      BOXES:      EMPTY:
##########  1111111111  0000000000  0000000000
#..O..O.O#  1000000001  0001001010  0110110100
#......O.#  1000000001  0000000100  0111111010
#.OO..O.O#  1000000001  0011001010  0100110100
#..O@..O.#  1000000001  0001000100  0110011010
#O#..O...#  1010000001  0100010000  0001101110
#O..O..O.#  1000000001  0100100100  0011011010
#.OO.O.OO#  1000000001  0011010110  0100101000
#....O...#  1000000001  0000010000  0111101110
##########  1111111111  0000000000  0000000000

The bitboards are bigger than the ones of a chess board, but that shall not stop us.

# Get relative positions

Readability is key, and since the robot movement simulation will move the robot in any of the 4 directions often, and we need to refer to movements in the code, we also create functions for determining the positions of the bits in the bitboards, that are 1 move up, down, left, or right of any given position, for better readability:

(define pos-relative-up (λ (pos) (+ pos cols)))
(define pos-relative-down (λ (pos) (- pos cols)))
(define pos-relative-left (λ (pos) (+ pos 1)))
(define pos-relative-right (λ (pos) (- pos 1)))

Actually, defining all these little functions, what we are doing is building an abstraction layer around the low level details of bitboards. When we calculate the position to the right of some position in the bitboard, we could be at the right edge and then there is no position that should be considered "right" of the current position. Considering this, the above defined functions for determining the positions up, down, left, right relative to the current position are naive. What we actually need are functions, that will not return a bogus position, but will indicate, when there is no position in a specific direction. These are defined as follows:

(define pos-relative-up-safe
  (λ (pos)
    (let ([up (pos-relative-up pos)])
      (if (and (same-col? pos up) (not (negative? up)))
          up
          #f))))

(define pos-relative-down-safe
  (λ (pos)
    (let ([down (pos-relative-down pos)])
      (if (and (same-col? pos down)
               (< (receive (q r) (euclidean/ down cols) q)
                  rows))
          down
          #f))))

(define pos-relative-left-safe
  (λ (pos)
    (let ([left (pos-relative-left pos)])
      (if (and (same-row? pos left) (not (negative? left)))
          left
          #f))))

(define pos-relative-right-safe
  (λ (pos)
    (let ([right (pos-relative-right pos)])
      (if (and (same-row? pos right))
          right
          #f))))

These will return the position in the specific direction or #f, if there is no such position.

# Find robot position

To determine the robot starting position, we do the following:

(define robot? (λ (elem) (char=? elem robot-char)))

(define robot-start-pos
  (row-major-2d-indices->position
   (array-index-of grid robot?)))

array-index-of is a helper function from my array-utils module, which returns the first indices, where the predicate robot? is satisfied.

# Find empty spaces for movement of boxes

The description of the puzzle states, that a box can only be moved, if there is no wall in that direction, or other boxes, which are themselves blocked by a wall in that direction. There needs to be some empty space, behind the box(es) so that they can be moved. To determine whether such an empty space exists we will need a function:

(define find-empty-space-in-direction
  (λ (robot-pos direction crates-bitmap blockers-bitmap)
    "Look for an empty space moving from the ROBOT-POS in the
given DIRECTION.

DIRECTION is expected to be a function that given a position
returns the next position relative to the one given safely.
Safely meaning that it will return false, if the next
position would be out of bounds."
    (let ([empty-space-bitmap
           (bitwise-not
            (bitwise-ior crates-bitmap blockers-bitmap))])
      (let iter ([next-pos (direction robot-pos)])
        (cond
         [next-pos
          (cond
           ;; Empty space found? Then a move in the
           ;; direction is possible.
           [(= (get-bit-value-at-pos empty-space-bitmap next-pos) 1)
            next-pos]
           ;; Is there a blocker at the position? Then no
           ;; move in the direction is possible.
           [(= (get-bit-value-at-pos blockers-bitmap next-pos) 1)
            #f]
           ;; Otherwise there is a crate at the position
           ;; and we need to check the next position.
           [else (iter (direction next-pos))])]
         [else #f])))))

find-empty-space-in-direction takes the position of the robot, robot-pos, the direction in which the boxes shall be moved, which is one of pos-relative-up-safe, pos-relative-down-safe, pos-relative-left-safe, and pos-relative-right-safe, the bitboard of boxes crates-bitmap and the bitboard of walls blockers-bitmap. It returns #f, if there is no empty space in the specified direction.

We also define a few predicates to check whether there is a box, a wall or nothing at a given bitboard position:

(define crate-at-position?
  (λ (pos crates-bitmap)
    (= (get-bit-value-at-pos crates-bitmap pos) 1)))

(define blocker-at-position?
  (λ (pos blockers-bitmap)
    (= (get-bit-value-at-pos blockers-bitmap pos) 1)))

(define empty-at-position?
  (λ (pos crates-bitmap blockers-bitmap)
    (= (get-bit-value-at-pos (bitwise-ior blockers-bitmap crates-bitmap)
                             pos)
       0)))

These simply take the corresponding bitboard and the position and use bitboard helper functions to check the condition.

# Moving boxes

We will need to also have a function for actually "moving" a box/crate to simulate the robot movements:

(define move-crate
  (λ (pos-old pos-new crates-bitmap)
    (-> crates-bitmap
        ((cut unset-bit-at-pos <> pos-old))
        ((cut set-bit-at-pos <> pos-new)))))

And a function for moving the robot, which checks for walls and whether boxes/crates can be moved, so that a robot movement can be executed, or if it cannot, be ignored/skipped:

(define move
  (λ (robot-pos direction crates-bitmap blockers-bitmap)
    "Move the robot into a given DIRECTION, if possible."
    (let ([next-pos (direction robot-pos)])
      (cond
       [(blocker-at-position? next-pos blockers-bitmap)
        (values robot-pos crates-bitmap)]
       ;; If the next position is empty, then simply go
       ;; there. Crates stay the same.
       [(empty-at-position? next-pos crates-bitmap blockers-bitmap)
        (values next-pos crates-bitmap)]
       ;; If there is a crate, move it to the next free
       ;; spot, by calculating a new crates-bitmap.
       [else
        (let ([next-free-pos
               (find-empty-space-in-direction robot-pos
                                              direction
                                              crates-bitmap
                                              blockers-bitmap)])
          (cond
           ;; If there is a next free position in that
           ;; direction, then we move the crate there and
           ;; move the robot to where the crate was.
           [next-free-pos
            (values next-pos
                    (move-crate next-pos
                                next-free-pos
                                crates-bitmap))]
           ;; Otherwise do not change anything.
           [else (values robot-pos crates-bitmap)]))]))))

move takes as arguments:

the robot position in the bitboards robot-pos
the direction in which to move according to the movement instructions found in the puzzle input
the bitboard containing the information about the boxes/crates crates-bitmap
the bitboard containing the information about the walls/blockers blockers-bitmap

The direction is actually a function, not one of the direction indicating characters ^, v, <, >. To call move we still need to translate from those characters of the puzzle input, to the actual direction functions. We do this as follows:

(define direction-table
  (alist->hash-table
   `((#\^ . ,pos-relative-up-safe)
     (#\v . ,pos-relative-down-safe)
     (#\< . ,pos-relative-left-safe)
     (#\> . ,pos-relative-right-safe))))

(define move-char->direction
  (λ (char)
    (hash-table-ref direction-table char)))

We use a non-functional hash table, but just like we did with the array initially, we only initialize the hash table, and then perform only lookups. We never mutate it. This usage is acceptable, when trying to stay in the realm of functional programming.

# Calculating a score

Since the puzzle requires to calculate a sum of weighted coordinates of box positions, we also need to implement that. We do this using the score function, which we define as follows:

(define score
  (λ (crates-bitmap)
    (simple-format #t "calculate score of crates:\n")
    (display-bitmap-ex crates-bitmap
                       #:translate crates-translate-bits)
    (let ([max-pos (- (* rows cols) 1)])
      (let iter-pos ([pos° 0] [result° 0])
        (cond
         [(<= pos° max-pos)
          (let ([val (get-bit-value-at-pos crates-bitmap pos°)])
            (cond
             [(= val 1)
              (let* ([indices (position->row-major-2d-indices pos°)]
                     [row-ind (vector-ref indices 0)]
                     [col-ind (vector-ref indices 1)])
                (iter-pos (+ pos° 1)
                          (+ result°
                             (+ (* row-ind 100)
                                col-ind))))]
             [else (iter-pos (+ pos° 1)
                             result°)]))]
         [else result°])))))

This score function iterates through the crates-bitmap, and calculates the weighted coordinates for each of the boxes. Looks a bit verbose still, and could probably be split up, refactored a bit.

To be able to calculate the score using score, the last thing we need to do is to actually simulate the robot movements, iterating through the robot movement inputs. When the last movement is processed, we will have the final position of all the boxes, and can run score. The result will be the solution of the puzzle. Here is how we do it:

(define solution
  (-> (let iter-inputs ([robot-pos° robot-start-pos]
                        [inputs° inputs]
                        [crates-bitmap° crates-bitmap])
        (cond
         [(null? inputs°)
          (simple-format
           #t "robot at: ~s\n"
           (position->row-major-2d-indices robot-pos°))
          (simple-format #t "grid of crates:\n")
          (display-bitmap-ex crates-bitmap°
                             #:translate crates-translate-bits)
          crates-bitmap°]
         [else
          (let ([direction (move-char->direction (car inputs°))])
            (receive (next-robot-pos next-crates-bitmap)
                (move robot-pos°
                      direction
                      crates-bitmap°
                      blockers-bitmap)
              (iter-inputs next-robot-pos
                           (cdr inputs°)
                           next-crates-bitmap)))]))
      score))

(simple-format #t "score: ~s\n" solution)

# Result

The time has come to check, whether implementing this on top of bitboards has been worth it. We need to time things.

Instead of once defining solution at the top level of the program, we define a solve function, which we can call many times to solve the puzzle. We also comment out other display... and simple-format calls, to avoid having the program slow down due to excessive printing.

(define solve
  (λ ()
    (-> (let iter-inputs ([robot-pos° robot-start-pos]
                          [inputs° inputs]
                          [crates-bitmap° crates-bitmap])
          (cond
           [(null? inputs°) crates-bitmap°]
           [else
            (let ([direction (move-char->direction (car inputs°))])
              (receive (next-robot-pos next-crates-bitmap)
                  (move robot-pos° direction crates-bitmap° blockers-bitmap)
                (iter-inputs next-robot-pos (cdr inputs°) next-crates-bitmap)))]))
        score)))

We can now time our solve function call by using the measure-time function from the imported timing helper module as follows:

(measure-time (λ () (solve)) #:iterations 1000)

measure-time expects a thunk as an argument, which it will internally call. Note, that timing is easy, because our program does not have any side effects. If there were many different warehouses with different robots and boxes arrangements, we could effortlessly parallelize calculating the solutions to all of them, because no part of our program would interfere with the state of any other concurrent calculations.

On my current machine this finishes with:

21.686015 time units

So it took roughly 22 seconds to solve the puzzle 1000 times.

If I just run once and use the time to measure the time the whole initialization of GNU Guile and running the program takes, I get the following result, which matches what one would expect assuming insignificant initialization overhead:

real    0m0.218s
user    0m0.292s
sys     0m0.131s

# A performance concern

There was one concern I had when implementing this and that was the following:

Bitboards for chess are usually have 64 bits, as there are 64 squares on a chess board. That is neatly contained in one 64 bit unsigned integer. Calculations with such numbers are insanely fast on modern CPUs, as they are heavily optimized for. Would bitboards still be fast, if there are more bits?

Someone not familiar with Scheme might even be worried about number type range overflows, but fortunately in Scheme we have integers of arbitrary size and do not need to worry about that. However, the bitboards are much bigger in our case.

We can still solve the puzzle a whopping 1000 times in 22s, using a single CPU core, or finish running once in some 0.02s, so I think the concern I had can be laid to rest.

# Outlook

Even though the bitboard approach proved to be quite performant, the approach is actually still optimizable. For example maybe there is a way to improve the function find-empty-space-in-direction to run faster. I imagine, if one had bitboard for each position, having all bits set in one of the directions for all of the directions, so called ray masks, and used those in a clever way, maybe one could figure out, whether crates on these rays are movable into the direction. I don't know whether there is an algorithm like that, I am just guessing at this point.

I asked LLMs a little and got told, that this might be a well known problem from chess programming (of course), named occluded ray fill or some variation of it. However, I couldn't get the LLM to give me a faster algoritm than find-empty-space-in-direction, for the case of having one single robot pushing crates.

# Conclusions

There are a few things to note about the implementation.

Functional programming: At no point in the whole implementation did we write any classes. Our data, the state of the warehouse and the robot position, live separately from the behavior, the functions, that deal with creating updated versions of the data. We implement functions that take data and other functions as input and return data as output.
Building blocks: Our building blocks are not classes, objects and mutation. Our building blocks are merely functions and data. Or in other words, we strive to define things in terms of calculations rather than actions.
Global state and immutability: None of the functions writes any global state. At most we define and initialize immutable global state once and then never change it. If we somehow managed to parallelize parts of this program, we would face no concurrency issues whatsoever, because no part of the program mutates anything, that another concurrent unit of execution could see or be interested in. Code written in this style is often trivially parallelizable, when the problem at hand is not inherently sequential. Therefore using this approach we keep our options for using multiple cores open.
Locality: Most functions rely exclusively on their arguments. Some of them also read immutable global state. We could make even the immutable global state disappear, by adding a few more arguments to the functions, which read global state. This style of writing functions makes the code easier to understand, because one only has to look at a single, pure function, and can understand it, without looking at other parts of the code. This wouldn't be possible, if there was some global state that was mutated and relied upon by various parts of the code.
Testability: The code makes it trivial to write tests for it. Simple calls of functions, input, output, check result, done little to no mocking is needed here.
Elegant code: Each of the functions by itself is kind of simple and short. Some have a few more lines, but that is due to code formatting and comments, and a few lines of debug output could also still be removed. The code is easy to follow and read, provided one is familiar with Scheme or GNU Guile.
Efficiency: We managed to stick to the functional programming approach, and as we can see, FP can solve complex problems efficiently.

# Footnotes

[1] Man, do I wish my teachers and professors had used something as interesting as chess. I am a chess player after all, and computer programming is my passion. That would have combined 2 interests of mine into one activity. I find this to be a highly motivating approach to learn about a topic. For example Xiaolong Dictionary uses this approach, combining learning Mandarin on one hand and computer programming on the other hand, while actually learning about tkinter and a bunch of other things using Python.

[2]	lazydroid/crafty-chess, MartinMSPedersen/Crafty-Chess

[3]	In fact, this notation is similar to FEN notation, which is an optimized notation expressing the this information and some more, like castling rights and en passant. For more detail see the definition on Wikipedia.

Disclaimer: This is a link to an external website. That website contains contents not created by the author of this website. The author is not responsible for content, statements, or expressed opinions of others appearing on the linked website. It is in the nature of the external websites, that what others create on the external website is not under the author's control. Furthermore the author does expressedly not subscribe to, accept, or adapt as their own view any content, statements, opinions or ideas of others on the linked website. The author is not liable for linked external content, which was not created by the author themselves.

Author:	Zelphir Kaltstahl
Last modified:	2026-06-11

Hans-Werner
Roitzsch