devlog

Representing the AST in C

Dhruv Anand

May 2026 Part 2 of 2

The scanner was the easy part. Every character goes in, tokens come out — it is basically a state machine, and state machines in C are comfortable territory. The parser is where things get harder, because the parser produces a tree, and trees have nodes of different shapes.

In Java, Nystrom solves this with a class hierarchy. Every node type is a subclass of Expr. The visitor pattern lets you walk the tree without a giant switch statement. It is elegant, and it works because Java has inheritance.

C has no inheritance. So I had to figure out how to fake it.

The problem

Consider a binary expression like 1 + 2. The AST node for this needs to store an operator and two sub-expressions — the left and right operands. A literal like 42 just needs a value. A unary expression like -x needs an operator and one operand.

These are different shapes. In Java you make them different classes. In C, your options are:

  1. A giant struct with fields for everything, most of them unused for any given node type
  2. A tagged union — a struct with a type field and a union of the possible payloads
  3. Void pointers with manual casting (please no)

I went with the tagged union. It is the idiomatic C approach and it is actually not bad once you stop fighting it.

The implementation

First, the node types:

typedef enum {
  EXPR_BINARY,
  EXPR_GROUPING,
  EXPR_LITERAL,
  EXPR_UNARY,
} ExprType;

Then the payload structs for each shape:

typedef struct Expr Expr;

typedef struct {
  Expr* left;
  Token operator;
  Expr* right;
} BinaryExpr;

typedef struct {
  Expr* expression;
} GroupingExpr;

typedef struct {
  Token value;
} LiteralExpr;

typedef struct {
  Token operator;
  Expr* right;
} UnaryExpr;

And the tagged union that ties them together:

struct Expr {
  ExprType type;
  union {
    BinaryExpr   binary;
    GroupingExpr grouping;
    LiteralExpr  literal;
    UnaryExpr    unary;
  } as;
};

To construct a node, I have small helper functions:

Expr* newBinary(Expr* left, Token operator, Expr* right) {
  Expr* expr = malloc(sizeof(Expr));
  expr->type = EXPR_BINARY;
  expr->as.binary.left = left;
  expr->as.binary.operator = operator;
  expr->as.binary.right = right;
  return expr;
}

And to walk the tree, a switch on the type tag:

void printExpr(Expr* expr) {
  switch (expr->type) {
    case EXPR_BINARY:
      printf("(");
      printExpr(expr->as.binary.left);
      printf(" %.*s ", expr->as.binary.operator.length,
                       expr->as.binary.operator.start);
      printExpr(expr->as.binary.right);
      printf(")");
      break;
    case EXPR_LITERAL:
      printf("%.*s", expr->as.literal.value.length,
                     expr->as.literal.value.start);
      break;
    case EXPR_UNARY:
      printf("(%.*s", expr->as.unary.operator.length,
                      expr->as.unary.operator.start);
      printExpr(expr->as.unary.right);
      printf(")");
      break;
    case EXPR_GROUPING:
      printf("(group ");
      printExpr(expr->as.grouping.expression);
      printf(")");
      break;
  }
}

It is more verbose than the visitor pattern, but it is also completely transparent. There is no magic. You can read this and know exactly what is happening.

What I lost and what I gained

The visitor pattern in Java lets you add new operations (evaluate, print, resolve) without touching the node types. With the switch approach in C, every new operation means another function with another switch. That is more code, but it is also easier to grep and harder to get wrong.

What I gained is clarity about memory. Every Expr* is a heap allocation. I own it. Eventually I need to free it. Java would GC this quietly — in C, I will have to write a freeExpr function and call it at the right time. That is annoying but it is also the entire point of this exercise.

What is next

With the scanner producing tokens and the parser producing an AST, the next step is evaluation — actually walking the tree and computing a result. That means representing Lox values in C, which brings its own set of decisions.