зеркало из
https://github.com/iharh/notes.git
synced 2025-10-30 21:26:09 +02:00
273 строки
10 KiB
Plaintext
273 строки
10 KiB
Plaintext
http://apocalisp.wordpress.com/2009/08/21/structural-pattern-matching-in-java/
|
||
|
||
One of the great features of modern programming languages is structural pattern matching on algebraic data types.
|
||
Once you’ve used this feature, you don’t ever want to program without it. You will find this in languages like Haskell and Scala.
|
||
|
||
In Scala, algebraic types are provided by case classes. For example:
|
||
|
||
sealed trait Tree
|
||
case object Empty extends Tree
|
||
case class Leaf(n: Int) extends Tree
|
||
case class Node(l: Tree, r: Tree) extends Tree
|
||
|
||
To define operations over this algebraic data type, we use pattern matching on its structure:
|
||
|
||
def depth(t: Tree): Int = t match {
|
||
case Empty => 0
|
||
case Leaf(n) => 1
|
||
case Node(l, r) => 1 + max(depth(l), depth(r))
|
||
}
|
||
|
||
When I go back to a programming language like, say, Java, I find myself wanting this feature. Unfortunately, algebraic data types aren't provided in Java.
|
||
However, a great many hacks have been invented over the years to emulate it, knowingly or not.
|
||
|
||
|
||
|
||
The Ugly: Interpreter and Visitor
|
||
|
||
What I have used most throughout my career to emulate pattern matching in languages that lack it are a couple of hoary old hacks.
|
||
These venerable and well respected practises are a pair of design patterns from the GoF book: Interpreter and Visitor.
|
||
|
||
The Interpreter pattern really does describe an algebraic structure, and it provides a method of reducing (interpreting) the structure.
|
||
However, there are a couple of problems with it.
|
||
The interpretation is coupled to the structure, with a "context" passed from term to term, and each term must know how to mutate the context appropriately.
|
||
That's minus one point for tight coupling, and minus one for relying on mutation.
|
||
|
||
The Visitor pattern addresses the former of these concerns.
|
||
Given an algebraic structure, we can define an interface with one "visit" method per type of term, and have each term accept a visitor object that implements this interface,
|
||
passing it along to the subterms.
|
||
This decouples the interpretation from the structure, but still relies on mutation.
|
||
Minus one point for mutation, and minus one for the fact that Visitor is incredibly crufty.
|
||
For example, to get the depth of our tree structure above, we have to implement a TreeDepthVisitor.
|
||
A good IDE that generates boilerplate for you is definitely recommended if you take this approach.
|
||
|
||
On the plus side, both of these patterns provide some enforcement of the exhaustiveness of the pattern match.
|
||
For example, if you add a new term type, the Interpreter pattern will enforce that you implement the interpretation method.
|
||
For Visitor, as long as you remember to add a visitation method for the new term type to the visitor interface, you will be forced to update your implementations accordingly.
|
||
|
||
|
||
|
||
The Bad: Instanceof
|
||
|
||
An obvious approach that's often sneered at is runtime type discovery. A quick and dirty way to match on types is to simply check for the type at runtime and cast:
|
||
|
||
public static int depth(Tree t) {
|
||
if (t instanceof Empty)
|
||
return 0;
|
||
if (t instanceof Leaf)
|
||
return 1;
|
||
if (t instanceof Node)
|
||
return 1 + max(depth(((Node) t).left), depth(((Node) t).right));
|
||
throw new RuntimeException("Inexhaustive pattern match on Tree.");
|
||
}
|
||
|
||
There are some obvious problems with this approach.
|
||
For one thing, it bypasses the type system, so you lose any static guarantees that it's correct.
|
||
And there's no enforcement of the exhaustiveness of the matching.
|
||
But on the plus side, it's both fast and terse.
|
||
|
||
|
||
|
||
The Good: Functional Style
|
||
|
||
There are at least two approaches that we can take to approximate pattern matching in Java more closely than the above methods.
|
||
Both involve utilising parametric polymorphism and functional style.
|
||
Let's consider them in order of increasing preference, i.e. less preferred method first.
|
||
|
||
|
||
Safe and Terse - Disjoint Union Types
|
||
|
||
The first approach is based on the insight that algebraic data types represent a disjoint union of types.
|
||
Now, if you've done any amount of programming in Java with generics, you will have come across (or invented) the simple pair type, which is a conjunction of two types:
|
||
|
||
public abstract class P2<A, B> {
|
||
public A _1();
|
||
public B _2();
|
||
}
|
||
|
||
A value of this type can only be created if you have both a value of type A and a value of type B. So (conceptually, at least) it has a single constructor that takes two values.
|
||
The disjunction of two types is a similar idea, except that a value of type Either<A, B> can be constructed with either a value of type A or a value of type B:
|
||
|
||
public final class Either<A, B> {
|
||
...
|
||
public static <A, B> Either<A, B> left(A a) { ... }
|
||
public static <A, B> Either<A, B> right(B a) { ... }
|
||
...
|
||
}
|
||
|
||
Encoded as a disjoint union type, then, our Tree data type above is: Either<Empty, Either<Leaf, Node>>
|
||
|
||
Let's see that in context. Here's the code.
|
||
|
||
public abstract class Tree {
|
||
// Constructor private so the type is sealed.
|
||
private Tree() {}
|
||
|
||
public abstract Either<Empty, Either<Leaf, Node>> toEither();
|
||
|
||
public static final class Empty extends Tree {
|
||
public <T> T toEither() {
|
||
return left(this);
|
||
}
|
||
|
||
public Empty() {}
|
||
}
|
||
|
||
public static final class Leaf extends Tree {
|
||
public final int n;
|
||
|
||
public Either<Empty, Either<Leaf, Node>> toEither() {
|
||
return right(Either.<Leaf, Node>left(this));
|
||
}
|
||
|
||
public Leaf(int n) { this.n = n; }
|
||
}
|
||
|
||
public static final class Node extends Tree {
|
||
public final Tree left;
|
||
public final Tree right;
|
||
|
||
public Either<Empty, Either<Leaf, Node>> toEither() {
|
||
return right(Either.<Leaf, Node>right(this));
|
||
}
|
||
|
||
public Node(Tree left, Tree right) {
|
||
this.left = left; this.right = right;
|
||
}
|
||
}
|
||
}
|
||
|
||
The neat thing is that Either<A, B> can be made to return both Iterable<A> and Iterable<B> in methods right() and left(), respectively.
|
||
One of them will be empty and the other will have exactly one element.
|
||
So our pattern matching function will look like this:
|
||
|
||
public int depth(Tree t) {
|
||
Either<Empty, Either<Leaf, Node>> eln = t.toEither();
|
||
for (Empty e: eln.left())
|
||
return 0;
|
||
for (Either<Leaf, Node> ln: t.toEither().right()) {
|
||
for (leaf: ln.left())
|
||
return 1;
|
||
for (node: ln.right())
|
||
return 1 + max(depth(node.left), depth(node.right));
|
||
}
|
||
throw new RuntimeException("Inexhaustive pattern match on Tree.");
|
||
}
|
||
|
||
That's terse and readable, as well as type-safe.
|
||
The only issue with this is that the exhaustiveness of the patterns is not enforced, so we're still only discovering that error at runtime.
|
||
So it's not all that much of an improvement over the instanceof approach.
|
||
|
||
|
||
Safe and Exhaustive: Church Encoding
|
||
|
||
Alonzo Church was a pretty cool guy. Having invented the lambda calculus, he discovered that you could encode data in it.
|
||
We've all heard that every data type can be defined in terms of the kinds of operations that it supports. Well, what Church discovered is much more profound than that.
|
||
A data type IS a function. In other words, an algebraic data type is not just a structure together with an algebra that collapses the structure.
|
||
The algebra IS the structure.
|
||
|
||
Consider the boolean type. It is a disjoint union of True and False. What kinds of operations does this support?
|
||
Well, you might want to do one thing if it's True, and another if it's False.
|
||
Just like with our Tree, where we wanted to do one thing if it's a Leaf, and another thing if it's a Node, etc.
|
||
|
||
But it turns out that the boolean type IS the condition function. Consider the Church encoding of booleans:
|
||
|
||
true = ?a.?b.a
|
||
false = ?a.?b.b
|
||
|
||
|
||
So a boolean is actually a binary function. Given two terms, a boolean will yield the former term if it's true, and the latter term if it's false.
|
||
What does this mean for our Tree type? It too is a function:
|
||
|
||
empty = ?a.?b.?c.a
|
||
leaf = ?a.?b.?c.?x.b x
|
||
node = ?a.?b.?c.?l.?r.c l r
|
||
|
||
|
||
You can see that this gives you pattern matching for free. The Tree type is a function that takes three arguments:
|
||
|
||
A value to yield if the tree is empty.
|
||
A unary function to apply to an integer if it's a leaf.
|
||
A binary function to apply to the left and right subtrees if it's a node.
|
||
|
||
|
||
The type of such a function looks like this (Scala notation):
|
||
|
||
T => (Int => T) => (Tree => Tree => T) => T
|
||
|
||
Or equivalently:
|
||
|
||
(Empty => T) => (Leaf => T) => (Node => T) => T
|
||
|
||
Translated to Java, we need this method on Tree:
|
||
|
||
public abstract <T> T match(F<Empty, T> a, F<Leaf, T> b, F<Node, T> c);
|
||
|
||
The F interface is a first-class function from Functional Java. If you haven't seen that before, here it is:
|
||
|
||
public interface F<A, B> { public B f(A a); }
|
||
|
||
Now our Tree code looks like this:
|
||
|
||
public abstract class Tree {
|
||
// Constructor private so the type is sealed.
|
||
private Tree() {}
|
||
|
||
public abstract <T> T match(F<Empty, T> a, F<Leaf, T> b, F<Node, T> c);
|
||
|
||
public static final class Empty extends Tree {
|
||
public <T> T match(F<Empty, T> a, F<Leaf, T> b, F<Node, T> c) {
|
||
return a.f(this);
|
||
}
|
||
|
||
public Empty() {}
|
||
}
|
||
|
||
public static final class Leaf extends Tree {
|
||
public final int n;
|
||
|
||
public <T> T match(F<Empty, T> a, F<Leaf, T> b, F<Node, T> c) {
|
||
return b.f(this);
|
||
}
|
||
|
||
public Leaf(int n) { this.n = n; }
|
||
}
|
||
|
||
public static final class Node extends Tree {
|
||
public final Tree left;
|
||
public final Tree right;
|
||
|
||
public <T> T match(F<Empty, T> a, F<Leaf, T> b, F<Node, T> c) {
|
||
return c.f(this);
|
||
}
|
||
|
||
public Node(Tree left, Tree right) {
|
||
this.left = left; this.right = right;
|
||
}
|
||
}
|
||
}
|
||
|
||
And we can do our pattern matching on the calling side:
|
||
|
||
public int depth(Tree t) {
|
||
return t.match(constant(0), constant(1), new F<Node, Integer>(){
|
||
public Integer f(Node n) {
|
||
return 1 + max(depth(n.left), depth(n.right));
|
||
}
|
||
});
|
||
}
|
||
|
||
This is almost as terse as the Scala code, and very easy to understand.
|
||
Everything is checked by the type system, and we are guaranteed that our patterns are exhaustive.
|
||
This is an ideal solution.
|
||
|
||
By the way, if you're wondering what constant(0) means, it's a method that returns a function F<A, Integer> that always returns 0, ignoring the argument.
|
||
|
||
Conclusion
|
||
|
||
With some slightly clever use of generics and a little help from our friends Church and Curry,
|
||
we can indeed emulate structural pattern matching over algebraic data types in Java, to the point where it's almost as nice as a built-in language feature.
|
||
|
||
So throw away your Visitors and set fire to your GoF book.
|
||
|