зеркало из
				https://github.com/iharh/notes.git
				synced 2025-11-04 15:46:08 +02:00 
			
		
		
		
	
		
			
				
	
	
		
			273 строки
		
	
	
		
			10 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
			
		
		
	
	
			273 строки
		
	
	
		
			10 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
http://apocalisp.wordpress.com/2009/08/21/structural-pattern-matching-in-java/
 | 
						||
 | 
						||
One of the great features of modern programming languages is structural pattern matching on algebraic data types.
 | 
						||
Once you’ve used this feature, you don’t ever want to program without it. You will find this in languages like Haskell and Scala.
 | 
						||
 | 
						||
In Scala, algebraic types are provided by case classes. For example:
 | 
						||
 | 
						||
sealed trait Tree
 | 
						||
case object Empty extends Tree
 | 
						||
case class Leaf(n: Int) extends Tree
 | 
						||
case class Node(l: Tree, r: Tree) extends Tree
 | 
						||
 | 
						||
To define operations over this algebraic data type, we use pattern matching on its structure:
 | 
						||
 | 
						||
def depth(t: Tree): Int = t match {
 | 
						||
  case Empty => 0
 | 
						||
  case Leaf(n) => 1
 | 
						||
  case Node(l, r) => 1 + max(depth(l), depth(r))
 | 
						||
}
 | 
						||
 | 
						||
When I go back to a programming language like, say, Java, I find myself wanting this feature. Unfortunately, algebraic data types aren't provided in Java.
 | 
						||
However, a great many hacks have been invented over the years to emulate it, knowingly or not.
 | 
						||
 | 
						||
 | 
						||
 | 
						||
The Ugly: Interpreter and Visitor
 | 
						||
 | 
						||
What I have used most throughout my career to emulate pattern matching in languages that lack it are a couple of hoary old hacks.
 | 
						||
These venerable and well respected practises are a pair of design patterns from the GoF book: Interpreter and Visitor.
 | 
						||
 | 
						||
The Interpreter pattern really does describe an algebraic structure, and it provides a method of reducing (interpreting) the structure.
 | 
						||
However, there are a couple of problems with it.
 | 
						||
The interpretation is coupled to the structure, with a "context" passed from term to term, and each term must know how to mutate the context appropriately.
 | 
						||
That's minus one point for tight coupling, and minus one for relying on mutation.
 | 
						||
 | 
						||
The Visitor pattern addresses the former of these concerns.
 | 
						||
Given an algebraic structure, we can define an interface with one "visit" method per type of term, and have each term accept a visitor object that implements this interface,
 | 
						||
passing it along to the subterms.
 | 
						||
This decouples the interpretation from the structure, but still relies on mutation.
 | 
						||
Minus one point for mutation, and minus one for the fact that Visitor is incredibly crufty.
 | 
						||
For example, to get the depth of our tree structure above, we have to implement a TreeDepthVisitor.
 | 
						||
A good IDE that generates boilerplate for you is definitely recommended if you take this approach.
 | 
						||
 | 
						||
On the plus side, both of these patterns provide some enforcement of the exhaustiveness of the pattern match.
 | 
						||
For example, if you add a new term type, the Interpreter pattern will enforce that you implement the interpretation method.
 | 
						||
For Visitor, as long as you remember to add a visitation method for the new term type to the visitor interface, you will be forced to update your implementations accordingly.
 | 
						||
 | 
						||
 | 
						||
 | 
						||
The Bad: Instanceof
 | 
						||
 | 
						||
An obvious approach that's often sneered at is runtime type discovery. A quick and dirty way to match on types is to simply check for the type at runtime and cast:
 | 
						||
 | 
						||
public static int depth(Tree t) {
 | 
						||
  if (t instanceof Empty)
 | 
						||
    return 0;
 | 
						||
  if (t instanceof Leaf)
 | 
						||
    return 1;
 | 
						||
  if (t instanceof Node)
 | 
						||
    return 1 + max(depth(((Node) t).left), depth(((Node) t).right));
 | 
						||
  throw new RuntimeException("Inexhaustive pattern match on Tree.");
 | 
						||
}
 | 
						||
 | 
						||
There are some obvious problems with this approach.
 | 
						||
For one thing, it bypasses the type system, so you lose any static guarantees that it's correct.
 | 
						||
And there's no enforcement of the exhaustiveness of the matching.
 | 
						||
But on the plus side, it's both fast and terse.
 | 
						||
 | 
						||
 | 
						||
 | 
						||
The Good: Functional Style
 | 
						||
 | 
						||
There are at least two approaches that we can take to approximate pattern matching in Java more closely than the above methods.
 | 
						||
Both involve utilising parametric polymorphism and functional style.
 | 
						||
Let's consider them in order of increasing preference, i.e. less preferred method first.
 | 
						||
 | 
						||
 | 
						||
Safe and Terse - Disjoint Union Types
 | 
						||
 | 
						||
The first approach is based on the insight that algebraic data types represent a disjoint union of types.
 | 
						||
Now, if you've done any amount of programming in Java with generics, you will have come across (or invented) the simple pair type, which is a conjunction of two types:
 | 
						||
 | 
						||
public abstract class P2<A, B> {
 | 
						||
  public A _1();
 | 
						||
  public B _2();
 | 
						||
}
 | 
						||
 | 
						||
A value of this type can only be created if you have both a value of type A and a value of type B. So (conceptually, at least) it has a single constructor that takes two values.
 | 
						||
The disjunction of two types is a similar idea, except that a value of type Either<A, B> can be constructed with either a value of type A or a value of type B:
 | 
						||
 | 
						||
public final class Either<A, B> {
 | 
						||
  ...
 | 
						||
  public static <A, B> Either<A, B> left(A a) { ... }
 | 
						||
  public static <A, B> Either<A, B> right(B a) { ... }
 | 
						||
  ...
 | 
						||
}
 | 
						||
 | 
						||
Encoded as a disjoint union type, then, our Tree data type above is: Either<Empty, Either<Leaf, Node>>
 | 
						||
 | 
						||
Let's see that in context. Here's the code.
 | 
						||
 | 
						||
public abstract class Tree {
 | 
						||
  // Constructor private so the type is sealed.
 | 
						||
  private Tree() {}
 | 
						||
 
 | 
						||
  public abstract Either<Empty, Either<Leaf, Node>> toEither();
 | 
						||
 
 | 
						||
  public static final class Empty extends Tree {
 | 
						||
    public <T> T toEither() {
 | 
						||
      return left(this);
 | 
						||
    }
 | 
						||
 
 | 
						||
    public Empty() {}
 | 
						||
  }
 | 
						||
 
 | 
						||
  public static final class Leaf extends Tree {
 | 
						||
    public final int n;
 | 
						||
 
 | 
						||
    public Either<Empty, Either<Leaf, Node>> toEither() {
 | 
						||
      return right(Either.<Leaf, Node>left(this));
 | 
						||
    }
 | 
						||
 
 | 
						||
    public Leaf(int n) { this.n = n; }
 | 
						||
  }
 | 
						||
 
 | 
						||
  public static final class Node extends Tree {
 | 
						||
    public final Tree left;
 | 
						||
    public final Tree right;   
 | 
						||
 
 | 
						||
    public Either<Empty, Either<Leaf, Node>> toEither() {
 | 
						||
      return right(Either.<Leaf, Node>right(this));
 | 
						||
    }
 | 
						||
 
 | 
						||
    public Node(Tree left, Tree right) {
 | 
						||
      this.left = left; this.right = right;
 | 
						||
    }
 | 
						||
  }
 | 
						||
}
 | 
						||
 | 
						||
The neat thing is that Either<A, B> can be made to return both Iterable<A> and Iterable<B> in methods right() and left(), respectively.
 | 
						||
One of them will be empty and the other will have exactly one element.
 | 
						||
So our pattern matching function will look like this:
 | 
						||
 | 
						||
public int depth(Tree t) {
 | 
						||
  Either<Empty, Either<Leaf, Node>> eln = t.toEither();
 | 
						||
  for (Empty e: eln.left())
 | 
						||
    return 0;
 | 
						||
  for (Either<Leaf, Node> ln: t.toEither().right()) {
 | 
						||
    for (leaf: ln.left())
 | 
						||
      return 1;
 | 
						||
    for (node: ln.right())
 | 
						||
      return 1 + max(depth(node.left), depth(node.right));
 | 
						||
  }
 | 
						||
  throw new RuntimeException("Inexhaustive pattern match on Tree.");
 | 
						||
}
 | 
						||
 | 
						||
That's terse and readable, as well as type-safe.
 | 
						||
The only issue with this is that the exhaustiveness of the patterns is not enforced, so we're still only discovering that error at runtime.
 | 
						||
So it's not all that much of an improvement over the instanceof approach.
 | 
						||
 | 
						||
 | 
						||
Safe and Exhaustive: Church Encoding
 | 
						||
 | 
						||
Alonzo Church was a pretty cool guy. Having invented the lambda calculus, he discovered that you could encode data in it.
 | 
						||
We've all heard that every data type can be defined in terms of the kinds of operations that it supports. Well, what Church discovered is much more profound than that.
 | 
						||
A data type IS a function. In other words, an algebraic data type is not just a structure together with an algebra that collapses the structure.
 | 
						||
The algebra IS the structure.
 | 
						||
 | 
						||
Consider the boolean type. It is a disjoint union of True and False. What kinds of operations does this support?
 | 
						||
Well, you might want to do one thing if it's True, and another if it's False.
 | 
						||
Just like with our Tree, where we wanted to do one thing if it's a Leaf, and another thing if it's a Node, etc.
 | 
						||
 | 
						||
But it turns out that the boolean type IS the condition function. Consider the Church encoding of booleans:
 | 
						||
 | 
						||
true  = ?a.?b.a
 | 
						||
false = ?a.?b.b
 | 
						||
 | 
						||
 | 
						||
So a boolean is actually a binary function. Given two terms, a boolean will yield the former term if it's true, and the latter term if it's false.
 | 
						||
What does this mean for our Tree type? It too is a function:
 | 
						||
 | 
						||
empty = ?a.?b.?c.a
 | 
						||
leaf  = ?a.?b.?c.?x.b x
 | 
						||
node  = ?a.?b.?c.?l.?r.c l r
 | 
						||
 | 
						||
 | 
						||
You can see that this gives you pattern matching for free. The Tree type is a function that takes three arguments:
 | 
						||
 | 
						||
    A value to yield if the tree is empty.
 | 
						||
    A unary function to apply to an integer if it's a leaf.
 | 
						||
    A binary function to apply to the left and right subtrees if it's a node.
 | 
						||
 | 
						||
 | 
						||
The type of such a function looks like this (Scala notation):
 | 
						||
 | 
						||
T => (Int => T) => (Tree => Tree => T) => T
 | 
						||
 | 
						||
Or equivalently:
 | 
						||
 | 
						||
(Empty => T) => (Leaf => T) => (Node => T) => T
 | 
						||
 | 
						||
Translated to Java, we need this method on Tree:
 | 
						||
 | 
						||
public abstract <T> T match(F<Empty, T> a, F<Leaf, T> b, F<Node, T> c);
 | 
						||
 | 
						||
The F interface is a first-class function from Functional Java. If you haven't seen that before, here it is:
 | 
						||
 | 
						||
public interface F<A, B> { public B f(A a); }
 | 
						||
 | 
						||
Now our Tree code looks like this:
 | 
						||
 | 
						||
public abstract class Tree {
 | 
						||
  // Constructor private so the type is sealed.
 | 
						||
  private Tree() {}
 | 
						||
 
 | 
						||
  public abstract <T> T match(F<Empty, T> a, F<Leaf, T> b, F<Node, T> c);
 | 
						||
 
 | 
						||
  public static final class Empty extends Tree {
 | 
						||
    public <T> T match(F<Empty, T> a, F<Leaf, T> b, F<Node, T> c) {
 | 
						||
      return a.f(this);
 | 
						||
    }
 | 
						||
 
 | 
						||
    public Empty() {}
 | 
						||
  }
 | 
						||
 
 | 
						||
  public static final class Leaf extends Tree {
 | 
						||
    public final int n;
 | 
						||
 
 | 
						||
    public <T> T match(F<Empty, T> a, F<Leaf, T> b, F<Node, T> c) {
 | 
						||
      return b.f(this);
 | 
						||
    }
 | 
						||
 
 | 
						||
    public Leaf(int n) { this.n = n; }
 | 
						||
  }
 | 
						||
 
 | 
						||
  public static final class Node extends Tree {
 | 
						||
    public final Tree left;
 | 
						||
    public final Tree right;   
 | 
						||
 
 | 
						||
    public <T> T match(F<Empty, T> a, F<Leaf, T> b, F<Node, T> c) {
 | 
						||
      return c.f(this);
 | 
						||
    }
 | 
						||
 
 | 
						||
    public Node(Tree left, Tree right) {
 | 
						||
      this.left = left; this.right = right;
 | 
						||
    }
 | 
						||
  }
 | 
						||
}
 | 
						||
 | 
						||
And we can do our pattern matching on the calling side:
 | 
						||
 | 
						||
public int depth(Tree t) {
 | 
						||
  return t.match(constant(0), constant(1), new F<Node, Integer>(){
 | 
						||
    public Integer f(Node n) {
 | 
						||
      return 1 + max(depth(n.left), depth(n.right));
 | 
						||
    }
 | 
						||
  });
 | 
						||
}
 | 
						||
 | 
						||
This is almost as terse as the Scala code, and very easy to understand.
 | 
						||
Everything is checked by the type system, and we are guaranteed that our patterns are exhaustive.
 | 
						||
This is an ideal solution.
 | 
						||
 | 
						||
By the way, if you're wondering what constant(0) means, it's a method that returns a function F<A, Integer> that always returns 0, ignoring the argument.
 | 
						||
 | 
						||
Conclusion
 | 
						||
 | 
						||
With some slightly clever use of generics and a little help from our friends Church and Curry,
 | 
						||
we can indeed emulate structural pattern matching over algebraic data types in Java, to the point where it's almost as nice as a built-in language feature.
 | 
						||
 | 
						||
So throw away your Visitors and set fire to your GoF book.
 | 
						||
 |