What will programming languages look like one hundred years from now? Where will all of those wasted cycles end up going?
I think it is safe to say that the programming language of the future, if it exists at all, will involve some kind of artificial intelligence. This post is about why I think that theorem provers will be standard in languages of the future.
1


This simple function takes two arguments. The first is a predicate
distinguishing between desirable (True
) and undesirable (False
) values for A.
The second is a size restriction on A (e.g. number of bytes).
The function returns a random value of A, if one exists, meeting two constraints:
 It satisfies the predicate.
 It is no larger than the size constraint.
Also, the solve
function is guaranteed to terminate whenever the predicate
terminates.
First I will try to convince you that the solve
function is more important than any of your petty opinions about syntax, objectorientation, type theory, or macros. After that I will make a fool of myself by explaining how to build the solve
function with today’s technology.
Why it matters
It can find fixpoints:
1 2 3 4 5 6 7 

It can invert functions:
1 2 3 4 

It can solve Project Euler problems:
1 2 3 4 5 6 7 8 9 10 

It can check that two functions are equal:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 

So it’s useful for detecting the introduction of bugs when you are optimizing things.
In fact, the solve function can find a more efficient implementation on your behalf.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 

The speed check is crude, but the idea is there.
Keeping the size constraint reasonable prevents the solve
function from just creating a giant table
mapping inputs to outputs.
Curry and Howard tell us that
programs and proofs are one and the same thing. If our solve
function can generate programs, then it
can also generate mathematical proofs.
1 2 3 4 5 6 7 8 9 10 11 

If the proof is ugly, we can decrease the search size, and we will get a more elegant proof.
The solve
function can find bugs:
1 2 3 4 5 

The solve
function will never get people to stop arguing, but it will at least change the dynamic
vs static types argument from a pragmatic one to an artistic one.
One last example:
Testdriven development advocates writing tests which are sufficient to construct the missing parts of a program. So why write the program at all?
1 2 3 4 5 6 7 8 

In fact, unit_tests can be replaced with any assertion about the desired program: e.g. that it type checks under HindleyMilner, that it terminates within a certain number of steps, that it does not deadlock within the first X cycles of the program’s execution, and so on.
Are you excited yet? Programming in the future is awesome!
Implementation
Always start with the obvious approach:
1 2 3 4 5 

Correct, but useless. If the predicate consisted of only one floating point operation, the Sequoia supercomputer would take 17 minutes to solve a mere 8 bytes.
The complexity of solve
is clear. The variable num
can be nondeterministically chosen from the range in
linear time (size * 8
), decode takes linear time, and predicate takes polynomial time in most of
our examples from above. So solve
is usually in NP, and no worse than NPcomplete as long as
our predicate is in P.
It’s a hard problem. Were you surprised? Or did you get suspicious when the programmers of the future started exemplifying godlike powers?^{1}
Thankfully, a lot of work has been put into solving hard problems.
Today’s sat solvers can solve problems with 10 million variables. That’s 1.2 megabytes of search space, which is large enough for almost all of the examples above, if we’re clever enough. (The Kadane example is the definite exception, since the predicate takes superpolynomial time.)
The CookLevin theorem gives us a
procedure for writing the solve
function more efficiently.
 Imagine a large number of processors, each with its own memory, lined up and connected so that the output state of each processor and memory becomes the input state of the next processor and memory. The state of the entire assembly is determined solely by the state of the first processor. The state of the whole system is static.
 Represent each (unchanging) bit in the assembly with a boolean variable, and generate constraints on those variables by examining the logic gates connecting the bits.
 Assign values to some of the variables in a way that corresponds to the first processor containing the machine code of the predicate.
 Likewise, assign values so that the accumulator register of the last processor contains the value
True
.  Apply a sat solver to the variables and their constraints.
 Extrapolate a solution by examining the first processor’s total state.
I call this approach “solving the interpreter trace” because the imaginary processors act as an interpreter for the predicate, and we ask the sat solver to trace out the processor execution.
The approach is elegant, but it has three major problems:
 The formula given to the sat solver is enormous, even for small predicates and input sizes. (It’s polynomial, but the coefficient is large.)
 The formula is highly symmetrical, which means the sat solver will perform a lot of redundant computation.
 The meaning of bits in later processors is highly dependent on the value of bits in earlier processors (especially if the predicate starts off with a loop). This will force our sat solver to work a problem from beginning to end, even when a different order (such as end to beginning) would be more intelligent.
We can get rid of these problems if we compile our predicate directly into a boolean formula. Compilation is easy enough if our predicate contains neither loops nor conditionals.
1 2 3 4 5 6 

becomes
1 2 3 4 5 

Actually conditionals aren’t that hard either
1 2 3 4 5 6 7 8 

becomes
1 2 3 4 5 

A sat solver would immediately assign w2
the value 0
. If we were solving over an interpretational
trace, w2
wouldn’t be a single variable, but would be one of two variables depending on whether
b
was True
or False
.
By compiling the predicate, we have enabled the solver to work from end to beginning (if it so chooses).
Can we handle loops?
1 2 3 4 5 6 7 8 

One approach is to unroll the loop a finite number of times.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 

With branching and conditionals, we are turing complete. Function calls can be inlined up until recursion. Tail recursive calls can be changed to while loops, and the rest can be reified as loops around stack objects with explicit push and pop operations. These stack objects will introduce symmetry into our sat formulas, but at least it will be contained.
When solving, we assume the loops make very few iterations, and increase our unroll depth as that assumption is violated. The solver might then look something like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 

max_unroll_count
does static analysis to figure out the maximum number of
unrolls that are needed. The number of unrolls will either be a constant
(and so can be found out by doing constant reduction within the predicate), or it
will somehow depend on the size of the predicate argument (and so an upper bound can be found by
doing inference on the predicate).
The solver is biased toward finding solutions that use fewer loop iterations, since each loop
iteration sets another boolean variable to 1
, and thus cuts the solution space down by half.
If the solver finds a solution, then we return it. If not, then we try again, this time allowing
_longer_loop_needed
to be true. If it still can’t find a solution, then we know no solution
exists, since i
and j
were set to arbitrary values. By “arbitrary”, I mean that, at compilation
time, no constraints will connect the later usages of i
and j
(there are none in this example)
with the earlier usages.
I admit that this approach is ugly, but the alternative, solving an interpreter trace, is even more expensive. The hacks are worth it, at least until somebody proves P == NP.
Some of the examples I gave in the first section used eval. Partial evaluation techniques can be used to make these examples more tractable.
I’ve only talked about sat solvers. You can probably get better results with an smt solver or a domainspecific constraint solver.
In thinking about this problem, I’ve realized that there are several parallels between compilers and sat solvers. Constant reduction in a compiler does the same work as the unit clause heuristic in a sat solver. Dead code removal corresponds to early termination. Partial evaluation reduces the need for symmetry breaking. Memoization corresponds to clause learning. Is there a name for this correspondance? Do compilers have an analogue for the pure symbol heuristic? Do sat solvers have an analogue for attribute grammars?
Today
If you want to use languages which are on the evolutionary path toward the language of the future, you should consider C# 4.0, since it is the only mainstream language I know of that comes with a builtin theorem prover.
Update (20131124):
I am happy to report that I am not alone in having these ideas. “Searchassisted programming”, “solver aided languages”, “computer augmented programming”, and “satisfiability based inductive program synthesis” are some of the names used to describe these techniques. Emily Torlak has developed an exciting language called Rosette, which is a dsl for creating solver aided languages. Ras Bodik has also done much work combining constraint solvers and programming languages. The ExCAPE project focuses on program synthesis. Thanks to Jimmy Koppel for letting me know these people exist.
^{1}: Even many computer scientists do not seem to appreciate how different the world would be if we could solve NPcomplete problems efficiently. I have heard it said, with a straight face, that a proof of P = NP would be important because it would let airlines schedule their flights better, or shipping companies pack more boxes in their trucks! One person who did understand was Gödel. In his celebrated 1956 letter to von Neumann, in which he first raised the P versus NP question, Gödel says that a linear or quadratictime procedure for what we now call NPcomplete problems would have “consequences of the greatest magnitude.” For such a procedure “would clearly indicate that, despite the unsolvability of the Entscheidungsproblem, the mental effort of the mathematician in the case of yesorno questions could be completely replaced by machines.” But it would indicate even more. If such a procedure existed, then we could quickly find the smallest Boolean circuits that output (say) a table of historical stock market data, or the human genome, or the complete works of Shakespeare. It seems entirely conceivable that, by analyzing these circuits, we could make an easy fortune on Wall Street, or retrace evolution, or even generate Shakespeare’s 38th play. For broadly speaking, that which we can compress we can understand, and that which we can understand we can predict. — Scott Aaronson