2024-05-09

The Penultimate Interview Question

To be clear, this is an excessively nerdy troll interview question. It is designed to sound solvable in an interview, but is actually proven entirely impossible. If you just want to see the parts needed for presenting the problem in an interview, read Introduction to Lambda Calculus, Operational Semantics, and then the beginning of each problem linked below.

Motivation
The Problem
Introduction to Lambda Calculus
Not the Problem
Operational Semantics
Problem 1
Problem 2
Problem 3
Problem 4
Problem 5
Bonus Problem

Motivation

Tech interview questions tend to be one of two kinds, the excessively technical questions, or the no right answer questions. The technical sort of question likely has little bearing on the day to day work. Being able to implement a binary search tree is unlikely to help one figure out why some div refuses to center. But figuring out if someone is actually good at the job is hard, so ability to solve leet code problems is a go to proxy metric.

The no right answer questions tend to be along the lines of, can you explain this gap in your resume? Interviewers will even ask, without a hint of irony, what is your greatest weakness? I have my doubts about the utility of no right answer questions, but far be it from me to challenge tradition.

At some point, I was pondering Gödel’s incompleteness theorem, when it occurred to me, we could do better. We could craft a question where not only are there no right answers, but any answer given can be incontrovertibly demonstrated incorrect. It goes without say that such a question will be excessively technical.

It took me some time to hit upon a suitable question. The question needs to appear answerable in an interview, and only reveal itself to be a trap once sprung. One day I realized, I had a perfect candidate - lambda calculus. It is the ultimate interview question. Unless there is somehow a way to make a better one. Just in case, I will call it the penultimate interview question. (Ok, fine, I just like the word penultimate)

The Problem

Lambda calculus was developed in the 1930s by Logician Alonzo Church as a tool to investigate computability and the foundations of mathematics. The system is so simple to define, it is easy to think you have missed something. But on closer examination, it turns out to be devilishly complex. It’s like a cursed Conway’s game of life. In fact, it turns out to be impossible to always determine if two lambda calculus expressions are equivalent. The problem of determining the equivalence of lambda calculus expressions is equivalent to the halting problem. This is the question we will pose in the interview - write a program which determines if two lambda calculus expressions are equivalent.

Introduction to Lambda Calculus

We will start by presenting the interviewee with a short introduction to the syntax. Lambda calculus has variables, function definitions, and function application. A function is defined as λ<variable>.<body>. When the function is applied to an argument, every instance of the variable in the function body is replaced with the argument. The act of replacing variables with an argument is called reduction. To apply a function to an argument, we simply place the argument after the function. Below is the lambda calculus syntax in Backus-Naur form.

<expression> ::= <variable> | <function> | <application> | (<expression>)
<function> ::= λ<variable>.<expression>
<application> ::= <expression> <expression>

Note:

I always put parenthesis to disambiguate associativity and scope of function bodies. If a function body is not wrapped in parentheses, it is only one variable.
In many places, including the original work, functions are called abstractions.

To make this less abstract, we can present a few examples of lambda expressions and reductions:

λx.x - the identity function. Here x is both the variable and the expression body.
(λx.x) a -> a - the identity function applied to a reduces to just a.
λx.(x x) - this is a function which applies the argument to that same argument.
(λx.(x x)) a -> a a - the above function applied to a reduces to a applied to a.
λs.(λz.(s (s z))) - this function takes two arguments and applies the first argument two times to the second argument. Technically λs takes one argument and returns the λz function which takes one argument. This trick of making multi argument functions is called currying.
(λs.(λz.(s (s z))) x) y -> (λz.(x (x z))) y -> x (x y) - the above function applied to x and y reduces to x applied to y twice.

Not the Problem

Note that we have used parenthesis to disambiguate the expressions. While there are conventions around the scope of functions and the associativity of applications, we are only interested in hitting the interviewee with tricky things which are impossible. Getting the details of scope and associativity is tricky, but uninteresting.

Another uninteresting problem is expressions which are the same except for variable names. λx.x is equivalent to λy.y, but we will be careful to never present expressions which reduce to such a pair. Checking for this sort of equivalence would take precious interview time, but is, without a doubt, solvable.

The trickiest and least interesting problem to avoid is name conflicts. Lambda calculus notation has a significant issue with name conflicts. To give an example of a name conflict, consider the expression λz.(λs.(λz.(s z)) z). The first thing you might notice is that there is a λz inside the body of the outer λz. In situations like this, the inner λ overrides the outer λ. But this situation has another problem. Applying the λs function to z will put an outer scope z into the scope of the inner λz. To solve this, we would need to do a rename, perhaps to λz.(λs.(λz'.(s z')) z) which reduces to λz.(λz'.(z z')). We will be careful to only present expressions where the most naive approach to name conflicts will suffice.

Operational Semantics

To keep the interview on track, we are going to specify a reduction strategy. Below are the rules. Note the => represents a single reduction step, and everything can “reduce” to itself

x => x, meaning variables reduce to themselves
λx.M => λx.M' if M => M', meaning we reduce inside functions.
(λx.M)N => M'[x := N'] if M => M' and N => N' (M'[x := N'] means all the x-s in M' have been replaced with N'), meaning we reduce both the function and the argument, and then apply, all in one step.
MN => M'N' if M => M' and N => N' (and M is not a lambda), meaning we reduce both side of an application if we can’t apply because the left side is not in the form of a function.

The reason we must specify a reduction strategy is to make sure the interviewee always finds the normal form if it exists and finds it with the runtime properties we expect. The Church-Rosser theorem asserts that there can only be one normal form for an expression. However, the order in which expressions are reduced can change if the normal form will be found. For example, consider the expression λx.y ((λx.xx)(λx.xx)). If you always try to completely evaluate the argument before substituting, you will never find the normal form. If on the other hand, you reduce the outer application first, this will converge in one step to y.

The reduction strategy above is the maximal parallel β-reduction strategy. I selected it because it is one of the strategies which will always find the normal form if it exists and because it fits with my impatient nature.

The prompts

To start, we will let the interviewee test their solution with the example expressions, but it will quickly get harder from there.

Problem 1

λx.(x x) λx.(x x)
(λx.x λx.x) λx.x

This is our first trick, the omega combinator. The first expression can reduce, but the reduction is to itself. Some implementations will keep reducing this forever. The second expression looks similar, but simply reduces to λx.x. While we wait for their program to never finish, we can have the interviewee try to understand what is happening. Once they realize what has gone wrong, they quite possibly will have their program stop reducing when the reduction causes no change. This sets us up for our next trick.

Problem 2

((λx.(λy.((x y) x)) λa.(λb.((a b) a))) λx.(λy.((x y) x)))
(((λx.(λy.(λz.(((x y) z) x))) λa.(λb.(λc.(((a b) c) a)))) λm.(λn.(λp.(((m n) p) m)))) λx.(λy.(λz.(((x y) z) x))))

Both those expressions are cycles. The first is a 2-cycle and the second is a 3-cycle. The previous problem was technically a 1-cycle. While there are many clever cycle detection algorithms, even an easy solution, such as maintaining a set of previously seen expressions, should do. The next problem will take things in a different direction.

Problem 3

(λf.((f f) f) λf.((f f) f))
(λf.(f f) λf.((f f) (f f)))

The above two expressions both grow forever. The 2nd starts slower, but then grows faster. These expressions are analogous to fork bombs. While it may be possible to use induction to recognize many such diverging expression, I suspect most interviewees will simply put some arbitrary cap on the number of reduction steps, the time to reduce, or the expression size. But of course, putting an arbitrary cap has a problem.

Problem 4

This is the point where the interviewee has unknowingly chose their own adventure. If the interviewee has capped reductions steps, jump to 4A. If they have chosen to cap expression size, go to 4b. If they have chosen to cap reduction time, go to 4C.

Problem 4A - Arbitrary Number of Steps

b
(((λg.(((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((g g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) λf.(λa.(f a))) λx.x) b)

Those two expressions are equivalent, but the 2nd expression will take 100 steps to evaluate. To make it take more or less steps, add or remove gs. Each g adds one step. There are several components to the 2nd expression. Lets first review them individually, and then consider them together. The simplest lambda term is of course the lone variable b. Slightly more complicated, we have the id function λx.x. It takes a single argument and returns it. The first interesting term is λf.(λa.(f a))), which I am going to call the apply function. The apply function takes a two arguments and applies the first to the second. Now if you apply the apply function to another function, you get back a function which is equivalent to the argument function. So if you apply the apply function to itself, you get back an apply function. How this differs from simply using the id function is that this takes two reduction steps to get back the original function. The final component is the function which takes an argument and applies it to itself a bunch of times with left associativity. Lets consider it with just a few gs - λg.((g g) g). g g is an application, not an function, so it can not be applied to the outer g as is. If g g reduces to a function, it can then be applied to the outer g. If we add one more g, λg.(((g g) g) g), the same logic applies to the new outermost g. (g g) g is an application, so it can not be applied to the outermost g. When we combine λg.((g g) g) with the apply function, we get the apply function applied to itself repeatedly. It will reduce to a single apply function, but it will require n steps to reduce, where n is the number of self applications. When we pass the ID function to the apply function, we get the ID function. The final step is to pass b to the ID function, which then gives us b.

Problem 4B - Arbitrary Expression Size

b
(((((λs.(λz.(s (s z))) λe.(λo.(e (e o)))) λr.(λy.(r (r y)))) λa.(λb.(a (a b)))) λx.x) b)

Those two expressions are equivalent. The 2nd takes 3 steps to reduce, yet, on step 2, is 391903 characters long. It is an embedding of ((2^2)^2)^2 applied to the ID function and b. If the interviewee set a size cap we can always increase it by making one of those numbers bigger or adding yet another exponentiation.

To understand how this works, we need to explain embeddings. The idea of embeddings in lambda calculus is to create lambda expressions which shares important properties with the embedded phenomena.

To embed natural numbers, we can use church numerals. A church numeral n, takes two arguments and applies the first argument to the second argument n times. Zero is λs.(λz.z), 1 is λs.(λz.(s z)), 2 λs.(λz.(s (s z))), etc.

The easiest arithmetic operation to perform on church numerals is exponentiation. One number applied to another number reduces to the second raised to the power of the first.

I think the best way to explain why is to simply demonstrate it. Below are the steps in reduction of 2^3 and 3^2. I use different variables for the two different numbers to avoid name conflicts.

Three raised to two:

3^2
(λr.(λy.(r (r y))) λs.(λz.(s (s (s z)))))
λy.(λs.(λz.(s (s (s z)))) (λs.(λz.(s (s (s z)))) y))
λy.(λz.(λz.(y (y (y z))) (λz.(y (y (y z))) (λz.(y (y (y z))) z))))
λy.(λz.(y (y (y (y (y (y (y (y (y z))))))))))
9

Two raised to three:

2^3:
(λs.(λz.(s (s (s z)))) λr.(λy.(r (r y))))
λz.(λr.(λy.(r (r y))) (λr.(λy.(r (r y))) (λr.(λy.(r (r y))) z)))
λz.(λy.(λy.(λy.(z (z y)) (λy.(z (z y)) y)) (λy.(λy.(z (z y)) (λy.(z (z y)) y)) y)))
λz.(λy.(z (z (z (z (z (z (z (z y)))))))))
9

Now that we have exponentiation, we can use it to create a very large number. To collapse this number we can apply it to the ID function which will reduce the number itself to an ID function. We then apply that to b to get the final form.

Problem 4C - Arbitrary Run Time

λs.(λz.z)
(λx.((λn.(λm.((m λm.(λp.(p λt.(λf.t)) ((m λp.((λl.(λr.(λb.((b l) r))) (λp.(p λt.(λf.f)) p)) (λn.(λs.(λz.((n s) (s z)))) (λp.(p λt.(λf.f)) p)))) ((λl.(λr.(λb.((b l) r))) λs.(λz.z)) λs.(λz.z))))) n)) x) x) λs.(λz.(s (s (s (s (s (s (s (s (s (s (s z)))))))))))))

Those two expressions are equivalent. The first is an embedding of zero. The second is an embedding of f(n) = n - n applied to 11. The 2nd expression takes 35-40 seconds to evaluate on my laptop with the reducer I wrote (though only 2 seconds if I omit name conflict checks). Adding another application of s to z at the end of the second expression would make it take significantly longer.

To understand how this works, and why it is so slow, we need to understand how subtraction works on church numerals. Subtraction is the hardest arithmetic operation I know of. To understand it, we first need embeddings of booleans, ordered pairs, the successor function, and the predecessor function.

The embedding of booleans is relatively simple. True is a function which takes two arguments and returns the first one - λt.(λf.t). False takes two arguments and returns the second - λt.(λf.f).

Pair is a function which takes two items, for left and right, and a boolean for selecting an item from the pair - λl.(λr.(λb.((b l) r))). A function for selecting the first item of a pair is simple a function which takes a pair and applies it to true - λp.(p λt.(λf.t)). Similarly, the function for getting the second item of a pair takes a pair and applies it to false, λp.(p λt.(λf.f)).

The successor function, λn.(λs.(λz.((n s) (s z)))), takes a number n, an s, and a z. It then swaps n’s s with another s and swaps n’s z for s z. This reduces to a new church numeral with one more application of s to z.

The predecessor function is a little bit of a mind bender. It takes a number n as you might expect. What is swapped in for n’s s and z is interesting. For n’s z, we swap in a pair of zeros:

((λl.(λr.(λb.((b l) r)))
        λs.(λz.z))
        λs.(λz.z))

For n’s s, we swap in a function which takes a pair of numbers, selects the second number, then returns a new pair where the first item is the selected number unchanged, and the second item is the successor of the selected number:

λp.
    ((λl.(λr.(λb.((b l) r)))
        (λp.(p λt.(λf.f)) p))
        (λn.(λs.(λz.((n s) (s z)))) (λp.(p λt.(λf.f)) p)))

What this does is make nth application of what was previously s return a pair, (n-1, n). The final step is to take the first item of the pair resulting from the applications. All together:

λm.(
   λp.(p λt.(λf.t)) (
       (m
           λp.(
               (λl.(λr.(λb.((b l) r)))
                   (λp.(p λt.(λf.f)) p)) (λn.(λs.(λz.((n s) (s z))))
                   (λp.(p λt.(λf.f)) p))
               )
           ((λl.(λr.(λb.((b l) r)))
               λs.(λz.z))
               λs.(λz.z)
       )
   )
)

You can see on the second line, λp.(p λt.(λf.t)) for getting the first item.

Now that we have a predecessor function, we can do repeated applications of predecessor to get subtraction. To get m - n, we apply m to the predecessor function and n

Problem 5, Arbitrary time, steps, and size

Something noticeable about 4A and 4C is they shrink most of the time they run. Should the interviewee get the idea of only canceling the calculation if the calculation is exceeding some combinations of limits, we can create an expression which allows us to arbitrarily increase any metric. The below two expressions are equivalent.

b
((((((((λg.(((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((g g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) g) λf.(λa.(f a))) λx.x) λe.(λo.(e (e o)))) ((λf.(λx.(f λy.((x x) y)) λx.(f λy.((x x) y))) λf.(λv.((((λl.(λm.(λn.((l m) n))) (λn.((n λx.(λt.(λf.f))) λt.(λf.t)) v)) λx.(λs.(λz.(s z)))) λx.((λn.(λm.(λs.(n (m s)))) v) (f (λm.(λp.(p λt.(λf.t)) ((m λp.((λl.(λr.(λb.((b l) r))) (λp.(p λt.(λf.f)) p)) (λn.(λs.(λz.((n s) (s z)))) (λp.(p λt.(λf.f)) p)))) ((λl.(λr.(λb.((b l) r))) λs.(λz.z)) λs.(λz.z)))) v)))) λx.x))) λs.(λz.(s (s (s z)))))) λr.(λy.(r (r y)))) λa.(λb.(a (a b)))) λx.x) b)

This combines elements of 4A and 4B and a factorial function which is very expensive. In sudo code, the numerical component is returnValueAfterNSteps(200, 2)^3!^2^2 It will step very slowly at the start, then quickly go through 200, steps, then explode in size, before collapsing to just b. Increasing the argument to the factorial function, λs.(λz.(s (s (s z)))), would drastically increase runtime and max size. Increasing one of the other numbers, such as λr.(λy.(r (r y))) would drastically increase max size. adding more gs will add more steps. No combination of limits can be set.

This prompt introduces a factorial function. To understand that, I need to explain how to embed multiplication, if-else, and recursion.

Multiplication is relatively direct - λn.(λm.(λs.(n (m s)))). It is a function which takes two numbers and an s. All the ss in n are replaced with m s. The m’s internal ss are replaced with the outer s. Then n’s z replaces the z of the remnant of the innermost m. That innermost remnant of m then replaces the z of the second to innermost remnant of m. To show it in action, here are steps of reducing 2*2:

((λn.(λm.(λs.(n (m s)))) λs.(λz.(s (s z)))) λs.(λz.(s (s z))))
λs.(λs.(λz.(s (s z))) (λs.(λz.(s (s z))) s))
λs.(λz.(λz.(s (s z)) (λz.(s (s z)) z)))
λs.(λz.(λz.(s (s z)) (s (s z))))
λs.(λz.(s (s (s (s z)))))

Another embedding we will need for factorial is an if-else function. That is a function which takes a boolean and two expressions. If the boolean is true, it returns the first expression. Otherwise it returns the second expression. Our embedding of boolean is already such a function. We are just going to put a wrapper around it:

λl.(λm.(λn.((l m) n)))

To use the if-else function, we will need a test for zero. To test if a number n is zero, we can replace z with true, and replace the ss with a function which ignores the argument and returns false. All together:

λn.(
   (n
       λx.(λt.(λf.f)))
       λt.(λf.t)
   )

The final piece we need for factorial is a way to embed the recursive definition. To do this, we use something called a fix-point combinator. The most famous fix-point combinator is the Y-combinator λf.(λx.(f (x x)) λx.(f (x x))). It takes a function f, and creates λx.(f (x x)) applied to itself. Those xs inside will be replaced with the expression itself, λx.(f (x x)). Since those two xs are applied, one to the other, this results in recreating the original pattern, but as an argument to f. The way the recursion stops is when f returns a constant without evaluating x x. This however will only work with lazy evaluation. We can make a small change to the Y-combinator to make it evaluate lazily even when evaluated by an eager reducer. We just wrap x x in a dummy lambda, λy.((x x) y). That expression is irreducible from the outside, but once it is applied to something, the wrapper goes away, exposing the reducible x x. All together, our fix-point combinator is:

λf.(
    λx.(f λy.((x x) y))
    λx.(f λy.((x x) y))
)

To use the fix-point combinator to imbed recursion, we need to apply it to a function with the right properties. This function should take another function, and some value. If the value matches the base case, return some fixed value. Otherwise, make a call to the passed in function. We also again need to take steps to keep things lazy. Each path needs to be wrapped in a dummy function which ignores it’s argument and returns the expression itself. Together the form is:

λf.(λv.((((
    λl.(λm.(λn.((l m) n)))
        <predicate of v>)
        λx.<base expression>)
        λx.<expression using f and v>)
    )
    λx.x -- this is just a dummy value to open the selected path
)

For factorial, our predicate is testing if our value v is zero:

(λn.((n λx.(λt.(λf.f))) λt.(λf.t)) v)

Our base expression is simply one:

λs.(λz.(s z))

Our common case expression is to multiply v by f of the predecessor of v:

(λn.(λm.(λs.(n (m s)))) -- multiplication function
    v) -- our value
    (f -- recursive call
        (λm.(λp.(p λt.(λf.t)) ((m λp.((λl.(λr.(λb.((b l) r))) (λp.(p λt.(λf.f)) p)) (λn.(λs.(λz.((n s) (s z)))) (λp.(p λt.(λf.f)) p)))) ((λl.(λr.(λb.((b l) r))) λs.(λz.z)) λs.(λz.z)))) -- predecessor function
            v)) -- our value

All together, the function which we will pass to our fix-point combinator is:

λf.(λv.((((λl.(λm.(λn.((l m) n))) (λn.((n λx.(λt.(λf.f))) λt.(λf.t)) v)) λx.(λs.(λz.(s z)))) λx.((λn.(λm.(λs.(n (m s)))) v) (f (λm.(λp.(p λt.(λf.t)) ((m λp.((λl.(λr.(λb.((b l) r))) (λp.(p λt.(λf.f)) p)) (λn.(λs.(λz.((n s) (s z)))) (λp.(p λt.(λf.f)) p)))) ((λl.(λr.(λb.((b l) r))) λs.(λz.z)) λs.(λz.z)))) v)))) λx.x))

Bonus problem

Should the interviewee see through our tricks and call us out, we can present them a different problem. Does the below reduce to one when applied to any church numeral (other than zero)?

(λf.(λx.(f λy.((x x) y)) λx.(f λy.((x x) y))) λg.(λn.((((λl.(λm.(λn.((l m) n))) (λm.(λp.(p λt.(λf.t)) ((m λp.((λl.(λr.(λb.((b l) r))) (λp.(p λt.(λf.f)) p)) λt.(λf.f))) ((λl.(λr.(λb.((b l) r))) λt.(λf.t)) λt.(λf.t)))) n)) λx.(λs.(λz.(s z)))) λx.(g (λm.(λt.(((λp.(p λu.(λv.(λw.u))) t) (λp.(p λu.(λv.(λw.v))) t)) (λp.(p λu.(λv.(λw.w))) t)) ((m λt.(((λl.(λm.(λr.(λt.(((t l) m) r)))) (λb.((b λt.(λf.f)) λt.(λf.t)) (λp.(p λu.(λv.(λw.u))) t))) ((((λp.(p λu.(λv.(λw.u))) t) λn.(λs.(λz.((n s) (s z))))) λx.x) (λp.(p λu.(λv.(λw.v))) t))) ((λn.(λm.(λs.(λz.((n s) ((m s) z))))) λs.(λz.(s (s (s z))))) (λp.(p λu.(λv.(λw.w))) t)))) (((λl.(λm.(λr.(λt.(((t l) m) r)))) λt.(λf.t)) λs.(λz.z)) λs.(λz.(s z))))) n))) λx.x)))

It is an embedding of the Collatz conjecture

Leaving Note

Developing these prompts was challenging. My main resource was Types and Programming Languages by Pierce. I did my experimentation and prompt generation in Haskell. These are the general notes and I also threw together a quick cycle detection module to verify my expressions cycled as I expected.