DEV Community

Cover image for A Deep Dive into Lambdas
Maow
Maow

Posted on • Updated on

A Deep Dive into Lambdas

Introduced in Java 1.8, the lambda expression (a.k.a. closure, anonymous function) is an important piece of syntax for code cleanliness, reduced repetition, and functionality.

It provides the language—and its siblings, like Kotlin and Scala—with the ability to store a function as a value, which can also then be used to implement higher-order functions, i.e. functions that take or return other functions.

But how does it all work underneath the surface?

Functional Interfaces

As Java is a statically typed language, a lambda needs to have a type. The type of a lambda includes its parameters and its return type, which can be seen with functional interfaces like Function<String, String>, analogous to String function(String).

What exactly makes a functional interface so special?

They have a single restriction: They may contain multiple default or static methods, but only one abstract method. This is why functional interfaces are sometimes referred to as "SAM (Single Abstract Method) interfaces."

This restriction exists precisely because the compiler needs to know which method will be implemented to create the lambda, for instance, get in Supplier or apply in Function. With multiple abstract methods, any of them could be implemented when writing a lambda.

Small example of a custom functional interface:

// The @FunctionalInterface annotation is
// used to tell the compiler to enforce the SAM restriction.
@FunctionalInterface
interface TokenProvider {
    ApiToken get();
}
Enter fullscreen mode Exit fullscreen mode

The Low-Level

In order to talk about how lambdas really work, we need to give a brief explanation of the Java Virtual Machine (hereby referred to as the JVM).

When a JVM-based language (Java, Kotlin, etc.) is compiled, it outputs a file with the .class extension. Class files contain what is called bytecode. Bytecode is a term used to describe any compiled code made of raw bytes that is executed by a virtual machine rather than your CPU.

A virtual machine (not to be confused with VirtualBox, VMware, QEMU, and etc.) is a program that processes and executes bytecode, with the optional step of transforming that bytecode into optimized machine code, which is ran on the CPU. Each virtual machine handles a different bytecode language, i.e. the JVM handles JVM bytecode, which is designed specially for the Java platform and contains information about classes, fields, methods, and etc.

Bytecode Instructions

In the JVM, code is split up into single-purpose functions called "instructions." Instructions have a "mnemonic" and set of arguments, the mnemonic being the name of that specific instruction in the bytecode language.

For a simple example:

ldc "Hello, World!"
Enter fullscreen mode Exit fullscreen mode

This is a single instruction with the mnemonic ldc, short for "load constant."
When it's run, it takes the first argument (which must be some kind of constant value, such as a string, int, or float) and pushes it onto the "operand stack," a stack data structure used for performing operations on values.

When your code is compiled, it is turned into a list of instructions. When your code is run, the JVM starts at the first instruction and makes its way down until it reaches the end or reaches a jump (like an if statement) which makes the JVM jump to a different instruction and begin reading down the list.

Lambda Bodies

You would probably be surprised to learn that the body of each lambda you declare in a class actually becomes a private method in that class. These lambda body methods are generated by the compiler and contain all the code placed in your lambda body.
They follow a specific naming scheme: lambda$<originating method>$<index>
So, the first lambda in your main method would have its body become the lambda$main$0 method.

Keep all of this in mind! It will be very important later on.

Method Handles and java.lang.invoke

Java's Invocation API (java.lang.invoke) was introduced back in Java 1.7. It allows you to create "method handles," which are invokable references to a method or constructor. These differ from the Reflection API in many ways, the most well-known difference being that access checks are performed on method handle creation, rather than on each call like with Reflection.

An additional type provided by the Invocation API is CallSite, which holds a method handle. Despite the simplicity of this, CallSite is very important for reasons that will be explained shortly.

invokedynamic

The core component of lambdas is the invokedynamic instruction. It, like all other invoke- instructions, invokes/calls a method.

However, the difference between it and other invoke- instructions is that the first time that invokedynamic is run, it will invoke a "bootstrap method." This is a method that returns a CallSite object that is "bound" to that invokedynamic instruction, meaning it only needs to be retrieved once. After the CallSite is retrieved, its underlying method handle is invoked and, assuming the method handle returns a value, the returned value is pushed to the operand stack.

This instruction allows us to essentially receive functions that are to be invoked while avoiding performance downsides and compatibility issues.

But, invokedynamic is not the only piece of the puzzle.

The Lambda Metafactory

Under java.lang.invoke exists a highly low-level class called LambdaMetafactory. This class is used to create method handles that return "function objects," which are objects that implement functional interfaces.

The way this is done is somewhat complicated, so bear with me.

LambdaMetafactory has a function named metafactory that returns a CallSite after being given the following parameters (with only a few excluded for simplicity):

  • The name of the method to be implemented, for instance, "get" when implementing a Supplier.
  • The descriptor of the method to be implemented, meaning the return type and parameter types. For a Supplier<String>'s method, there would be a String return type but no parameter types.
  • The descriptor of the CallSite, where the parameter types are the captured local variables and the return type is the interface that should be implemented.
  • A handle to the implementation method, which is called in the interface's method. This is equivalent to the lambda body.

The most interesting part is what the method handle the lambda metafactory creates is.

It generates bytecode at runtime. This bytecode instantiates a new function object that calls the lambda body method with any necessary parameters, and then creates a new method handle that calls that bytecode before returning it in a call site.

The bytecode generation is done via the help of a third-party library called ObjectWeb ASM that allows you to parse and write JVM bytecode very easily. The standard library uses it in this situation due to its performance and feature-completeness.

Finally, a real example of what one of these function objects looks like:

final class Test$$Lambda$1 implements Supplier<String> {
    private final String arg$1;

    private Test$$Lambda$1(String arg$1) {
        this.arg$1 = arg$1;
    }

    public String get() {
        return Test.lambda$main$0(arg$1);
    }
}
// Oh no, it's hideous.
Enter fullscreen mode Exit fullscreen mode

That function object was generated from the following code:

public class Test {
    public static void main(String[] args) {
        String value = "Hello, World!";
        Supplier<String> msg = () -> value;
        msg.get();
    }
}
Enter fullscreen mode Exit fullscreen mode

End

That is basically all there is to the creation and invocation of lambdas. I'm personally compelled by the unexpected complexity of Java's lambdas and the wonderful design that went into its dynamic invocations, so I felt the need to share this information in a (somewhat) easy to digest way.

I hope you found this either fun, interesting, or useful. Bye!

Top comments (0)