Kyle Carter

Posted on Jan 18, 2022 • Edited on Sep 8, 2023 • Originally published at blog.scaledcode.com

Effective Java: Implement Serializable With Great Caution

#java #effective #serializable #architecture

In the last topic, we covered why we should avoid using the built-in serialization framework in Java. A big part of that serialization system is the _Serializable _ interface. This interface indicates some of the magic promised by Java's serialization. Simply add this interface (which requires no methods to be implemented) and all of a sudden you have serialization. Unfortunately, this is not the case, while that enables serialization there are many concerns related to serialization that the developer of the program must keep in mind. This post covers some of those concerns.

A major cost of implementing the Serializable interface is the loss of encapsulation of internal data structure and thus a decrease in flexibility. Once you implement the Serializable interface the output of the serialization is part of your code's API and thus must not be changed without care. A potential way to minimize this risk would be to create a custom serialized form of your data (an idea that will be discussed in a future item). If you don't take this mitigating action though, by default, all of your private and package-private fields become part of the API.

If you do change the internal structure of your class and then someone tries to use the new code to read an old object byte stream they will be presented with failures. There are specific ways to try to account for internal changes by using ObjectOutputStream.putFields and ObjectOutputStream.readField but these are far from a clean solution. Thus, if you are going to use Java serialization then you need to carefully design a serialized form of the class that you are ready to support for the long term.

An example of the limitations on the evolution of a class imposed by serialization is stream unique identifiers, also known as serial version UIDs. Each serializable class has a unique identifier to specify the serializable version that it is. This can be manually set by declaring a static final long field of the name serialVersionUID. If you don't specify one one will be generated for you at runtime by applying a cryptographic hash (SHA-1) to the structure of your class. This will mean that it will have a consistent value as long as you don't change anything about the structure of the class. This means names of the class, interfaces it implements, most member variables, and even synthetic members generated by the compiler all affect this unique identifier. So even if you made a change that shouldn't affect the serialization of the class you still could be presented with an InvalidClassException at runtime when trying to use it.

Another major cost of implementing Serializable is that it increases the likelihood of bugs and security holes. This is covered fairly extensively in the last topic from this book. A lot of this concern comes down to a backdoor being generated for your classes to be created from. Because there is this hidden constructor it is easy to forget that you must validate the invariants of your class even in this case.

Yet another burden of serializability is the increased testing burden when making changes to the class. If you want a robust program you don't only need to verify that the business logic is sound, that previous bugs haven't regressed, and that your code is performant, but you also need to verify that the serializability is still sound. You can again mitigate some of this burden if you use a custom serializable form and if you minimize the number of versions of your class that can exist in the wild but the burden is still there.

Sometimes implementing Serializable can not be avoided. This should not be taken lightly though. Whether it is because a class is participating in a framework that needs object transmission or persistence or if the class is participating as a component of another Serializable class it can have its uses. When the decision to implement the interface is undertaken it is then our responsibility to do it safely. Within the core language, it has historically been that value classes such as BigInteger and Instant implement as well as collections implement serializable. However, classes that represent active executing items such as Thread and Thread Pools have not.

Serializable should rarely be implemented by classes designed for inheritance as well as new interfaces should rarely extend Serializable. If you do one of the above you will be putting a heavy burden on the future users of your classes and interfaces. You may need to violate this rule if your class or interface's sole purpose is to participate in a framework that requires serializability. Some examples from the core library are Throwable and Component. Throwable requires serializability because it is enabling exceptions to be passed via RMI. Component implements it so that GUIs can be sent, saved, and restored (even though this ability was rarely used).

If you choose to implement a class that is built for extensibility as well as is serializable there are a few items to be aware of. If ther are any invariants that must be kept for your fields then you must ensure that no subclass overrides the finalize method. You can do this by overriding it yourself and marking it as final. If you don't you leave your class open to a finalizer attack. Also, if you have invariants of fields that would be violated if they were reset to their default values then you must add a readObjectNoData method:

private void readObjectNoDAta() throws InvalidObjectException {
  throw new InvalidObjectException("Stream data required");
}

This method was added in Java 4 to account for the edge case where a serializable superclass was added to an existing class.

When deciding to not implement Serializable on a class built for extensibility you need to also consider if a subclass would reasonably need to implement Serializable. This is because deserializing requires the superclass to have an accessible parameterless constructor. If there is no such constructor, subclasses must follow other patterns to succeed.

Finally, inner (non-static) classes should not implement Serializable. The way these are implemented is using synthetic fields that store references to its enclosing instance and to store values of the local variables from the enclosing scope. The way that these are defined is ill-defined and thus should be avoided. Static member classes, however, don't have this issue.

In summary, correctly implementing the Serializable interface is full of pitfalls. Unless you have a high level of control of your environment where versioning and data inputs are constrained you will be in an uphill battle. This only gets more challenging when also introducing inheritance.

DEV Community

Effective Java: Implement Serializable With Great Caution

Latest comments (0)