Serialization is a way of taking data and storing it or transferring data to other systems, machines, or applications. Deserialization is the act of extracting a data structure from a series of bytes, like parsing a YAML or JSON configuration file.
When data is trusted, as in created or maintained by you, then there is often little risk when deserializing this data. Consider a scenario where your application loads a YAML configuration file: your application would need to deserialize this data and parse strings to set various configuration parameters. If you or your organization controls access to this configuration file, then there is little risk that there could be malformed data within it.
However, if your application collects data from a user, such as their first and last name, the risks increase. Your application will serialize this data into YAML to send to another service where it gets deserialized and processed. An attacker could store malicious data hidden within user fields, and if your application doesn’t use a proper library to safely deserialize it, then an attacker could run arbitrary code, steal data, or otherwise abuse your application.
This often happens because a developer doesn't realize that a user has control over the data that they later will be deserializing for use in other parts of the application.
While some static analyzers can do pattern matching to see if you are using specific unsafe deserialization libraries (For example, looking for instances where you are using Marshal.load(), which is inherently unsafe), these tools very easily miss scenarios where the source of the data is not easily observed. Also not every use of these methods is unsafe, so you can end up with many false positives.
The best way to understand what your code is doing is to watch it execute at runtime. Doing this while it is still in local development, provides insights into its behavior to identify security vulnerabilities and performance issues at the earliest possible moment.
In this example, we’ll analyze the runtime behavior of the OWASP Railsgoat project, which is a vulnerable Ruby on Rails application. This project demonstrates the OWASP Top-10 and is a great project for educating developers and security teams.
AppMap can already analyze web applications (using a technique called Remote Recording), but this project supports Docker, which is another way you can run your application to remote record it with AppMap.
Remote recording is helpful if your project lacks extensive tests, or when you want to record a specific user flow or functional test that isn’t represented within an already existing test.
To get started, make sure you already have AppMap installed into your project, you can follow the AppMap docs for setup instructions based on your application language.
To enable AppMap in our Ruby project container, we’re going to add
APPMAP=true to the
docker-compose.yml file to enable AppMap when this container starts running.
version: '2' services: web: build: . command: bash -c "rm -f tmp/pids/server.pid && APPMAP=true bundle exec rails s -p 3000 -b '0.0.0.0'" volumes: - .:/myapp ports: - "3000:3000"
After saving this file, we’ll run
docker-compose build to rebuild this container with AppMap enabled. After the container is built, running
docker-compose up starts the service and forwards port 3000 from localhost into the container.
At this point, we can access
http://localhost:3000 to navigate around this Rails application. If your Docker configuration is set up differently, and you are not forwarding ports over, you’d simply connect to the IP address on the Docker container to interact with the application.
NOTE - You'll need Nodejs to run the commands below.
One way to record AppMaps is by running your test suite, and you can run
npx @appland/appmap record test in the root of your project to kick off your tests with AppMap enabled.
You can record AppMaps by running your tests (if they exist), but in this scenario we'll running a remote recording instead.
npx @appland/appmap record remote then you’ll be guided through the process of connecting to the running application to record your interaction.
You can also use the VS Code extension and select on a record button or use the command palette to start a remote recording.
When we use either of those ways to kick off a remote recording, we’ll enter the URL where our application is running, either localhost if you forwarded a port or the IP address of your docker container.
With recording starting I will navigate through the user password reset flow. It’s important to keep your remote recordings specific towards a single user interaction, and keep them from growing needlessly large.
Going back to VS Code I'll stop the AppMap recording, and will enter a name for the AppMap we recorded, calling it "User Password Reset", then the AppMap opens in VS Code.
The AppMap will be instantly scanned with AppMap Analysis (You can enable early access here) locating our security issue. A deserialization of untrusted data, which would allow an attacker to exploit my web application.
By clicking on the issue, I'll get taken to the trace view in the AppMap where this issue occurs, I can see the parameters sent and the return values.
From here we can see the code that calls this method call is the
PasswordResetsController#reset_password function and I can navigate directly to that line of code from inside the AppMap.
According to the OWASP Railsgoat project the issue with this code is:
During the forgot password flow, after the user clicks on the reset email link, the application verifies the token then adds a Marshaled user object which is posted during the password reset.
As mentioned above, the Ruby Standard Library warns of this by saying:
By design, ::load can deserialize almost any class loaded into the Ruby process. In many cases this can lead to remote code execution if the Marshal data is loaded from an untrusted source.
As a result, ::load is not suitable as a general purpose serialization format and you should never unmarshal user supplied input or other untrusted data.
To fix this issue, we should investigate if we need to even accept a Marshaled user object from the UI. A smarter solution would be to serialize this user data as JSON to send to the backend. Deserialization with JSON is always safe and we can parse the data as strings to reset the user password. Alternative options are to use the YAML.safe_load and adding the specific classes to be deserialized to an allow list.
In practice, depending on a static analyzer to find this problem can bring false negatives (missing stuff) and false positives (false alarms) which are both pretty big problems. False positives because they create so many false alarms that developers just ignore everything - including true positives. Only with the runtime analysis can you see the true source of the data to understand the risk.
Watch a Video Demo: