As part of our general security automation toolkit we use a number of third-party cloud-based services that allow us to take advantage of expertise and specialisation in those partners.
The services we use are:
- Ghost Inspector
Probely is a security scanning service that offers a variety of different scans. Lightning scans check endpoints for basic issues and we run these against production and staging environments before the work day starts and just before it ends.
We use the OWASP extension to the more in-depth scan and we run that several times a week. This scan is more akin to a manual penetration test (and there is a human element to the process so it isn't fully automated) and when we are asked for the result of our last penetration test I now supply the most recent PDF export of the Probley report. So far this has kept auditors and researchers happy.
When we first ran Problely it discovered a problem in our AWS and application setup and we felt it was important to get a clean report so we resolved the issue and generally we try to resolve any issues it raises within two weeks of its discovery. Recently though we found that very few Problely customers have zero issues in their scans and clients have been equally suspicious. If I had known that then I might have kept in a low-priority issue.
Having corrected various HTTP proxy issues the most pernicious issues that Problely now finds for us our XSS style issues. This has changed some of the ways we use our features now.
We started using Ghost Inspector as a way of improving our integration testing and removing some of the headless tests that happen in our test runner system.
It has been such a life saver in terms of catching regressions that it would terrifying not to have it or an equivalent system. It's rare now that we ship a regression all the way to production that we have to revert. The vast majority are all caught in the testing process.
Ghost Inspector allows you to capture test scenarios via a browser plugins but the resulting CSS selectors tend to be very specific and brittle. The system comes into its own if you actually program the steps yourself.
Test steps can be abstracted into modules and reshared. The system has a pretty good screenshot comparison tools. A complete history of runs and failures (again providing a great audit trail). Videos are available of what happened during the test run. It also provides a fake email system which is something we thought we were going to have to build ourselves.
Ghost Inspector has changed the way we've approached some of the ways we now develop the system. We have screens that are specifically designed to expose information to Ghost Inspector for verification. We've massively improved the structure of our DOM to be more machine readable.
We also have some classic hot points in our code like primary keys used as page parameters where it would be too hard to switch to opaque identifiers so instead we have regression tests that try to access different pages by direct access to confirm that the access control system is working as intended. This is the only way barring a big rewrite that we could guarantee this potential security hole is not in fact a problem.
We've also seen that a lot of our clients use manual testing process when checking our platform for vulnerabilities. Often they share their process when they do discover bugs and we've been able to turn those into Ghost Inspector tests. We've been able to build a suite of tests based on the different expertise of the various companies we've worked with. It is now like we have a little team of virtual security researchers on our side!
Buildkite is the glue that sticks all our automation together. While advertised as a continuous integration (CI) platform is actually a general kind of task automation system and is useful for any kind of scheduled or triggered task.
Buildkite does manage all our CI builds but it also handles deployments, refreshes of our pre-production environment and other tasks.
Within its pipelines it will deploy software and then use Ghost Inspector and verify the results of the deployment.
Before we switched to Buildkite we were using [Jenkins. This is a venerable tool but there were two major issues for us with trying to run our own CI service.
Firstly the truth is that running Jenkins effectively is hard. It relies on filesystem access, to get auditable controls you need to run plugins which need to be updated and maintained, the pipeline functionality is a late addition to the system rather than a core element.
Secondly after attending a few secops conferences it was clear that Jenkins was one of the top targets they look for when looking for bug bounties (the number one is unsurprisingly Wordpress).
This means we had a high-profile target that we didn't have confidence in our ability to secure. By not using Jenkins a number of drive-by hackers will just pass you by and move on to other organisations that are. We needed to make a change.
One of the key differences between Buildkite and CI systems like CircleCI is that you bring your own agents to Buildkite. We can run jobs in our own AWS accounts and assign our own security policies to the agents. This means we can choose to completely isolate one cluster of agents without affecting the more privileged permissions of clusters that do deployments to ECS that need a lot of permissions.
We do also use CircleCI if what we want is pure CI or lightweight automation and the contents of the repo is low risk.
Sentry is primarily an exception reporting and aggregation service. However because it also records who in your team has looked at an exception it means it can be used to confirm that you make periodic reviews of the errors that occur on your system.
You should probably use an error aggregation service anyway but it is worth looking at how you can use them to also provide a way of delivering security objectives, for example explicitly ignoring common but irrelevant errors and looking for unexpected issues.