This video is part of TechXchange: Developing High-Quality Software and TechXchange: Talks
What you’ll learn:
- What is fuzz testing?
- How fuzz testing works.
- What does fuzz testing have to do with Ada programming?
Fuzz testing, or fuzzing, is a way to automatically test applications. It can find errors from memory leaks to buffer overflows. It has garnered interest around safety and security and can be a complement to other testing methodologies, including unit testing.
AdaCore is known for its Ada and Spark compilers that are used to develop safe and secure software for applications like avionics. The company is now looking into fuzz testing.
In the video, I talked with Paul Butcher, Senior Software Engineer at AdaCore, to find out more about fuzz testing.
Wong: Creating applications that run properly is always a challenge for developers, and testing is part of that development process. There’s a technique called fuzzing that probably some of you have heard about, but not most of you.
So, Paul, what is fuzzing? And why is it important developers?
Quite a nice way of explaining what fuzz testing is to consider how it differs to more traditional forms of testing that perhaps people may be more familiar with. One particular form would be unit testing where we take inputs and feed them into an application, and then we compare the outputs against a set of expected outputs. We call this verification testing.
Fuzz testing is different in the sense that it isn’t actually interested in the output of your system. It’s more about the behavior of your system, and, in particular, it’s what we’re doing with fuzz testing. It’s an automated form of testing. We subject an application to a large number of inputs. We’re going to look at the application and try and detect any anomalies that it might be displaying. These tend to be memory safety issues.
Wong: So how does fuzz testing actually work? You say it’s automatic. Do I just flip a switch?
Butcher: It’s not quite as straightforward as flipping a switch, but the tools that are being developed are trying to make the setting up of fuzz-testing campaigns as simple as possible.
There’s two main components to fuzz testing. One of them is getting this generation of input data, and clearly, we need to be doing this as fast as we possibly can. The way fuzz testing tends to generate the input data is that you provide your fuzzer with an initial set of input data, we call this the starting corpus, and then the fuzzer will mutate that data.
There’s lots of different strategies for mutating data. It can be as simplistic as flipping each bits within the binary representation of that data or we can do more sophisticated things like we can change the data with an understanding of what that data represents.
Once we’ve got our mutation strategy in place, we’re injecting this data into our application as fast as we can. We then need a mechanism of detecting whether the application has done something that it shouldn’t, whether it’s operating in an unknown state or whether it’s completely crashed or whether it’s a hung process.
What a lot of fuzzers tend to do is that they look out for things like core dump files that have been raised through segmentation faults, or they can just simply do a simplistic check like if the application is overrun so you can have a time out on that process.
There’s a concept in fuzz testing known as smart fuzz testing, which is where, in addition to the concept of just randomly generating the input data, we want to understand which bits of input data are of interest to us. And there’s lots of different ways of doing this.
A common fuzz-testing tool is called American Fuzzy Lop. I consider this the de facto fuzzing engine for a lot of fuzz testing solutions. It allows you to instrument your application and to test around the basic blocks. You instrument during the compilation phase and this instrumentation will check when the program is executing as these instrumentation points are hit. They’ll be writing data into a shared memory area.
The results are that the fuzzing engine has an understanding of when a test case file has just found a new path through the control flow of the application. It will say, “You made progress. You got deeper into that control flow. I’m going to put you back on the queue and I’m going to subject you to a full mutation phase.” These are the ways of achieving this concept of smart fuzzing. You can also do it with symbolic execution capabilities and theorem provers, but the instrumented approach is often seen as the method of choice because it’s the most simplistic to implement.
Wong: So when should developers be using fuzzing? It sounds like early in the development process, and is it with respect to an individual functions like unit testing or the end application or something in between?
Butcher: It’s an interesting question because, from my experience, I’ve come across companies that will have kind of security testing components of their software development lifecycle or security testing teams, and they will tend to be working on a baseline of the developed application towards the end of the lifecycle. We can do this with fuzz testing.
We can look at the API of our applications and fuzz test at that level, but what we’re finding is that the smaller the scope of your fuzzing campaign, the more benefit you can get from it. But in order to do that, you need to incorporate it much earlier in the lifecycle almost to a point where you offer the capability to the developer at the point where your subcomponent that you’re working on is compiling. This is a good time to subject it to a fuzz-testing campaign. At this point, we’re kind of fuzz testing at the unit level rather than the application level.
Wong: So does fuzzing replace or complement other test methodologies?
Butcher: I would say it’s a complementary form of testing.
It’s traditionally used as a security testing capability because it’s very good at finding memory issues. For example, buffer overflows where, if you’ve adopted a more traditional form of unit testing, you’ll tend to do things like a boundary value analysis where you’ll take your input data and look at the smallest values, the biggest values , maybe add one and take one away from them and maybe a mid-range value.
You can gain some confidence in the assurance of whatever requirement you’re trying to implement with that approach, but there’s clearly huge amounts of permutations of data that you’re missing. So fuzz testing tries to plug that gap by trying to find the weird and wonderful corner-case scenarios in your application that can actually result in one of these memory safety issues.
This is really important because we clearly want to find the bugs in our software applications, but if those bugs can be exploitable, then they become security concerns. It’s really important that we find them and patch them.
Wong: So what is AdaCore doing with fuzz testing?
Butcher: That’s another really good question. And I guess people may immediately question why a fuzzing solution would be appropriate for a memory-safe language like Ada.
It’s traditionally used with more memory-unsafe languages like C and C ++ that are more prone to have things like buffer overflows We found that what we can do is leverage the Ada runtime checking capability with our fuzz-testing campaigns. The runtime checking capability in Ada is quite a powerful feature. So on one hand, the programming language supports aspects like strong typing, and we can pick up a lot of potential safety and security issues during compilation via a static-analysis approach.
In addition, we can test security dynamically using the Ada runtime. The runtime will detect where we are reading off the end of arrays, where we’ve got some potential buffer overflow issues, divide by zero, and all sorts of different range checking and constraint checking. But we need to drive values into those checks for them to be worthwhile.
It’s great to have them there, but we need to feed them with data in order for them to actually detect the issues in our code so we can do that with fuzz testing. We’ve found, through experiments, that we can use the American Fuzzing Lop (AFL) fuzzing engine with Ada code quite early on in this sort of experimental journey. We developed our own compiler pass that performs the instrumentation that AFL needs to get this smart gray box for testing awareness. We have since gone on and developed a method of producing a fully automated test harness for Ada applications.
What we wanted to provide our customers with is the capability to be able to load up their Ada applications and our tool will analyze that code base and tell them which of their subprograms is appropriate to be the starting point for a fuzzing campaign. We do that by analyzing the parameter types associated with those subprograms.
Then we can generate a full fuzz-test harness, including the starting corpus generation. And moving on from that, we can then execute the fuzzing campaign, and, combined with AdaCore’s coverage analysis tool, we can provide them with dynamic statement coverage as the fuzzing session is executing.
Wong: Okay. What is AFL?
Butcher: AFL stands for American Fuzzy Lop.
It’s a fuzzing engine. I believe it came out of Google Labs originally. It’s evolved over the years. It’s been around for a while now. AFL ++ is the latest version. It’s what’s known as the smart gray box fuzzing engine. It will run through GCC or LLVM compilers and it also has the option to run through hardware emulators, particularly QEMU.
It’s smart in the sense that it does this approach of instrumenting the software application on the test around the basic blocks to get this understanding of when it’s found new paths through the control flow.
Wong: So where is fuzz testing headed in the future?
Butcher: When you go right back to looking at the history of fuzz testing, it literally was all about random injection testing, which was very much a kind of a brute force approach. It’s gone off in many different directions since then. There’s a lot of research that’s gone on into this discipline of software testing.
There’s all sorts of theories over correlation between application states and application input data. There’s some really interesting research around the capability called Red Queen that’s been implemented within AFL. I think they’ve called it CmpLog. It is where you want another compiler pass for your application and around all of the comparison statements looking for constant values.
For example, if the value says X equals ten, then do something. You theorize that there’s probably a high predictability that the constant value of ten, if you add it into your input data, it may get you past that bottleneck in the control flow. So you pull it out of the flow control and you put it in the input data in all different places and then you analyze to see whether you’ve made it past that point.
Something that we’ve been looking at is complementing fuzz testing with symbolic analysis. We’re not the only people looking at it. There are other companies out there that are doing this as well. There’s plenty of symbolic execution engines that are publicly available. The one we’re looking at is called SymCC.
When the fuzzer has found a new input that is of interest, it has found a new path through the control. We pass that into the symbolic executor so we compile the application under test with an additional compiler pass that allows us to symbolically execute that application. This is probably a whole different talk altogether.
The discipline of the subject matter of symbolic execution is quite complicated, but it allows you to concretely execute those test inputs whilst also working out what that input data would have needed to have been to take you through any diverging paths in the control flow. The output is an additional test case input data that will find even more branches. Then we feed all of that back into the fuzzer. Again, it’s all about increasing the coverage and touching more of the user code base.
That’s an area that I think is going to be really quite exciting in fuzz testing in the future.
Wong: Excellent Paul. Thanks very much for filling us in on fuzz testing and where it’s headed.
Butcher: No problem at all Bill, thanks. Lovely to talk to you.
Read more articles and view more videos in TechXchange: Developing High-Quality Software and TechXchange: Talks