Testers are often caught into the trap of analyzing just the GUI interface and not looking beyond it. It is helpful many a times to go beyond the black box view and explore a little further. This is typically a grey area between black box and white box, wherein a tester tries to understand the application a little further without access to the source code.
One such area is testing a software by modifying/corrupting the contents of associated files. This is ideally called “File Fuzzing”. In this post, I’ll pick up an interesting case of testing by corrupting a blank file.
Yes! You read it correctly! – Corrupting the contents of a blank file! You might think how can a blank file have contents? Let’s jump directly into a practical example.
Microsoft Excel supports publishing “.xls” files (there are others too, but let’s stick to this and let’s not think about how many different versions of xls it supports). So, an xls file is an associated file for MS Excel software. Let’s see what a “blank” xls file might look like.
To create a blank xls file or empty xls file:
- Right click in the Windows Explorer and click “New->Microsoft Excel Worksheet” OR
- Open the MS Excel interface from Programs and click File->Save As.. and give a name of choice.)
I find the first one quicker. So, here goes the snapshot of how it looks in windows Explorer:
A blank file? Still 12 KB!! More on this later, first let’s see this file in MS Excel:
So, it’s essentially a “blank” file as far as its purpose as a spreadsheet is concerned. There’s no data.
But did you get a hint? It’s still a 12 KB file. There’s definitely some data in it. Let’s open it in a hex editor and have a look:
So, you can see hundreds of rows of data. This data is used by MS Excel to render the file. When you edit the data, this is added in the proprietary excel format.
Coming to testing front, this sequence of bytes may not make any sense to you till you start looking at the Excel format. These values will typicially be a sequence of records wherein each record contains a sequence of bytes telling its type, length and the data contained in it. Software reads the file parses it fully to render it (or parses, renders, parses, renders and so on…).
As you can think of, there can be infinite number of test cases for these values e.g. around boundary values of each record length allowed by software and the one constrained by the number of bytes/bits.
I took Excel as just an example, you can try this any multimedia file, office productivity suite file, archive file and you will find similar observations – the key is to locate such files for your software.
So, next time someone asks how many test cases can you conduct by utilizing an “empty” file, you know what the answer should be!
Site Admin, www.testingperspective.com
Very intelligent observation.
Thanks for explaining the technique of File Fuzzing!
A few tests that come to my mind immediately include:
1. Read and update the user (author) information.
2. Read the application (Excel) information.
3. Read and update the document information (number and names of sheets, fonts and blank data).
4. Open the file in a different operating system (different version and different language/ locale)
5. Archive the file and read the above information.
Inder P Singh
Thanks for sharing your thoughts.
One essential aspect of file fuzzing is playing with assumptions. We need to break down the binary data into TLV (Type-Length-Value) records. For each one of them, we need to come up with test cases and then an automation set-up for executing the tests.
The article is really nice and was interesting enough to read till the end. I however have a question here. Why this is important to test this particular aspect. If a s/w has a file which is important for it to run, and he tampers it’s template/format it’s ought to change so how exactly it is important to test such cases.
I understand that corruption of any single file can cause any s/w to stop working but point here is, can we fool proof a file from getting corrupted with the help of testing.
Please guide me if I’m not understanding the core of the issue here.
The impact would depend on the kind of software and whether the file can be user-supplied/forced in some way via other mechanisms, thereby making it a valid entry point.
If it is a server software, for example, that takes a user supplied file to process, but with a corrupt file, it hangs/crashes, it is the case of DoS Denial of Service, rendering it non-functional for legitimate users.
If it is a local application, making it crash is not that serious ( although needs to be handled ), but if the corruption is carefully crafted to, say, cause a buffer overflow that is exploitable, then it can be serious. E.g. with exploitation, a shell could be spawned from the local system which listens at a port. So, the remote attacker can execute arbitrary commands.
If the above happens for a server machine, you can imagine the seriousness.
What if the attack is focused at the anti-malware software. The DoS itself would lead to a non-functional security system, rendering your system, now prone to attack from other malware. What if this file is sent as an attachment in an email, which gets scanned by an anti-malware solution on an email server and caused DoS? It means, then on no malware would be detected and cleaned on the email server from the emails.
I hope I answered your question.
Thanks Rahul. That really answers my questions, the one which I asked and the ones which I just thought of. 🙂