How does an innocent looking interface element lead to such disastrous consequences like disclosure of credit card information, remotely controlling a machine, hijacking user identities and so on? Data becoming code via cleverly crafted payloads is one of the key security issues and is at the heart of many of the security attacks.
Above thoughts form the basis of my new tutorial which I am going to present at STeP-IN SUMMIT 2015. It discusses the subject from the ground up so that the attendees come to know about the core of this problem while appreciating the consequences via live demonstration done by me.
For the benefit of my readers, most of whom would not be there to be attend the tutorial, I am sharing the slide-deck at the end of this blog post. Also, I am writing down the key ideas below for reference. Most of them are questions, and debatable ones.
The Eye to Behold
- Look at the text below and identify everything that you see and interpret:
<strong> int i = 1;</strong>
- The usual answers that I get are at high-level abstract layers, the industry has become so used to. We forget to see simple things and hence fail to build knowledge that needs them as the base.
- Some basic questions to ask:
- Did you see the space?
- Two kinds of spaces?
- The = and ;
- Did you think about the constraint that int is a keyword?
- Which language is it?
- Size of memory?
- Little Endian? Big Endian?
- What comes on right vs left?…
- A key question is – “Do you have the eye to behold?” Seeing is not beholding. Beholding needs to be cultured and practiced.
Compiled vs Interpreted Code
- Is Compiled Code interpreted?
- Can we see processor on a machine as a Virtual Machine just like JVM? Why does it matter that it runs OpCodes (machine Code) instead of bytecodes?
- What are the implications of this understanding?
- What does this understanding help us with?
- Is SQL an interpreted language?
- Can HTML be treated as one?
- What about a simple path string? Can we treat the code which makes sense out of a path string a VM?
- What about the system command modules in various languages?
Code – Data = ?
- Or Code = Data + ?
- Or Data + ? = Code
- Most of the times the ? is a simple ‘ or ” or ; or something similar
- The moment we understand any data is on its way to become code when it breaks the data cage, we start appreciating the injection and overflow attacks in a much generic way.
- A simple example: ‘Testing’ is a string, not because of something to do with the word Testing. Just put a quote after Test and see what happens:
- Now Test is still data but ing has broken the data cage. It would be run (post byte code conversion) by an interpreter as a part of its code and not data.
I would demonstrate these concepts via multiple different attacks which are usually studied separately and as if they are fundamentally different from each other. The core problem of them all is – Data can become code.