Ichhadhari Data: A Tester’s View of Shape-Shifting Data in Variables

It took a lot of writing to reach this point. I had initially thought of writing on Sampling as a core test design technique. I soon realised that directly talking about the inputs will have the same constraint on discussions, as do most existing test design texts. As I want to supplement them with my writing rather than repeating what they explain already in a great manner, I made a digression and wrote 4 articles on explaining the AEIOU model further in terms of various concepts that are critical to a test. This is the 5th article in the series.

The basis for my style of thinking is what I learned from various specialised areas in testing. So, I seek your patience and openness to have a look at input data from my perspective.

The title image has a context and is related to the title. Have fun doing some homework, if you didn’t understand it.

Data and Variable

For some precise distinctions between variable and data, I am using information from SQL and Relational Theory, 2nd Edition by C.J. Date. I have taken the liberty to use the word ‘data’ instead of ‘value’ as it helps in understanding concepts in this article.

Data is a vague term and can mean anything starting from a single value for one variable to values for all variables in a given context.

For the purpose of this article, consider data as a given value for a single variable.

If it helps to call it a data item or value, please go ahead and think in that manner and focus on the concepts being discussed.

A variable is a holder for a representation of value. A variable has a location in time and space.

A variable is updatable. It can be assigned a value. Although a programmer can associate it by name e.g. age, however, we need to think about it in a more open ended manner and in an un-named manner as well. This is where thinking in terms of imaginary/notional names comes into play.

Basic/Built-in Data Types

Type is a an ambiguous, open-ended concept. There are various way to look at type of data.

Let’s look at it in the context of type in programming languages which are often categorised into strongly-typed (e.g. C++) and loosely-typed languages (e.g. Python). A strongly-typed programming language establishes a stricter relationship between the type of a variable/container and the type(s) of values that it can hold. This inter-relationship is not the focus in this section, however this gives a base to discuss which one can think of as some built-in types of data across languages. This notion of built-in types can be seen even in loosely typed languages like Python, where type is associated with the object only, rather than the container which does not have a type of its own.

Integer
Floating-point number
String
Character
Boolean
Complex number

The Default Value

A variable can be in an initialised or uninitialised.

If it is initialised as a part of internal logic before external input is welcomed, this value can be considered as the default.

There can also be a default value assigned to uninitialised variables in some implementations.

Some languages don’t separate declaration from definition. Some of these languages allow an undefined, uninitialised variable name to be used and can resort to unintuitive logic e.g. such a variable is considered “undefined” and undefined could mean 0 in a numerical context or a blank string in a string context. Still other languages like Python in such a situation will raise a NameError exception.

NULL, null, None or any such corresponding value which is assigned by default in many implementations are of interest. In compiled language world, this could mean consuming what is called garbage value.

A tester needs to clearly differentiate between the variables that s/he wants to control in terms of keeping them at a controlled value vs those for which value is changed across tests. So, beyond the defaults, a constantly assigned and controlled value is of interest too.

Container Constraints

The basic built-in data types can at times impose size restrictions. These size restrictions can be explicit or implicit.

For example, an integer can be a 1/2/4/8 bytes in some languages depending on how you declare it. In some implementations, where ‘short’/’long’ kind of qualifiers are used, then if one is not provided, what is the default?

Similar declarations can put limit on maximum number of decimal places a number can take. E.g. float vs double. So, it impacts the maximum precision level of a number.

Such restrictions could also be put on the maximum size of a string.

Another thing to look for is whether a number can hold a positive/negative value. This is restricted in some languages with explicit signed/unsigned qualifiers. If one is not specified, what is the default behaviour of container? Can it hold signed values or not?

Wrongly used containers are difficult to locate in a black-box test. Code reviews or to some extent using static code analysers can help.

Composite/Custom Data Types

The notion of type further extends to more composite data types like mapping types (dictionaries), arrays, lists and so on. Also, they further extend to custom types which one can define using structures or classes. I will not go into them as the focus right now is on simple values and breaking down their concepts.

My assumption is that only when one looks deeper into simpler types, one can make sense of composite types in a deeper manner. So, I’ll revisit this in detail at some time in a future article.

Continuous vs Discreet Variables

There is another way to look at type of variables:

Continuous Variables: These variables are numeric in nature and have a range of values that they can take. The delta/precision between two consecutive values is consistent. Some texts consider only the range with theoretically infinite numeric values (floating points) as continuous, but such categorisation is of little help to a tester for most part. So, if a range has let’s say a delta = 0.1, rather than treating it as discreet, we assume it to be continuous. For the decimal places themselves, we test for transformations separately (e.g. rounding/truncation etc).

Discreet Nominal Variables: These variables contain values which are independent of each other. For example the variable first_name can be considered in this category. Boolean values (True/False) can be considered here as well. If one were to consider 1/0 as True/False, then this is also a case of discreet nominal variable.

Discreet Ordinal Variables: These variables contain values (often non-numeric) where there is a notion of relationship/ordering/sequencing amongst different values. For example the variable student_grade which can take A/B/C/D as values can be considered in this category (because let’s say A > B > C > D in its domain meaning). If one were to replace the grade with numbers, 1/2/3/4, from my perspective, it still helps to think of such a variable as a discreet ordinal variable despite the apparent continuous nature of values.

Constraints

In mathematics, the domain of a function is the set of inputs accepted by the function. And then, maths like always goes deeper into categorisation of this which frankly at this point in my life is way beyond my mind can take interest in or can make sense of. Also, in maths (correct me in comments if I am wrong), the values which a function does not accept or generates errors for, are not considered a part of its domain.

In practice, during testing, we are not really testing a singular function, rather a totality of a mix of many functions which more or less is a black box, whichever level of testing we are at. I’ve discussed about this aspect in my previous articles where I argue that there’s always a black-box and the test shade is always gray.

Following are examples of domain restrictions to consider:

Container Constraints as discussed above in corresponding section.
Logical Constraints put constraints through logic on top of the base container. E.g. although an 8-byte unsigned int can itself contain a pretty big number, when a developer uses it to contain the value for “age” of a person, there will be restrictions using conditional logic. If this age field is used in the context of allowable age to apply for a driver license, there will be further logical restrictions.

Same variable and its value can be used in multiple contexts and the rules differ.

Format Constraints govern the domain meaning of content part of data. E.g. a string can have a simple length rule as a container restriction. However, by default it can have a length of 0, which is not suitable when the string represents a user name. Similarly, an email variable will have restrictions put on the contents. Regular expressions are often used in textual content perspective for validation by programmers.

Content restrictions can also follow white-listing or black-listing strategies. Whitelisting uses a pre-defined set of allowed characters. Black-listing tries to ban disallowed characters.

Relational Constraints come into play when the concept of domain is extended to multiple variables. Value of one variable can be constrained by the value of another variable. Two variables can be mutually exclusive, mutually inclusive or there could range imposition. E.g. a variable representing number of days is constrained by another variable which represents the month.

References – Connected Data

Often there is a notion of connected data and it is called a reference problem.

E.g. a user can choose to buy a book. For this s/he provides a code BK005. This string BK005 is data in itself, but is more like a key to the actual data represented or connected to this ‘code’ BK005. You can also see this as a base for state management in web applications.

It’s interesting for a tester to explore the connection part beyond the domain of the code/key itself.

A tester focused on functionality might be happy establishing the continuous nature of domain of a code. A tester focused on security will question the whole notion of continuity and predictability of the code and its relation to a back-end object. In short, from security testing perspective, one looks for indirect referencing as a security feature which in the context of this article means ‘code/key’ as a discreet nominal variable rather than a continuous variable.

You know nothing, John Snow

We don’t really know, in a large part of testing, how a value travels and which variable contains it. We just imagine it. This is true at every stage, although the number of these imaginary variables reduces.

Let’s take an example. If you have understood the previous section, you’ll be able to understand the following much better.

You are choosing number of items to order on an e-Commerce application as a part of testing the purchase flow.

Note: I’ve made a slight change to the HTML of Amazon’s website to demonstrate my example here.

Following is a screenshot of this option:

You see, the interface does not really give you a variable name. Although, you can visually associate it to the label and create an imaginary variable in your mind:

Quantity = “One”

At this point, as a tester, what do you think is the type of this variable? Integer or String? Continuous? Discreet?

On a second thought, are you sure?

Above is the actual HTML (used to construct the DOM in browser). Hence this is what will be used for form submission to eventually exercise the functionality to add value to the shopping cart.

What you can realise is that “Quantity” was just a label. The actual “variable” is the name attribute of the drop down list. Its name is “qty”.

qty = “One”

That sorts the variable part. However what about the value itself? Is it “One” which you saw in the GUI?

No.

One

The value is 1 which you see as “1” as the value attribute.

qty = “1”

That’s more like it.

Next thought, is “1” an integer or a string that happens to contain an integer?

At what stage of logic will it become an integer? Which layer validates it? If at this stage it is an integer, is the conversion logic int() or parseInt() kind of functionality? (Looks for only integers in the string or parses from beginning till sees a number and stops.) If one provides a float an int() style logic could reject it, but a parseInt style logic could consider the integer part of it.

It helps to treat even Value as temporary thing because of processing and transformations on its journey away from and towards a test actor.

Those who say that they test as per written down requirements, please stop and think. Is is humanly possibly for anyone to document till this extent for all variables?

The journey of this data has just started. It is still with the client side. It could travel through a chain of such variables, then to be submitted in a network packet, on to a web server, travel a chain of application logic points and so on. You can either view this as a chain of Input/Output data or a data changing its costumes of Representation/Value/Length/Type.

You can only imagine these variables. And practically, most of the times, you’ll think many of these variables as just one, vague variable in your mind.

A variable is just a thought.

The ReVoLT Mnemonic

The ReVoLT mnemonic is inspired by TLV abbreviation that I had come across in literature discussing Fuzzing, which is an automated security testing technique. I’ve added the Representation as a critical part in the said abbreviation to develop the ReVoLT mnemonic.

When one thinks of data/value in the context of a single variable, there are 4 primary aspects :

Representation: What is the format/representation of a given value in a certain protocol/at a certain interface layer.
Value: It is an individual constant. E.g. the integer 10. By itself it has no notion of its location in time or space. For a given variable, a value can be changed. There are still other containers which look like variables, but are constrained never to change their values (they are called constants).
Length: It can be thought of in terms of units of transient or permanent storage that a value takes. It can also be seen just as the count of chars. It should be interpreted in relation to the length of the container at various layers (visual/syntactical/thresholds)
Type: Data has a kind. This kind/category can be thought of independently for a value itself. It should also be seen in the context of the variable/container that is supposed to hold it. We need to consider any implicit/explicit transformations that a value has to go through to be held in a variable/container or errors thereof.

Although the value itself has type and length as its properties, in many protocols, it is captured in separate data. Such data can be thought of as meta-data: data about data.

Have a look at following examples for integer 1.

int i = 1; Value is 1. As it is not in quotes on the RHS, in the context of RHS it is a number. Type of 1 as data can be inferred as a number which is assigned to a container/variable of type int, whose meaning is dependent on the language (and maybe even version). This assignment is subjected to static type checking. At this stage the length appears to be one. Representation of 1 is human-representation.

When we take the above value of 1 and imagine in in memory, it is just a sequence of bytes. The notion of type in itself is gone. So, it’s in the hand of calling logic whether to treat it as a number of not. The length (how many bytes/bits) is again determined by the creator (the language which one used). Let’s assume 4 bytes, so is the representation 00 00 00 01 or 01 00 00 00? That’s a question of Little-Endian (more popular and opposite to human representation) or Big-Endian packing. For negative numbers, it gets much more involved.

The farther you move from point of implementation, the notion of “Type” and even name of the type starts moving towards its domain meaning.

If you were to imagine this:

int age = 12;

becomes more and more on the lines of:

age = 12

Forget technicalities, here please. Think of how this value travels over network or seen in interfaces to understand what I am saying. In simple words, the idea of int being the container is under wraps and age emerges as a concept of Age in the higher layers.

Testers often forget the container perspective of a value during testing as they are dealing with this higher level “Age” variable (and think type is age) rather than a variable with name age of type int. This is a pretty subtle thing to explain and put in words.

When the value 1 is included as a part of a binary data protocol format, often found in networking as well as file formats, Representation problem is similar to the one discussed above. W.r.t. the other 3 parameters is is pretty common to have the TLV aspects as 3 different sets of bytes to define what we call as a record: Type-Bytes Length-Bytes Value-Bytes. E.g. first bytes might tell that the variable type is “Age”, the next two 2 bytes tell you “4 bytes” as length and the next 4 bytes contain the actual value. In text based protocols, this can be sorted out with delimiters as seen in the next point.
In textual formats of information representation like JSON/XML or even in the body of HTML, the notion of type as a domain variable starts becoming more evident. So, one can see or “age” : “1” kind of entries, where notion of type of variable moves to a more high level domain meaning and the type of data itself might need some type of type conversion at times before use.

As a tester, it will be of tremendous help when you understand the grammar part packing of data. It is similar to syntax in programming however it gets a little tricky in binary protocols. I have always encouraged testers to learn programming, to create this appetite. However, somewhere the message gets lost almost every time because most testers confuse learning programming with doing test automation. How many times should it happen to you, before it occurs to you?

In the user interface how the value 1 appears also has dependencies. If it is a value to be got from the user for example, the way it is handled in a text box vs a drop-down list vs a radio-button/check-box could be different. The relationship between Representation, Type, Length and Value becomes technically more vague as such interfaces are supposed to be human-centric and hence are supposed to be intuitive rather than being closer to underlying technical complexity.

Representation aspect of a value beyond simple stuff is a lesser understood concept amongst testers. There is value in digging deeper into it. For example, one could think of various representations across number systems of a number. Looking at the data, it at times helps to understand, do you see hex-encoded data? Does that look like a base-64 string? Is that a 13-digit string, what can it possibly represent, I have seen it often. And so on, so forth.

Breaking and Respecting Constraints

I find that in testing, a general open-ended view can help us much better, which goes as follows:

When we think of input, there should not be a tightly coupled relationship between how we chose input values and what is the supposed domain of what we test. Whether the test object fulfils its promise of domain is something we need to establish and find out. So, while we consider its domain, that’s just one input to our thinking.

We should think of providing non-domain values too to test a test object.

We think of values that can generate errors/exceptions, as many of these errors/exceptions are themselves a part of the overall functionality of a black box.

While maths focuses on precision of the term, testing can benefit from its ambiguous, open-ended interpretation.

Your actual test input set will consist of a combination of respecting and breaking constraints on Representation, Value, Length and Type for a given variable.

As s tester, I can carry only a vague notion of what to expect as a result of the input values and the interactions. I need to keep my mind open to surprises.

In a real test, there is lot happening in-the-moment and in retrospection. Expectations are fluent.

Some existing texts try to draw a line between positive and negative testing. I don’t. These two terms don’t exist in my vocabulary. If it helps to map some parts of this article to these terms for you, please go ahead.

One key aspect of testing is to have precise control over when to follow a protocol and when not to. This precise control is what controls which layer of test object is the target of test and in what manner.

Closing Thoughts

When I think about data in the context of an input variable, I go with the following thought process:

I try to understand the domain of a variable at the primary interface I plan to use and think of input values.
I study the interface constraints and think how can I bypass them. From functionality stand-point, it’s mostly an over-kill. However, from security stand-point as well as general know how of the test object, this is the key. For interest sake, I look upwards to see the final translation if it helps. (see my article on GOLF Heuristic for these two aspects.)
I focus on what inputs I can provide. I shortlist them keeping failure-modes as primary focus in choosing representative values. This is what I call sampling. Whether a test object will fail can not known upfront. Only an actual test can tell that. The concept of failure itself is an open concept. E.g. a test object might throw an error as expected. Vague? Usability issue. Too revealing? Security Issue.
I use systematic approaches as my base. My exploration is built on top of them. I don’t consider exploration as something detached to systematic test design. I am ambidextrous – I reuse what I know and I explore to go beyond that. Because of this I am neither as good as many out there whose expertise is in systematic test design and nor as good as those who are great at pure play exploration. But guess what. Think of me as a good bridge. I think of myself as the missing link, hence this article series.
I proceed with a vague notion of expectations. If something is documented, I take that as input. However, I don’t confine myself to that. There have been many cases, where I observed something, but I didn’t understand it. So, I had to study to know its meaning.

Sorry that this article was long. It would have been longer if I had not controlled myself from elaborating many points.

That’s all for now.