Uncaging the Meaning of Sampling and its Role for Testers

I had started writing on the subject of sampling, 5-6 weeks back. I soon realised that there are many missing pieces. I had started talking about sampling without building the larger context. So, I wrote many more articles to primarily cover the concept of Test Transaction, Interactions and the shape-shifting, wandering view of data as input.

It helped me. As I reached this point of writing further on sampling, because of capturing the larger context, I was able to think about arranging my thoughts on sampling in a much better manner.

What is Sampling?

It refers to selecting a smaller representation from a larger set.

Representation of what?

That’s what this article is about.

A simple notion to keep in mind is that you are controlling a test object with a Test Transaction – a combination of interactions involving input/output. It means that you will want to test various paths through its logic, its input classification logic and the associated risks of failure. The key thing is the failure-focus which is at the heart of sampling.

Whether conducting such a test with chosen sample(s) ends up uncovering something undesirable and whether that’s a problem or not, is an aftermath usually. Testers often try to capture this part in Expected Results in test cases. However, one should note that Sampling techniques are heuristics. Their power lies in acknowledging that they can not really determine what to expect all the time. Such attempts also fail to acknowledge a multi-perspective approach to determining whether there’s a problem. That’s a subject of its own and I hope to write a separate series about it at some point of time.

It is obviously wrong to assume that all what you need can ever be specified in its exact detail. That will possibly need the same effort that it takes to create the test object itself – infact more. So, a tester manages this practicality to do sampling for various input variables and then incorporates them into a minimum number of tests. Life is short. Testing budget is even shorter.

This is easier said than done. Sampling is the most difficult thing to do in testing. I am in the same boat as you. I am a part of the same struggle and trying to make sense of it.

Another lesser discussed area is about the treatment of the samples collected as an outcome, which takes a pretty different direction in the subject.

As AI becomes more prominent, sampling is an area which I think is still going to be a human-dominant thinking space for the years to come for those who value quality. I’ve decided to go deeper into it after spending 20 years in testing. You can join this journey too.

Reading my article series is not the only way to do this. Many testers who I respect and read have already produced a large body of work in this area. Read them. You can as well read mine too along side, as I am going to fill in some gaps which I think still exist in the work done so far in testing space. E.g. the usual methods of sampling mostly disregard interaction as an input and hence are not able to find some classes of bugs. I’ll try to expand on this idea much later in this series.

For the same reason, I am not going to follow the traditional way of discussing test design with named techniques. I am going to follow a problem-first approach. I am going to keep the techniques, their names, everything open-ended for interpretation.

Sampling is NOT just about Input Data

A tester, knowingly or unknowingly, uses sampling in various activities. In my opinion, the more we do it knowingly and consciously, the more we turn it into a skill.

Thinking of sampling in terms of input data, as popularised by Equivalence Class Partitioning and Boundary Value analysis, is pretty common in testing community. However, based on my discussions with hundreds of testers in workshops across the world, this is where most testers stop their thinking. Let’s dig a little deeper.

Sampling Applied to Input

When as a tester, we think of input, rather than starting from data, we should start from the concept of interactions. Please refer my articles on the Input/Output models.

As a part of a test, you will do some explicit interactions, which in turn result in many implicit interactions within and/or outside the test object.

These interactions have various variables associated with them. For example, temporality, parallelity, ratios, existence (happened or not?) etc.

Which interactions are in focus and what do you include in a test can fall in the domain of sampling.

Sampling as applied to Interactions is pretty evident in performance and security testing.

Across these interactions, there is data involvement. Check my articles on the shape-shifting view and Wanderer mental model of data.

Equivalence is mostly a notion, a mental construct, a practical compromise.

Thinking in terms of partitioning of the universal space of data is a basic premise on how to manage the infinity. Finding representatives is a core need too. However, equivalence, boundaries, typicality etc are pretty ambiguous and notional concepts in practice. A tester needs to keep his/her mind open in this regard.

Writings on testing by testers have a functionality and end-user perspective bias. This bias is clearly seen in the way core sampling techniques are discussed in these texts.

A typical tester puts all special characters in a single partition for example. This is how it starts. Sadly, this is where his/her mental block also happens. S/he almost never gets to the question – What is special about an individual special character? Can a special character be in a partition of its own?

Furthermore, there is a one-thing-at-a-time bias in most testing done by testers. Typical automation also ignores this aspect.

This applies to data as well as interactions. Multiple samples of same nature (equivalence) are discouraged. That’s a functional tester’s bias. In a performance test, a tester WILL need multiple samples from the same partitioned set spread across interactions and users.

State of a test object should not be thought of as static. This is a basic, implicit assumption made in test design, mostly seen in the wild. When is a certain interaction taking place, can have inter-relationship with other interactions happening in the system at that time. However, as interactions themselves are not really sampled, this aspect gets lost.

Sampling based on Output

Output-based Equivalence Class Partitioning is included in some texts, mostly as an extension to the input-oriented ECP. It is mostly ignored as soon as it is introduced in many such texts too. This is ironical because output is the basis on which equivalence is decided even in the input-based ECP.

Output/outcomes are in fact critical to sampling.

I can go on to say that output is more important than input when one is thinking of sampling in testing. This is because tester’s version of sampling should have a failure-bias: Which samples can push the test object into its failure modes?

A simple, concrete example of this is coverage of known error codes/ exceptions. Then working backwards from there to select samples.

Sampling Applied to Output

Another perspective about this is to apply sampling to the output itself (which again includes interactions and data). This is especially seen in a performance test where you collect multiple readings of what you are measuring. Let’s say you collected 10000 samples of response time for a given interaction with different data. Are they equivalent? Does the choice of data make them further categorisable? For example, a search interaction which used different words? Is it fine to average these 10000 readings or categorise and average based on word groups? When did these readings take place? What was the user load? What was the ramp-up/ramp down pattern? Was the user load constant? Should we consider average or median or percentiles? What about outliers? Ignore them? Study them separately?

If you are thinking, the above is irrelevant for you, as you are doing functional testing, consider this. This happens more explicitly in automated tests in the wild these days. During human testing, one might end up doing this mistake unknowingly:

Retry-on-fail is an often positioned and desirable feature these days. Let’s say a test failed once. Now you try it second time, it passed. The prevalent bias is to now believe the second test result and mark the test as passed? Do you realise the problem here? The moment second time test execution was triggered, the tester had entered the domain of statistical significance without realising it. Are 2 samples enough? How many samples should be collected and how many times it should pass, to say the test has passed?

Sampling Elsewhere

What is discussed so far is still a very partial view of sampling.

A tester can see sampling applied in various other activities, relevant to him/her

Choice of what to automate
User personas
Sampling of samples (e.g. payloads in security testing)
Sampling of integrated third-party services at the back-end e.g. sources of data.
Test data preparation in environment to determine what to pre-populate
Proof of concepts and piloting
Studying reported defects to find problem patterns/root causes
Creating case studies
Surveys
A/B Testing

Further Thoughts

We often hear – “Anybody can do testing”. This sentence is correct with one little change:

Anybody can do EASY/CHEAP testing.

If during reading this article, you were thinking, where’s time and budget for this style of thinking, please think again as a tester. Are you done with testing because the budget is over or because you are done with all your ideas in a span of time? What if you come across a company/team/management that believes in deeper testing? Do you plan to start exploring deeper testing only when someone asks you? Is your relation with testing a conditional relationship based on opportunity? Then of course, my style of thinking about testing is completely irrelevant to you.

I am a dreamer. I’ll continue to remain one.

Your testing skill and how much of that testing skill you can apply to a problem context are two different things. Former should be the motivation for learning, latter its application. More and more testers are treating testing the other way round. I hope that changes.

In my next article, I’ll elaborate on sampling to talk about multiple factors that impact sampling. As a heads-up, when I count them now, I’ve taken a note of more than 60 of them.

That’s all for now.

Rahul Verma: My Petty Bits