LLMs and Prompting for Practical Application in Testing

Dear Testers,

Let us translate and project. That’s what we do when dealing with something new or completely unknown or scarcely known.

The context is usage of LLMs in testing.

I’m going to project based on my knowledge of automation framework design. A test automation framework typically is a layered work. E.g. let’s say one wants to build a feature-set in a framework on top of Selenium:

1. Selenium library is a raw library providing some pre-defined, atomic Command-Query API to interact with a browser.
2. At the same layer, or at the next layer, one thinks of other existing/new features in framework e.g. w.r.t. test engine functionality, data driving tests etc.
3. At one layer above from step 1 is where the first abstraction/wrapper API in the framework comes into play for refinement of functionality – e.g. auto-wait (based on explicit wating) when one wants to click a button (think of contextual polling).
4. At still further layers, one thinks of further high level wrapping in the context of business, App-Page-Section-Widget-Dialog models, etc.
5. At still further layer, one can think of test expression abstractions like Gherkin, Keywords, Natural language DSLs etc.
6. Still up the ladder, you think of layers on the abstractions in step 5, creating API/GUI/CLI’s accommodating these abstractions etc.
7. In parallel to all these layers, you think of integrations with other modules/services etc.

ChatGPT or any other LLM should be thought of in a similar way, when thinking of usage in testing:

– Consider ATOMIC prompts as stage 1 equivalent of raw API calls.
– What layers can you build on top of that for refinement? What will compounding the atomic prompts mean in a testing context? Filtering? Refinement?
– What is the abstraction you want to build?
– What is your focus when it comes to False Negatives and False Positives? Do you have a mode to switch between what matters more FNs or FPs?
– If the LLM by itself is not interactive/asking questions, can you build a layer in your prompt engineering framework which does that?

Thoughts?


Leave a comment