13 February 2026

By: Hema Gopalakrishnan / blog / 0 Comments

Testing AI Chatbots Using Semantic Relevance: A Smarter Validation Approach

Let me start with a question.

🤔 If your chatbot has to answer hundreds of user questions… do you really test all of them through the UI?
If your answer is “yes”, then you already know the pain:

Repeating the same steps repeatedly
Waiting for UI rendering
Dealing with flaky UI behavior
Spending hours just typing questions

Now let me ask the real question ?

Is UI Interaction Really Required to Test Chatbot Intelligence?

Think about it.
When a user asks a question in a chatbot UI:

The UI captures the question
Sends it to a backend POST API
The backend processes it (LLM / rules / embeddings)
Sends back a response
UI simply renders the text

So what exactly are we trying to test here?

UI layout? ❌ (covered elsewhere)
Chatbot understanding + correctness of response? ✅

That’s where my solution begins.

What If We Skip the UI Completely?

Yes — skip it.
Instead of:
Open chatbot → Type question → Click send → Read response → Validate

I asked: Why not directly hit the chatbot API with questions?

🎯 Key Idea

If real intelligence lives in the API, test it there.

How Did I Implement This?

Here’s the high-level approach I followed.

✅ Tooling

Playwright (APIContext) – for backend API testing
POST API – the same endpoint used by chatbot UI
Semantic relevance validation – instead of strict text matching

How Does the Test Flow Work?

Let’s break it down step by step 👇

1️⃣ Prepare User Questions

I created a list of real user-like questions, for example:

“How do I reset my password?”
“What is your refund policy?”
“Can I change my delivery address?”

2️⃣ Send Questions Directly to the Chatbot API

Using Playwright APIContext, each question is sent as a POST request to the chatbot endpoint.
💡 This simulates:
“User asked a question” — without opening the UI

3️⃣ Capture the Chatbot Response

The API returns the chatbot’s response text — clean, fast, and without UI noise.

• No loaders
• No animations
• No flaky selectors

Just pure response content.

How Did I Validate the Answers?

Here comes the most important part.

Why Not Exact Text Matching?

Chatbots (especially AI-based ones):

Rephrase answers
Change sentence structure
Improve wording over time

So, checking response === expected Text ❌ doesn’t scale.

✅ Semantic Relevance to the Rescue

Instead of matching exact text, I validated:

Does the response mean what it is supposed to mean?

How?

Compare chatbot response with expected intent/answer
Use semantic relevance scoring
Assert that the response is contextually correct, not textually identical

🎯 This makes the tests:

Stable
Future-proof
AI-friendly

What About Scale? Can This Handle 100 Questions?

Absolutely — that’s the real win.

🚀 Benefits I Observed

100s of questions tested in minutes
Zero UI dependency
No manual typing
Faster CI execution
Easy to add new questions

Adding a new test is as simple as: “Add one more question to the list”

🧠 Final Thought

So, the next time someone asks:
“How do you test hundreds of chatbot questions?”
My answer is simple:

“I don’t ask the chatbot manually.
I ask its API — and I validate its intelligence semantically.”