Testing AI Chatbots Using Semantic Relevance: A Smarter Validation Approach

🤔 If your chatbot has to answer hundreds of user questions… do you really test all of them through the UI?
If your answer is “yes”, then you already know the pain: 

  • Repeating the same steps repeatedly 
  • Waiting for UI rendering 
  • Dealing with flaky UI behavior 
  • Spending hours just typing questions 

Think about it. 
When a user asks a question in a chatbot UI: 

  • The UI captures the question 
  • Sends it to a backend POST API 
  • The backend processes it (LLM / rules / embeddings) 
  • Sends back a response
  • UI simply renders the text 
  • UI layout? ❌ (covered elsewhere) 
  • Chatbot understanding + correctness of response? ✅ 

That’s where my solution begins. 

Yes — skip it. 
Instead of: 
Open chatbot → Type question → Click send → Read response → Validate 

I asked: Why not directly hit the chatbot API with questions? 

If real intelligence lives in the API, test it there. 

How Did I Implement This?  

Here’s the high-level approach I followed.

✅ Tooling 

  • Playwright (APIContext) – for backend API testing 
  • POST API – the same endpoint used by chatbot UI 
  • Semantic relevance validation – instead of strict text matching 

How Does the Test Flow Work? 

Let’s break it down step by step 👇 

I created a list of real user-like questions, for example: 

  • “How do I reset my password?” 
  • “What is your refund policy?”
  • “Can I change my delivery address?” 

Using Playwright APIContext, each question is sent as a POST request to the chatbot endpoint. 
💡 This simulates: 
“User asked a question” — without opening the UI 

The API returns the chatbot’s response text — clean, fast, and without UI noise. 

• No loaders 
• No animations 
• No flaky selectors 

Just pure response content

Here comes the most important part.

Chatbots (especially AI-based ones): 

  • Rephrase answers 
  • Change sentence structure 
  • Improve wording over time 

So, checking response === expected Text ❌ doesn’t scale. 

✅ Semantic Relevance to the Rescue 

Instead of matching exact text, I validated: 

  • Compare chatbot response with expected intent/answer 
  • Use semantic relevance scoring 
  • Assert that the response is contextually correct, not textually identical 

🎯 This makes the tests: 

  • Stable 
  • Future-proof 
  • AI-friendly 

Absolutely — that’s the real win. 

🚀 Benefits I Observed 

  • 100s of questions tested in minutes 
  • Zero UI dependency 
  • No manual typing 
  • Faster CI execution 
  • Easy to add new questions

Adding a new test is as simple as: “Add one more question to the list” 

🧠 Final Thought 

So, the next time someone asks: 
“How do you test hundreds of chatbot questions?” 
My answer is simple: 

 

 

Leave A Comment