Ensuring Safe and Accurate ChatGenie Chatbot Interactions with Llama3

July 26, 2024 12:54 PM

LLMs powered by transformer models have taken the world by storm, with products like ChatGPT reaching over a million users within the first month of release and surpassing 180 million users by May 2024. While these technologies are incredibly useful, ensuring their safety and accuracy is paramount.

Given LLMs' vast knowledge and capabilities, enterprises are concerned about the potential misuse of AI-implemented features to generate content that falls outside the scope of business operations or could be harmful, thus damaging the brand’s reputation. Hallucinations, where the AI generates incorrect or nonsensical information, are a critical concern and a significant reason why enterprises are hesitant to fully integrate AI into their workflows and processes.

At ChatGenie, we prioritize safety and accuracy in our Copilot capabilities, specifically in our Chatbot implementation. While foundational models come with built-in prompt guards to ensure safety, we have developed our prompt guard agent as an additional layer to guarantee the safety of our Chatbot capabilities. This agent ensures that our Chatbot can detect and address the following:

Offensive Content: Filters out profanity, hate speech, sexual content, and harassment.
Sensitive Information Protection: Protects personal data and maintains confidentiality.
Misinformation Prevention: Prevents the spread of false information and avoids biases.
Security: Guards against injection attacks and ensures secure data handling.
User Experience: Avoids irrelevant, off-topic, overly complex, or repetitive inquiries.
Legal Compliance: Adheres to regulations like GDPR and CCPA.
Ethical Considerations: Ensures cultural and emotional sensitivity in responses.

Here’s an animated sample interaction of ChatGenie Prompt Guard Agent in action.

ChatGenie Prompt Guard detects sensitive information. The prompt guard protects the exposure of sensitive information like transaction-related information and personal information.

ChatGenie Prompt Guard detects multiple, off-topic, and complex inquiries. The prompt guard prioritizes relevant and manageable inquiries.

To further improve the quality and maintain the accuracy of our Chatbot responses, we have implemented a response refinement agent that serves as a QA for every response generated by our Chatbot. The design considerations for this refinement agent include:

Accuracy: Ensuring responses are factually correct and relevant.
Clarity: Maintaining concise, simple, and unambiguous language.
Tone and Style: Keeping a consistent, appropriate tone that reflects empathy and professionalism.
Relevance and Context Awareness: Providing contextually appropriate responses that maintain conversational flow.
User Satisfaction: Aiming for helpfulness and encouraging user engagement.

To optimize performance, we chose Llama3 for these agents due to its advanced natural language understanding, high-quality language generation, and efficient performance. Its smaller context length compared to GPT-4 suits the simpler tasks of prompt guarding and response refinement, offering faster processing times. Additionally, Llama3 includes robust Filipino language training data, making it effective for localized content. Its advanced filtering capabilities, compliance with data protection regulations, and adherence to ethical standards ensure safe and respectful interactions.

Want to bring ChatGPT-level technology to your Messenger and Instagram interactions? With ChatGenie, you don’t need to worry about complicated flow editors—our AI handles all responses seamlessly. Contact us using the form below to set up a meeting.

‍

Back to Blog

The Agentic AI Evaluation Playbook: How We Compared GPT-5.2, Claude Sonnet 4.5, and Qwen for Enterprise Deployment [Compressed Version]

May 20, 2026 9:25 AM

ChatGenie's production customer support automation platform was built on Microsoft Azure. Our Orchestrator Agent ran on GPT-5.2 — tuned from a GPT-4o baseline — and it had earned its place: 99% resolution accuracy, a 77% reduction in support OPEX, and a customer team rightsized from 39 to 9 agents for our flagship enterprise client. When we migrated to Amazon EKS and AWS Bedrock, we lost access to Azure AI Foundry and with it, our path to GPT-5.2. We needed a replacement sourced entirely from Amazon Bedrock. This article documents what we found.

The Agentic AI Evaluation Playbook: How We Compared GPT-5.2, Claude Sonnet 4.5, and Qwen for Enterprise Deployment [Complete Version]

May 20, 2026 9:25 AM

When we set out to evaluate large language models for ChatGenie's core agentic workflow, we quickly realized that the benchmarks published by model providers — MMLU, HumanEval, MATH — told us almost nothing useful. We were not building a trivia assistant. We were running a production customer support automation system with real enterprise clients, live conversation volume, and service-level commitments that could not absorb experimental failure. The question we needed to answer was not "which model is smartest?" It was "which model behaves most reliably inside a constrained, multi-agent pipeline operating under production conditions?

How We Cut Latency by 50% by Simplifying Our Agentic Architecture

January 22, 2026 2:04 PM

When we first designed ChatGenie's agentic system for customer chat operations, we followed a principle that seemed intuitive: separate concerns into separate agents. Intent classification? That's one agent. Policy enforcement? Another agent. Response generation? Yet another.The result was a five-agent core chain that was clean, modular, and easy to reason about. It was also slow.Each agent in the chain required a separate LLM call. Five agents meant five round-trips to the model. In customer chat, where users expect near-instant responses, this cumulative latency was becoming a problem. Users would see typing indicators for seconds before receiving a response. Containment rates suffered as impatient users escalated to human agents.We needed to rethink our architecture.

View Blog