Sakha: Building a Chatbot That Cares

A digital friend, designed to check in on you and help you improve your mood

NLP
chat-bot
LLM
Published

May 4, 2025

source code

Introduction

In today’s fast-paced world, many of us silently wrestle with stress, loneliness, and emotional overwhelm. Therapy is a powerful avenue for healing, but not everyone needs or can access that level of care. Sometimes, all it takes is a gentle check-in, a thoughtful nudge, or the simple comfort of feeling seen.

That’s where Sakha began not as a replacement for professional help, but as a small, caring presence always within reach. Not as a replacement for professional help, but as a small, caring presence always there, always listening. A friend in your pocket, ready to ask how you’re doing, suggest something that might lift your mood, or simply sit with you in a moment of stillness.

The vision was simple yet ambitious: create a chatbot that doesn’t just respond, but genuinely cares. One that offers warmth, understanding, and personalized encouragement without judgment, pressure, or pretense.

The name Sakha, meaning “friend” in Sanskrit, wasn’t chosen lightly. It reflected the heart of what I wanted this project to embody: a companion who listens, nudges, and supports without judgment. Someone who remembers that you enjoy walks when you’re anxious, or that music lifts your mood when you’re feeling off.

Scope of the Project

Sakha isn’t trying to replace a human friend. It doesn’t claim to have all the answers or the emotional depth of years-long relationships. But what it does aim to be is something quite intentional: a steady, caring companion that’s always available especially in moments when reaching out to a real person feels hard.

Instead of aiming for general-purpose conversation or entertainment, Sakha focuses on a specific emotional use-case:
> Supporting users in their day-to-day mental well-being through gentle check-ins, helpful activity nudges, and empathetic conversation.

Here’s what Sakha is designed to do well:

  • Recognize emotional states and adapt its responses accordingly.
  • Suggest meaningful actions like personalized self-care activities, not generic tips.
  • Hold continuity across a session and maintain contextual memory, enabling more nuanced and human-like responses.
  • Handle sensitive situations with care, offering support resources rather than pretending to be a therapist.
  • Balance modularity and empathy using structured flows powered by LangGraph, but never sounding robotic.

It’s a limited scope, yes, but it’s a foundation one that can evolve over time as we explore more complex interactions, while keeping things simple and meaningful in the beginning.


Building Sakha: One Decision at a Time

Where Do You Even Start?

I didn’t set out with a 100-step master plan. Like most projects, Sakha started with a question: How do I make a chatbot feel like a friend not just in tone, but in behavior? I knew I needed something flexible, expressive, and capable of holding a conversation in a way that didn’t feel robotic. That naturally pointed me toward LLMs.

Choosing the Mind – LLMs and How I Picked Mine

With dozens of options, the choice wasn’t easy. GPT-4? Claude? LLaMA? Each model had tradeoffs in terms of access, pricing, and openness. Eventually, I chose to use open-source LLMs hosted through Groq’s API. Why?

  • Speed: Groq’s inference speeds were impressive.
  • Flexibility: It supports open-source models like LLaMA3.3, which means I’m not locked into one ecosystem.
  • Future-ready: I can always self-host these models later if needed, thanks to their open-source nature.

I wrapped this through LangChain, which makes it easy to swap out models during experimentation and build proofs-of-concept quickly without worrying about underlying LLM plumbing.


LangGraph – Giving Structure to Conversations

Once I picked the brain, I needed to figure out how to guide it. Not every conversation should be flat and linear some needed state, branching logic, and even tool use.

That’s where LangGraph came in.

  • It lets me create multi-step workflows.
  • I can embed conditional logic into conversations (e.g., “Is the user in distress?” → switch flow).
  • It gives me conversational memory for continuity.
  • And it integrates smoothly with external tools and APIs.

LangGraph powers Sakha’s conversation graph the engine that routes how chats progress depending on context and user state.


Flask – Powering Interaction with the Bot

I used Flask not just to power Sakha’s simple UI but also to serve the core APIs that make chatting with the bot possible.

On the frontend, I kept things minimal basic HTML and JavaScript for quick interactions. But behind the scenes, Flask handled the important parts: receiving user input, sending it to the LLM pipeline, and returning thoughtful responses.

It was the ideal choice because: - It’s lightweight and easy to set up. - It integrates seamlessly with Python, which is what the rest of Sakha is built on. - It made building both a simple web UI and a chat API straightforward and fast.

In short, Flask gave me just enough without getting in the way letting me focus on what mattered most: the experience of the conversation.


Under the Hood: Core Components That Make Sakha Work

At its heart, Sakha is more than just a chatbot it’s a thoughtfully layered system designed to feel human, stay context-aware, and respond with care. Instead of relying on a single prompt-response loop, I built Sakha around a modular architecture where each component has a clear responsibility.

Here’s a quick overview of the main building blocks:

  • Conversation Processor – The entry point for all incoming user messages. It handles preprocessing of the user_input and streams it through the conversation graph.
  • Conversation Graph (LangGraph) – A dynamic flow engine that guides conversations using stateful, branching logic. This is where Sakha’s adaptability comes from.
  • Conversation State – A session-unique state object passed through the entire graph, allowing context to persist and evolve throughout the interaction.
  • Supervisor – A smart dispatcher that decides what kind of conversation Sakha should engage in whether it’s a normal chat, an activity suggestion, or even a crisis intervention.
  • Crisis Handler – A specialized module triggered when signs of emotional distress are detected. It provides professional, 24/7 toll-free helpline numbers that users can reach out to.
  • Chat Engine – The actual brain behind the conversations, powered by the LLM and dynamically generated prompts based on context and flow.
  • User, Checkpoint, and Memory Managers – These handle persistent storage: tracking user info, conversation history, and flow progress across sessions.
  • Response Templates – Modular prompt templates that ensure Sakha’s tone stays consistent and compassionate, while allowing for personalization based on user data.

Each of these pieces plays a role in making Sakha feel less like a machine and more like a friend who listens, remembers, and responds with care.


Flow chart of internal components

Flow chart of internal components

A Day in the Life of a Message

So, what actually happens when you type “I’m feeling off today” into Sakha?

While the reply might seem instant and effortless, under the hood, a lot of thought goes into how that message is received, processed, and responded to with care, context, and relevance.

Here’s how a single message flows through Sakha’s system:

  1. UI → Server
    The message begins its journey in the lightweight frontend a simple HTML/JavaScript interface. When you hit send, the text is forwarded to the backend via a websocket of Flask server, which acts as the bridge between the user and Sakha’s internal logic.

  2. Flask Endpoint → Conversation Processor
    Once the message reaches the server, it’s handed off to the Conversation Processor the orchestrator of the system. It performs preprocessing and streams the message into the conversation graph, ultimately receiving a response that’s passed back to the server.

    If a follow-up or reminder is due, Sakha’s scheduler triggers dedicated Flask endpoints (/reminder, /followup) which initiate their flows independently.

    For new sessions, the server also initializes a fresh Conversation Graph and Conversation State, which are then passed to the processor for handling.

  3. Conversation Processor → Conversation Graph (LangGraph)
    The Conversation Processor injects the input into the Conversation Graph, powered by LangGraph. This graph consists of modular nodes and conditional transitions, allowing Sakha to intelligently route the conversation depending on user intent and state.

  4. Supervisor Node → Chat Flow Decision
    Early in the graph, the Supervisor node evaluates the input and context. It determines whether this message should enter a general chat flow, trigger an activity suggestion, or if necessary be flagged for the Crisis Handler.

  5. Flow Execution → Chat Engine
    Once the appropriate flow is selected, the relevant nodes are executed. These call into the Chat Engine, which builds the final LLM prompt using conversation state and sends it to the model (e.g., LLaMA3.3 via Groq API). The response is parsed and formatted.

  6. Server → UI
    Finally, the crafted response is returned through the websocket and rendered in the chat UI. To the user, it feels like a natural, thoughtful message just from a friend who’s really listening.


This flow isn’t just functional it’s designed for empathy. Each step is intentional, focused on making the interaction feel personal, seamless, and emotionally aware.


Zooming In: How Sakha Handles Every Message

Let’s take a closer look at what happens after a message enters Sakha’s system how it decides what kind of response to generate, which flow to trigger, and how it tracks everything in between.

Supervisor: Making the First Decision

Every message first arrives at the Supervisor node within the LangGraph-powered conversation graph. This is an LLM-based decision point that examines both the latest user input and the ongoing conversation state to determine the appropriate direction for the chat.

Here’s what the Supervisor might decide: - Crisis Detected → The flow is routed to the Crisis Handler, which shares 24/7 toll-free helpline numbers, ensuring the user has access to professional support. - No Crisis → The message is routed to the Chat Engine, along with a flow directive:
normal_chat, activity_suggestion, reminder, or follow_up.


Chat Engine: Generating Context-Aware Responses

The Chat Engine is where Sakha’s intelligence comes to life. It’s responsible for crafting thoughtful, structured responses by leveraging multiple subcomponents:

  1. Summarizer
    If the current context is too long for the LLM window, the summarizer condenses it and stores the summary in the conversation state preserving emotional and contextual continuity.

  2. Chat Flow Manager
    Based on the Supervisor’s directive, this module retrieves the appropriate chat flow object, which contains prompts and logic for one of the following flows:

    Current Chat Flows:

    • normal_chat: A general conversation mode where Sakha checks in on the user’s well-being. If signs of a low mood are detected, the Supervisor may switch the flow to activity_suggestion.

    • activity_suggestion: Here, Sakha suggests mood-boosting activities tailored to the user’s emotional state and preferences. A key feature is a RAG (Retrieval-Augmented Generation) module using Qdrant to find past activities the user engaged in under similar circumstances (indexed by a situation embedding). If the user agrees, Sakha sets a reminder. If not, it may shift the flow back to normal_chat.

    • reminder: Triggered externally by APScheduler when a reminder is due. Sakha gently motivates the user to carry out the scheduled activity. If the user declines or changes their mind, the Supervisor can redirect to normal_chat.

    • follow_up: Also triggered by APScheduler, but after the activity window has passed. Sakha asks for feedback, which is stored in memory using Qdrant and appended to the activity log.

  3. Response Manager
    Each flow returns a structured response using Sakha’s consistent, modular Response Templates. The Response Manager handles these to:

    • Store reminders and feedback in the appropriate database via the Reminder Manager.
    • Log activity outcomes and mood feedback in long-term memory.
    • Format the response for final delivery.

Finally, once the response is generated and sent back, the conversation history and any relevant updates are written back to the conversation state, ready for the next interaction.

Making Sakha Personal: Memory, Preferences, and Context

One of Sakha’s core goals is to feel personal not just in tone, but in how it remembers and responds based on your past experiences. While it’s still early in development, the foundation for a memory-aware system is already in place.

Activity Preferences and Session Context

When a user first registers, their activity preferences are collected and stored in a Postgres database. These preferences like enjoying walks, listening to music, or journaling are retrieved every time a new conversation begins and injected into the conversation state, allowing Sakha to suggest relevant activities right from the start.

Structured Summarization for Long Conversations

To manage lengthy conversations and keep the LLM within token limits, Sakha uses a summarizer component after every 10 exchanges to generate summary. Rather than generating freeform summaries, Sakha uses a structured template to ensure consistency and easy updating:

Summary Format:

1. Main Topic:
2. Key Things the User Said:
3. Key Things Sakha Said or Did:
4. Decisions or Plans Made:
5. Unresolved Topics or Follow-Ups:

This summary is stored in the conversation state and passed across nodes during the conversation flow, allowing Sakha to remain context-aware even when the raw message history grows too large.

Memories Through Feedback

After each activity, Sakha checks in through the follow-up flow to ask how it went. The responses whether the activity was completed, enjoyed, skipped, or why are stored as memories using Qdrant, a vector store optimized for similarity search.

To make these memories searchable and context-aware, Sakha uses the BAAI/bge-small-en-v1.5 model to convert the user’s situation into an embedding vector, which becomes the index key for Qdrant. The stored memory includes:

  • user_id, thread_id, activity_id
  • user_situation: the emotional or contextual input from the user
  • activity_name, duration
  • Whether the activity was completed
  • A score of enjoyment
  • Any reason for skipping
  • A timestamp of when the feedback was submitted

This structure enables Sakha to later suggest activities that have worked in similar emotional contexts, not just random options.

What’s Missing Today

At this stage, Sakha doesn’t support conversational memory across sessions. That means users can’t yet say things like “Remember what we did last week?” or “Let’s do the walk again,” and expect Sakha to recall past chats or actions explicitly.

Currently, memory is limited to: - In-session context - Activity preferences - Feedback-driven memories for activity suggestions

But this limitation is by design. The goal is to first get structured memory and recall right ensuring that Sakha offers helpful, accurate nudges based on feedback before expanding into long-term conversational memory.


Testing Sakha: What We’re Evaluating and How

Before Sakha is released more broadly, it’s essential to understand not just whether it works but how well it supports, listens, and responds to real users. The current plan is to manually test interactions with a mix of real users and AI-generated simulated conversations, with a human evaluator scoring each chat based on the following rubric.

This rubric is designed to balance emotional intelligence, usability, personalization, and safety everything that makes Sakha feel like more than a script.


🌱 Sakha Chatbot Evaluation Rubric

Criteria Rating (1–5) What We’re Looking For
Emotional Intelligence & Encouragement Does Sakha express empathy, validate the user’s feelings, and offer uplifting, compassionate responses without being preachy or robotic?
Tone Balance (Including Humor) Is the tone warm, kind, and emotionally appropriate? Is humor used carefully to lighten mood without undermining the situation?
Clarity & Simplicity of Responses Are the responses easy to understand, jargon-free, and structured clearly for the user’s mental/emotional state?
Personalization & Contextual Awareness Does Sakha adapt to user preferences, past feedback, or the current mood? Does it feel like a personal assistant, not just a generic chatbot?
Memory Usage (Preferences & Feedback Recall) Is previously stored data (e.g., activity preferences, feedback on past activities) reflected appropriately in recommendations or tone?
Adaptability to User Mood & Engagement Can Sakha shift gears be it from light conversation to deeper support or vice versa based on mood signals or disengagement?
Privacy & Boundary Respect Does Sakha avoid prying questions, handle sensitive topics gracefully, and make the user feel emotionally safe throughout the interaction?
Encouragement Toward Self-care or Action Are nudges toward activity or self-care natural, timely, and well-aligned with user readiness or preferences?
Crisis Sensitivity & Handling Does Sakha detect possible signs of distress and route the user appropriately (e.g., to a crisis handler)? Does it avoid triggering or insensitive responses?
Response Consistency & Identity Is Sakha’s tone, values, and personality consistent throughout conversations even across sessions?
Overall Flow & Conversation Coherence Do the messages feel like part of a thoughtful, coherent interaction rather than disjointed replies?

🧾 Overall Evaluation

  • Strengths:
    (What aspects of Sakha’s performance were particularly impressive or emotionally resonant?)

  • Areas for Improvement:
    (Where did it fall short? Any awkwardness, missed cues, or technical slips?)

  • Final Rating:
    (Sum of all individual scores. Used for comparative tracking as testing progresses.)


This rubric will evolve with usage. Early testing will be fully manual, with notes and reflections gathered from test users and evaluators. In the long run, the plan is to partially automate this process using LLMs as scoring agents on synthetic data.


What’s Next & Closing Thoughts

Sakha started as an idea to build a chatbot that doesn’t just reply, but connects. From managing structured flows to handling complex emotional contexts, every piece of Sakha was designed with empathy and adaptability in mind. But this is only the beginning.

What’s Next:

  • User Testing & Iteration: With the core in place, the next step is structured testing using the evaluation rubric above. This feedback loop will guide refinement both in how Sakha speaks and how it reasons.
  • Deeper Memory & Personalization: One limitation today is the lack of long-term memory in natural conversation. The next version will allow users to refer to past chats, explore past activity feedback, and reflect over time.
  • Multimodal and Voice Interfaces: Conversations don’t always have to be typed. Voice input and lightweight emotion detection are on the roadmap for future iterations.
  • Open Source + Deployment: Sakha will likely be open-sourced with clear documentation, so others can build on it or adapt it to their own domains whether mental health, coaching, or companionship.

At its heart, Sakha is a human-first system less about flashy features, more about thoughtful design. It’s far from perfect, but it’s trying. And sometimes, that’s exactly what we need: something (or someone) that listens, remembers, and tries to help, gently.

Thanks for reading. 💙