Article

Business Chatbot Using Dialogflow and OpenAI Assistant API

July 22, 2024

By Young Kim | Senior Software Engineer II |

This blog post will explore the journey of building an experimental chatbot named Toothbot(™) for the fictional Dr. Toothfairy’s dental clinic website, highlighting the evolution from using Dialogflow CX to integrating OpenAI’s generative AI capabilities.

Initial Approach: Dialogflow CX

The project began with Google’s Dialogflow CX, enticed by its advanced features for building conversational AI interfaces. Dialogflow CX is particularly well-suited for complex conversations due to its flow-based model, scalability, and visual flow builder. The initial plan involved using Dialogflow CX to handle:

  • Basic FAQs: Providing information like business hours, address, insurance policies, etc.
  • Form Processing: Managing “Appointment” and “Emergency” forms through flows and state machines.

However, challenges arose during the implementation:

  • Complexity of Intent Creation: Defining specific intents for every potential user query proved to be tedious and time-consuming.
  • User Difficulty: Dr. Toothfairy found the process of setting up and managing intents in Dialogflow CX to be quite demanding.

Shift Towards Generative AI

The emergence of generative AI models, particularly ChatGPT, presented a compelling alternative. These models offered a more user-friendly approach, as they could generate human-like responses without the need for meticulously crafted intents. One significant advantage of integrating OpenAI was the automatic activation of the multilingual feature, which allowed the chatbot to understand and respond in multiple languages seamlessly. However, relying solely on generative AI for a business website chatbot presented its own set of challenges:

  • Hallucinations: Ensuring that the chatbot’s responses remained factually accurate and aligned with the clinic’s specific information.
  • Control Over Responses: Maintaining a balance between the generative capabilities of ChatGPT and the need for structured and reliable responses for specific tasks.

Merging Dialogflow and OpenAI: A Hybrid Approach

To leverage the strengths of both technologies, a hybrid approach was adopted:

  • Dialogflow CX: Continued to manage the structured aspects of the chatbot, handling “Appointment” and “Emergency” form processing due to the requirement for specific user-provided information. This decision was also driven by the need for guaranteed, consistent responses for welcoming patients and providing examples of how to interact with the chatbot.
  • GPT 3.5 (OpenAI): Employed to handle all other user conversations, providing more dynamic and engaging responses to general inquiries. A notable outcome of this integration was the inclusion of the multilingual feature, enhancing user interaction by supporting multiple languages in GPT-based responses.

Technical Implementation of the Hybrid Model

Dialogflow Setup:

  • Dialogflow CX was set up to primarily manage two nested flows dedicated to “Appointment” and “Emergency” form processing.
  • All other user interactions were directed to a fulfillment Google function, which acted as a bridge to OpenAI’s Chat API (gpt3.5 turbo).

Fulfillment Google Function:

  • This function, written in TypeScript using libraries like “@google-cloud/functions-framework” and “Openai,” facilitated the communication between Dialogflow and OpenAI.
  • It received user queries from Dialogflow, sent them to the ChatGPT 3.5 Turbo model via the OpenAI API, and returned the generated responses back to Dialogflow for delivery to the user.

Incorporating Business-Specific Information:

  • To ground GPT’s responses in the context of the dental clinic, an embedding strategy was implemented using Supabase as a vector database.
  • Dr. Toothfairy provided documents containing clinic information, which were processed and stored as embeddings in the Supabase database.
  • When a user interacted with the chatbot, the Google function retrieved relevant information from the Supabase database based on the user’s query and included it as context in the prompt sent to GPT.
  • This approach helped to minimize hallucinations and ensure that GPT’s responses were tailored to the clinic’s specific details.

Evolution with GPT-4 and Assistant API

The release of GPT-4 and OpenAI’s Assistant API marked a significant advancement, prompting a second iteration of the chatbot:

  • Dedicated AI Agents: The Assistant API enabled the creation of specialized AI agents within the OpenAI platform, each tailored to a specific task. This led to the development of:
    • Toothbot: An agent focused on answering general patient questions.
    • AppointmentAssistant: An agent designed to handle appointment scheduling.
    • EmergencyAssistant: An agent dedicated to assisting patients with emergency situations.
  • Streamlined Architecture:
    • The Assistant API’s built-in data retrieval capabilities eliminated the need for the external Supabase database. Clinic information (in MDX format) was directly uploaded to the “Toothbot” assistant, enabling it to access and utilize this information without relying on an external source.
    • A key advantage of using the Assistant API is the ability to replace complex state machines and form-processing logic with simple instructions provided directly to the agents. This makes the system more flexible and easier to maintain, effectively offering a no-code solution.
    • This streamlined the system, enhancing performance by reducing the need for data transfer and retrieval from a separate database.
    • With the introduction of the Assistant API, it became possible to replace the Dialogflow intents and entities used for form field gathering with simple instructions provided directly to the assistant, simplifying the form-filling process.
    • The multilingual feature was extended to all parts of the conversation, ensuring that every aspect of the chatbot’s interaction could be conducted in multiple languages, thus significantly improving the user experience. With the hybrid approach, the Dailogflow native intents did not have the multilingual feature.

Future Direction: Implementing Streaming for Enhanced User Experience

One potential area for improvement is implementing OpenAI API’s streaming capability. This feature can significantly enhance the user experience by providing quicker response times, making Toothbot feel more responsive and natural in its interactions. However, the current system architecture presents a challenge: the existing client-Dialogflow-fulfillment setup makes it difficult to implement partial responses, which are essential for streaming functionality. To address this, a restructuring of the architecture may be necessary, which could involve exploring alternative client or fulfillment methods that are more compatible with streaming.

Is Dialogflow CX Still Relevant?

While the integration of OpenAI’s generative AI capabilities, especially with GPT-4 and the Assistant API, has significantly advanced Toothbot’s capabilities, it’s worth considering if Dialogflow CX still holds relevance in its architecture. Given its strengths in handling structured conversations and form processing, could there be scenarios where Dialogflow CX, instead of being completely replaced, could be further leveraged or integrated with newer technologies to enhance Toothbot’s functionality even further?

The answer is “yes.” While generative outputs can sometimes be unpredictable and less accurate, a hybrid model could leverage the best of both worlds. The potential synergies between these technologies could open new avenues for innovation.

Conclusion:

The project to create Toothbot, the dental clinic chatbot, showcases a successful integration and evolution of technologies. Starting with Dialogflow CX, the limitations of intent-based systems for complex interactions led to the incorporation of OpenAI’s generative capabilities. The hybrid approach, merging Dialogflow’s structured form processing with ChatGPT’s conversational prowess, provided a balanced solution. The release of GPT-4 and the Assistant API further streamlined the architecture, allowing for the creation of specialized AI agents and the elimination of the external vector database dependency. This evolution highlights the importance of adaptability and the potential of integrating emerging technologies to create effective and user-friendly conversational AI solutions.