LLM Chatbot with personal data in minutes

In this article, I’ll guide you in creating a chatbot that utilizes the power of user data . Lets transform your data into interactive conversations!

𝐏𝐨𝐰𝐞𝐫 𝐨𝐟 𝐲𝐨𝐮𝐫 𝐨𝐰𝐧 𝐝𝐚𝐭𝐚
Imagine having a chatbot that understands your documents and your content. That’s exactly what we’re building today. By leveraging your own user data, like PDFs, this chatbot will provide tailored answers and insights

𝐏𝐚𝐫𝐭 𝟏: 𝐒𝐞𝐭𝐭𝐢𝐧𝐠 𝐔𝐩 𝐭𝐡𝐞 𝐄𝐧𝐯𝐢𝐫𝐨𝐧𝐦𝐞𝐧𝐭
Before we delve into the magic, ensure you have these files ready:
1) app.py: Here goes the code for the actual app
2) embeddings.py: To create vector embeddings of the textual data
3) “data” directory: Add your PDF files here
4) .env file: Add your Hugging Face API token here

5) requirements.txt : for all the libraries required

𝐈𝐦𝐩𝐨𝐫𝐭𝐬

After setting up these files

Type 𝘱𝘪𝘱 𝘪𝘯𝘴𝘵𝘢𝘭𝘭 -𝘳 𝘳𝘦𝘲𝘶𝘪𝘳𝘦𝘮𝘦𝘯𝘵𝘴.𝘵𝘹𝘵 in your terminal and hit Enter , before proceeding to the next step

𝐏𝐚𝐫𝐭 𝟐: 𝐂𝐫𝐞𝐚𝐭𝐢𝐧𝐠 𝐕𝐞𝐜𝐭𝐨𝐫 𝐄𝐦𝐛𝐞𝐝𝐝𝐢𝐧𝐠𝐬
Lets kick things off with embeddings.py This script is where the magic begins. The embeddings will be generated using the sentence transformers model and saved locally in a vector store DB.

𝐏𝐚𝐫𝐭 𝟑: 𝐒𝐞𝐭𝐭𝐢𝐧𝐠 𝐔𝐩 𝐭𝐡𝐞 𝐂𝐡𝐚𝐭𝐛𝐨𝐭
Now, let’s dive into app.py, where the good stuff happens

𝐥𝐨𝐚𝐝_𝐥𝐥𝐦(): This function loads the language model via the HF API for the chatbot. I am using Mistral’s Instruct model for this case

𝐜𝐫𝐞𝐚𝐭𝐞_𝐩𝐫𝐨𝐦𝐩𝐭_𝐭𝐞𝐦𝐩() : This creates the custom prompt template that adds a personal touch to your chatbot’s responses . Fill the blank with however you want your agent to behave ex — An expert on History , Medicine etc…

𝐂𝐫𝐞𝐚𝐭𝐢𝐧𝐠 𝐫𝐞𝐭𝐫𝐢𝐞𝐯𝐚𝐥 𝐜𝐡𝐚𝐢𝐧 : This step will create a retrieval-based question answering chain using Langchain.

𝐂𝐡𝐚𝐢𝐧𝐥𝐢𝐭 𝐒𝐞𝐭𝐮𝐩 : This process will setup the server for chanlit to handle messages and send the message to the chain in order to generate the response . Here we set the empty context , and handle user’s message and add formatting to the generated response.

𝐑𝐮𝐧 𝐭𝐡𝐞 𝐜𝐡𝐚𝐭𝐛𝐨𝐭
1) Type 𝘱𝘺𝘵𝘩𝘰𝘯 𝘪𝘯𝘨𝘦𝘴𝘵.𝘱𝘺 in the terminal and hit enter. This step needs to be done only once to create the embeddings .
2) Once the above step is complete , run the app by executing
𝘤𝘩𝘢𝘪𝘯𝘭𝘪𝘵 𝘳𝘶𝘯 𝘢𝘱𝘱.𝘱𝘺 -𝘸

And finally the chatbot is up and running at localhost:8000

Now you can tinker around with it and explore its maximum capabilities , play around with different prompt templates , fine tune it , try multimodal input, use different LLMs and do lots of cool stuff with it

Building a Simple LLM Chatbot that works on User Data in minutes 🤖