Sign in
Log inSign up
Kevin Naidoo

6 likes

1.5K reads

4 comments

Elon Musk
Elon Musk
May 19, 2024

Thank you Kevin Naidoo for this useful guide.

If someone wants to develop a chatbot whose knowledge base is drawn from the content (posts articles, and comments) of a website, how can they dynamically update the memory of chain?

By the way, can you tell me about the pricing of openai API key usage with some examples for using the gpt-3.5-turbo model?

Thank you!

3 replies
Kevin Naidoo
Kevin Naidoo
Author
May 22, 2024

Hi Elon Musk . RAG is essentially taking text data and vectorizing the data so that it can be searched.

In the case of a website, you need to scrape the text of the website and then follow the RAG process as described in this article.

You don't need to always use RAG, if you use Langchain tools for example, you can easily just query an API or scrape a page and then provide that data to the model in real-time.

RAG helps to narrow down large chunks of text. Remember you pay for each token and models usually have a limitation on the maximum number of tokens you can send in a prompt, therefore using a RAG system - you minimize the amount of tokens sent.

OpenAI uses a token-based costing system. A token is roughly 4 characters (in English).

You get billed a different rate for text fed into the model and text generated from the model.

GPT3.5-Turbo is generally the most cost-effective model, it's not the best model but for most use cases it's usually fine.

With GPT3.5 Turbo expect to pay $0.50 / 1M tokens for text you pass into the model, and $1.50 / 1M tokens the model generates.

Elon Musk
Elon Musk
May 23, 2024

Hi Kevin Naidoo, Thanks for your comprehensive response!

I am developing a chatbot for a WordPress website and planning to use your script as the chatbot backend. However, I'm curious about how to implement the asynchronous function effectively.

Since the chatbot will be used on a public website, I should consider making multiple calls on the chatbot in a short period. Do you have any recommendations on how to approach this aspect?

Thanks, Elon

Kevin Naidoo
Kevin Naidoo
Author
May 23, 2024

Elon Musk For chat, generally, web sockets make the most sense. PHP does have web socket support via:

openswoole.com

Open Swoole might introduce more complexity than you need, depending on how many connections you want to handle per second/minute, might just be better to use regular AJAX and HTTP requests.

PHP 8 with PHP-FPM can handle these fairly efficiently, or in the Python world, FastAPI is a good choice.

FYI: I do have a few tutorials on this blog relating to FastAPI and also using Qdrant as a vector store instead of FAISS. FAISS may not work so well for more advanced realtime chatbots.