Deploying HuggingFace Chat UI with the Hugging Face Text Generation Inference Server


Before we dive into deploying the Hugging Chat UI, let's first explore the capabilities of the Hugging Face Text Generation Inference Server. We'll start with a practical walkthrough, demonstrating how to access and utilize its API endpoints effectively. This initial exploration is key to understanding the various configurations available for text generation and how they can enhance your AI interactions.

Start The Hugging Face Inference Server

In this section, we focus on launching the Hugging Face Text Generation Inference Server, specifically configured with 8-bit quantization. This setting is pivotal for optimizing GPU memory utilization, ensuring efficient resource management, please refer to the detailed setup instructions provided in this link

export model=mistralai/Mistral-7B-v0.1
export volume=$PWD/data 
docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data --quantize=bitsandbytes --model-id $model
Image description

Discover Hugging Face Inference Server endpoints

Call the default generate Enpoint

curl --location '' \
--header 'Content-Type: application/json' \
--data '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}'
Call the streaming endpoint

curl --location '' \
--header 'Content-Type: application/json' \
--data '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}'
Call the generate endpoint while activating sampling

curl --location '' \
--header 'Content-Type: application/json' \
--data '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":100, "do_sample":true, "top_k":50 }}'
Call the generate endpoint while changing temperature

curl --location '' \
--header 'Content-Type: application/json' \
--data '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":50, "do_sample":true, "top_k":50, "temperature":0.2 }}'
For more Generation strategies please refer to this link :

Monitoring with Health, Info, and Metrics API Endpoints

Ensuring System Health

curl --location ''
Retrieving Server Information

curl --location ''
Image description

Accessing Performance Metrics Endpoint

curl --location ''
Image description

Install Hugging Face Chat UI

Clone the Repository

Initiate your project by cloning the Hugging face chat UI repository:

git clone
Configure the Environment

After cloning the repository, you'll need to set up your environment by editing the .env file. This involves specifying the correct IP addresses for your MongoDB instance and the Hugging Face Text Generation Inference Server.

Editing MongoDB Configuration:

Locate and edit the MONGODB_URL in the .env file to point to your MongoDB instance. Replace ${MONGO_DB_IP} with the actual IP address of your MongoDB server.

Setting Up Text Generation Inference Server Connection:

In the same .env file, ensure that the Hugging Face Text Generation Inference Server is correctly configured. Below is a JSON configuration snippet that you'll need to adjust based on your setup, it's important to recognize the MODELS object encapsulates your models' configurations:

      "name": "mistralai/Mistral-7B-Instruct-v0.1-local",
      "displayName": "mistralai/Mistral-7B-Instruct-v0.1-name",
      "description": "Mistral 7B is a new Apache 2.0 model, released by Mistral AI that outperforms Llama2 13B in benchmarks.",
      "websiteUrl": "",
      "preprompt": "",
      "chatPromptTemplate" : "<s>{{#each messages}}{{#ifUser}}[INST] {{#if @first}}{{#if @root.preprompt}}{{@root.preprompt}}\n{{/if}}{{/if}}{{content}} [/INST]{{/ifUser}}{{#ifAssistant}}{{content}}</s>{{/ifAssistant}}{{/each}}",
      "parameters": {
        "temperature": 0.1,
        "top_p": 0.95,
        "repetition_penalty": 1.2,
        "top_k": 50,
        "max_new_tokens": 1024,
        "stop": ["</s>"]
      "endpoints": [{
        "type" : "tgi",
        "url": "http://${TEXT_GENERATION_INFERENCE_SERVER}:80/",
      "promptExamples": [
          "title": "Assist in a task",
          "prompt": "How do I make a delicious lemon cheesecake?"
Build the Chat UI Docker image

DOCKER_BUILDKIT=1 docker build -t hugging-face-ui .
Run MongDB

docker run -d -p 27017:27017 --name mongo-chatui mongo:latest
Run the Hugging-Face Chat UI

docker run -p:3000:3000 hugging-face-ui
