Run Llama7b 100+ TPS with A10.

#machinelearning #tutorial #ai #bash

Hi Folk.

I've been working on setting up and managing TensorRT-LLM and Triton backend scripts to build the Llama2-7b model in FP16, int8, and int4 formats.

I ran a benchmark with int4 and achieved an inference speed of approximately 100 tokens per second.

Github Llama7b-TensorRT-LLM

Top comments (1)

Lionel♾️☁️ • Nov 30 '23

Hello @mattick27 great work. I love the hardwork you put in this. Above all thanks for the link, it helps to be able to understand what you are referring to.

Best Perplexity AI Integrations

Ritik Tyagi - Oct 17

A Complete Guide to Build a Documentation Site with Astro Starlight

Waricha - Nov 5

Update Cursor AI on Linux

Kosa Matyas - Nov 4

How To Run Static Analysis On Your CI/CD Pipelines Using AI

Atulpriya Sharma - Oct 17

DEV Community