Added rop of pending updates on bot start, reset command, AnswerChat method, GPU offload, limit to response lenght, context reduced to 2048, flash attention, 4 parallel decode queues, --keep of the original 810 tokens (which is the starting prompt)

2024-12-26 03:24:56 +01:00
parent 296d150282
commit 4167c75279
5 changed files with 49 additions and 20 deletions
--- a/.env
+++ b/.env
@@ -1,2 +1,2 @@
 MODEL_PATH=./model
-MODEL_NAME=Qwen2.5-7B-Instruct-Q8_0.gguf
+MODEL_NAME=Qwen2.5-7B-Instruct-Q8.gguf