We open sourced our internal tooling at
Let me know what you think! :D
Why do things gotta fall in your lap 15 minutes after you need them all the time ππ½
i started a discussion we can talk about this is on my lr scheduler benchmark space. just go to my hugging face space and click community in the top right corner.
What are you using to run your local models? llama.cpp ollama vLLM?
Yo, i personally love the Qwen2.5-coder line of models. I use it to adversarially look at code from other models very frequently. with your setup you could use Qwen/Qwen2.5-Coder-14B-Instruct-GGUF and use the q5_0.gguf quantized version. As far as configs go you could set:
Temperature 0.6
Top_P 1.0
Min_P 0
Alternatives you could use would be:
DeepSeek-Coder-V2-Lite-Instruct
Qwen2.5-Coder-7B Q8
For 31b models to fit with your hardware you would have to use q3 quants and the quality is not going to be the greatest. Alternatively you could look into using a service like Modal. They offer free GPU credits monthly. You can run an app as a shell and use ollama through their GPaaS. This gives you varying GPU's with VRAM that will fit to specific models you're looking for. But if completely local is what you want the models I've listed above should fit your needs.
Haha just the tip of the iceberg hey? I've been stuck in the library rabbit hole for a good while now and its honestly changes the game entirely.
I feel this mistake in all of my hidden dimensions.