QwQ-32B-Preview: This experimental model focuses on enhancing AI reasoning abilities. While still under development and facing challenges such as language mixing and recursive reasoning loops, it shows promise in math and coding tasks. Initial testing on HuggingFace suggests QwQ-32B performs on par with OpenAI’s O1 preview, with users noting it’s highly verbose in its reasoning process. Multiple quantized versions are available including Q4_K_M for 24GB VRAM and Q3_K_S for 16GB VRAM.
SmolVLM: This 2B VLM excels in on-device inference, surpassing competitors in GPU RAM usage and token throughput. It can be fine-tuned on Google Colab, making it accessible for users without high-end hardware.
Tinygrad announces the upcoming TinyCloud launch, offering contributors access to 54 GPUs via 9 tinybox reds by year’s end. With a custom driver ensuring stability, users can easily tap into this power using an API key—no complexity, just raw GPU horsepower.
Sonnet in Aider now supports reading PDFs, making it a more versatile assistant for developers. Users report smooth sailing with the new feature, saying it “effectively interprets PDF files” and enhances their workflow.
QwQ website: https://qwenlm.github.io/blog/qwq-32b-preview/
QwQ-32B-preview Demo: https://huggingface.co/spaces/Qwen/QwQ-32B-preview
QwQ-32B-Preview model: https://huggingface.co/Qwen/QwQ-32B-Preview