Last Updated on February 26, 2026
If you’re interested in private LLMs for Microsoft Word, you might want to explore the recent INTELLECT-2 model. This pioneering 32 billion-parameter model is uniquely trained via globally distributed reinforcement learning—a first of its kind approach. Unlike conventional centralized training methods, INTELLECT-2 employs fully asynchronous reinforcement learning across a dynamic and diverse network of permissionless compute contributors. The team behind it also introduced significant adjustments to the standard GRPO training recipe and data filtering techniques, which were essential for maintaining training stability and ensuring that the model effectively met its objectives. These enhancements mark a notable improvement over the previous QwQ-32B model.
Experience the convenience of running INTELLECT-2 directly within Microsoft Word with LocPilot. Host the model locally to access advanced LLM features while maintaining data privacy and eliminating monthly fees. This direction is at the core of our Local LLM Benchmarks for Microsoft Word, where we explore the move toward 100% data security on your intranet.
Check out our demo video to see it in action. The demo is powered by GPTLocalhost, which offers the same core features for individual use. LocPilot in Word is the Intranet edition of GPTLocalhost designed for enterprise users and team collaboration.
For more creative uses of local and private LLMs in Microsoft Word, explore additional demos available on our channel at @LocPilot.
Technical Profile: Why Intellect-2? (Download Size: 19.85 GB)
Intellect-2 is a 32B parameter model that serves as a landmark proof-of-concept for decentralized AI development. Unlike models built in massive, power-hungry centralized data centers, Intellect-2 was refined through globally distributed reinforcement learning (RL) using the GRPO-based training technique. By pooling heterogeneous compute resources from around the world, this model demonstrates that high-level “reasoning” capabilities can be matured outside of traditional corporate clusters.
- Inference-Heavy Training: Shifting from traditional pre-training, Intellect-2 utilized a 1:4 training-to-inference compute ratio, spending significantly more resources on generating and verifying “thought samples” to ensure logical consistency in complex drafting.
- Scalable Reasoning Architecture: Built upon the QwQ-32B base, it utilizes verifiable rewards to improve performance in math and coding, proving that RL-tuning of 32B models is now technologically viable across a distributed network.
Ultimately, Intellect-2 represents a new potential direction toward AGI. By moving away from the requirement for single, multi-gigawatt data centers and instead utilizing distributed, spare compute capacity, this architecture provides a blueprint for scaling intelligence while significantly spreading the environmental and power consumption bottlenecks of traditional AI development.
Deployment Reminders: Running Intellect-2 Locally
Our primary testing was conducted on an M1 Max (64 GB). For those wishing to contribute to the inference or training network (which is how the model was developed), specific VRAM is necessary depending on the role:
- Running the Model for Inference (locally): If you intend to run the pre-trained Intellect-2 model yourself, the necessary VRAM depends on the model quantization level (GGUF format) you choose to download.
- To run the model at maximum speed entirely on GPU VRAM, you need a GPU with VRAM capacity slightly larger than the model file size (e.g., a 24GB GPU can run many quantized versions, but might run out of memory for a full 13B parameter version in certain contexts).
- Inference Workers: A machine equipped with 4× NVIDIA RTX 3090 GPUs (each typically having 24GB of VRAM) was cited as a sufficient example for contributing to the 32B-parameter model training run. The model uses sharding techniques (like PyTorch FSDP2) to distribute weights across available GPUs.
It is worth noting that, following our evaluation, Prime Intellect has released newer models, as listed below. Interested users are encouraged to try them as well.
- INTELLECT-3 (2025-11): INTELLECT-3 is a 106B parameter Mixture-of-Experts model trained with both SFT and RL on top of the GLM 4.5 Air base model. It achieves state-of-the-art performance for its size across math, code, science and reasoning benchmarks.
The Local Advantage
Running your LLM models locally via LocPilot ensures:
- Air-Gapped Security: Operate entirely within your intranet — no external connections.
- Cost Savings: Eliminate subscription fees for the entire team — no ongoing costs.
- Model Flexibility: Easily host and switch models to suit your use cases — no vendor lock-in.