Last Updated on March 2, 2026
What is Reasoning-Based Creative Writing?
In professional drafting, Creative Writing often requires more than just “generating text”—it requires the model to follow a logical thread, maintain a consistent voice, and understand subtext. This is where “Reasoning” models excel. Unlike standard models that predict the next word immediately, reasoning models like Reka Flash 3 use a “thinking” phase to:
- Plan the Narrative: Outlining a scene’s structure before writing a single word.
- Maintain Tone: Ensuring a technical report stays formal or a story stays atmospheric throughout.
- Check Instructions: Verifying that all user constraints are met during the generation process.
In the past, conventional NLP techniques lacked the “IQ” to handle these complex creative tasks. However, with the arrival of recent LLMs such as Reka Flash 3—a 21B parameter model built from scratch—Private AI for Word has reached a new milestone. This capability within the coverage of our Local LLM Benchmarks for Microsoft Word, where we explore the move toward 100% data security on your intranet.
With LocPilot in Word, you can now experience the convenience of running Reka Flash 3 right inside Microsoft Word. You can host the model locally, so you get all the advanced LLM features while keeping your data private and saving on any monthly fees.
Take a quick look at our demo video to see how it works. The demo is powered by GPTLocalhost, which offers the same core features for individual use. LocPilot in Word is the Intranet edition of GPTLocalhost designed for enterprise users and team collaboration. For a quick demo of LocPilot, please click here.
For more creative uses of local and private LLMs in Microsoft Word, explore additional demos available on our channel at @LocPilot.
Technical Profile: Why Reka Flash 3 for Word? (Download Size: 13.61 GB)
Reka Flash 3 is a compact powerhouse that bridges the gap between small on-device models and massive cloud-based assistants. It is now possible to access these capabilities in Microsoft Word because of GPTLocalhost running as a Word Add-in, which enables local LLM support directly in your document.
- “Budget Forcing” Reasoning: Reka Flash 3 uses specialized
<reasoning>tags to show its “thinking” process. In the Word Add-in, you can watch the model plan its creative approach before it outputs the final text. - 21B Parameter Intelligence: Despite its compact size, it performs competitively with proprietary models like OpenAI’s o1-mini. It is currently considered one of the best models in its size category for instruction following.
- Efficient Local Deployment: At 4-bit quantization, Reka Flash 3 fits into just 11GB of VRAM, making it a perfect fit for consumer-grade GPUs or Apple Silicon Macs.
Deployment Reminders: Running Reka Flash 3 Locally
Our primary testing was conducted on an M1 Max (64 GB), which is more than sufficient. This model is suitable for low-latency or on-device deployments, fitting within the VRAM of many consumer-grade GPUs with 12GB or more, such as the NVIDIA RTX 3060 (12GB version) or RTX 4070, or Apple Silicon Macs with unified memory.
- Context Window Consideration: The model has a 32,000-token context length. Running with a large context will increase the memory required for the KV cache (Key-Value cache), potentially adding a few more gigabytes of memory usage.
The Local Advantage
Running your LLM models locally via LocPilot ensures:
- Air-Gapped Security: Operate entirely within your intranet — no external connections.
- Cost Savings: Eliminate subscription fees for the entire team — no ongoing costs.
- Model Flexibility: Easily host and switch models to suit your use cases — no vendor lock-in.