Last Updated on March 2, 2026
Introduction
In the landscape of Private AI for Word, the ability to transform and polish professional prose without sacrificing data security is a high-priority requirement. Consider the recently released Gemma 3 QAT (Quantization Aware Trained) models. This innovative quantization technique significantly reduces memory usage without sacrificing performance, allowing you to run sophisticated models like Gemma 3 27B locally – even on a single consumer-grade GPU. For legal, medical, and corporate professionals, moving to a private Microsoft Copilot alternative ensures that sensitive drafts remain secure. This direction is at the core of our Local LLM Benchmarks for Microsoft Word, where we explore the move toward 100% data ownership on your intranet.
As part of our continuous performance testing, we have evaluated Google’s Gemma-3-27B-IT-QAT. This powerful model is suitable for high-fidelity text rewriting, and it is now possible to access its capabilities directly inside Microsoft Word. By using LocPilot as a local Word Add-in, you can enable this model for Word, providing a 100% private drafting experience on your desktop.
Demo
Here’s a quick demo. The demo is powered by GPTLocalhost, which offers the same core features for individual use. LocPilot in Word is the Intranet edition of GPTLocalhost designed for enterprise users and team collaboration. For a quick demo of LocPilot, please click here.
Another demo shows how Gemma-3 summarizes an article as below.
For more creative uses of local and private LLMs in Microsoft Word, explore additional demos available on our channel at @LocPilot.
Technical Profile: Why Gemma-3-27B-IT-QAT? (Download Size: 16.43 GB)
When selecting a private AI for Word for rewriting tasks, the “QAT” (Quantization-Aware Training) architecture of Gemma-3 offers a significant edge. Based on this post, this model delivers several key advantages:
- Accessibility to Powerful AI on Local Hardware: The main advantage is the ability to run a large, powerful 27-billion-parameter model on a single consumer GPU, such as an NVIDIA RTX 3090 (24GB VRAM). The standard, unquantized version of the model typically requires much more memory (around 54 GB) which is generally only found in expensive, high-end data center GPUs.
- Versatile Capabilities: The base Gemma 3 27B instruction-tuned model (which the QAT version is based on) offers a wide range of state-of-the-art capabilities:
- Long Context Support: It handles a large context window of up to 128,000 tokens, enabling it to process long documents or conversations.
- Multilingual Support: It has out-of-the-box support for over 35 languages and exposure to over 140 languages during pre-training.
Deployment: Grounded Performance on Mac M1 Max
Our tests were performed on a Mac M1 Max (64GB RAM), which is more than sufficient. Thanks to the Unified Memory architecture of Apple Silicon, the Gemma-3-27B-IT-QAT model generates text smoothly. As aforementioned, you can also run the model with NVIDIA RTX 3090 (24GB VRAM).
By leveraging Private AI for Word, you eliminate the risks associated with third-party servers. Your document remains on your local disk, and the “brain” processing your text is running locally on your NPU or GPU. This ensures that your private assistant is not only secure but also faster and more reliable than cloud-based alternatives.
The Local Advantage
Running your LLM models locally via LocPilot ensures:
- Air-Gapped Security: Operate entirely within your intranet — no external connections.
- Cost Savings: Eliminate subscription fees for the entire team — no ongoing costs.
- Model Flexibility: Easily host and switch models to suit your use cases — no vendor lock-in.