Why a local AI?
By decision of the Directorate, sensitive data within the meaning of the Swiss data protection law (LPD) must not go through the cloud. Commercial AIs (ChatGPT, Claude, Gemini) send your files to external servers. With LM Studio + Gemma 4, everything stays on your computer.
- Confidential. No data is sent. The model works offline.
- For sensitive data (LPD) and professional secrecy, as a complement to the institutional Microsoft Copilot Chat.
- Energy-efficient. A few dozen watts compared with about 700 W for an Nvidia H100 GPU server.
Let us be honest. On Artificial Analysis, Gemma 4 scores around 19 to 20 out of 100. The best cloud models are around 57 to 61. Local AI does not aim for raw performance. Its value lies in confidentiality, being free and working offline. It is more than enough to summarise, translate, reformulate or analyse documents.
Install LM Studio
- Download the free application from lmstudio.ai.
- Install then open LM Studio. No account required.
LM Studio’s interface changes regularly and differs a little between Windows and Mac. The button names quoted here are indicative. Rely on the function being described.
Which model: E4B or 12B?
Both are recommended and on par for everyday use. The 12B offers slightly higher quality at the cost of speed and memory. The choice depends on your machine.
| Gemma 4 E4B | Gemma 4 12B | |
|---|---|---|
| Strength | Light and fast | Slightly higher quality |
| Memory | Modest | Higher |
| Long PDFs | Very comfortable | Slow on a small machine |
| Choose if | Modest machine or long PDFs | Well-equipped machine |
When in doubt, pick E4B. On loading, LM Studio shows a memory estimate and warns you if it is too heavy. You can keep both installed.
Download the model
- Open the model search (the magnifying glass, often called “Discover”).
- Type gemma 4 then choose the model you decided on (E4B or 12B).
- Start the download of the default version offered. The software picks the right version for your computer.
- Once downloaded, open a conversation with this model.
![<em>📸 [Screenshot: the model search screen]</em>](https://wp.unil.ch/iaunil/files/2025/07/capture-decran-2026-06-15-a-12.03.16-1024x640.jpg)
Increase the context size
The context is the amount of text the model reads at once. More context means more memory. By default it is too small for a whole PDF. Aim for about 32,000 for long documents. For an exceptionally long document, you can go higher (for example 64,000) if memory allows; the answer will then be slower.
- Load the model. Open its settings through the small cog.
- Raise the context length (often “Context Length”) to about 32,000.
- Apply (a “reload” type button appears). If LM Studio warns that it is too heavy, lower the value or switch to E4B.
![<em>📸 [Screenshot: the loaded model's settings opened via the cog, with the context length]</em>](https://wp.unil.ch/iaunil/files/2025/07/capture-decran-2026-06-15-a-17.58.09-1024x640.jpg)
Useful symptom. If the context is too small, the model often answers as if no document had been attached (“I cannot see any document”). This is not a malfunction. Increase the context then send your request again.
Working confidentially
Drag your document into the conversation (.pdf, .docx, .txt formats). Type your request, for example “Summarise this report in 5 points”. Nothing leaves your computer.
![<em>📸 [Screenshot: a conversation with an attached document]</em>](https://wp.unil.ch/iaunil/files/2025/07/capture-decran-2026-06-15-a-20.06.45-1024x640.jpg)
The model thinks before answering. On a modest machine, expect 1 to 2 minutes sometimes. This is normal. To go faster on simple tasks, turn off the thinking with the “Think” button (below the message box).
Why Gemma 4?
- Among the best small local models. These are open models that run on a personal computer.
- Designed for your computer. No specialised graphics card required.
- Open source (Apache 2.0 licence) and more than 140 languages. Official details.
Alternatives
Other tools exist: Ollama, GPT4All or Hugging Face. To get started, LM Studio remains the most accessible. With plenty of memory, larger models become feasible. The IT Centre can also make local models available.