Why a local AI?
By decision of the Directorate, sensitive data within the meaning of the Swiss data protection law (LPD) must not go through the cloud. Commercial AIs (ChatGPT, Claude, Gemini) send your files to external servers. With LM Studio, everything stays on your computer.
- Confidential. No data is sent. The model works offline.
- For sensitive data (LPD) and professional secrecy, as a complement to the institutional Microsoft Copilot Chat.
Let us be honest. On Artificial Analysis, Gemma 4 scores around 19 to 20 out of 100 and Qwen3.5 9B around 25. The best cloud models are around 57 to 61. These are above all small local models, designed to run on a personal computer. Their value: confidentiality, free of charge and offline. That is enough to summarise, translate, rephrase or analyse documents.
Install LM Studio
Check your machine first. On Mac, LM Studio requires an Apple Silicon chip (M1 to M4), so a Mac from late 2020 or newer. Intel Macs are not supported. On PC, most computers from the last few years will do, ideally with 16 GB of RAM.
- Download the free application from lmstudio.ai.
- Install then open LM Studio. No account required.
LM Studio’s interface changes regularly and differs a little between Windows and Mac. The button names quoted here are indicative. Rely on the function being described.
Which model should you choose?
All three are recommended for everyday use. Qwen3.5 9B (Chinese, by Alibaba) is the most capable; Gemma 4 (by Google) remains a safe choice, lighter as E4B or a little finer as 12B. The choice depends on your machine.
| Qwen3.5 9B | Gemma 4 E4B | Gemma 4 12B | |
|---|---|---|---|
| Strength | Most capable | Light and fast | Slightly higher quality |
| Memory | Medium | Modest | Higher |
| Long PDFs | Very comfortable (large context) | Very comfortable | Slow on a small machine |
| Choose if | You want the best quality | Modest machine or long PDFs | Well-equipped machine |
When in doubt, pick Gemma 4 E4B. On loading, LM Studio shows a memory estimate and warns you if it is too heavy. You can keep several models installed.
Download the model
- Open the model search (the magnifying glass, often called “Discover”).
- Type the name of your chosen model (qwen3.5 or gemma 4).
- Start the download of the default version offered. The software picks the right version for your computer.
- Once downloaded, open a conversation with this model.
![<em>📸 [Screenshot: the model search screen]</em>](https://wp.unil.ch/iaunil/files/2025/07/capture-decran-2026-06-15-a-12.03.16-1024x640.jpg)
Increase the context size
The context is the amount of text the model reads at once. More context means more memory. By default it is too small for a whole PDF. Aim for about 32,000 for long documents. For an exceptionally long document, you can go higher (for example 64,000) if memory allows; the answer will then be slower.
- When loading the model, set the context length (often “Context Length”) to about 32,000.
- It can also be changed afterwards via the small wheel (a “reload” button appears).
- If LM Studio warns that it is too heavy, lower the value or switch to Gemma 4 E4B.
![<em>📸 [Screenshot: the loaded model's settings opened via the cog, with the context length]</em>](https://wp.unil.ch/iaunil/files/2025/07/capture-decran-2026-06-15-a-17.58.09-1024x640.jpg)
Useful symptom. If the context is too small, the model often answers as if no document had been attached (“I cannot see any document”). This is not a malfunction. Increase the context then send your request again.
Working confidentially
Drag your document into the conversation (.pdf, .docx, .txt formats). Type your request, for example “Summarise this report in 5 points”. Nothing leaves your computer.
![<em>📸 [Screenshot: a conversation with an attached document]</em>](https://wp.unil.ch/iaunil/files/2025/07/capture-decran-2026-06-15-a-20.06.45-1024x640.jpg)
The model thinks before answering. On a modest machine, expect several minutes sometimes. This is normal. To go faster on simple tasks, turn off the thinking with the “Think” button (below the message box).
Why these models?
- Among the most powerful small local models. These are open models that run on a personal computer.
- Designed for your computer. A dedicated graphics card helps but is not required.
- Open source (Apache 2.0 licence) and multilingual. Official details (Gemma 4) and Qwen3.5 9B
Alternatives
Other tools exist: Ollama, GPT4All or Hugging Face. To get started, LM Studio remains the most accessible. With plenty of memory, larger models become feasible. The IT Centre can also make local models available.
Local is not necessarily greener
Counterintuitive. Without a dedicated graphics card, local AI does not consume less than a request to a commercial LLM (ChatGPT and the like). A processor (CPU) is slow for AI: summarising a document can take it several minutes at 30 W, as much energy as the few seconds of the optimised hardware behind those services. On a consumer computer, the benefit of local AI is confidentiality. With a dedicated graphics card (GPU) and a well-chosen model, however, local AI can become 10 to 1,000 times less energy-hungry while keeping a satisfactory answer quality.