Skip to content
Private AI

AI on your own hardware, inside your own datacentre

Closed loop. Fixed costs. Full control.

A dedicated LLM environment we set up for you on hardware in your own datacentre. Your IP does not leave the building, you pay for capacity instead of per token, and you stay in charge of who sees what.

When Private AI is the right fit

Not every business wants its AI on a hyperscaler. Not every business case survives per-token billing. And some organisations simply will not let their IP leave the building, regardless of any vendor SLA. Think engineering drawings, patent-relevant knowledge or client files.

For those organisations we build a private AI environment. Your own hardware in your own datacentre, an open-weight LLM that runs entirely within that environment, and an inference runtime we set up and run for you.

Three reasons organisations choose this route

IP stays within your own walls

Engineering data, client files, patent-relevant research: none of it leaves your network. No external API call, no external logging, no questions about a vendor's data retention. Air-gapped if that is the requirement.

Predictable cost instead of per-token billing

At high inference volume, per-token billing scales linearly with success. With dedicated hardware, costs are fixed: you invest once in capacity and from there the solution runs without marginal cost per use.

Full control over model and governance

You pick which model you run (Llama, Mistral, DeepSeek or another open-weight model), you set the update cycle, and you meet compliance requirements that go stricter than EU data residency. Think defence, critical infrastructure, or IP-heavy R&D.

How we set it up

1

Scoping and hardware selection

Together we work out the inference volume you expect, the models you need, and the hardware that fits best: GPU-class, on-prem servers, edge deployment, or a combination.

2

Installation and model deployment

We install the inference runtime, deploy the chosen LLM, and connect it to your existing systems: Active Directory, SharePoint, ERP, PDM, depending on where your data lives.

3

Building agents and use cases

On top of the private LLM we build the same agents as on a hyperscaler: HR, Legal, Finance, or industry-specific (work preparation, knowledge access, case handling). Once the environment is live, the same promise applies as on other stacks: first agent in production within 60 days.

4

Operation and knowledge transfer

For the first few months we run it alongside your IT team, transfer the knowledge, and then you decide whether you take over operations or we keep doing it.

Three deployment models side by side

Which one fits your situation?

Hyperscaler
Microsoft · Google · AWS
Best for
Fast time-to-value, existing stack
Cost model
Per-token or capacity-based
IP control
Data residency you can pick, no physical isolation
Volentis
EU-sovereign SaaS
Best for
EU sovereignty without running your own infrastructure
Cost model
Subscription
IP control
EU data residency, no Cloud Act exposure
Private AI
Own hardware, open-weight LLM
Best for
IP criticality, high inference volume
Cost model
Fixed hardware investment, no per-token
IP control
Fully inside your network, optionally air-gapped

Who this fits

Private AI is not the default choice. It fits organisations that:

  • Are IP-sensitive: engineering, R&D, defence, critical infrastructure, life sciences
  • Expect high inference volumes where per-token billing undermines the business case
  • Have stricter data requirements than EU residency (e.g. air-gapped, classified, or contractually fixed)
  • Already have their own datacentre or colocation capacity to host in
  • Want to deliberately keep distance from hyperscaler vendor lock-in
Eerlijk

When Private AI is not the right match

Honestly: if time-to-value is the priority, or if you are just starting with AI and do not yet know what volume you will run, a hyperscaler or Volentis is usually a better first step. Private AI pays itself back at volume and at high IP criticality, not on a first pilot.

Frequently asked questions about Private AI

Which LLM runs on a Private AI environment?

We typically work with open-weight models like Llama, Mistral or DeepSeek. Which one we pick depends on your use cases (multilingual, code, reasoning), your hardware capacity, and your preference. Our advice is neutral. We have no vendor stake in any specific model.

What hardware is needed?

For most enterprise use cases a few GPU servers (NVIDIA L40S, H100 or equivalent) are enough. For high concurrency or heavier models we scale up. Exact specs are set in the scoping phase, based on your expected inference volume.

What does a Private AI project cost?

The hardware is a one-time investment that depends on your expected inference volume and chosen models. Building agents on top of the private LLM follows our standard Build & Run rates. In the scoping phase you get a tailored, substantiated total cost of ownership over 3 to 5 years, including a comparison with hyperscaler alternatives, so you can make an informed choice.

What about model updates?

Open-weight models improve quickly. We track the releases of Llama, Mistral, DeepSeek and others, and propose upgrades when one adds real value. You decide; we carry out the upgrade once you sign off.

What if we want to move to a hyperscaler later?

No vendor lock-in. The agents we build on top of your private LLM largely run on the same frameworks that work on Microsoft, Google or AWS. A later move is not a rebuild. We are set up for it from day one.

Does Private AI fit your situation?

An exploratory call where we look at your use case, your expected inference volume, and whether a private environment is genuinely the right choice. No sales. Honest advice, even if the answer turns out to be 'hyperscaler' or 'Volentis'.