Host your own AI Server using Proxmox and Ollama and connect PHPStorm to it
This is a guide on how to set up an AI server on Proxmox, connect it to PHPStorm, and use Tailscale for remote access.
I have a rather beefy server at home that I use mostly as my Windows gaming machine. It runs Proxmox because, from the beginning, I knew I wanted a server where I could run VMs and containers from a distance.
You may not know, but I travel a lot with my van, and I can't bring all my equipment with me.
For context, here are the hardware specs of my server:
- CPU: AMD Ryzen 9 5900X 12-Core Processor
- RAM: 32GB (I need to upgrade it to 128GB or something)
- GPU: Nvidia RTX 3070 + Nvidia RTX 3060 Ti
- Storage: 2TB NVMe + 2x 60GB SSD
- Create a new VM
- Install required Nvidia CUDA Drivers and Toolkit
- Docker Installation
- Use Ollama for running AI models
- OpenWebUI Installation
- Tailscale
- PHPStorm Integration
- Update Ollama and OpenWebUI
- Conclusion
Create a new VM
We begin by setting up an Ubuntu VM on Proxmox, allocating the necessary resources for your AI server. You can download an ISO and create a new VM from it with the console, but I prefer to use a script to automate the process.
The script comes from the community-scripts repository, and it's a simple bash script that you can run on your Proxmox server.
Run the following script to deploy the Ubuntu VM:
bash -c "$(wget -qLO - [Proxmox script](https://github.com/community-scripts/ProxmoxVE/raw/main/vm/ubuntu2404-vm.sh))"
The script will ask you for some information; here is what I used:
- I allocated 512 Go storage because LLVM models are huge.
- I gave it 24 576 MB RAM. I tried with 32 GB, but it made my server crash. Keep some RAM for the host. Ollama is great at distributing the load on the CPU and GPUs. This gives me plenty of room to test large models.
- Do not start the VM immediately after installation.
After the installation is finished, in the Cloud-Init tab:
- Configure the user
user
, set a password, and/or add your public SSH key. - I chose to use DHCP for the network configuration because I will only access it from my Tailscale network.
- Click on Regenerate image
Before starting the VM, go to the Hardware tab and add your GPU to the VM.
- Add -> PCI Device -> Raw Device -> Your GPU
- Check All functions
- In the advanced options, check ROM-Bar
- Start the VM.
After that, you have a fresh Ubuntu VM ready to be used.
I took some time to install ZSH and Oh-My-Zsh, but do as you wish.
Install required Nvidia CUDA Drivers and Toolkit
Now it's time to configure this server to work well with AI models.
Start by confirming that your GPU is detected with:
lspci | grep -i nvidia
Now we need to install Nvidia's CUDA Toolkit and drivers.
The instructions for your machine can be found here CUDA downloads page
For our Ubuntu 24.04, we will use the following commands:
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-keyring_1.1-1_all.debsudo dpkg -i cuda-keyring_1.1-1_all.debsudo apt-get updatesudo apt-get -y install cuda-toolkit-12-8sudo apt-get install -y cuda-drivers
Now we need to reboot the server to apply the changes:
sudo reboot
Set the environment variables:
export PATH=/usr/local/cuda-12/bin${PATH:+:${PATH}}export LD_LIBRARY_PATH=/usr/local/cuda/lib${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
And verify the installation:
nvcc --versionnvidia-smi
That last command shows the GPU usage and the CUDA version.
Later you can watch it live with:
watch -n 0.5 nvidia-smi
There is also a command called nvtop
that you can install with:
sudo apt-get install nvtop
Docker Installation
For some software that we will use, we need to install Docker.
Follow the steps detailed in the Docker installation guide:
or copy and paste the following commands:
# Add Docker's official GPG key:sudo apt-get updatesudo apt-get install ca-certificates curlsudo install -m 0755 -d /etc/apt/keyringssudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.ascsudo chmod a+r /etc/apt/keyrings/docker.asc # Add the repository to Apt sources:echo \"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \$(. /etc/os-release && echo "${UBUNTU_CODENAME:-$VERSION_CODENAME}") stable" | \sudo tee /etc/apt/sources.list.d/docker.list > /dev/nullsudo apt-get update # Install Dockersudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
Confirm the installation by running:
sudo docker run hello-world
Install Nvidia Container Toolkit
I don't use Ollama in a Docker container, but you may want to. Or maybe you need to run some other containers that need access to the GPU. This section ensures that you can use the GPU in a container in the future.
Nvidia container toolkit documentation
Follow the instructions for your distribution.
Use Ollama for running AI models
Ollama is a powerful tool for running AI models on your server. It provides a simple command-line interface to run various AI models with ease.
It also comes with an API that can be used by other applications.
One feature that I like is that it can distribute the load on multiple GPUs and CPUs. I tried to run models that only use GPU; the rule of thumb is to use models that can fit in the combined memory of all GPUs.
This is our true AI server.
I decided to install Ollama directly on the VM, but you can also use a Docker container.
Install Ollama with the provided script:
curl -fsSL https://ollama.com/install.sh | sh
Basic Usage of Ollama
Run various AI models with commands such as:
ollama run llama3.2 --verboseollama run deepseek-r1:14b --verboseollama run deepseek-r1:8b --verbose
I like to use the --verbose
flag to see stats like tokens per second
and samples per second
after every answer.
You can also just pull the models without running them:
ollama pull mistral
You can also see how the load is distributed on the GPUs and CPUs with:
ollama ps
Make Ollama listen outside of localhost
This step will be useful if we want our PHPStorm to connect to Ollama.
systemctl edit ollama.service
Add the following lines at the beginning of the file:
[Service]Environment="OLLAMA_HOST=0.0.0.0:11434"
OpenWebUI Installation
OpenWebUI is a web interface that gives you a ChatGPT interface to interact with your AI server.
OpenWebUI GitHub repository for detailed installation instructions.
Here is a quick guide to install OpenWebUI:
docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main
You can now access OpenWebUI at http://your-server-ip:3000
.
The first account created will be the admin account.
Tailscale
Tailscale is your personal VPN that makes it easy to connect your devices through internet without any port forwarding.
To install Tailscale, the easiest way is to go to your Tailscale admin console and get the command to install it on your server.
Then the next step is to tell Tailscale to serve OpenWebUI:
sudo tailscale serve --bg 3000
You can now access OpenWebUI from anywhere with your Tailscale Magic DNS with HTTPS.
Example: https://aiserver.tail1234567.ts.net
PHPStorm Integration
For PHPStorm to connect to Ollama, we need to install the offical PHPStorm AI plugin.
In the configuration, you can add your Ollama server with the IP and port.
Update Ollama and OpenWebUI
To update OpenWebUI automatically, you can use Watchtower.
Here is a command to update OpenWebUI once:
sudo docker run --rm --volume /var/run/docker.sock:/var/run/docker.sock containrrr/watchtower --run-once open-webui
And here is a command to update OpenWebUI every hour:
sudo docker run -d \--restart always \--volume /var/run/docker.sock:/var/run/docker.sock \containrrr/watchtower --interval 3600 open-webui
The only way to update Ollama is to run the install script again.
Here is a command to update Ollama every week:
sudo crontab -e
Add the following line to the file:
0 0 * * 0 curl -fsSL https://ollama.com/install.sh | sh
Conclusion
And voilĂ !
You now have a personal AI server that you can access from anywhere with PHPStorm and OpenWebUI.
You should dig into the documentation of Ollama and OpenWebUI to see all the features they offer.
Examples of advanced features you can explore:
- Image generation
- Indexing documents with embeddings
- Chat with ChatGPT and local models at the same time
- Use multi modal models like
llama3.2-vision
Syntax highlighting provided by torchlight.dev
Comments
Reply on Bluesky to join the conversation.