Building a Local AI Development Server with Framework Desktop

TL;DR

Using AI/LLM goes beyond using ChatGPT, Gemini, and Claude. Running large language models (LLMs) locally eliminates cloud dependencies, keeps sensitive data on-premises, and provides the computational muscle needed for AI-assisted security research. This post summarizes my experience building a dedicated AI development server using the Framework Desktop with AMD’s Ryzen AI Max+ 395 processor — a system capable of running 70B parameter models entirely in local memory.

The complete installation guide is available as a PDF download. Future improvements and developed tools will be published to the CutSec GitHub repositories.

AI-Assisted Planning

To manage and speed up the server deployment, this build was planned and troubleshot with assistance from Claude, Anthropic’s AI assistant. The build conversation spanned hardware selection, partition layouts, GPU memory configuration, and debugging issues like LUKS boot prompts and Qdrant API changes. When something didn’t work as expected, Claude helped diagnose the problem and propose solutions, significantly speeding up the process.

Any capable AI assistant can fill this role, such as ChatGPT, Gemini, or others. The key benefit is having a knowledgeable collaborator available to work through technical decisions, generate configuration files, and troubleshoot errors in real-time. For complex builds like this one, AI assistance reduces the time spent searching documentation and forums, letting you focus on the actual work.

Future development efforts on this platform will continue using AI collaboration, and we’ll share tooling and improvements through the CutSec GitHub repositories.

Why Local LLMs for Cybersecurity?

Cloud-based AI services are convenient, but they come with trade-offs: data leaves your control, API costs accumulate, and internet connectivity becomes a dependency. For cybersecurity work, especially ICS/OT assessments where client data sensitivity is paramount, local inference offers significant advantages.

The Framework Desktop with its 128GB of unified memory provides enough headroom to run production-quality models locally. The AMD Radeon 8060S integrated GPU shares system memory, eliminating the traditional VRAM bottleneck that limits most local LLM setups.

Hardware Overview

The build centers on Framework’s Desktop system with the AMD Ryzen AI Max+ 395 processor. Key specifications:

CPU/GPU: AMD Ryzen AI Max+ 395 with integrated Radeon 8060S
Memory: 128GB DDR5 unified memory (shared between CPU and GPU)
Storage: Dual 4TB NVMe drives — one for system/development, one for AI models and datasets
GPU Memory Allocation: 96GB configured for GPU tasks via GTT

Installation Summary

The installation follows eight phases, from hardware assembly through a working RAG (Retrieval-Augmented Generation) pipeline:

Phase 1-2: Hardware and BIOS — Standard assembly with BIOS defaults suitable for the AMD platform. No special virtualization settings required.

Phase 3: OS Installation — Kubuntu 24.04 LTS with the HWE (Hardware Enablement) stack for newer kernel support. LUKS encryption protects sensitive partitions while leaving the AI models partition unencrypted for performance.

Phase 4: GPU Configuration — The Vulkan/Mesa stack works out of the box. GTT (Graphics Translation Table) memory is configured via kernel parameter to allow the GPU to utilize up to 96GB of system memory for model inference.

Phase 5: Core Tools — Development essentials, security tools (nmap, netcat, tcpdump, SecLists), Docker, Tailscale for remote access, and `pass` for GPG-encrypted credential management that works over SSH.

Phase 6: LLM Stack — Ollama provides the primary interface for model management and inference. Open WebUI delivers a browser-based chat interface. The system comfortably runs 7B models at 50+ tokens/second and 70B quantized models at usable speeds.

Phase 7: Claude Code — Anthropic’s CLI tool for AI-assisted development, configured with secure API key storage via `pass`.

Phase 8: RAG Pipeline — Qdrant vector database with local embeddings via Ollama’s nomic-embed-text model. This enables semantic search across ICS/OT documentation, PCAPs, and vendor manuals.