OpenAI gpt-oss models optimized for NVIDIA RTX

The OpenAI gpt-oss models have been released with full NVIDIA RTX GPU optimizations, bringing fast, local AI inference to millions of developers and enthusiasts. NVIDIA’s collaboration with OpenAI ensures these reasoning models, gpt-oss-20b and gpt-oss-120b, run efficiently on RTX AI PCs and workstations without requiring cloud access.

From cloud to PC: RTX-optimized AI reasoning

These open-weight models are designed for advanced reasoning tasks such as web search, in-depth research, document comprehension, and coding assistance. Built with a mixture-of-experts architecture, they offer adjustable reasoning effort levels and chain-of-thought capabilities. Optimizations for RTX GPUs deliver up to 256 tokens per second on a GeForce RTX 5090. This performance allows complex tasks to run quickly while maintaining high model quality through MXFP4 precision, which requires fewer resources than traditional formats.

Run OpenAI gpt-oss models with Ollama

The easiest way to use the OpenAI gpt-oss models locally is with the Ollama app. This popular tool for AI integration now supports OpenAI’s open-weight models out of the box. On RTX AI PCs with at least 24GB of VRAM, Ollama offers seamless setup, PDF and text file integration in chats, multimodal prompt support, and customizable context lengths for long documents. Users can interact via a friendly UI or run models through command line and SDK for application integration.

More ways to accelerate locally on RTX

Beyond Ollama, developers can run these models with llama.cpp or the GGML tensor library, both enhanced for RTX performance. NVIDIA’s latest contributions include CUDA Graphs for reduced processing overhead and improved algorithms to minimize CPU usage. Windows developers can also try the models through Microsoft AI Foundry Local, currently in public preview. This on-device AI inference solution uses ONNX Runtime optimized for CUDA, with TensorRT support for RTX coming soon.

The launch of these OpenAI gpt-oss models marks another step in democratizing AI reasoning capabilities. By optimizing them for RTX hardware, NVIDIA and OpenAI are empowering developers to build intelligent, responsive AI applications that work instantly and privately on local devices. This fusion of open-source flexibility and hardware acceleration sets the stage for the next wave of AI-powered creativity and productivity.