How to Deploy GPT-OSS on a Voltage Park GPU Server

How to deploy GPT-OSS on Voltage Park's cloud H100 servers
Voltage Park operates a massive fleet of high-performance NVIDIA GPUs. They are highly-optimized for hosting open-source models like GPT-OSS.
GPT-OSS presents two, highly-efficient models:
- 20B
- 120B
Both models can use tools, such as web browsing, to improve output accuracy.
The 20B parameter model, hosted on Voltage Park's cloud H100 servers with maximum inference optimizations and batch sizes, costs as low as $0.10 per million input tokens, and $0.20 per million output tokens at continuous saturation.
The 120B parameter model, hosted on Voltage Park's AI cloud, costs as low as $0.20 per million input tokens, and $0.60 per million output tokens at continuous saturation.
In this new tutorial, we'll deploy GPT-OSS 120B using both Ollama and vLLM inference engines.
Take me to the tutorial
Steps for deploying GPT-OSS on our GPU server
The tutorial linked above is broken down into the following steps for deploying GPT-OSS:
- Deploy a GPU server
- Access your GPU server
- Deploy GPT-OSS
- Recommended: Securing your environment
What is the history of RL models?
Large language models (LLMs) are changing how we approach artificial intelligence applications. They have been quickly moving from experimental research tools to production-ready systems that power chatbots to code generation platforms - in roughly six years.
From 2019-2024, foundational model companies generally focused on scaling pretraining, sharing increasingly larger models that had been trained on large corpus data. Output distillation to train smaller, more-efficient models also grew in popularity.
In September 2024, OpenAI launched its 01 model. This introduced a new paradigm of compute: reinforcement learning. For the first time, models could generate tokens in a "scratchpad" to "think" before delivering a result. By providing models tools to interact with, they produce more informed, accurate outputs with less hallucinations.
How much does o1 cost?
o1 costs $15.00 per million input tokens and $60.00 per million output tokens
o3, launched in December 2024, costs $2.00 per million input tokens and $8.00 per million output tokens
GPT-OSS costs about $0.20 per million input tokens and $0.60 per million output tokens.
However, this depends on your environment, batch size, and saturation.