August 2025

How to Deploy GPT-OSS on a Voltage Park GPU Server

How to deploy GPT-OSS on Voltage Park's cloud H100 servers

Voltage Park operates a massive fleet of high-performance NVIDIA GPUs. They are highly-optimized for hosting open-source models like GPT-OSS.

GPT-OSS presents two, highly-efficient models:

20B
120B

Both models can use tools, such as web browsing, to improve output accuracy.

The 20B parameter model, hosted on Voltage Park's cloud H100 servers with maximum inference optimizations and batch sizes, costs as low as $0.10 per million input tokens, and $0.20 per million output tokens at continuous saturation.

The 120B parameter model, hosted on Voltage Park's AI cloud, costs as low as $0.20 per million input tokens, and $0.60 per million output tokens at continuous saturation.

In this new tutorial, we'll deploy GPT-OSS 120B using both Ollama and vLLM inference engines.

Take me to the tutorial‍

‍

Steps for deploying GPT-OSS on our GPU server

The tutorial linked above is broken down into the following steps for deploying GPT-OSS:

Deploy a GPU server
Access your GPU server
Deploy GPT-OSS
Recommended: Securing your environment

What is the history of RL models?

Large language models (LLMs) are changing how we approach artificial intelligence applications. They have been quickly moving from experimental research tools to production-ready systems that power chatbots to code generation platforms - in roughly six years.

From 2019-2024, foundational model companies generally focused on scaling pretraining, sharing increasingly larger models that had been trained on large corpus data. Output distillation to train smaller, more-efficient models also grew in popularity.

In September 2024, OpenAI launched its 01 model. This introduced a new paradigm of compute: reinforcement learning. For the first time, models could generate tokens in a "scratchpad" to "think" before delivering a result. By providing models tools to interact with, they produce more informed, accurate outputs with less hallucinations.

How much does o1 cost?

o1 costs $15.00 per million input tokens and $60.00 per million output tokens

o3, launched in December 2024, costs $2.00 per million input tokens and $8.00 per million output tokens

GPT-OSS costs about $0.20 per million input tokens and $0.60 per million output tokens.

However, this depends on your environment, batch size, and saturation.

‍

How to Deploy GPT-OSS on a Voltage Park GPU Server

How to deploy GPT-OSS on Voltage Park's cloud H100 servers

Take me to the tutorial‍

Steps for deploying GPT-OSS on our GPU server

What is the history of RL models?

How much does o1 cost?

Related articles

Accessible AI Compute. Exceptional Customer Service.

Accessible AI Compute. Exceptional Customer Service.