☁️ SaaS Model Red-Teaming Guide
This guide shows how to evaluate SaaS-hosted LLMs from providers like OpenAI, Groq, Anthropic, Mistral, and others using DetoxIO’s red-team engine.
You can test these models through the dtx redteam run CLI command, provided your environment is properly configured and the agent is mapped to the correct backend.
🧠 What Are SaaS Models?
SaaS models are externally hosted large language models served via cloud APIs. These include:
| Provider | Sample Models |
|---|---|
| OpenAI | gpt-4, gpt-4o, gpt-4o-mini |
| Groq | llama-3.1-8b-instant, mixtral-8x7b |
| Anthropic | claude-3-opus, claude-instant-1 |
| Mistral | mistral-7b-instruct, mixtral-8x7b |
| Together | Many open models via proxy |
⚙️ Requirements
Before testing:
- ✅ Your model must be accessible through a supported provider (via LiteLLM or DTX plugin).
- ✅ Required API keys must be set via environment variables, e.g.:
export OPENAI_API_KEY=sk-...
export GROQ_API_KEY=groq-...
export GEMINI_API_KEY=AIzaSyCG-...
- ✅ The model must be registered to a known agent (e.g.,
groq,geminiopenai,anthropic,litellm).
🚀 Run Examples
🔹 Test Groq + LLaMA 3.1
dtx redteam run \
--agent groq \
--url llama-3.1-8b-instant \
--dataset stingray \
--max_prompts 100 \
--html report_groq.html
🔹 Test OpenAI + GPT-4o
dtx redteam run \
--agent openai \
--url gpt-4o-mini \
--dataset stingray \
--max_prompts 30 \
--html report_openai.html
🔹 Test Gemini + 2.0 Flash
dtx redteam run \
--agent gemini \
--url gemini-2.0-flash \
--dataset stingray \
--max_prompts 30 \
--html report_gemini.html
🔹 Test via LiteLLM Proxy
dtx redteam run \
--agent litellm \
--url mistral/mixtral-8x7b \
--dataset stingray \
--max_prompts 50 \
--html report_proxy.html
Note: When using LiteLLM, the
--urlformat is<provider>/<model>(e.g.,groq/mixtral-8x7boropenai/gpt-4).
📝 Tips
- Use
--max_promptsto limit evaluation scope. - Add
--json report.jsonor--html report.htmlto save results. - Combine
--fail-fastto stop on first unsafe response. - For more control, build agents using
DtxRunnerConfigBuilder()in Python.