Domain Based Routing
This guide shows you how to use fine-tuned classification models for intelligent routing based on academic and professional domains. Domain routing uses specialized models (ModernBERT, Qwen3-Embedding, EmbeddingGemma) with LoRA adapters to classify queries into categories like math, physics, law, business, and more.
Key Advantages
- Efficient: Fine-tuned models with LoRA adapters provide fast inference (5-20ms) with high accuracy
- Specialized: Multiple model options (ModernBERT for English, Qwen3 for multilingual/long-context, Gemma for small footprint)
- Multi-task: LoRA enables running multiple classification tasks (domain + PII + jailbreak) with shared base model
- Cost-effective: Lower latency than LLM-based classification, no API costs
What Problem Does It Solve?
Generic classification approaches struggle with domain-specific terminology and nuanced differences between academic/professional fields. Domain routing provides:
- Accurate domain detection: Fine-tuned models distinguish between math, physics, chemistry, law, business, etc.
- Multi-task efficiency: LoRA adapters enable simultaneous domain classification, PII detection, and jailbreak detection with one base model pass
- Long-context support: Qwen3-Embedding handles up to 32K tokens (vs ModernBERT's 8K limit)
- Multilingual routing: Qwen3 trained on 100+ languages, ModernBERT optimized for English
- Resource optimization: Expensive reasoning only enabled for domains that benefit (math, physics, chemistry)
When to Use
- Educational platforms with diverse subject areas (STEM, humanities, social sciences)
- Professional services requiring domain expertise (legal, medical, financial)
- Enterprise knowledge bases spanning multiple departments
- Research assistance tools needing academic domain awareness
- Multi-domain products where classification accuracy is critical
Configuration
Configure the domain classifier in your config.yaml:
classifier:
category_model:
model_id: "models/category_classifier_modernbert-base_model"
use_modernbert: true
threshold: 0.6
use_cpu: true
category_mapping_path: "models/category_classifier_modernbert-base_model/category_mapping.json"
pii_model:
model_id: "models/pii_classifier_modernbert-base_presidio_token_model"
use_modernbert: true
threshold: 0.7
use_cpu: true
pii_mapping_path: "models/pii_classifier_modernbert-base_presidio_token_model/pii_type_mapping.json"
categories:
- name: math
system_prompt: "You are a mathematics expert. Provide step-by-step solutions."
model_scores:
- model: qwen3
score: 1.0
use_reasoning: true
- name: physics
system_prompt: "You are a physics expert with deep understanding of physical laws."
model_scores:
- model: qwen3
score: 0.7
use_reasoning: true
- name: computer science
system_prompt: "You are a computer science expert with knowledge of algorithms and data structures."
model_scores:
- model: qwen3
score: 0.6
use_reasoning: false
- name: business
system_prompt: "You are a senior business consultant and strategic advisor."
model_scores:
- model: qwen3
score: 0.7
use_reasoning: false
- name: health
system_prompt: "You are a health and medical information expert."
semantic_cache_enabled: true
semantic_cache_similarity_threshold: 0.95
model_scores:
- model: qwen3
score: 0.5
use_reasoning: false
- name: law
system_prompt: "You are a knowledgeable legal expert."
model_scores:
- model: qwen3
score: 0.4
use_reasoning: false
default_model: qwen3
Supported Domains
Academic: math, physics, chemistry, biology, computer science, engineering
Professional: business, law, economics, health, psychology
General: philosophy, history, other
Features
- PII Detection: Automatically detects and handles sensitive information
- Semantic Caching: Cache similar queries for faster responses
- Reasoning Control: Enable/disable reasoning per domain
- Custom Thresholds: Adjust cache sensitivity per category
Example Requests
# Math query (reasoning enabled)
curl -X POST http://localhost:8801/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "MoM",
"messages": [{"role": "user", "content": "Solve: x^2 + 5x + 6 = 0"}]
}'
# Business query (reasoning disabled)
curl -X POST http://localhost:8801/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "MoM",
"messages": [{"role": "user", "content": "What is a SWOT analysis?"}]
}'
# Health query (high cache threshold)
curl -X POST http://localhost:8801/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "MoM",
"messages": [{"role": "user", "content": "What are symptoms of diabetes?"}]
}'
Real-World Use Cases
1. Multi-Task Classification with LoRA (Efficient)
Problem: Need domain classification + PII detection + jailbreak detection on every request Solution: LoRA adapters run all 3 tasks with one base model pass instead of 3 separate models Impact: 3x faster than running 3 full models, <1% parameter overhead per task
2. Long Document Analysis (Specialized - Qwen3)
Problem: Research papers and legal documents exceed 8K token limit of ModernBERT Solution: Qwen3-Embedding supports up to 32K tokens without truncation Impact: Accurate classification on full documents, no information loss from truncation
3. Multilingual Education Platform (Specialized - Qwen3)
Problem: Students ask questions in 100+ languages, ModernBERT limited to English Solution: Qwen3-Embedding trained on 100+ languages handles multilingual routing Impact: Single model serves global users, consistent quality across languages
4. Edge Deployment (Specialized - Gemma)
Problem: Mobile/IoT devices can't run large classification models Solution: EmbeddingGemma-300M with Matryoshka embeddings (128-768 dims) Impact: 5x smaller model, runs on edge devices with <100MB memory
5. STEM Tutoring Platform (Efficient Reasoning Control)
Problem: Math/physics need reasoning, but history/literature don't Solution: Domain classifier routes STEM → reasoning models, humanities → fast models Impact: 2x better STEM accuracy, 60% cost savings on non-STEM queries
Domain-Specific Optimizations
STEM Domains (Reasoning Enabled)
- name: math
use_reasoning: true # Step-by-step solutions
score: 1.0 # Highest priority
- name: physics
use_reasoning: true # Derivations and proofs
score: 0.7
- name: chemistry
use_reasoning: true # Reaction mechanisms
score: 0.6
Professional Domains (PII + Caching)
- name: health
semantic_cache_enabled: true
semantic_cache_similarity_threshold: 0.95 # Very strict
pii_detection_enabled: true
- name: law
score: 0.4 # Conservative routing
pii_detection_enabled: true
General Domains (Fast + Cached)
- name: business
use_reasoning: false # Fast responses
score: 0.7
- name: other
semantic_cache_similarity_threshold: 0.75 # Relaxed
score: 0.7
Performance Characteristics
| Domain | Reasoning | Cache Threshold | Avg Latency | Use Case |
|---|---|---|---|---|
| Math | ✅ | 0.85 | 2-5s | Step-by-step solutions |
| Physics | ✅ | 0.85 | 2-5s | Derivations |
| Chemistry | ✅ | 0.85 | 2-5s | Mechanisms |
| Health | ❌ | 0.95 | 500ms | Safety-critical |
| Law | ❌ | 0.85 | 500ms | Compliance |
| Business | ❌ | 0.80 | 300ms | Fast insights |
| Other | ❌ | 0.75 | 200ms | General queries |
Cost Optimization Strategy
- Reasoning Budget: Enable only for STEM (30% of queries) → 60% cost reduction
- Caching Strategy: High threshold for sensitive domains → 70% hit rate
- Model Selection: Lower scores for low-value domains → cheaper models
- PII Detection: Only for health/law → reduced processing overhead
Reference
See bert_classification.yaml for complete configuration.