Tutorial: Hyperparameter Tuning
Systematically optimize your topic models for best performance.
Duration: ~15 minutes Prerequisites: Tutorial: Model Validation & Evaluation
Key Hyperparameters
Three main hyperparameters control model quality and training:
num_topics (K): How many topics to discover
learning_rate (lr): Optimization step size
batch_size: Documents per training iteration
Lesser-used but important:
num_iterations: Training steps (usually 100-1000)
random_seed: Reproducibility
num_topics: The Critical Parameter
The number of topics affects results the most.
Too few topics:
❌ Vague, broad themes
❌ Everything maps to few topics
❌ Loss of granularity
Too many topics:
❌ Redundant/overlapping topics
❌ Low coherence
❌ Noise capture
Optimal:
✓ Coherent, interpretable themes
✓ No obvious redundancy
✓ Good downstream performance
Selecting num_topics:
# Strategy: try multiple values
num_topics_to_try = [5, 10, 15, 20, 30, 50, 75, 100]
results = {}
for k in num_topics_to_try:
print(f"Training with {k} topics...")
model = PF(counts, vocab, num_topics=k, batch_size=64)
model.train_step(num_steps=200, lr=0.01)
coherence_df = model.compute_topic_coherence()
coherence = coherence_df['coherence'].values
results[k] = {
'coherence_mean': coherence.mean(),
'coherence_std': coherence.std(),
'coherence_min': coherence.min(),
'model': model
}
# Analyze results
import pandas as pd
df = pd.DataFrame(results).T
print(df)
# Best by coherence
best_k = df['coherence_mean'].idxmax()
print(f"\nOptimal num_topics: {best_k}")
Practical guideline:
Small corpus (<10k docs): Start with K=5-20
Medium corpus (10-100k docs): K=20-50
Large corpus (100k+ docs): K=50-200
learning_rate: Optimization Speed
Controls how fast the model learns.
Too low (0.001):
Learning is very slow
Many iterations needed
More stable but inefficient
Too high (0.5):
Learning is erratic
Loss may increase
Unstable convergence
Just right (0.01-0.05):
Steady decrease in loss
Converges in reasonable time
Reproducible results
Finding optimal learning rate:
lrs_to_try = [0.001, 0.005, 0.01, 0.05, 0.1]
for lr in lrs_to_try:
print(f"\nTraining with learning_rate={lr}")
model = PF(counts, vocab, num_topics=20, batch_size=64)
model.train_step(num_steps=200, lr=lr)
Default recommendation: Start with 0.01
batch_size: Gradient Stability
Batch size affects gradient noise and GPU utilization.
Too small (16):
Very noisy gradients
Unstable training
Each iteration fast but need many
Not efficient on GPU
Too large (1024):
Stable gradients
Few iterations needed
Slower per-iteration
May not fit in GPU memory
Balanced (64-256):
Good stability
Good GPU utilization
Efficient training
Choosing batch_size:
# Rule of thumb: experiment with powers of 2
batch_sizes = [32, 64, 128, 256, 512]
for bs in batch_sizes:
model = PF(counts, vocab, num_topics=20, batch_size=bs)
import time
t0 = time.time()
model.train_step(num_steps=50, lr=0.01)
elapsed = time.time() - t0
print(f"batch_size={bs:3d}: {elapsed:.1f}s")
# Find sweet spot of speed/quality
With GPU:
Start with batch_size=256 or 512
Increase until GPU memory error
Then reduce by half
Systematic Hyperparameter Search
Grid search over parameter combinations:
from itertools import product
# Parameter grid
param_grid = {
'num_topics': [10, 20],
'learning_rate': [0.01, 0.05],
'batch_size': [64, 128]
}
best_score = -float('inf')
best_params = None
# Grid search
for (k, lr, bs) in product(*param_grid.values()):
params = {'num_topics': k, 'learning_rate': lr, 'batch_size': bs}
model = PF(counts, vocab, num_topics=k, batch_size=bs)
model.train_step(num_steps=200, lr=lr)
# Evaluate
coherence_df = model.compute_topic_coherence()
score = coherence_df['coherence'].mean()
print(f"K={k}, lr={lr}, bs={bs}: coherence={score:.3f}")
if score > best_score:
best_score = score
best_params = params
print(f"\nBest parameters: {best_params}")
print(f"Best coherence: {best_score:.3f}")
Warning: Grid search is expensive. With 100 combinations and 100 iterations each:
100 models × 100 iterations × 1 minute per 100 iters = 167 hours (!!)
Solution: Use random search or limit combinations
Or better: use GPU (10-40x faster)
Random Search (More Efficient)
import numpy as np
# Sample 20 random combinations from space
n_trials = 20
results = []
for trial in range(n_trials):
# Random parameters
k = np.random.choice([10, 15,20, 30, 50])
lr = np.random.uniform(0.001, 0.1) # log scale recommended
bs = np.random.choice([32, 64, 128, 256])
model = PF(counts, vocab, num_topics=k, batch_size=bs)
model.train_step(num_steps=200, lr=lr)
coherence_df = model.compute_topic_coherence()
results.append({
'num_topics': k,
'learning_rate': lr,
'batch_size': bs,
'coherence': coherence_df['coherence'].mean()
})
print(f"Trial {trial+1}/{n_trials}: coherence={results[-1]['coherence']:.3f}")
# Best configuration
best_idx = np.argmax([r['coherence'] for r in results])
best_config = results[best_idx]
print(f"Best: {best_config}")
Practical Tuning Strategy
Step 1: Find good num_topics (most important)
# Try 5 values: rough search
for k in [10, 20, 35, 50, 75]:
model = PF(counts, vocab, num_topics=k, batch_size=64)
model.train_step(num_steps=200, lr=0.01)
coherence_df = model.compute_topic_coherence()
print(f"K={k}: {coherence_df['coherence'].mean():.3f}")
Step 2: Refine around best K
# If K=35 was best, try nearby
best_k = 35
for k in range(30, 41, 1): # 30-40
model = PF(counts, vocab, num_topics=k, batch_size=64)
model.train_step(num_steps=200, lr=0.01)
coherence_df = model.compute_topic_coherence()
print(f"K={k}: {coherence_df['coherence'].mean():.3f}")
Step 3: Tune lr and batch_size
# With best K, try different lr values
best_k = 35 # from previous step
for lr in [0.005, 0.01, 0.02, 0.05]:
model = PF(counts, vocab, num_topics=best_k, batch_size=64)
model.train_step(num_steps=200, lr=lr)
coherence_df = model.compute_topic_coherence()
print(f"lr={lr}: {coherence_df['coherence'].mean():.3f}")
Step 4: Final validation
# Train final model with best parameters
final_model = PF(counts, vocab, num_topics=35, batch_size=128)
final_model.train_step(num_steps=500, lr=0.02) # More steps for final
# Validate
coherence_df = final_model.compute_topic_coherence()
print(f"Final model coherence: {coherence_df['coherence'].mean():.3f}")
final_model.summary())
Early Stopping
Stop training when loss plateaus:
model = PF(counts, vocab, num_topics=20, batch_size=64)
loss_history = []
patience = 10 # Stop if no improvement for 10 iterations
best_loss = float('inf')
patience_counter = 0
for epoch in range(100):
params = model.train_step(num_steps=10, lr=0.01)
current_loss = model.Metrics.loss[-1] if model.Metrics.loss else float('nan')
loss_history.append(current_loss)
print(f"Epoch {epoch+1}: loss={current_loss:.1f}")
# Check for improvement
if current_loss < best_loss - 1.0: # Improvement threshold
best_loss = current_loss
patience_counter = 0
print(" ✓ Improvement!")
else:
patience_counter += 1
print(f" No improvement ({patience_counter}/{patience})")
if patience_counter >= patience:
print("Early stopping!")
break
Documenting Experiments
Track your hyperparameter explorations:
import logging
logging.basicConfig(
filename='hyperparameter_log.txt',
level=logging.INFO,
format='%(asctime)s - %(message)s'
)
for k in [20, 30, 50]:
model = PF(counts, vocab, num_topics=k, batch_size=64)
model.train_step(num_steps=200, lr=0.01)
coherence_df = model.compute_topic_coherence()
logging.info(f"K={k}: coherence={coherence_df['coherence'].mean():.3f}")
Common Mistakes & Solutions
Mistake: Tuning learning_rate too aggressively
Solution: It’s usually not the bottleneck. Focus on K first.
Mistake: Grid search over too many combinations
Solution: Use random search or tune one parameter at a time.
Mistake: Not tracking which configurations you’ve tried
Solution: Keep a log with timestamps and results.
Mistake: Overfitting to coherence on one dataset
Solution: Validate on held-out documents, multiple datasets.
Mistake: Not using GPU
Solution: Enable GPU - changes game for hyperparameter search!
Tuning Checklist
✓ Focus on num_topics first (most impact) ✓ Try at least 5 different values ✓ Use GPU to enable faster experimentation ✓ Document all trials ✓ Validate on held-out data ✓ Use early stopping when possible ✓ Final training: more iterations than tuning
Next Steps
Want to understand models better? See Fundamentals
Ready to use your model? See How-To Guides
Need production setup? See Contributing Guide
Summary
Num_topics is the most important parameter
Learning rate usually fine at 0.01
Batch size affects speed, not much else
Use GPU to enable rapid experimentation
Track all experiments for reproducibility
Stop training when loss plateaus