.. _tutorial_hyperparameters:

================================================================================
Tutorial: Hyperparameter Tuning
================================================================================

Systematically optimize your topic models for best performance.

**Duration**: ~15 minutes
**Prerequisites**: :doc:`tutorial_validation`

Key Hyperparameters
===================

Three main hyperparameters control model quality and training:

1. **num_topics (K)**: How many topics to discover
2. **learning_rate (lr)**: Optimization step size
3. **batch_size**: Documents per training iteration

Lesser-used but important:

- **num_iterations**: Training steps (usually 100-1000)
- **random_seed**: Reproducibility

num_topics: The Critical Parameter
===================================

The number of topics affects results the most.

**Too few topics**:

- ❌ Vague, broad themes
- ❌ Everything maps to few topics
- ❌ Loss of granularity

**Too many topics**:

- ❌ Redundant/overlapping topics
- ❌ Low coherence
- ❌ Noise capture

**Optimal**:

- ✓ Coherent, interpretable themes
- ✓ No obvious redundancy
- ✓ Good downstream performance

Selecting num_topics:

.. code-block:: python

   # Strategy: try multiple values
   num_topics_to_try = [5, 10, 15, 20, 30, 50, 75, 100]

   results = {}
   for k in num_topics_to_try:
       print(f"Training with {k} topics...")
       model = PF(counts, vocab, num_topics=k, batch_size=64)
       model.train_step(num_steps=200, lr=0.01)

       coherence_df = model.compute_topic_coherence()
       coherence = coherence_df['coherence'].values
       results[k] = {
           'coherence_mean': coherence.mean(),
           'coherence_std': coherence.std(),
           'coherence_min': coherence.min(),
           'model': model
       }

   # Analyze results
   import pandas as pd
   df = pd.DataFrame(results).T
   print(df)

   # Best by coherence
   best_k = df['coherence_mean'].idxmax()
   print(f"\nOptimal num_topics: {best_k}")

**Practical guideline**:

- **Small corpus** (<10k docs): Start with K=5-20
- **Medium corpus** (10-100k docs): K=20-50
- **Large corpus** (100k+ docs): K=50-200

learning_rate: Optimization Speed
==================================

Controls how fast the model learns.

**Too low (0.001)**:

- Learning is very slow
- Many iterations needed
- More stable but inefficient

**Too high (0.5)**:

- Learning is erratic
- Loss may increase
- Unstable convergence

**Just right (0.01-0.05)**:

- Steady decrease in loss
- Converges in reasonable time
- Reproducible results

Finding optimal learning rate:

.. code-block:: python

   lrs_to_try = [0.001, 0.005, 0.01, 0.05, 0.1]

   for lr in lrs_to_try:
       print(f"\nTraining with learning_rate={lr}")
       model = PF(counts, vocab, num_topics=20, batch_size=64)
       model.train_step(num_steps=200, lr=lr)

**Default recommendation**: Start with 0.01

batch_size: Gradient Stability
===============================

Batch size affects gradient noise and GPU utilization.

**Too small (16)**:

- Very noisy gradients
- Unstable training
- Each iteration fast but need many
- Not efficient on GPU

**Too large (1024)**:

- Stable gradients
- Few iterations needed
- Slower per-iteration
- May not fit in GPU memory

**Balanced (64-256)**:

- Good stability
- Good GPU utilization
- Efficient training

Choosing batch_size:

.. code-block:: python

   # Rule of thumb: experiment with powers of 2
   batch_sizes = [32, 64, 128, 256, 512]

   for bs in batch_sizes:
       model = PF(counts, vocab, num_topics=20, batch_size=bs)

       import time
       t0 = time.time()
       model.train_step(num_steps=50, lr=0.01)
       elapsed = time.time() - t0

       print(f"batch_size={bs:3d}: {elapsed:.1f}s")
       # Find sweet spot of speed/quality

With GPU:

- Start with batch_size=256 or 512
- Increase until GPU memory error
- Then reduce by half

Systematic Hyperparameter Search
=================================

Grid search over parameter combinations:

.. code-block:: python

   from itertools import product

   # Parameter grid
   param_grid = {
       'num_topics': [10, 20],
       'learning_rate': [0.01, 0.05],
       'batch_size': [64, 128]
   }

   best_score = -float('inf')
   best_params = None

   # Grid search
   for (k, lr, bs) in product(*param_grid.values()):
       params = {'num_topics': k, 'learning_rate': lr, 'batch_size': bs}

       model = PF(counts, vocab, num_topics=k, batch_size=bs)
       model.train_step(num_steps=200, lr=lr)

       # Evaluate
       coherence_df = model.compute_topic_coherence()
       score = coherence_df['coherence'].mean()

       print(f"K={k}, lr={lr}, bs={bs}: coherence={score:.3f}")

       if score > best_score:
           best_score = score
           best_params = params

   print(f"\nBest parameters: {best_params}")
   print(f"Best coherence: {best_score:.3f}")

**Warning**: Grid search is expensive. With 100 combinations and 100 iterations each:

.. code-block:: text

   100 models × 100 iterations × 1 minute per 100 iters = 167 hours (!!)

   Solution: Use random search or limit combinations
   Or better: use GPU (10-40x faster)

Random Search (More Efficient)
==============================

.. code-block:: python

   import numpy as np

   # Sample 20 random combinations from space
   n_trials = 20
   results = []

   for trial in range(n_trials):
       # Random parameters
       k = np.random.choice([10, 15,20, 30, 50])
       lr = np.random.uniform(0.001, 0.1)  # log scale recommended
       bs = np.random.choice([32, 64, 128, 256])

       model = PF(counts, vocab, num_topics=k, batch_size=bs)
       model.train_step(num_steps=200, lr=lr)

       coherence_df = model.compute_topic_coherence()
       results.append({
           'num_topics': k,
           'learning_rate': lr,
           'batch_size': bs,
           'coherence': coherence_df['coherence'].mean()
       })

       print(f"Trial {trial+1}/{n_trials}: coherence={results[-1]['coherence']:.3f}")

   # Best configuration
   best_idx = np.argmax([r['coherence'] for r in results])
   best_config = results[best_idx]
   print(f"Best: {best_config}")

Practical Tuning Strategy
=========================

**Step 1: Find good num_topics** (most important)

.. code-block:: python

   # Try 5 values: rough search
   for k in [10, 20, 35, 50, 75]:
       model = PF(counts, vocab, num_topics=k, batch_size=64)
       model.train_step(num_steps=200, lr=0.01)
       coherence_df = model.compute_topic_coherence()
       print(f"K={k}: {coherence_df['coherence'].mean():.3f}")

**Step 2: Refine around best K**

.. code-block:: python

   # If K=35 was best, try nearby
   best_k = 35
   for k in range(30, 41, 1):  # 30-40
       model = PF(counts, vocab, num_topics=k, batch_size=64)
       model.train_step(num_steps=200, lr=0.01)
       coherence_df = model.compute_topic_coherence()
       print(f"K={k}: {coherence_df['coherence'].mean():.3f}")

**Step 3: Tune lr and batch_size**

.. code-block:: python

   # With best K, try different lr values
   best_k = 35  # from previous step

   for lr in [0.005, 0.01, 0.02, 0.05]:
       model = PF(counts, vocab, num_topics=best_k, batch_size=64)
       model.train_step(num_steps=200, lr=lr)
       coherence_df = model.compute_topic_coherence()
       print(f"lr={lr}: {coherence_df['coherence'].mean():.3f}")

**Step 4: Final validation**

.. code-block:: python

   # Train final model with best parameters
   final_model = PF(counts, vocab, num_topics=35, batch_size=128)
   final_model.train_step(num_steps=500, lr=0.02)  # More steps for final

   # Validate
   coherence_df = final_model.compute_topic_coherence()
   print(f"Final model coherence: {coherence_df['coherence'].mean():.3f}")
   final_model.summary())

Early Stopping
==============

Stop training when loss plateaus:

.. code-block:: python

   model = PF(counts, vocab, num_topics=20, batch_size=64)

   loss_history = []
   patience = 10  # Stop if no improvement for 10 iterations
   best_loss = float('inf')
   patience_counter = 0

   for epoch in range(100):
       params = model.train_step(num_steps=10, lr=0.01)
       current_loss = model.Metrics.loss[-1] if model.Metrics.loss else float('nan')
       loss_history.append(current_loss)

       print(f"Epoch {epoch+1}: loss={current_loss:.1f}")

       # Check for improvement
       if current_loss < best_loss - 1.0:  # Improvement threshold
           best_loss = current_loss
           patience_counter = 0
           print("  ✓ Improvement!")
       else:
           patience_counter += 1
           print(f"  No improvement ({patience_counter}/{patience})")

       if patience_counter >= patience:
           print("Early stopping!")
           break

Documenting Experiments
=======================

Track your hyperparameter explorations:

.. code-block:: python

   import logging

   logging.basicConfig(
       filename='hyperparameter_log.txt',
       level=logging.INFO,
       format='%(asctime)s - %(message)s'
   )

   for k in [20, 30, 50]:
       model = PF(counts, vocab, num_topics=k, batch_size=64)
       model.train_step(num_steps=200, lr=0.01)
       coherence_df = model.compute_topic_coherence()

       logging.info(f"K={k}: coherence={coherence_df['coherence'].mean():.3f}")

Common Mistakes & Solutions
============================

**Mistake**: Tuning learning_rate too aggressively

*Solution*: It's usually not the bottleneck. Focus on K first.

**Mistake**: Grid search over too many combinations

*Solution*: Use random search or tune one parameter at a time.

**Mistake**: Not tracking which configurations you've tried

*Solution*: Keep a log with timestamps and results.

**Mistake**: Overfitting to coherence on one dataset

*Solution*: Validate on held-out documents, multiple datasets.

**Mistake**: Not using GPU

*Solution*: Enable GPU - changes game for hyperparameter search!

Tuning Checklist
================

✓ Focus on num_topics first (most impact)
✓ Try at least 5 different values
✓ Use GPU to enable faster experimentation
✓ Document all trials
✓ Validate on held-out data
✓ Use early stopping when possible
✓ Final training: more iterations than tuning

Next Steps
==========

- Want to understand models better? See :doc:`../fundamentals/index`
- Ready to use your model? See :doc:`../how_to_guides/index`
- Need production setup? See :doc:`../contributing_guide/index`

Summary
=======

1. Num_topics is the most important parameter
2. Learning rate usually fine at 0.01
3. Batch size affects speed, not much else
4. Use GPU to enable rapid experimentation
5. Track all experiments for reproducibility
6. Stop training when loss plateaus