How-To Guides
Practical recipes for common topic modeling tasks.
Note
Individual how-to guides for specific tasks are coming soon. For now, please refer to:
Tutorials for complete step-by-step examples
Fundamentals for understanding core concepts
Getting Started for a quick introduction
Common Topics
Data & Input
Loading text files and creating document-term matrices
Tokenization, cleaning, and preprocessing
Working with sparse matrix formats
Creating and managing vocabularies
Training & Configuration
Mini-batch vs full-batch training
GPU acceleration and multi-GPU setups
Hyperparameter tuning and validation
Reproducibility with seeds
Results & Analysis
Extracting topic distributions and top words
Interpreting and visualizing results
Model evaluation and coherence metrics
Exporting results for downstream analysis
Troubleshooting
Handling data issues and edge cases
GPU memory problems
Training failures and convergence issues
Improving topic quality
Tips & Best Practices
General Workflow
Prepare Data: Load and clean text, create document-term matrix
Train Model: Start with basic PF before trying advanced variants
Evaluate: Check topic quality with metrics and manual inspection
Extract & Analyze: Get top words, distributions, and visualizations
Improve: Adjust hyperparameters or model type if needed
Performance Optimization
Use GPU for datasets with >100k documents (see Tutorial: GPU Acceleration)
Filter rare words to reduce vocabulary size
Use sparse matrices for large inputs
Start with fewer topics for testing, then scale up
Model Selection
PF: Start here - simple, unsupervised baseline
SPF: Use if you have domain knowledge (keywords/seeds)
CPF/CSPF: Use if documents have metadata (authors, dates, etc.)
ETM: Use if you have pre-trained word embeddings
TBIP: Use for discovering ideological positions (political text)
Learn More
For theoretical foundations: Fundamentals
For complete tutorials: Tutorials
For API details: API Reference
For examples: Examples & Applications