.. _how_to_guides:

================================================================================
How-To Guides
================================================================================

Practical recipes for common topic modeling tasks.

.. note::

   Individual how-to guides for specific tasks are coming soon.
   For now, please refer to:

   - :doc:`../tutorials/index` for complete step-by-step examples
   - :doc:`../fundamentals/index` for understanding core concepts
   - :doc:`../getting_started/index` for a quick introduction

Common Topics
=============

**Data & Input**

- Loading text files and creating document-term matrices
- Tokenization, cleaning, and preprocessing
- Working with sparse matrix formats
- Creating and managing vocabularies

**Training & Configuration**

- Mini-batch vs full-batch training
- GPU acceleration and multi-GPU setups
- Hyperparameter tuning and validation
- Reproducibility with seeds

**Results & Analysis**

- Extracting topic distributions and top words
- Interpreting and visualizing results
- Model evaluation and coherence metrics
- Exporting results for downstream analysis

**Troubleshooting**

- Handling data issues and edge cases
- GPU memory problems
- Training failures and convergence issues
- Improving topic quality

Tips & Best Practices
=====================

**General Workflow**

1. **Prepare Data**: Load and clean text, create document-term matrix
2. **Train Model**: Start with basic PF before trying advanced variants
3. **Evaluate**: Check topic quality with metrics and manual inspection
4. **Extract & Analyze**: Get top words, distributions, and visualizations
5. **Improve**: Adjust hyperparameters or model type if needed

**Performance Optimization**

- Use GPU for datasets with >100k documents (see :doc:`../tutorials/tutorial_gpu`)
- Filter rare words to reduce vocabulary size
- Use sparse matrices for large inputs
- Start with fewer topics for testing, then scale up

**Model Selection**

- **PF**: Start here - simple, unsupervised baseline
- **SPF**: Use if you have domain knowledge (keywords/seeds)
- **CPF/CSPF**: Use if documents have metadata (authors, dates, etc.)
- **ETM**: Use if you have pre-trained word embeddings
- **TBIP**: Use for discovering ideological positions (political text)

Learn More
==========

- For theoretical foundations: :doc:`../fundamentals/index`
- For complete tutorials: :doc:`../tutorials/index`
- For API details: :doc:`../api/index`
- For examples: :doc:`../examples_guide/index`