.. _api:

================================================================================
API Reference
================================================================================

Complete API documentation for poisson-topicmodels.

.. note::

   For auto-generated class and method documentation from docstrings, see
   :doc:`../models`.

Module Organization
====================

.. code-block:: python

   from poisson_topicmodels import (
       # Models
       PF,       # Poisson Factorization (unsupervised)
       SPF,      # Seeded Poisson Factorization (guided)
       CPF,      # Covariate Poisson Factorization (with metadata)
       CSPF,     # Covariate Seeded Poisson Factorization (both)
       ETM,      # Embedded Topic Models (with embeddings)
       TBIP,     # Text-Based Ideal Points (author positions)
       STBS,     # Structured Text-Based Scaling (topic-specific ideal points)
       # Base classes
       NumpyroModel,
       Metrics,
   )

Model API Pattern
=================

All models follow the same workflow:

**1. Initialize**

.. code-block:: python

   model = PF(counts=dtm, vocab=vocab, num_topics=10, batch_size=32)

**2. Train**

.. code-block:: python

   params = model.train_step(num_steps=200, lr=0.01, random_seed=42)

**3. Summarize**

.. code-block:: python

   model.summary()              # Formatted text summary of the fitted model

**4. Extract**

.. code-block:: python

   topics, e_theta = model.return_topics()         # Dominant topic per doc + proportions
   beta = model.return_beta()                       # Word–topic DataFrame (words × topics)
   top_words = model.return_top_words_per_topic(n=10)  # dict {topic_id: [words]}

**5. Evaluate**

.. code-block:: python

   coherence_df = model.compute_topic_coherence()   # NPMI coherence per topic
   diversity = model.compute_topic_diversity()       # Fraction of unique top words (0–1)

**6. Visualize**

.. code-block:: python

   model.plot_model_loss()               # Training loss curve
   model.plot_topic_prevalence()         # Mean topic prevalence bar chart
   model.plot_topic_correlation()        # Cosine-similarity heatmap
   model.plot_document_topic_heatmap()   # Document × topic heatmap
   model.plot_topic_wordclouds()         # Wordcloud per topic

Common Parameters
=================

**Data**

- ``counts`` (csr_matrix): Document-term matrix (documents × terms)
- ``vocab`` (ndarray): Vocabulary terms, shape ``(num_words,)``

**Model configuration**

- ``num_topics`` (int): Number of topics to discover (PF, CPF, TBIP, STBS, ETM)
- ``keywords`` (dict): Seed words per topic (SPF, CSPF)
- ``residual_topics`` (int): Extra unsupervised topics (SPF, CSPF)
- ``X_design_matrix`` (ndarray | DataFrame): Covariates (document-level for CPF/CSPF, author-level for STBS)
- ``authors`` (ndarray): Author labels per document (TBIP, STBS)
- ``embeddings_mapping`` (dict): Word → embedding vector (ETM)

**Training**

- ``num_steps`` (int): Training iterations
- ``lr`` (float): Learning rate (step size for optimizer)
- ``batch_size`` (int): Documents per training step
- ``random_seed`` (int): Reproducibility seed (supported by PF/SPF/CPF/CSPF/ETM)

Common Methods (all models)
============================

**``train_step(num_steps, lr, random_seed=None, ...)``**
   Train the model via Stochastic Variational Inference (SVI).

   Returns: ``dict`` of estimated parameters.

   Note: ``TBIP`` and ``STBS`` currently expose ``train_step(num_steps, lr)``
   without a ``random_seed`` argument.

**``return_topics()``**
   Returns ``(categories, E_theta)`` — dominant topic per document and
   document-topic proportions.

**``return_beta()``**
   Returns a ``pd.DataFrame`` of word–topic associations (words × topics).

**``return_top_words_per_topic(n=10)``**
   Returns a ``dict`` mapping topic identifiers to their top-n words.

**``summary(n_top_words=5)``**
   Prints a formatted summary of the fitted model, including loss,
   top words, and model-specific details.

**``compute_topic_coherence(texts=None, metric='c_npmi', top_n=10)``**
   Computes per-topic coherence scores (NPMI or UMass).

   Returns: ``pd.DataFrame`` with topic and coherence columns.

**``compute_topic_diversity(top_n=25)``**
   Fraction of unique words across all topics' top-n lists. Range 0–1.

**``plot_model_loss(window=10, save_path=None)``**
   Line chart of training loss (raw + smoothed). Returns ``(fig, ax)``.

**``plot_topic_prevalence(save_path=None)``**
   Horizontal bar chart of mean topic prevalence. Returns ``(fig, ax)``.

**``plot_topic_correlation(save_path=None)``**
   Cosine-similarity heatmap between topics. Returns ``(fig, ax)``.

**``plot_document_topic_heatmap(n_docs=50, sort_by_topic=False, save_path=None)``**
   Document × topic heatmap. Returns ``(fig, ax)``.

**``plot_topic_wordclouds(n_words=50, figsize=(16,12), save_path=None)``**
   One wordcloud per topic. Returns ``(fig, axes)``.

SPF-specific Methods
====================

**``plot_seed_effectiveness(save_path=None)``**
   Grouped bar chart comparing mean seed vs. non-seed word weights per topic.

   Returns: ``(fig, axes)``.

CPF-specific Methods
====================

**``return_covariate_effects()``**
   Point estimates of covariate effect matrix λ (covariates × topics).

   Returns: ``pd.DataFrame``.

**``return_covariate_effects_ci(ci=0.95)``**
   Covariate effects with Bayesian credible intervals.

   Returns: ``pd.DataFrame`` with columns ``covariate, topic, mean, lower, upper``.

**``plot_cov_effects(ci=0.95, topics=None, save_path=None)``**
   Forest plot of covariate effects with credible intervals.

   Returns: ``(fig, axes)``.

CSPF-specific Methods
=====================

Inherits all SPF methods (seeded topics) plus all CPF methods (covariate effects):

- ``return_covariate_effects()``
- ``return_covariate_effects_ci(ci=0.95)``
- ``plot_cov_effects(ci=0.95, ...)``

TBIP-specific Methods
=====================

**``return_ideal_points()``**
   Returns a ``pd.DataFrame`` with columns ``author, ideal_point, std``,
   sorted by ideal point.

**``return_ideological_words(topic, n=10)``**
   Top-n words with the strongest ideological loading (η) for a given topic.

   Returns: ``pd.DataFrame`` with columns ``word, eta, direction``.

**``plot_ideal_points(selected_authors=None, show_ci=False, ci=0.95, save_path=None)``**
   1-D scatter of author ideal points with optional credible intervals.

   Returns: ``(fig, ax)``.

STBS-specific Methods
=====================

**``return_ideal_points()``**
   Returns a ``pd.DataFrame`` with columns ``author, topic, ideal_point, std``
   for topic-specific author positions.

**``return_ideal_covariates()``**
   Returns a ``pd.DataFrame`` with columns ``covariate, topic, iota, std``
   for covariate effects on ideological positions.

**``plot_author_topic_heatmap(...)``**
   Heatmap of mean normalized author-topic intensities.

   Returns: ``(fig, ax)``.

**``plot_ideol_points(...)``**
   Dot plot of author ideology by topic, with optional grouping overlays.

   Returns: ``(fig, ax)``.

**``plot_iota_credible_intervals(ci=0.95, ...)``**
   Credible-interval plot for covariate-topic ideology coefficients.

   Returns: ``(fig, ax)``.

ETM-specific Methods
====================

ETM overrides ``return_topics()`` and ``return_beta()`` to use its neural encoder
and embedding-based topic–word computation. No additional public methods beyond
the common set.

Metrics Dataclass
=================

``Metrics`` tracks training diagnostics per model instance:

- ``loss`` (list): ELBO loss per training step
- ``coherence_scores`` (pd.DataFrame | None): Per-topic coherence if computed
- ``diversity`` (float | None): Topic diversity if computed
- ``reset()``: Clear all stored metrics

Error Handling
==============

Models validate inputs and provide clear error messages:

.. code-block:: python

   try:
       model = PF(counts, vocab, num_topics=10, batch_size=32)
   except ValueError as e:
       print(f"Invalid input: {e}")

Type Hints
==========

All functions include type hints for IDE support and static analysis.

Performance Notes
=================

- Use sparse matrices (CSR format) for large vocabularies
- GPU acceleration requires ``JAX_PLATFORMS=gpu``
- Batch size affects memory usage and speed
- See :doc:`../tutorials/tutorial_gpu` for optimization

API Stability
=============

- Public API (what you import) is stable
- Internal implementation may change
- Breaking changes documented in release notes

Next Steps
==========

- Auto-generated docs: :doc:`../models`
- Learn models: :doc:`../fundamentals/index`
- Train models: :doc:`../tutorials/index`
- Solve tasks: :doc:`../how_to_guides/index`