poisson-topicmodels: Probabilistic Topic Modeling with Bayesian Inference

poisson-topicmodels is a modern Python package for probabilistic topic modeling using Bayesian inference, built on JAX and NumPyro.

It enables researchers and practitioners to extract interpretable semantic structure from text data through advanced topic modeling techniques with transparent GPU acceleration and reproducible results.

Quick Links

Get Started – Getting Started – 5-minute introduction and basic examples
Installation – Installation – Install poisson-topicmodels from PyPI or source
Fundamentals – Fundamentals – Learn core concepts and model variants
Tutorials – Tutorials – Step-by-step guides for different use cases
API Reference – API Reference – Complete API documentation with examples
How-To Guides – How-To Guides – Practical recipes for common tasks
Examples – Examples & Applications – Real-world examples and applications
Testing – Testing Guide – How to test your code
Contributing – Contributing Guide – Contribute to the project
Release Notes – Release Notes & Changelog – Version history and changelog

Key Features

✨ Modern Probabilistic Inference: Built on NumPyro for automatic differentiation, probabilistic programming, and integration with cutting-edge Bayesian methods.
✨ Advanced Topic Models: Beyond LDA: guided topic discovery, covariate effects, ideal point estimation, and word embeddings—all with principled Bayesian inference.
✨ GPU Acceleration: Leverages JAX for transparent GPU computation, essential for large-scale corpus analysis.
✨ Reproducible & Scalable: Mini-batch SVI training with built-in seed control for exact reproducibility.
✨ Research-Friendly API: Purpose-built for computational social science and NLP researchers.

The Package at a Glance

The poisson-topicmodels library provides multiple topic modeling approaches:

Model	Use Case	Key Feature
Poisson Factorization (PF)	Unsupervised baseline	Fast, interpretable word-topic associations
Seeded PF (SPF)	Guided discovery	Incorporate domain knowledge via keyword priors
Covariate PF (CPF)	Covariate effects	Model topics influenced by document metadata
Covariate Seeded PF (CSPF)	Guided + covariates	Combine keyword guidance with external factors
Text-Based Ideal Points (TBIP)	Ideal point estimation	Estimate author positions from legislative/social text
Structured Text-Based Scaling (STBS)	Topic-specific ideal points + covariates	Topic-specific ideal points with author-level covariates
Embedded Topic Models (ETM)	Modern embeddings	Integrate pre-trained word embeddings

Core Capabilities:

✓ Stochastic Variational Inference (SVI) with mini-batch training
✓ Transparent GPU acceleration via JAX
✓ Reproducible results with seed control
✓ Type hints and comprehensive API documentation
✓ >70% test coverage with continuous integration
✓ Clear error messages and input validation

Quick Start Example

import numpy as np
from scipy.sparse import csr_matrix
from poisson_topicmodels import PF

# Prepare data: document-term matrix and vocabulary
counts = csr_matrix(np.random.poisson(2, (100, 500)).astype(np.float32))
vocab = np.array([f'word_{i}' for i in range(500)])

# Initialize and train model
model = PF(counts, vocab, num_topics=10, batch_size=32)
params = model.train_step(num_steps=200, lr=0.01, random_seed=42)

# Summarize and inspect
model.summary()
top_words = model.return_top_words_per_topic(n=10)
for topic_id, words in top_words.items():
    print(f"Topic {topic_id}: {', '.join(words)}")

# Evaluate and visualize
print(f"Topic diversity: {model.compute_topic_diversity():.3f}")
model.plot_model_loss()
model.plot_topic_prevalence()

See Getting Started for a detailed walkthrough.

Community & Contributing

We welcome contributions! For guidelines, see the Contributing Guide.

🐛 Found a bug? Open an issue
💡 Have a feature request? Start a discussion
📚 Want to contribute? Check out our contribution guidelines

License

This project is licensed under the MIT License. See the LICENSE file for details.

Citation

If you use poisson-topicmodels in your research, please cite:

@software{prostmaier2026poisson,
  title={poisson-topicmodels: Probabilistic Topic Modeling with Bayesian Inference},
  author={Prostmaier, Bernd and Grün, Bettina and Hofmarcher, Paul},
  year={2026},
  url={https://github.com/BPro2410/topicmodels_package}
}