poisson-topicmodels: Probabilistic Topic Modeling with Bayesian Inference
poisson-topicmodels is a modern Python package for probabilistic topic modeling using Bayesian inference, built on JAX and NumPyro.
It enables researchers and practitioners to extract interpretable semantic structure from text data through advanced topic modeling techniques with transparent GPU acceleration and reproducible results.
Quick Links
Get Started – Getting Started – 5-minute introduction and basic examples
Installation – Installation – Install poisson-topicmodels from PyPI or source
Fundamentals – Fundamentals – Learn core concepts and model variants
Tutorials – Tutorials – Step-by-step guides for different use cases
API Reference – API Reference – Complete API documentation with examples
How-To Guides – How-To Guides – Practical recipes for common tasks
Examples – Examples & Applications – Real-world examples and applications
Testing – Testing Guide – How to test your code
Contributing – Contributing Guide – Contribute to the project
Release Notes – Release Notes & Changelog – Version history and changelog
Key Features
- ✨ Modern Probabilistic Inference
Built on NumPyro for automatic differentiation, probabilistic programming, and integration with cutting-edge Bayesian methods.
- ✨ Advanced Topic Models
Beyond LDA: guided topic discovery, covariate effects, ideal point estimation, and word embeddings—all with principled Bayesian inference.
- ✨ GPU Acceleration
Leverages JAX for transparent GPU computation, essential for large-scale corpus analysis.
- ✨ Reproducible & Scalable
Mini-batch SVI training with built-in seed control for exact reproducibility.
- ✨ Research-Friendly API
Purpose-built for computational social science and NLP researchers.
The Package at a Glance
The poisson-topicmodels library provides multiple topic modeling approaches:
Model |
Use Case |
Key Feature |
|---|---|---|
Poisson Factorization (PF) |
Unsupervised baseline |
Fast, interpretable word-topic associations |
Seeded PF (SPF) |
Guided discovery |
Incorporate domain knowledge via keyword priors |
Covariate PF (CPF) |
Covariate effects |
Model topics influenced by document metadata |
Covariate Seeded PF (CSPF) |
Guided + covariates |
Combine keyword guidance with external factors |
Text-Based Ideal Points (TBIP) |
Ideal point estimation |
Estimate author positions from legislative/social text |
Structured Text-Based Scaling (STBS) |
Topic-specific ideal points + covariates |
Topic-specific ideal points with author-level covariates |
Embedded Topic Models (ETM) |
Modern embeddings |
Integrate pre-trained word embeddings |
Core Capabilities:
✓ Stochastic Variational Inference (SVI) with mini-batch training
✓ Transparent GPU acceleration via JAX
✓ Reproducible results with seed control
✓ Type hints and comprehensive API documentation
✓ >70% test coverage with continuous integration
✓ Clear error messages and input validation
Quick Start Example
import numpy as np
from scipy.sparse import csr_matrix
from poisson_topicmodels import PF
# Prepare data: document-term matrix and vocabulary
counts = csr_matrix(np.random.poisson(2, (100, 500)).astype(np.float32))
vocab = np.array([f'word_{i}' for i in range(500)])
# Initialize and train model
model = PF(counts, vocab, num_topics=10, batch_size=32)
params = model.train_step(num_steps=200, lr=0.01, random_seed=42)
# Summarize and inspect
model.summary()
top_words = model.return_top_words_per_topic(n=10)
for topic_id, words in top_words.items():
print(f"Topic {topic_id}: {', '.join(words)}")
# Evaluate and visualize
print(f"Topic diversity: {model.compute_topic_diversity():.3f}")
model.plot_model_loss()
model.plot_topic_prevalence()
See Getting Started for a detailed walkthrough.
Community & Contributing
We welcome contributions! For guidelines, see the Contributing Guide.
🐛 Found a bug? Open an issue
💡 Have a feature request? Start a discussion
📚 Want to contribute? Check out our contribution guidelines
License
This project is licensed under the MIT License. See the LICENSE file for details.
Citation
If you use poisson-topicmodels in your research, please cite:
@software{prostmaier2026poisson,
title={poisson-topicmodels: Probabilistic Topic Modeling with Bayesian Inference},
author={Prostmaier, Bernd and Grün, Bettina and Hofmarcher, Paul},
year={2026},
url={https://github.com/BPro2410/topicmodels_package}
}