Testing Guide

How to test poisson-topicmodels code and write tests for your own code.

Running Tests

Run all tests:

pytest tests/

Run specific test file:

pytest tests/test_pf.py

Run with coverage:

pytest tests/ --cov=poisson_topicmodels

Test Categories

Unit Tests: Function/class behavior

Located in: tests/test_*.py
Fast, focused
Test individual components

Integration Tests: Multiple components together

Located in: tests/test_integration.py
Slower, broader scope
Test workflows

Comprehensive Tests: All models and features

Located in: tests/test_models_comprehensive.py
Slowest
Test all variations

Test Coverage

Coverage target: >90%

View coverage report:

pytest tests/ --cov=poisson_topicmodels --cov-report=html
open htmlcov/index.html  # View in browser

Writing Tests

Test structure:

import pytest
import numpy as np
from scipy.sparse import csr_matrix
from poisson_topicmodels import PF

class TestPF:
    """Tests for Poisson Factorization model."""

    @pytest.fixture
    def sample_data(self):
        """Create sample data for testing."""
        counts = csr_matrix(
            np.random.poisson(2, (50, 100)).astype(np.float32)
        )
        vocab = np.array([f'word_{i}' for i in range(100)])
        return counts, vocab

    def test_initialization(self, sample_data):
        """Test model initializes correctly."""
        counts, vocab = sample_data
        model = PF(counts, vocab, num_topics=10)
        assert model.num_topics == 10
        assert model.vocab.shape[0] == 100

    def test_training(self, sample_data):
        """Test model trains successfully."""
        counts, vocab = sample_data
        model = PF(counts, vocab, num_topics=5, batch_size=16)
        params = model.train_step(num_steps=10, lr=0.01)
        assert params is not None

    def test_return_topics(self, sample_data):
        """Test topic extraction."""
        counts, vocab = sample_data
        model = PF(counts, vocab, num_topics=5, batch_size=16)
        model.train_step(num_steps=10, lr=0.01)
        categories, e_theta = model.return_topics()
        assert e_theta.shape == (50, 5)

Continuous Integration

Tests run automatically on:

Every push to main/develop
Every pull request
Scheduled nightly runs

Via GitHub Actions:

Python 3.11, 3.12, 3.13
CPU and GPU (if available)
Multiple OS: Linux, macOS, Windows

View results on GitHub Actions tab.

Testing Best Practices

For users:

Run tests after installation: pytest tests/test_imports.py
Before reporting bugs: run full test suite
Document any test failures

For developers:

Write tests for new features
Aim for >90% code coverage
Test edge cases and error conditions
Document test purpose

Test Organization

File	Purpose
test_imports.py	Check packages import correctly
test_input_validation.py	Validate input handling
test_pf.py	Poisson Factorization tests
test_spf.py	Seeded models tests
test_integration.py	End-to-end workflows
test_models_comprehensive.py	All models, all variants
test_training_and_outputs.py	Training process & outputs

Common Test Patterns

Checking shape:

categories, e_theta = model.return_topics()
assert e_theta.shape == (num_docs, num_topics)

Checking values:

beta = model.return_beta()
assert (beta.values >= 0).all()  # Non-negative

Checking errors:

with pytest.raises(ValueError):
    model = PF(invalid_counts, vocab, num_topics=5)

Parametrized tests:

@pytest.mark.parametrize("num_topics", [5, 10, 20])
def test_different_topics(self, sample_data, num_topics):
    """Test with different numbers of topics."""
    counts, vocab = sample_data
    model = PF(counts, vocab, num_topics=num_topics, batch_size=16)
    model.train_step(num_steps=5, lr=0.01)
    _, e_theta = model.return_topics()
    assert e_theta.shape[1] == num_topics

Debugging Tests

Run with verbose output:

pytest tests/ -v  # Verbose
pytest tests/ -vv # Very verbose
pytest tests/ -s  # Show print statements

Stop at first failure:

pytest tests/ -x

Debug with pdb:

pytest tests/ --pdb  # Drop to debugger on failure

Performance Testing

Time specific tests:

pytest tests/ --durations=10  # 10 slowest tests

Mark slow tests:

@pytest.mark.slow
def test_large_model(self):
    """This test is slow."""
    ...

Run only fast tests:

pytest tests/ -m "not slow"

Testing GPU Code

Tests detect GPU automatically:

import jax

def test_with_gpu_if_available():
    if not jax.devices():
        pytest.skip("GPU not available")

    # Test GPU-specific code
    ...

Or mark GPU tests:

@pytest.mark.gpu
def test_gpu_training(self):
    """Only run on GPU."""
    ...

Run GPU tests:

pytest tests/ -m gpu

Troubleshooting Tests

Tests fail on import:

Install development dependencies:

pip install -e ".[dev]"

Random test failures:

Set seed for reproducibility:

np.random.seed(42)

Memory issues during testing:

Run with smaller datasets or skip large tests:

pytest tests/ -m "not large"

Tests hang:

Add timeout:

pytest tests/ --timeout=60  # 60 second timeout per test

Contributing Tests

When submitting code:

Write tests for new functionality
Ensure all tests pass
Maintain or increase coverage (aim for >90%)
Document test purpose

See Contributing Guide for full guidelines.

Test Configuration

Tests configured in:

pytest.ini – Pytest settings
conftest.py – Shared fixtures
.github/workflows/ – CI configuration

Examples from conftest:

@pytest.fixture
def random_seed():
    """Ensure reproducibility."""
    np.random.seed(42)
    return 42

@pytest.fixture
def sample_dtm():
    """Reusable sample data."""
    counts = csr_matrix(
        np.random.poisson(2, (100, 500)).astype(np.float32)
    )
    vocab = np.array([f'word_{i}' for i in range(500)])
    return counts, vocab

Testing Progress

Latest test results:

Total tests: 150+
Coverage target: >90%
CI status: Passing on main

Contribution Checklist

When adding code:

✓ Write unit tests ✓ Write integration tests ✓ All tests pass locally ✓ Coverage doesn’t decrease ✓ Document tested behavior ✓ Test on Python 3.11, 3.12, 3.13

Next Steps

Run tests: pytest tests/
Write tests: Follow patterns above
CI/CD: Automated via GitHub Actions
Contribute tests: See Contributing Guide