Ideal Points Models (TBIP & STBS)
This package provides two related ideal-point models:
Text-Based Ideal Points (TBIP): estimates one latent position per author
Structured Text-Based Scaling (STBS): estimates topic-specific positions per author and links them to author-level covariates
Both are commonly used in political science, social-media analysis, and author-level stance estimation.
What Are Ideal Points?
Ideal points are latent coordinates representing positions on abstract dimensions:
Example: Political polarization
Left politicians use words: “equality”, “justice”, “workers”, “government”
Right politicians use words: “freedom”, “liberty”, “business”, “market”
Model estimates position on left-right spectrum from text
Example: Product stance
Critics use words: “broken”, “poor quality”, “disappointed”
Supporters use words: “amazing”, “excellent”, “recommend”
Model estimates critic vs. supporter position
Model Intuition
TBIP works by:
Discovering topics in text corpus
Analyzing word usage patterns within topics
Inferring author positions that explain language variation
Higher-dimensional spaces possible (not just 1D left-right):
2D: (left-right, authoritarian-libertarian)
3D+: Custom dimensions discovered from data
TBIP vs. STBS at a glance:
TBIP: one ideal-point coordinate per author (simpler and faster)
STBS: one coordinate per author-topic pair, plus regression on author covariates
When to Use TBIP vs STBS
Use TBIP when:
✓ You have author-attributed text (speeches, tweets, reviews) ✓ You assume polarization or position variation ✓ You want to estimate latent author positions ✓ You’re interested in discourse analysis
Use STBS when:
✓ You want positions to vary by topic (not a single global axis) ✓ You have author-level metadata/covariates (e.g., party, tenure, demographics) ✓ You want covariate effects on ideology to be estimated jointly with topics
Don’t use either model if:
✗ Text is anonymous or unattributed ✗ No meaningful position variation expected ✗ You only care about topics, not author positions
Basic Usage (TBIP)
from poisson_topicmodels import TBIP
import numpy as np
# Author IDs indicating who wrote each document
author_ids = np.array([0, 1, 0, 2, 1, 0, ...]) # 3 authors
model = TBIP(
counts=counts,
vocab=vocab,
authors=author_ids,
num_topics=10,
batch_size=32,
)
params = model.train_step(num_steps=200, lr=0.01)
# Extract results
ideal_points_df = model.return_ideal_points() # DataFrame: author, ideal_point, std
print(ideal_points_df)
Basic Usage (STBS)
from poisson_topicmodels import STBS
import numpy as np
import pandas as pd
# Author per document (length = number of documents)
authors_doc = np.array(["author_a", "author_b", "author_a", ...])
# Author-level covariates with one row per unique author.
# Row order must match np.unique(authors_doc).
unique_authors = np.unique(authors_doc)
X_author = pd.DataFrame(
{
"party_r": [0, 1, ...],
"tenure_years": [4, 12, ...],
},
index=unique_authors,
)
model = STBS(
counts=counts,
vocab=vocab,
num_topics=10,
authors=authors_doc,
X_design_matrix=X_author,
batch_size=32,
)
model.train_step(num_steps=200, lr=0.01)
# Topic-specific author ideal points
stbs_ideal_points = model.return_ideal_points()
# Columns: author, topic, ideal_point, std
# Covariate effects on ideology by topic
stbs_covariate_effects = model.return_ideal_covariates()
# Columns: covariate, topic, iota, std
# STBS-specific visualization helpers
model.plot_author_topic_heatmap()
model.plot_ideol_points()
model.plot_iota_credible_intervals()
Interpreting Ideal Points
1D Case (single position axis):
ideal_points_df = model.return_ideal_points()
print(ideal_points_df)
# author ideal_point std
# 0 author_A -2.30 0.15
# 1 author_B 0.00 0.12
# 2 author_C 1.50 0.18
Visualization (built-in):
# Publication-ready 1-D scatter with optional credible intervals
fig, ax = model.plot_ideal_points(show_ci=True, ci=0.95)
# Or manually:
import matplotlib.pyplot as plt
df = model.return_ideal_points()
plt.scatter(df['ideal_point'], range(len(df)))
for i, row in df.iterrows():
plt.annotate(row['author'], (row['ideal_point'], i))
plt.xlabel('Ideal Point (left ← → right)')
plt.show()
Practical Example: Political Speeches (TBIP)
# Analyze legislative speeches
# Documents: individual speeches
# Authors: legislators
# Goal: estimate left-right position from language
from poisson_topicmodels import TBIP
# Load speech dataset
speeches = load_speeches() # (num_speeches, num_documents)
legislator_ids = speeches['legislator'].values # who said each speech
counts = speech_dtm # document-term matrix
model = TBIP(
counts=counts,
vocab=vocab,
authors=legislator_ids,
num_topics=20,
batch_size=64,
)
model.train_step(num_steps=200, lr=0.01)
# Get positions
ideal_points_df = model.return_ideal_points()
model.summary()
# Built-in visualization with credible intervals
model.plot_ideal_points(show_ci=True)
# Ideological words for the most political topic
print(model.return_ideological_words(topic=0, n=15))
# Compare with known party affiliation
parties = legislator_ids_to_parties(legislator_ids)
import matplotlib.pyplot as plt
for party_id, party in enumerate(['Democrat', 'Republican']):
mask = parties == party_id
plt.hist(ideal_points[mask], alpha=0.5, label=party)
plt.xlabel('Ideal Point (left ← → right)')
plt.legend()
plt.show()
# Expected: Democrats mostly negative, Republicans mostly positive
Validating Ideal Points
Compare with known positions:
# If ground truth available
true_positions = get_known_positions()
estimated = model.return_ideal_points()['ideal_point'].values
# Correlation should be high
correlation = np.corrcoef(true_positions, estimated)[0, 1]
print(f"Correlation: {correlation:.3f}") # Should be > 0.7 ideally
# Spearman rank correlation (order matters)
from scipy.stats import spearmanr
rank_corr, p_value = spearmanr(true_positions, estimated)
print(f"Rank correlation: {rank_corr:.3f}, p={p_value:.4f}")
Qualitative inspection:
# Read documents from extreme authors
df = model.return_ideal_points()
leftmost_author = df.iloc[0]['author'] # sorted by ideal_point
rightmost_author = df.iloc[-1]['author']
print(f"Leftmost author (ID {leftmost_author}):")
print(f"Top documents: {get_top_docs(leftmost_author, n=3)}")
print("\nRightmost author (ID {rightmost_author}):")
print(f"Top documents: {get_top_docs(rightmost_author, n=3)}")
Topic usage patterns:
# Which words distinguish the extremes the most?
for topic_id in range(min(3, model.num_topics)):
ideo = model.return_ideological_words(topic=topic_id, n=5)
print(f"\nTopic {topic_id} ideological words:")
print(ideo)
Relationship to Other Models
TBIP/STBS vs. PF: Adds author position estimation
PF: Discovers topics only
TBIP/STBS: Discover topics AND author positions
TBIP vs. STBS: Different structure
TBIP: Single latent position per author
STBS: Topic-specific latent position per author with author-level covariates
Typical workflow:
Start with PF or SPF to understand topics
Add TBIP for a compact ideological axis per author
Upgrade to STBS when you need topic-specific ideal points and covariate effects
Implementation Details
Identification: Ideal points can be flipped in sign (both left and right position work); only relative order is meaningful.
Centering: Model centers ideal points at 0 by default (mean = 0).
Scaling: Values are on arbitrary scale; interpret using relative differences.
Multiple dimensions: Discovered dimensions may not have clear interpretations. This is normal—inspect word distributions to understand.
STBS covariate alignment: STBS expects author-level covariates where row order
matches np.unique(authors) from the document-level authors input.
Troubleshooting
Problem: Ideal points don’t seem meaningful
Solution: - Check author IDs are correct - Ensure sufficient documents per author - Inspect topics and words - Try different num_topics or num_dimensions - Increase training iterations
Problem: Positions don’t match known affiliations
Solution: - Known affiliations might not align with language patterns - Try different num_dimensions - Check if covariate (e.g., party) matches topic structure - Language use might reveal different dimensions than official positions
Problem: Training is slow**
Solution:
- Reduce number of topics
- Increase batch size
- Reduce vocabulary (remove rare words)
- Use GPU: export JAX_PLATFORMS=gpu
Next Steps
Embedded Topic Models (ETM) - Exploring ETM with embeddings
Tutorials - Advanced techniques
API Reference - Complete TBIP/STBS API reference