Ideal Points Models (TBIP)
Text-Based Ideal Points (TBIP) is a specialized model for estimating latent positions (ideal points) of authors based on their language use. Commonly used in political science and social media analysis.
What Are Ideal Points?
Ideal points are latent coordinates representing positions on abstract dimensions:
Example: Political polarization
Left politicians use words: “equality”, “justice”, “workers”, “government”
Right politicians use words: “freedom”, “liberty”, “business”, “market”
Model estimates position on left-right spectrum from text
Example: Product stance
Critics use words: “broken”, “poor quality”, “disappointed”
Supporters use words: “amazing”, “excellent”, “recommend”
Model estimates critic vs. supporter position
Model Intuition
TBIP works by:
Discovering topics in text corpus
Analyzing word usage patterns within topics
Inferring author positions that explain language variation
Higher-dimensional spaces possible (not just 1D left-right):
2D: (left-right, authoritarian-libertarian)
3D+: Custom dimensions discovered from data
When to Use TBIP
Use TBIP when:
✓ You have author-attributed text (speeches, tweets, reviews) ✓ You assume polarization or position variation ✓ You want to estimate latent author positions ✓ You’re interested in discourse analysis
Don’t use if:
✗ Text is anonymous or unattributed ✗ No meaningful position variation expected ✗ You only care about topics, not author positions
Basic Usage
from poisson_topicmodels import TBIP
import numpy as np
# Author IDs indicating who wrote each document
author_ids = np.array([0, 1, 0, 2, 1, 0, ...]) # 3 authors
model = TBIP(
counts=counts,
vocab=vocab,
authors=author_ids,
num_topics=10,
batch_size=32,
)
params = model.train_step(num_steps=200, lr=0.01)
# Extract results
ideal_points_df = model.return_ideal_points() # DataFrame: author, ideal_point, std
print(ideal_points_df)
Interpreting Ideal Points
1D Case (single position axis):
ideal_points_df = model.return_ideal_points()
print(ideal_points_df)
# author ideal_point std
# 0 author_A -2.30 0.15
# 1 author_B 0.00 0.12
# 2 author_C 1.50 0.18
Visualization (built-in):
# Publication-ready 1-D scatter with optional credible intervals
fig, ax = model.plot_ideal_points(show_ci=True, ci=0.95)
# Or manually:
import matplotlib.pyplot as plt
df = model.return_ideal_points()
plt.scatter(df['ideal_point'], range(len(df)))
for i, row in df.iterrows():
plt.annotate(row['author'], (row['ideal_point'], i))
plt.xlabel('Ideal Point (left ← → right)')
plt.show()
Practical Example: Political Speeches
# Analyze legislative speeches
# Documents: individual speeches
# Authors: legislators
# Goal: estimate left-right position from language
from poisson_topicmodels import TBIP
# Load speech dataset
speeches = load_speeches() # (num_speeches, num_documents)
legislator_ids = speeches['legislator'].values # who said each speech
counts = speech_dtm # document-term matrix
model = TBIP(
counts=counts,
vocab=vocab,
authors=legislator_ids,
num_topics=20,
batch_size=64,
)
model.train_step(num_steps=200, lr=0.01)
# Get positions
ideal_points_df = model.return_ideal_points()
model.summary()
# Built-in visualization with credible intervals
model.plot_ideal_points(show_ci=True)
# Ideological words for the most political topic
print(model.return_ideological_words(topic=0, n=15))
# Compare with known party affiliation
parties = legislator_ids_to_parties(legislator_ids)
import matplotlib.pyplot as plt
for party_id, party in enumerate(['Democrat', 'Republican']):
mask = parties == party_id
plt.hist(ideal_points[mask], alpha=0.5, label=party)
plt.xlabel('Ideal Point (left ← → right)')
plt.legend()
plt.show()
# Expected: Democrats mostly negative, Republicans mostly positive
Validating Ideal Points
Compare with known positions:
# If ground truth available
true_positions = get_known_positions()
estimated = model.return_ideal_points()['ideal_point'].values
# Correlation should be high
correlation = np.corrcoef(true_positions, estimated)[0, 1]
print(f"Correlation: {correlation:.3f}") # Should be > 0.7 ideally
# Spearman rank correlation (order matters)
from scipy.stats import spearmanr
rank_corr, p_value = spearmanr(true_positions, estimated)
print(f"Rank correlation: {rank_corr:.3f}, p={p_value:.4f}")
Qualitative inspection:
# Read documents from extreme authors
df = model.return_ideal_points()
leftmost_author = df.iloc[0]['author'] # sorted by ideal_point
rightmost_author = df.iloc[-1]['author']
print(f"Leftmost author (ID {leftmost_author}):")
print(f"Top documents: {get_top_docs(leftmost_author, n=3)}")
print("\nRightmost author (ID {rightmost_author}):")
print(f"Top documents: {get_top_docs(rightmost_author, n=3)}")
Topic usage patterns:
# Which words distinguish the extremes the most?
for topic_id in range(min(3, model.num_topics)):
ideo = model.return_ideological_words(topic=topic_id, n=5)
print(f"\nTopic {topic_id} ideological words:")
print(ideo)
Relationship to Other Models
TBIP vs. PF: Adds author position estimation
PF: Discovers topics only
TBIP: Discovers topics AND author positions
TBIP vs. CPF: Different covariate handling
CPF: Document-level continuous covariates
TBIP: Author-level latent positions
Typical workflow:
Start with PF or SPF to understand topics
If interested in author positions, add TBIP
Optional: compare with CPF using author dummies as covariates
Implementation Details
Identification: Ideal points can be flipped in sign (both left and right position work); only relative order is meaningful.
Centering: Model centers ideal points at 0 by default (mean = 0).
Scaling: Values are on arbitrary scale; interpret using relative differences.
Multiple dimensions: Discovered dimensions may not have clear interpretations. This is normal—inspect word distributions to understand.
Troubleshooting
Problem: Ideal points don’t seem meaningful
Solution: - Check author IDs are correct - Ensure sufficient documents per author - Inspect topics and words - Try different num_topics or num_dimensions - Increase training iterations
Problem: Positions don’t match known affiliations
Solution: - Known affiliations might not align with language patterns - Try different num_dimensions - Check if covariate (e.g., party) matches topic structure - Language use might reveal different dimensions than official positions
Problem: Training is slow**
Solution:
- Reduce number of topics
- Increase batch size
- Reduce vocabulary (remove rare words)
- Use GPU: export JAX_PLATFORMS=gpu
Next Steps
Embedded Topic Models (ETM) - Exploring ETM with embeddings
Tutorials - Advanced techniques
API Reference - Complete TBIP API reference