.. _ideal_points: ================================================================================ Ideal Points Models (TBIP & STBS) ================================================================================ This package provides two related ideal-point models: - **Text-Based Ideal Points (TBIP)**: estimates one latent position per author - **Structured Text-Based Scaling (STBS)**: estimates **topic-specific** positions per author and links them to author-level covariates Both are commonly used in political science, social-media analysis, and author-level stance estimation. What Are Ideal Points? ======================= Ideal points are latent coordinates representing positions on abstract dimensions: **Example**: Political polarization - Left politicians use words: "equality", "justice", "workers", "government" - Right politicians use words: "freedom", "liberty", "business", "market" - Model estimates position on left-right spectrum from text **Example**: Product stance - Critics use words: "broken", "poor quality", "disappointed" - Supporters use words: "amazing", "excellent", "recommend" - Model estimates critic vs. supporter position Model Intuition =============== TBIP works by: 1. Discovering topics in text corpus 2. Analyzing word usage patterns within topics 3. Inferring author positions that explain language variation Higher-dimensional spaces possible (not just 1D left-right): - 2D: (left-right, authoritarian-libertarian) - 3D+: Custom dimensions discovered from data TBIP vs. STBS at a glance: - **TBIP**: one ideal-point coordinate per author (simpler and faster) - **STBS**: one coordinate per author-topic pair, plus regression on author covariates When to Use TBIP vs STBS ========================= Use **TBIP** when: ✓ You have author-attributed text (speeches, tweets, reviews) ✓ You assume polarization or position variation ✓ You want to estimate latent author positions ✓ You're interested in discourse analysis Use **STBS** when: ✓ You want positions to vary by topic (not a single global axis) ✓ You have author-level metadata/covariates (e.g., party, tenure, demographics) ✓ You want covariate effects on ideology to be estimated jointly with topics Don't use either model if: ✗ Text is anonymous or unattributed ✗ No meaningful position variation expected ✗ You only care about topics, not author positions Basic Usage (TBIP) ================== .. code-block:: python from poisson_topicmodels import TBIP import numpy as np # Author IDs indicating who wrote each document author_ids = np.array([0, 1, 0, 2, 1, 0, ...]) # 3 authors model = TBIP( counts=counts, vocab=vocab, authors=author_ids, num_topics=10, batch_size=32, ) params = model.train_step(num_steps=200, lr=0.01) # Extract results ideal_points_df = model.return_ideal_points() # DataFrame: author, ideal_point, std print(ideal_points_df) Basic Usage (STBS) ================== .. code-block:: python from poisson_topicmodels import STBS import numpy as np import pandas as pd # Author per document (length = number of documents) authors_doc = np.array(["author_a", "author_b", "author_a", ...]) # Author-level covariates with one row per unique author. # Row order must match np.unique(authors_doc). unique_authors = np.unique(authors_doc) X_author = pd.DataFrame( { "party_r": [0, 1, ...], "tenure_years": [4, 12, ...], }, index=unique_authors, ) model = STBS( counts=counts, vocab=vocab, num_topics=10, authors=authors_doc, X_design_matrix=X_author, batch_size=32, ) model.train_step(num_steps=200, lr=0.01) # Topic-specific author ideal points stbs_ideal_points = model.return_ideal_points() # Columns: author, topic, ideal_point, std # Covariate effects on ideology by topic stbs_covariate_effects = model.return_ideal_covariates() # Columns: covariate, topic, iota, std # STBS-specific visualization helpers model.plot_author_topic_heatmap() model.plot_ideol_points() model.plot_iota_credible_intervals() Interpreting Ideal Points ========================= **1D Case** (single position axis): .. code-block:: python ideal_points_df = model.return_ideal_points() print(ideal_points_df) # author ideal_point std # 0 author_A -2.30 0.15 # 1 author_B 0.00 0.12 # 2 author_C 1.50 0.18 **Visualization** (built-in): .. code-block:: python # Publication-ready 1-D scatter with optional credible intervals fig, ax = model.plot_ideal_points(show_ci=True, ci=0.95) # Or manually: import matplotlib.pyplot as plt df = model.return_ideal_points() plt.scatter(df['ideal_point'], range(len(df))) for i, row in df.iterrows(): plt.annotate(row['author'], (row['ideal_point'], i)) plt.xlabel('Ideal Point (left ← → right)') plt.show() Topic-Word-Author Relationships (TBIP) ====================================== TBIP discovers how words vary across author positions: .. code-block:: python # Get word-topic associations beta = model.return_beta() # DataFrame # Top words globally top_words = model.return_top_words_per_topic(n=10) # Ideological words per topic — shows which words load most on # the ideological dimension ideo_words = model.return_ideological_words(topic=0, n=10) print(ideo_words) # Columns: word, eta, direction # direction: 'positive' or 'negative' end of the axis Practical Example: Political Speeches (TBIP) ============================================ .. code-block:: python # Analyze legislative speeches # Documents: individual speeches # Authors: legislators # Goal: estimate left-right position from language from poisson_topicmodels import TBIP # Load speech dataset speeches = load_speeches() # (num_speeches, num_documents) legislator_ids = speeches['legislator'].values # who said each speech counts = speech_dtm # document-term matrix model = TBIP( counts=counts, vocab=vocab, authors=legislator_ids, num_topics=20, batch_size=64, ) model.train_step(num_steps=200, lr=0.01) # Get positions ideal_points_df = model.return_ideal_points() model.summary() # Built-in visualization with credible intervals model.plot_ideal_points(show_ci=True) # Ideological words for the most political topic print(model.return_ideological_words(topic=0, n=15)) # Compare with known party affiliation parties = legislator_ids_to_parties(legislator_ids) import matplotlib.pyplot as plt for party_id, party in enumerate(['Democrat', 'Republican']): mask = parties == party_id plt.hist(ideal_points[mask], alpha=0.5, label=party) plt.xlabel('Ideal Point (left ← → right)') plt.legend() plt.show() # Expected: Democrats mostly negative, Republicans mostly positive Validating Ideal Points ======================= **Compare with known positions**: .. code-block:: python # If ground truth available true_positions = get_known_positions() estimated = model.return_ideal_points()['ideal_point'].values # Correlation should be high correlation = np.corrcoef(true_positions, estimated)[0, 1] print(f"Correlation: {correlation:.3f}") # Should be > 0.7 ideally # Spearman rank correlation (order matters) from scipy.stats import spearmanr rank_corr, p_value = spearmanr(true_positions, estimated) print(f"Rank correlation: {rank_corr:.3f}, p={p_value:.4f}") **Qualitative inspection**: .. code-block:: python # Read documents from extreme authors df = model.return_ideal_points() leftmost_author = df.iloc[0]['author'] # sorted by ideal_point rightmost_author = df.iloc[-1]['author'] print(f"Leftmost author (ID {leftmost_author}):") print(f"Top documents: {get_top_docs(leftmost_author, n=3)}") print("\nRightmost author (ID {rightmost_author}):") print(f"Top documents: {get_top_docs(rightmost_author, n=3)}") **Topic usage patterns**: .. code-block:: python # Which words distinguish the extremes the most? for topic_id in range(min(3, model.num_topics)): ideo = model.return_ideological_words(topic=topic_id, n=5) print(f"\nTopic {topic_id} ideological words:") print(ideo) Relationship to Other Models ============================= **TBIP/STBS vs. PF**: Adds author position estimation - PF: Discovers topics only - TBIP/STBS: Discover topics AND author positions **TBIP vs. STBS**: Different structure - TBIP: Single latent position per author - STBS: Topic-specific latent position per author with author-level covariates **Typical workflow**: 1. Start with PF or SPF to understand topics 2. Add TBIP for a compact ideological axis per author 3. Upgrade to STBS when you need topic-specific ideal points and covariate effects Implementation Details ====================== **Identification**: Ideal points can be flipped in sign (both left and right position work); only relative order is meaningful. **Centering**: Model centers ideal points at 0 by default (mean = 0). **Scaling**: Values are on arbitrary scale; interpret using relative differences. **Multiple dimensions**: Discovered dimensions may not have clear interpretations. This is normal—inspect word distributions to understand. **STBS covariate alignment**: STBS expects author-level covariates where row order matches ``np.unique(authors)`` from the document-level ``authors`` input. Troubleshooting =============== **Problem**: Ideal points don't seem meaningful *Solution*: - Check author IDs are correct - Ensure sufficient documents per author - Inspect topics and words - Try different num_topics or num_dimensions - Increase training iterations **Problem**: Positions don't match known affiliations *Solution*: - Known affiliations might not align with language patterns - Try different num_dimensions - Check if covariate (e.g., party) matches topic structure - Language use might reveal different dimensions than official positions **Problem**: Training is slow** *Solution*: - Reduce number of topics - Increase batch size - Reduce vocabulary (remove rare words) - Use GPU: ``export JAX_PLATFORMS=gpu`` Next Steps ========== - :doc:`embedded_models` - Exploring ETM with embeddings - :doc:`../tutorials/index` - Advanced techniques - :doc:`../api/index` - Complete TBIP/STBS API reference