.. _ideal_points:

================================================================================
Ideal Points Models (TBIP)
================================================================================

**Text-Based Ideal Points (TBIP)** is a specialized model for estimating **latent positions**
(ideal points) of authors based on their language use. Commonly used in political science
and social media analysis.

What Are Ideal Points?
=======================

Ideal points are latent coordinates representing positions on abstract dimensions:

**Example**: Political polarization

- Left politicians use words: "equality", "justice", "workers", "government"
- Right politicians use words: "freedom", "liberty", "business", "market"
- Model estimates position on left-right spectrum from text

**Example**: Product stance

- Critics use words: "broken", "poor quality", "disappointed"
- Supporters use words: "amazing", "excellent", "recommend"
- Model estimates critic vs. supporter position

Model Intuition
===============

TBIP works by:

1. Discovering topics in text corpus
2. Analyzing word usage patterns within topics
3. Inferring author positions that explain language variation

Higher-dimensional spaces possible (not just 1D left-right):

- 2D: (left-right, authoritarian-libertarian)
- 3D+: Custom dimensions discovered from data

When to Use TBIP
================

Use TBIP when:

✓ You have author-attributed text (speeches, tweets, reviews)
✓ You assume polarization or position variation
✓ You want to estimate latent author positions
✓ You're interested in discourse analysis

Don't use if:

✗ Text is anonymous or unattributed
✗ No meaningful position variation expected
✗ You only care about topics, not author positions

Basic Usage
===========

.. code-block:: python

   from poisson_topicmodels import TBIP
   import numpy as np

   # Author IDs indicating who wrote each document
   author_ids = np.array([0, 1, 0, 2, 1, 0, ...])  # 3 authors

   model = TBIP(
       counts=counts,
       vocab=vocab,
       authors=author_ids,
       num_topics=10,
       batch_size=32,
   )

   params = model.train_step(num_steps=200, lr=0.01)

   # Extract results
   ideal_points_df = model.return_ideal_points()  # DataFrame: author, ideal_point, std
   print(ideal_points_df)

Interpreting Ideal Points
=========================

**1D Case** (single position axis):

.. code-block:: python

   ideal_points_df = model.return_ideal_points()
   print(ideal_points_df)
   #        author  ideal_point       std
   # 0    author_A        -2.30      0.15
   # 1    author_B         0.00      0.12
   # 2    author_C         1.50      0.18

**Visualization** (built-in):

.. code-block:: python

   # Publication-ready 1-D scatter with optional credible intervals
   fig, ax = model.plot_ideal_points(show_ci=True, ci=0.95)

   # Or manually:
   import matplotlib.pyplot as plt
   df = model.return_ideal_points()
   plt.scatter(df['ideal_point'], range(len(df)))
   for i, row in df.iterrows():
       plt.annotate(row['author'], (row['ideal_point'], i))
   plt.xlabel('Ideal Point (left ← → right)')
   plt.show()


Topic-Word-Author Relationships
===============================

TBIP discovers how words vary across author positions:

.. code-block:: python

   # Get word-topic associations
   beta = model.return_beta()  # DataFrame

   # Top words globally
   top_words = model.return_top_words_per_topic(n=10)

   # Ideological words per topic — shows which words load most on
   # the ideological dimension
   ideo_words = model.return_ideological_words(topic=0, n=10)
   print(ideo_words)
   # Columns: word, eta, direction
   # direction: 'positive' or 'negative' end of the axis

Practical Example: Political Speeches
=====================================

.. code-block:: python

   # Analyze legislative speeches
   # Documents: individual speeches
   # Authors: legislators
   # Goal: estimate left-right position from language

   from poisson_topicmodels import TBIP

   # Load speech dataset
   speeches = load_speeches()  # (num_speeches, num_documents)
   legislator_ids = speeches['legislator'].values  # who said each speech
   counts = speech_dtm  # document-term matrix

   model = TBIP(
       counts=counts,
       vocab=vocab,
       authors=legislator_ids,
       num_topics=20,
       batch_size=64,
   )

   model.train_step(num_steps=200, lr=0.01)

   # Get positions
   ideal_points_df = model.return_ideal_points()
   model.summary()

   # Built-in visualization with credible intervals
   model.plot_ideal_points(show_ci=True)

   # Ideological words for the most political topic
   print(model.return_ideological_words(topic=0, n=15))

   # Compare with known party affiliation
   parties = legislator_ids_to_parties(legislator_ids)

   import matplotlib.pyplot as plt
   for party_id, party in enumerate(['Democrat', 'Republican']):
       mask = parties == party_id
       plt.hist(ideal_points[mask], alpha=0.5, label=party)
   plt.xlabel('Ideal Point (left ← → right)')
   plt.legend()
   plt.show()
   # Expected: Democrats mostly negative, Republicans mostly positive

Validating Ideal Points
=======================

**Compare with known positions**:

.. code-block:: python

   # If ground truth available
   true_positions = get_known_positions()
   estimated = model.return_ideal_points()['ideal_point'].values

   # Correlation should be high
   correlation = np.corrcoef(true_positions, estimated)[0, 1]
   print(f"Correlation: {correlation:.3f}")  # Should be > 0.7 ideally

   # Spearman rank correlation (order matters)
   from scipy.stats import spearmanr
   rank_corr, p_value = spearmanr(true_positions, estimated)
   print(f"Rank correlation: {rank_corr:.3f}, p={p_value:.4f}")

**Qualitative inspection**:

.. code-block:: python

   # Read documents from extreme authors
   df = model.return_ideal_points()
   leftmost_author = df.iloc[0]['author']   # sorted by ideal_point
   rightmost_author = df.iloc[-1]['author']

   print(f"Leftmost author (ID {leftmost_author}):")
   print(f"Top documents: {get_top_docs(leftmost_author, n=3)}")
   print("\nRightmost author (ID {rightmost_author}):")
   print(f"Top documents: {get_top_docs(rightmost_author, n=3)}")

**Topic usage patterns**:

.. code-block:: python

   # Which words distinguish the extremes the most?
   for topic_id in range(min(3, model.num_topics)):
       ideo = model.return_ideological_words(topic=topic_id, n=5)
       print(f"\nTopic {topic_id} ideological words:")
       print(ideo)

Relationship to Other Models
=============================

**TBIP vs. PF**: Adds author position estimation

- PF: Discovers topics only
- TBIP: Discovers topics AND author positions

**TBIP vs. CPF**: Different covariate handling

- CPF: Document-level continuous covariates
- TBIP: Author-level latent positions

**Typical workflow**:

1. Start with PF or SPF to understand topics
2. If interested in author positions, add TBIP
3. Optional: compare with CPF using author dummies as covariates

Implementation Details
======================

**Identification**: Ideal points can be flipped in sign (both left and right position
work); only relative order is meaningful.

**Centering**: Model centers ideal points at 0 by default (mean = 0).

**Scaling**: Values are on arbitrary scale; interpret using relative differences.

**Multiple dimensions**: Discovered dimensions may not have clear interpretations.
This is normal—inspect word distributions to understand.

Troubleshooting
===============

**Problem**: Ideal points don't seem meaningful

*Solution*:
- Check author IDs are correct
- Ensure sufficient documents per author
- Inspect topics and words
- Try different num_topics or num_dimensions
- Increase training iterations

**Problem**: Positions don't match known affiliations

*Solution*:
- Known affiliations might not align with language patterns
- Try different num_dimensions
- Check if covariate (e.g., party) matches topic structure
- Language use might reveal different dimensions than official positions

**Problem**: Training is slow**

*Solution*:
- Reduce number of topics
- Increase batch size
- Reduce vocabulary (remove rare words)
- Use GPU: ``export JAX_PLATFORMS=gpu``

Next Steps
==========

- :doc:`embedded_models` - Exploring ETM with embeddings
- :doc:`../tutorials/index` - Advanced techniques
- :doc:`../api/index` - Complete TBIP API reference