cognitive/knowledge_base/mathematics/temperature_parameter.md
Daniel Ari Friedman 6caa1a7cb1 Update
2025-02-07 08:16:25 -08:00

10 KiB
Исходник Постоянная ссылка Ответственный История

type id created modified tags aliases
mathematical_concept temperature_parameter_001 2024-02-05 2024-02-05
active-inference
optimization
exploration
inverse-temperature
precision
exploration-control

Temperature Parameter

Overview

The temperature parameter (often denoted as \gamma or \beta=1/T) controls the balance between exploration and exploitation in decision making. In active inference, it acts as a precision parameter that modulates the relative weighting of different policies.

Links to:

Mathematical Formulation

The temperature appears in the softmax transformation for policy selection:

P(\pi) = \sigma(-\gamma G(\pi)) = \frac{\exp(-\gamma G(\pi))}{\sum_{\pi'} \exp(-\gamma G(\pi'))}

where:

Implementation

def compute_policy_probabilities(
    expected_free_energy: np.ndarray,
    temperature: float = 1.0,
    min_prob: float = 1e-10
) -> np.ndarray:
    """Compute policy probabilities using softmax with temperature.
    
    Args:
        expected_free_energy: EFE values for each policy
        temperature: Temperature parameter (inverse)
        min_prob: Minimum probability for numerical stability
        
    Returns:
        Policy probability distribution
    """
    # Scale EFE by temperature
    scaled_efe = -temperature * expected_free_energy
    
    # Compute softmax with numerical stability
    logits = scaled_efe - np.max(scaled_efe)
    probabilities = np.exp(logits)
    probabilities = np.maximum(probabilities, min_prob)
    probabilities /= np.sum(probabilities)
    
    return probabilities

Links to:

Properties

  1. Exploration Control

    • High temperature → More uniform (exploration)
    • Low temperature → More deterministic (exploitation)
    • Links to exploration_strategies
  2. Scale Dependence

  3. Dynamic Behavior

Applications

Active Inference

  1. Policy Selection

  2. Learning Control

Optimization Methods

  1. Simulated Annealing

  2. Stochastic Search

Analysis Methods

  1. Parameter Tuning

  2. Visualization

References

Parameter Relationships

graph TB
    T[Temperature Parameter γ] --> E[Exploration]
    T --> X[Exploitation]
    
    E --> |High γ| U[Uniform Distribution]
    E --> |High Entropy| D[Diverse Actions]
    
    X --> |Low γ| G[Greedy Selection]
    X --> |Low Entropy| O[Optimal Actions]
    
    U --> |Leads to| EX[Exploration Phase]
    D --> |Enables| IS[Information Seeking]
    
    G --> |Results in| EP[Exploitation Phase]
    O --> |Maximizes| R[Reward/Utility]
    
    classDef param fill:#f9f,stroke:#333,stroke-width:2px
    classDef effect fill:#bbf,stroke:#333,stroke-width:2px
    classDef outcome fill:#bfb,stroke:#333,stroke-width:2px
    
    class T param
    class U,G,D,O effect
    class EX,EP,IS,R outcome

Control Flow

graph LR
    EFE[Expected Free Energy] --> |Scale| T[Temperature γ]
    T --> |Transform| S[Softmax]
    S --> |Generate| P[Policy Distribution]
    P --> |Sample| A[Action]
    
    subgraph Temperature Control
        T --> |High| H[High Exploration]
        T --> |Low| L[Low Exploration]
        H --> |Decrease| L
    end
    
    classDef input fill:#f9f,stroke:#333,stroke-width:2px
    classDef process fill:#bbf,stroke:#333,stroke-width:2px
    classDef output fill:#bfb,stroke:#333,stroke-width:2px
    
    class EFE input
    class T,S process
    class P,A output

Annealing Schedule

stateDiagram-v2
    [*] --> InitialTemp
    
    state "Temperature Evolution" as TE {
        InitialTemp --> HighTemp: Start
        HighTemp --> MediumTemp: Early Phase
        MediumTemp --> LowTemp: Late Phase
        
        state HighTemp {
            [*] --> Exploration
            Exploration --> InformationGathering
        }
        
        state MediumTemp {
            [*] --> Balance
            Balance --> ExploitationBias
        }
        
        state LowTemp {
            [*] --> Exploitation
            Exploitation --> Convergence
        }
    }
    
    TE --> [*]: Converged

Optimization Landscape

graph TD
    subgraph Temperature Effects
        H[High Temperature] --> |Flattens| L1[Landscape]
        L[Low Temperature] --> |Sharpens| L2[Landscape]
    end
    
    subgraph Search Behavior
        L1 --> |Enables| W[Wide Exploration]
        L2 --> |Forces| N[Narrow Search]
        
        W --> |Finds| G1[Global Optima]
        N --> |Refines| G2[Local Optima]
    end
    
    subgraph Control Strategy
        A[Annealing] --> |Balances| B[Exploration/Exploitation]
        B --> |Achieves| C[Convergence]
    end
    
    classDef temp fill:#f9f,stroke:#333,stroke-width:2px
    classDef effect fill:#bbf,stroke:#333,stroke-width:2px
    classDef outcome fill:#bfb,stroke:#333,stroke-width:2px
    
    class H,L temp
    class L1,L2,W,N effect
    class G1,G2,C outcome

Convergence Analysis

graph TD
    subgraph Temperature Evolution
        T1[High Temperature] --> |Initial Phase| E1[Maximum Exploration]
        E1 --> |Gather Information| I1[High Information Gain]
        I1 --> |Reduce| U1[Initial Uncertainty]
        
        U1 --> |Guide| T2[Medium Temperature]
        T2 --> |Balance| E2[Mixed Strategy]
        E2 --> |Refine| I2[Targeted Information]
        
        I2 --> |Lead to| T3[Low Temperature]
        T3 --> |Focus| E3[Exploitation]
        E3 --> |Achieve| G[Goal State]
    end
    
    classDef temp fill:#f9f,stroke:#333,stroke-width:4px
    classDef state fill:#bbf,stroke:#333,stroke-width:2px
    classDef outcome fill:#bfb,stroke:#333,stroke-width:2px
    
    class T1,T2,T3 temp
    class E1,E2,E3,I1,I2 state
    class U1,G outcome

Value Integration

graph LR
    subgraph Temperature Control
        T[Temperature γ] --> |Scale| EFE[Expected Free Energy]
        T --> |Modulate| EP[Epistemic Value]
        T --> |Weight| PR[Pragmatic Value]
    end
    
    subgraph Decision Process
        EFE --> |Transform| S[Softmax]
        EP --> |Guide| EX[Exploration]
        PR --> |Drive| OP[Optimization]
        
        S --> |Generate| P[Policy]
        EX --> |Inform| P
        OP --> |Constrain| P
    end
    
    subgraph Outcome
        P --> |Execute| A[Action]
        A --> |Update| B[Beliefs]
        B --> |Feedback| T
    end
    
    classDef param fill:#f9f,stroke:#333,stroke-width:2px
    classDef process fill:#bbf,stroke:#333,stroke-width:2px
    classDef outcome fill:#bfb,stroke:#333,stroke-width:2px
    
    class T param
    class EFE,EP,PR,S,EX,OP process
    class P,A,B outcome

Enhanced Relationships

Core Dependencies

Control Mechanisms

Analysis Tools

Implementation Aspects

Theoretical Foundations