`allenact.algorithms.onpolicy_sync.losses.ppo`#

Defining the PPO loss for actor critic type models.

`PPO`#

class PPO(AbstractActorCriticLoss)

[view_source]

Implementation of the Proximal Policy Optimization loss.

Attributes

clip_param: The clipping parameter to use.
value_loss_coef: Weight of the value loss.
entropy_coef: Weight of the entropy (encouraging) loss.
use_clipped_value_loss: Whether or not to also clip the value loss.
clip_decay: Callable for clip param decay factor (function of the current number of steps)
entropy_method_name: Name of Distr's entropy method name. Default is entropy, but we might use conditional_entropy for SequentialDistr
show_ratios: If True, adds tracking for the PPO ratio (linear, clamped, and used) in each epoch to be logged by the engine.
normalize_advantage: Whether or not to use normalized advantage. Default is True.

`PPO.init`#

 | __init__(clip_param: float, value_loss_coef: float, entropy_coef: float, use_clipped_value_loss=True, clip_decay: Optional[Callable[[int], float]] = None, entropy_method_name: str = "entropy", normalize_advantage: bool = True, show_ratios: bool = False, *args, **kwargs)

[view_source]

Initializer.

See the class documentation for parameter definitions.

`PPOValue`#

class PPOValue(AbstractActorCriticLoss)

[view_source]

Implementation of the Proximal Policy Optimization loss.

Attributes

clip_param: The clipping parameter to use.
use_clipped_value_loss: Whether or not to also clip the value loss.

`PPOValue.init`#

 | __init__(clip_param: float, use_clipped_value_loss=True, clip_decay: Optional[Callable[[int], float]] = None, *args, **kwargs)

[view_source]

Initializer.

See the class documentation for parameter definitions.

allenact.algorithms.onpolicy_sync.losses.ppo#

PPO#

PPO.__init__#

PPOValue#

PPOValue.__init__#

`allenact.algorithms.onpolicy_sync.losses.ppo`#

`PPO`#

`PPO.init`#

`PPOValue`#

`PPOValue.init`#