allenact.algorithms.onpolicy_sync.losses.ppo#
Defining the PPO loss for actor critic type models.
PPO#
class PPO(AbstractActorCriticLoss)
Implementation of the Proximal Policy Optimization loss.
Attributes
clip_param: The clipping parameter to use.value_loss_coef: Weight of the value loss.entropy_coef: Weight of the entropy (encouraging) loss.use_clipped_value_loss: Whether or not to also clip the value loss.clip_decay: Callable for clip param decay factor (function of the current number of steps)entropy_method_name: Name of Distr's entropy method name. Default isentropy, but we might useconditional_entropyforSequentialDistrshow_ratios: If True, adds tracking for the PPO ratio (linear, clamped, and used) in each epoch to be logged by the engine.normalize_advantage: Whether or not to use normalized advantage. Default is True.
PPO.__init__#
| __init__(clip_param: float, value_loss_coef: float, entropy_coef: float, use_clipped_value_loss=True, clip_decay: Optional[Callable[[int], float]] = None, entropy_method_name: str = "entropy", normalize_advantage: bool = True, show_ratios: bool = False, *args, **kwargs)
Initializer.
See the class documentation for parameter definitions.
PPOValue#
class PPOValue(AbstractActorCriticLoss)
Implementation of the Proximal Policy Optimization loss.
Attributes
clip_param: The clipping parameter to use.use_clipped_value_loss: Whether or not to also clip the value loss.
PPOValue.__init__#
| __init__(clip_param: float, use_clipped_value_loss=True, clip_decay: Optional[Callable[[int], float]] = None, *args, **kwargs)
Initializer.
See the class documentation for parameter definitions.