TRPO "underflow encountered in multiply"

While running a TRPO train, after some time (random - anywhere from 15sec to 1min) it kicks with the following:
`Traceback (most recent call last):
  File "callback.py", line 196, in <module>
    model.learn(total_timesteps=time_steps, callback=callback, tb_log_name=tb_sub_dir) 
  File "/root/stable-baselines/stable_baselines/trpo_mpi/trpo_mpi.py", line 427, in learn
    self.vfadam.update(grad, self.vf_stepsize)
  File "/root/stable-baselines/stable_baselines/common/mpi_adam.py", line 61, in update
    step = (- step_size) * self.exp_avg / (np.sqrt(self.exp_avg_sq) + self.epsilon)
FloatingPointError: underflow encountered in multiply`

Using the recent version, 2.9.0, Python 3.7.5.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

TRPO "underflow encountered in multiply" #59

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

TRPO "underflow encountered in multiply" #59

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions