Illustration of PPO agent applied to a farming environment from farm-gym

### Imports:

[1]:
# classical libraries
import numpy as np
import pandas as pd
import seaborn as sns

# farm-gym pre-made environments
import farmgym_games

# RL library
from rlberry.agents.torch import PPOAgent
from rlberry.manager import AgentManager, evaluate_agents, plot_writer_data
from rlberry.agents.torch.utils.training import model_factory_from_env
from rlberry.envs import gym_make

Settings :

We’ll use the Farm1 environment

[2]:
env_ctor, env_kwargs = gym_make, {"id": "OldV21Farm1-v0"} # rlberry is gym v0.21 compatible. Use "id"="Farm1-v0" for gym v0.26 compatibility

We use an architecture of \(256\times 256\) for both the value and policy neural network of ppo.

[3]:

policy_configs = { "type": "MultiLayerPerceptron", # A network architecture "layer_sizes": (256, 256), # Network dimensions "reshape": False, "is_policy": True, } value_configs = { "type": "MultiLayerPerceptron", "layer_sizes": (256, 256), "reshape": False, "out_size": 1, }

### Agent code: We use rlberry’s PPOAgent. Remark that 365 days is the maximum lenght of an episode. This helps us to fix some of the parameters.

[4]:
manager = AgentManager(
        PPOAgent,
        (env_ctor, env_kwargs),
        agent_name="PPOAgent",
        init_kwargs=dict(
            policy_net_fn=model_factory_from_env,
            policy_net_kwargs=policy_configs,
            value_net_fn=model_factory_from_env,
            value_net_kwargs=value_configs,
            learning_rate=9e-5,
            n_steps=5 * 365,
            batch_size=365,
            eps_clip=0.2,
        ),
        fit_budget=5e5,
        eval_kwargs=dict(eval_horizon=365),
        n_fit=1,
        seed = 42, # Important: as farm-gym is very stochastic, for some choice of seed PPO does not train and the final reward is 0 !
        output_dir="ppo1_results", # results/trained agents are kept in this directory
    )
manager.fit()
[INFO] 16:46: Running AgentManager fit() for PPOAgent with n_fit = 1 and max_workers = None.
/home/frost/.local/lib/python3.10/site-packages/gym/spaces/box.py:127: UserWarning: WARN: Box bound precision lowered by casting to float32
  logger.warn(f"Box bound precision lowered by casting to {self.dtype}")
/home/frost/.local/lib/python3.10/site-packages/gym/spaces/box.py:127: UserWarning: WARN: Box bound precision lowered by casting to float32
  logger.warn(f"Box bound precision lowered by casting to {self.dtype}")
/home/frost/.local/lib/python3.10/site-packages/gym/utils/passive_env_checker.py:174: UserWarning: WARN: Future gym versions will require that `Env.reset` can be passed a `seed` instead of using `Env.seed` for resetting the environment random number generator.
  logger.warn(
/home/frost/.local/lib/python3.10/site-packages/gym/utils/passive_env_checker.py:187: UserWarning: WARN: Future gym versions will require that `Env.reset` can be passed `options` to allow the environment initialisation to be passed additional information.
  logger.warn(
/home/frost/.local/lib/python3.10/site-packages/gym/utils/passive_env_checker.py:195: UserWarning: WARN: The result returned by `env.reset()` was not a tuple of the form `(obs, info)`, where `obs` is a observation and `info` is a dictionary containing additional information. Actual type: `<class 'numpy.ndarray'>`
  logger.warn(
/home/frost/.local/lib/python3.10/site-packages/gym/utils/passive_env_checker.py:219: DeprecationWarning: WARN: Core environment is written in old step API which returns one bool instead of two. It is recommended to rewrite the environment with new step API. 
  logger.deprecation(
/home/frost/.local/lib/python3.10/site-packages/gym/utils/passive_env_checker.py:141: UserWarning: WARN: The obs returned by the `step()` method was expecting numpy array dtype to be float32, actual type: float64
  logger.warn(
/home/frost/.local/lib/python3.10/site-packages/gym/utils/passive_env_checker.py:165: UserWarning: WARN: The obs returned by the `step()` method is not within the observation space.
  logger.warn(f"{pre} is not within the observation space.")
[INFO] 16:46: [PPOAgent[worker: 0]] | max_global_step = 2209 | episode_rewards = 134.52428322598146 | total_episodes = 22 | fit/surrogate_loss = -17.20841407775879 | fit/entropy_loss = 1.3652242422103882 |
[INFO] 16:46: [PPOAgent[worker: 0]] | max_global_step = 4267 | episode_rewards = 180.22205261986448 | total_episodes = 45 | fit/surrogate_loss = 3.766744375228882 | fit/entropy_loss = 1.2915725708007812 |
[INFO] 16:46: [PPOAgent[worker: 0]] | max_global_step = 6460 | episode_rewards = -174.0 | total_episodes = 68 | fit/surrogate_loss = 1.7429035902023315 | fit/entropy_loss = 1.4351849555969238 |
[INFO] 16:46: [PPOAgent[worker: 0]] | max_global_step = 8671 | episode_rewards = 0.0 | total_episodes = 85 | fit/surrogate_loss = -4.726207256317139 | fit/entropy_loss = 1.5167763233184814 |
[INFO] 16:46: [PPOAgent[worker: 0]] | max_global_step = 10697 | episode_rewards = -62.0 | total_episodes = 101 | fit/surrogate_loss = 7.040808200836182 | fit/entropy_loss = 0.9549528956413269 |
[INFO] 16:46: [PPOAgent[worker: 0]] | max_global_step = 12417 | episode_rewards = -16.0 | total_episodes = 118 | fit/surrogate_loss = 2.217423915863037 | fit/entropy_loss = 1.255665898323059 |
[INFO] 16:46: [PPOAgent[worker: 0]] | max_global_step = 14796 | episode_rewards = 333.7999433709374 | total_episodes = 137 | fit/surrogate_loss = -4.331498622894287 | fit/entropy_loss = 1.1084470748901367 |
[INFO] 16:46: [PPOAgent[worker: 0]] | max_global_step = 16895 | episode_rewards = -122.0 | total_episodes = 154 | fit/surrogate_loss = -1.8878077268600464 | fit/entropy_loss = 0.8683847784996033 |
[INFO] 16:46: [PPOAgent[worker: 0]] | max_global_step = 18919 | episode_rewards = 0.0 | total_episodes = 167 | fit/surrogate_loss = -6.60181188583374 | fit/entropy_loss = 0.9932242035865784 |
[INFO] 16:46: [PPOAgent[worker: 0]] | max_global_step = 20920 | episode_rewards = 0.0 | total_episodes = 185 | fit/surrogate_loss = -1.0884056091308594 | fit/entropy_loss = 0.9581738710403442 |
[INFO] 16:46: [PPOAgent[worker: 0]] | max_global_step = 22962 | episode_rewards = -160.0 | total_episodes = 201 | fit/surrogate_loss = -6.423931121826172 | fit/entropy_loss = 0.916784405708313 |
[INFO] 16:46: [PPOAgent[worker: 0]] | max_global_step = 25143 | episode_rewards = 0.0 | total_episodes = 214 | fit/surrogate_loss = 2.195204019546509 | fit/entropy_loss = 0.8561437129974365 |
[INFO] 16:46: [PPOAgent[worker: 0]] | max_global_step = 27160 | episode_rewards = -32.0 | total_episodes = 228 | fit/surrogate_loss = -3.6802823543548584 | fit/entropy_loss = 0.576086699962616 |
[INFO] 16:46: [PPOAgent[worker: 0]] | max_global_step = 28953 | episode_rewards = 0.0 | total_episodes = 244 | fit/surrogate_loss = -2.6257214546203613 | fit/entropy_loss = 0.5261974334716797 |
[INFO] 16:46: [PPOAgent[worker: 0]] | max_global_step = 31158 | episode_rewards = -36.0 | total_episodes = 263 | fit/surrogate_loss = 0.9804189205169678 | fit/entropy_loss = 0.8518242239952087 |
[INFO] 16:47: [PPOAgent[worker: 0]] | max_global_step = 33170 | episode_rewards = 273.2909831213743 | total_episodes = 280 | fit/surrogate_loss = 0.08106783777475357 | fit/entropy_loss = 0.7627907395362854 |
[INFO] 16:47: [PPOAgent[worker: 0]] | max_global_step = 35126 | episode_rewards = -14.0 | total_episodes = 301 | fit/surrogate_loss = 4.697272777557373 | fit/entropy_loss = 1.3077625036239624 |
[INFO] 16:47: [PPOAgent[worker: 0]] | max_global_step = 37032 | episode_rewards = 0.0 | total_episodes = 316 | fit/surrogate_loss = -3.589635133743286 | fit/entropy_loss = 0.7756088376045227 |
[INFO] 16:47: [PPOAgent[worker: 0]] | max_global_step = 39019 | episode_rewards = 123.06599629645945 | total_episodes = 337 | fit/surrogate_loss = 1.0958138704299927 | fit/entropy_loss = 0.9637163281440735 |
[INFO] 16:47: [PPOAgent[worker: 0]] | max_global_step = 41039 | episode_rewards = 186.96929667917848 | total_episodes = 359 | fit/surrogate_loss = 2.217855930328369 | fit/entropy_loss = 1.3185549974441528 |
[INFO] 16:47: [PPOAgent[worker: 0]] | max_global_step = 42864 | episode_rewards = 0.0 | total_episodes = 376 | fit/surrogate_loss = 3.6343650817871094 | fit/entropy_loss = 0.9253543615341187 |
[INFO] 16:47: [PPOAgent[worker: 0]] | max_global_step = 44901 | episode_rewards = 0.0 | total_episodes = 395 | fit/surrogate_loss = 0.038416508585214615 | fit/entropy_loss = 0.9682783484458923 |
[INFO] 16:47: [PPOAgent[worker: 0]] | max_global_step = 46947 | episode_rewards = 0.0 | total_episodes = 412 | fit/surrogate_loss = 0.11422909051179886 | fit/entropy_loss = 0.6158499121665955 |
[INFO] 16:47: [PPOAgent[worker: 0]] | max_global_step = 48993 | episode_rewards = 0.0 | total_episodes = 431 | fit/surrogate_loss = -3.7807295322418213 | fit/entropy_loss = 0.724868893623352 |
[INFO] 16:47: [PPOAgent[worker: 0]] | max_global_step = 50966 | episode_rewards = 0.0 | total_episodes = 450 | fit/surrogate_loss = -0.5481564998626709 | fit/entropy_loss = 0.5342087149620056 |
[INFO] 16:47: [PPOAgent[worker: 0]] | max_global_step = 52844 | episode_rewards = 0.0 | total_episodes = 467 | fit/surrogate_loss = 4.418668270111084 | fit/entropy_loss = 0.8343883752822876 |
[INFO] 16:47: [PPOAgent[worker: 0]] | max_global_step = 54957 | episode_rewards = 0.0 | total_episodes = 486 | fit/surrogate_loss = -2.375662088394165 | fit/entropy_loss = 0.8427497148513794 |
[INFO] 16:47: [PPOAgent[worker: 0]] | max_global_step = 57007 | episode_rewards = 16.0 | total_episodes = 509 | fit/surrogate_loss = 3.9137234687805176 | fit/entropy_loss = 0.9979891180992126 |
[INFO] 16:47: [PPOAgent[worker: 0]] | max_global_step = 59076 | episode_rewards = 0.0 | total_episodes = 530 | fit/surrogate_loss = -0.2222403734922409 | fit/entropy_loss = 1.0436657667160034 |
[INFO] 16:47: [PPOAgent[worker: 0]] | max_global_step = 61089 | episode_rewards = 0.0 | total_episodes = 552 | fit/surrogate_loss = -3.2849655151367188 | fit/entropy_loss = 1.0299031734466553 |
[INFO] 16:47: [PPOAgent[worker: 0]] | max_global_step = 63029 | episode_rewards = 0.0 | total_episodes = 575 | fit/surrogate_loss = 3.238835334777832 | fit/entropy_loss = 1.221884846687317 |
[INFO] 16:47: [PPOAgent[worker: 0]] | max_global_step = 64886 | episode_rewards = 0.0 | total_episodes = 597 | fit/surrogate_loss = -0.7216285467147827 | fit/entropy_loss = 1.2553671598434448 |
[INFO] 16:47: [PPOAgent[worker: 0]] | max_global_step = 66934 | episode_rewards = 112.86767683173309 | total_episodes = 619 | fit/surrogate_loss = 2.7049076557159424 | fit/entropy_loss = 1.272657871246338 |
[INFO] 16:47: [PPOAgent[worker: 0]] | max_global_step = 68898 | episode_rewards = 18.0 | total_episodes = 641 | fit/surrogate_loss = 4.479696750640869 | fit/entropy_loss = 1.2921851873397827 |
[INFO] 16:48: [PPOAgent[worker: 0]] | max_global_step = 70786 | episode_rewards = 139.0729117340091 | total_episodes = 661 | fit/surrogate_loss = -0.4576759934425354 | fit/entropy_loss = 1.2917993068695068 |
[INFO] 16:48: [PPOAgent[worker: 0]] | max_global_step = 72621 | episode_rewards = -4.0 | total_episodes = 682 | fit/surrogate_loss = -0.2897544205188751 | fit/entropy_loss = 1.2893353700637817 |
[INFO] 16:48: [PPOAgent[worker: 0]] | max_global_step = 74505 | episode_rewards = 0.0 | total_episodes = 704 | fit/surrogate_loss = 3.921590805053711 | fit/entropy_loss = 1.2924174070358276 |
[INFO] 16:48: [PPOAgent[worker: 0]] | max_global_step = 76314 | episode_rewards = 0.0 | total_episodes = 724 | fit/surrogate_loss = 0.5616796016693115 | fit/entropy_loss = 1.3364989757537842 |
[INFO] 16:48: [PPOAgent[worker: 0]] | max_global_step = 78255 | episode_rewards = 0.0 | total_episodes = 746 | fit/surrogate_loss = 1.1299254894256592 | fit/entropy_loss = 1.2764776945114136 |
[INFO] 16:48: [PPOAgent[worker: 0]] | max_global_step = 80013 | episode_rewards = 266.94249821150834 | total_episodes = 763 | fit/surrogate_loss = 4.023199558258057 | fit/entropy_loss = 1.2256656885147095 |
[INFO] 16:48: [PPOAgent[worker: 0]] | max_global_step = 81715 | episode_rewards = 191.68010878323798 | total_episodes = 780 | fit/surrogate_loss = 8.112669944763184 | fit/entropy_loss = 1.2419995069503784 |
[INFO] 16:48: [PPOAgent[worker: 0]] | max_global_step = 83590 | episode_rewards = 276.3426506341879 | total_episodes = 800 | fit/surrogate_loss = -3.397995948791504 | fit/entropy_loss = 1.180037498474121 |
[INFO] 16:48: [PPOAgent[worker: 0]] | max_global_step = 85632 | episode_rewards = 201.806702212911 | total_episodes = 822 | fit/surrogate_loss = 11.917405128479004 | fit/entropy_loss = 1.1466782093048096 |
[INFO] 16:48: [PPOAgent[worker: 0]] | max_global_step = 87484 | episode_rewards = 0.0 | total_episodes = 842 | fit/surrogate_loss = -4.252918243408203 | fit/entropy_loss = 1.2645856142044067 |
[INFO] 16:48: [PPOAgent[worker: 0]] | max_global_step = 89279 | episode_rewards = 190.76893492535956 | total_episodes = 862 | fit/surrogate_loss = -0.5222299695014954 | fit/entropy_loss = 1.1454248428344727 |
[INFO] 16:48: [PPOAgent[worker: 0]] | max_global_step = 91222 | episode_rewards = 259.8129481494122 | total_episodes = 882 | fit/surrogate_loss = -3.2302627563476562 | fit/entropy_loss = 1.1970560550689697 |
[INFO] 16:48: [PPOAgent[worker: 0]] | max_global_step = 93047 | episode_rewards = 275.51577814466606 | total_episodes = 902 | fit/surrogate_loss = 8.772000312805176 | fit/entropy_loss = 1.1550569534301758 |
[INFO] 16:48: [PPOAgent[worker: 0]] | max_global_step = 95024 | episode_rewards = 275.19492424250683 | total_episodes = 922 | fit/surrogate_loss = -2.4215118885040283 | fit/entropy_loss = 1.0462547540664673 |
[INFO] 16:48: [PPOAgent[worker: 0]] | max_global_step = 96894 | episode_rewards = 0.0 | total_episodes = 942 | fit/surrogate_loss = 0.17885783314704895 | fit/entropy_loss = 1.0427830219268799 |
[INFO] 16:48: [PPOAgent[worker: 0]] | max_global_step = 98867 | episode_rewards = 256.3459135016439 | total_episodes = 963 | fit/surrogate_loss = -5.808422565460205 | fit/entropy_loss = 0.9922869801521301 |
[INFO] 16:48: [PPOAgent[worker: 0]] | max_global_step = 100767 | episode_rewards = 261.8378148045977 | total_episodes = 982 | fit/surrogate_loss = -0.07664193212985992 | fit/entropy_loss = 0.9912879467010498 |
[INFO] 16:48: [PPOAgent[worker: 0]] | max_global_step = 102705 | episode_rewards = 250.53306513006976 | total_episodes = 1004 | fit/surrogate_loss = -5.870321273803711 | fit/entropy_loss = 1.03829824924469 |
[INFO] 16:48: [PPOAgent[worker: 0]] | max_global_step = 104646 | episode_rewards = 31.12396309603301 | total_episodes = 1023 | fit/surrogate_loss = 2.2727718353271484 | fit/entropy_loss = 0.9626011848449707 |
[INFO] 16:48: [PPOAgent[worker: 0]] | max_global_step = 106558 | episode_rewards = 0.0 | total_episodes = 1043 | fit/surrogate_loss = 2.6146774291992188 | fit/entropy_loss = 0.8991783857345581 |
[INFO] 16:49: [PPOAgent[worker: 0]] | max_global_step = 108486 | episode_rewards = 197.42062229739346 | total_episodes = 1063 | fit/surrogate_loss = 5.396429538726807 | fit/entropy_loss = 0.8701432943344116 |
[INFO] 16:49: [PPOAgent[worker: 0]] | max_global_step = 110446 | episode_rewards = 293.10522661518553 | total_episodes = 1083 | fit/surrogate_loss = -4.111895561218262 | fit/entropy_loss = 0.925383448600769 |
[INFO] 16:49: [PPOAgent[worker: 0]] | max_global_step = 112293 | episode_rewards = 0.0 | total_episodes = 1103 | fit/surrogate_loss = 2.3448612689971924 | fit/entropy_loss = 0.9377964735031128 |
[INFO] 16:49: [PPOAgent[worker: 0]] | max_global_step = 114293 | episode_rewards = 270.8051013314283 | total_episodes = 1123 | fit/surrogate_loss = -9.176980972290039 | fit/entropy_loss = 0.9452497363090515 |
[INFO] 16:49: [PPOAgent[worker: 0]] | max_global_step = 116258 | episode_rewards = 283.65709798467174 | total_episodes = 1142 | fit/surrogate_loss = 11.655402183532715 | fit/entropy_loss = 0.8301343321800232 |
[INFO] 16:49: [PPOAgent[worker: 0]] | max_global_step = 118209 | episode_rewards = 0.0 | total_episodes = 1163 | fit/surrogate_loss = -5.4693922996521 | fit/entropy_loss = 0.913402795791626 |
[INFO] 16:49: [PPOAgent[worker: 0]] | max_global_step = 120071 | episode_rewards = 302.01518974606034 | total_episodes = 1183 | fit/surrogate_loss = 7.949192047119141 | fit/entropy_loss = 0.9001861810684204 |
[INFO] 16:49: [PPOAgent[worker: 0]] | max_global_step = 122083 | episode_rewards = 271.50668054340275 | total_episodes = 1204 | fit/surrogate_loss = -0.3852759897708893 | fit/entropy_loss = 0.8573261499404907 |
[INFO] 16:49: [PPOAgent[worker: 0]] | max_global_step = 124048 | episode_rewards = 260.3091520178151 | total_episodes = 1223 | fit/surrogate_loss = -10.70649528503418 | fit/entropy_loss = 0.9148780107498169 |
[INFO] 16:49: [PPOAgent[worker: 0]] | max_global_step = 125843 | episode_rewards = 274.7459547222016 | total_episodes = 1245 | fit/surrogate_loss = -2.958625555038452 | fit/entropy_loss = 0.8772678971290588 |
[INFO] 16:49: [PPOAgent[worker: 0]] | max_global_step = 127866 | episode_rewards = 245.85348013466609 | total_episodes = 1267 | fit/surrogate_loss = -3.8383290767669678 | fit/entropy_loss = 1.009056806564331 |
[INFO] 16:49: [PPOAgent[worker: 0]] | max_global_step = 129823 | episode_rewards = 240.46847748136793 | total_episodes = 1287 | fit/surrogate_loss = -5.578691482543945 | fit/entropy_loss = 1.0033317804336548 |
[INFO] 16:49: [PPOAgent[worker: 0]] | max_global_step = 131699 | episode_rewards = 283.56211757181813 | total_episodes = 1306 | fit/surrogate_loss = 10.772649765014648 | fit/entropy_loss = 1.0038061141967773 |
[INFO] 16:49: [PPOAgent[worker: 0]] | max_global_step = 133625 | episode_rewards = 0.0 | total_episodes = 1326 | fit/surrogate_loss = -2.2552425861358643 | fit/entropy_loss = 1.0329627990722656 |
[INFO] 16:49: [PPOAgent[worker: 0]] | max_global_step = 135498 | episode_rewards = 238.22589699089986 | total_episodes = 1349 | fit/surrogate_loss = 1.1353161334991455 | fit/entropy_loss = 1.0362646579742432 |
[INFO] 16:49: [PPOAgent[worker: 0]] | max_global_step = 137453 | episode_rewards = 273.6662891054696 | total_episodes = 1370 | fit/surrogate_loss = -7.482893943786621 | fit/entropy_loss = 1.1330825090408325 |
[INFO] 16:49: [PPOAgent[worker: 0]] | max_global_step = 139374 | episode_rewards = -10.0 | total_episodes = 1389 | fit/surrogate_loss = 4.843024253845215 | fit/entropy_loss = 1.079302430152893 |
[INFO] 16:49: [PPOAgent[worker: 0]] | max_global_step = 141359 | episode_rewards = 195.0093446034527 | total_episodes = 1409 | fit/surrogate_loss = -0.3425341248512268 | fit/entropy_loss = 1.1312304735183716 |
[INFO] 16:49: [PPOAgent[worker: 0]] | max_global_step = 143229 | episode_rewards = 0.0 | total_episodes = 1428 | fit/surrogate_loss = -3.96565318107605 | fit/entropy_loss = 1.0328818559646606 |
[INFO] 16:50: [PPOAgent[worker: 0]] | max_global_step = 145122 | episode_rewards = 109.29218472669729 | total_episodes = 1448 | fit/surrogate_loss = 9.060868263244629 | fit/entropy_loss = 1.0830814838409424 |
[INFO] 16:50: [PPOAgent[worker: 0]] | max_global_step = 147111 | episode_rewards = -22.0 | total_episodes = 1471 | fit/surrogate_loss = -2.5419631004333496 | fit/entropy_loss = 1.1042369604110718 |
[INFO] 16:50: [PPOAgent[worker: 0]] | max_global_step = 149006 | episode_rewards = 279.4907295573699 | total_episodes = 1491 | fit/surrogate_loss = 7.033303260803223 | fit/entropy_loss = 0.9747352004051208 |
[INFO] 16:50: [PPOAgent[worker: 0]] | max_global_step = 150915 | episode_rewards = 267.20894531089453 | total_episodes = 1510 | fit/surrogate_loss = 3.960771322250366 | fit/entropy_loss = 0.9919326305389404 |
[INFO] 16:50: [PPOAgent[worker: 0]] | max_global_step = 152908 | episode_rewards = 256.2250103305771 | total_episodes = 1532 | fit/surrogate_loss = -0.007613801397383213 | fit/entropy_loss = 1.0438289642333984 |
[INFO] 16:50: [PPOAgent[worker: 0]] | max_global_step = 154833 | episode_rewards = 245.52128312357308 | total_episodes = 1551 | fit/surrogate_loss = -10.362034797668457 | fit/entropy_loss = 1.1016902923583984 |
[INFO] 16:50: [PPOAgent[worker: 0]] | max_global_step = 156778 | episode_rewards = 278.79854712757714 | total_episodes = 1572 | fit/surrogate_loss = 6.5567522048950195 | fit/entropy_loss = 1.0572316646575928 |
[INFO] 16:50: [PPOAgent[worker: 0]] | max_global_step = 158651 | episode_rewards = 215.68329328843254 | total_episodes = 1591 | fit/surrogate_loss = 2.0005228519439697 | fit/entropy_loss = 1.053226351737976 |
[INFO] 16:50: [PPOAgent[worker: 0]] | max_global_step = 160549 | episode_rewards = 0.0 | total_episodes = 1612 | fit/surrogate_loss = 4.874629974365234 | fit/entropy_loss = 1.0698308944702148 |
[INFO] 16:50: [PPOAgent[worker: 0]] | max_global_step = 162579 | episode_rewards = 273.16079349912553 | total_episodes = 1632 | fit/surrogate_loss = -4.574504375457764 | fit/entropy_loss = 1.1021976470947266 |
[INFO] 16:50: [PPOAgent[worker: 0]] | max_global_step = 164542 | episode_rewards = 312.9760725954338 | total_episodes = 1654 | fit/surrogate_loss = -9.858379364013672 | fit/entropy_loss = 1.090998888015747 |
[INFO] 16:50: [PPOAgent[worker: 0]] | max_global_step = 166422 | episode_rewards = 267.55608565988587 | total_episodes = 1673 | fit/surrogate_loss = 7.012547016143799 | fit/entropy_loss = 1.1158734560012817 |
[INFO] 16:50: [PPOAgent[worker: 0]] | max_global_step = 168419 | episode_rewards = 274.3685780078847 | total_episodes = 1693 | fit/surrogate_loss = -2.038301944732666 | fit/entropy_loss = 1.1137681007385254 |
[INFO] 16:50: [PPOAgent[worker: 0]] | max_global_step = 170362 | episode_rewards = 228.2306574275855 | total_episodes = 1713 | fit/surrogate_loss = 1.7389625310897827 | fit/entropy_loss = 1.0956854820251465 |
[INFO] 16:50: [PPOAgent[worker: 0]] | max_global_step = 172316 | episode_rewards = 241.30185727204105 | total_episodes = 1732 | fit/surrogate_loss = 2.5269970893859863 | fit/entropy_loss = 1.1481139659881592 |
[INFO] 16:50: [PPOAgent[worker: 0]] | max_global_step = 174196 | episode_rewards = -12.0 | total_episodes = 1750 | fit/surrogate_loss = 1.630097508430481 | fit/entropy_loss = 1.1351419687271118 |
[INFO] 16:50: [PPOAgent[worker: 0]] | max_global_step = 176069 | episode_rewards = 189.3297826228422 | total_episodes = 1771 | fit/surrogate_loss = -3.8621702194213867 | fit/entropy_loss = 1.1028456687927246 |
[INFO] 16:50: [PPOAgent[worker: 0]] | max_global_step = 178013 | episode_rewards = 268.1629180735788 | total_episodes = 1789 | fit/surrogate_loss = 5.526572227478027 | fit/entropy_loss = 1.1215077638626099 |
[INFO] 16:50: [PPOAgent[worker: 0]] | max_global_step = 179927 | episode_rewards = 0.0 | total_episodes = 1808 | fit/surrogate_loss = 1.0089154243469238 | fit/entropy_loss = 1.1004434823989868 |
[INFO] 16:50: [PPOAgent[worker: 0]] | max_global_step = 181719 | episode_rewards = 239.03986857129473 | total_episodes = 1826 | fit/surrogate_loss = 1.3062998056411743 | fit/entropy_loss = 1.0463488101959229 |
[INFO] 16:51: [PPOAgent[worker: 0]] | max_global_step = 183621 | episode_rewards = 265.65874300593913 | total_episodes = 1845 | fit/surrogate_loss = -5.852258682250977 | fit/entropy_loss = 1.133949637413025 |
[INFO] 16:51: [PPOAgent[worker: 0]] | max_global_step = 185565 | episode_rewards = 67.80445637614709 | total_episodes = 1865 | fit/surrogate_loss = 9.158913612365723 | fit/entropy_loss = 1.1922465562820435 |
[INFO] 16:51: [PPOAgent[worker: 0]] | max_global_step = 187451 | episode_rewards = 282.8416307910926 | total_episodes = 1883 | fit/surrogate_loss = 2.0727455615997314 | fit/entropy_loss = 1.1585074663162231 |
[INFO] 16:51: [PPOAgent[worker: 0]] | max_global_step = 189332 | episode_rewards = 205.79572679668 | total_episodes = 1902 | fit/surrogate_loss = 2.384558916091919 | fit/entropy_loss = 1.1495928764343262 |
[INFO] 16:51: [PPOAgent[worker: 0]] | max_global_step = 191211 | episode_rewards = 0.0 | total_episodes = 1922 | fit/surrogate_loss = 0.5820152163505554 | fit/entropy_loss = 1.1726787090301514 |
[INFO] 16:51: [PPOAgent[worker: 0]] | max_global_step = 193215 | episode_rewards = 271.4559898725673 | total_episodes = 1943 | fit/surrogate_loss = -2.57812237739563 | fit/entropy_loss = 1.0842591524124146 |
[INFO] 16:51: [PPOAgent[worker: 0]] | max_global_step = 195103 | episode_rewards = 0.0 | total_episodes = 1962 | fit/surrogate_loss = 1.2391142845153809 | fit/entropy_loss = 1.194376826286316 |
[INFO] 16:51: [PPOAgent[worker: 0]] | max_global_step = 196935 | episode_rewards = 171.46015329686514 | total_episodes = 1981 | fit/surrogate_loss = 1.2422070503234863 | fit/entropy_loss = 1.1019363403320312 |
[INFO] 16:51: [PPOAgent[worker: 0]] | max_global_step = 198911 | episode_rewards = 170.06682650489532 | total_episodes = 2000 | fit/surrogate_loss = 0.9598288536071777 | fit/entropy_loss = 1.1226402521133423 |
[INFO] 16:51: [PPOAgent[worker: 0]] | max_global_step = 200645 | episode_rewards = 16.0 | total_episodes = 2018 | fit/surrogate_loss = -3.168940305709839 | fit/entropy_loss = 1.0989958047866821 |
[INFO] 16:51: [PPOAgent[worker: 0]] | max_global_step = 202747 | episode_rewards = 284.3230790190264 | total_episodes = 2037 | fit/surrogate_loss = 12.86429214477539 | fit/entropy_loss = 1.0760250091552734 |
[INFO] 16:51: [PPOAgent[worker: 0]] | max_global_step = 204687 | episode_rewards = 110.60583113983654 | total_episodes = 2056 | fit/surrogate_loss = -10.055243492126465 | fit/entropy_loss = 1.0619055032730103 |
[INFO] 16:51: [PPOAgent[worker: 0]] | max_global_step = 206614 | episode_rewards = 233.9282157160054 | total_episodes = 2075 | fit/surrogate_loss = -2.6406538486480713 | fit/entropy_loss = 1.0568320751190186 |
[INFO] 16:51: [PPOAgent[worker: 0]] | max_global_step = 208579 | episode_rewards = 278.2330085934469 | total_episodes = 2096 | fit/surrogate_loss = 7.980196475982666 | fit/entropy_loss = 1.0911810398101807 |
[INFO] 16:51: [PPOAgent[worker: 0]] | max_global_step = 210508 | episode_rewards = 260.32258285052814 | total_episodes = 2115 | fit/surrogate_loss = 5.786709308624268 | fit/entropy_loss = 1.0824633836746216 |
[INFO] 16:51: [PPOAgent[worker: 0]] | max_global_step = 212373 | episode_rewards = 218.04632652768822 | total_episodes = 2134 | fit/surrogate_loss = -8.352408409118652 | fit/entropy_loss = 1.0934839248657227 |
[INFO] 16:51: [PPOAgent[worker: 0]] | max_global_step = 214329 | episode_rewards = 271.3465157977871 | total_episodes = 2154 | fit/surrogate_loss = -4.062629222869873 | fit/entropy_loss = 1.0969864130020142 |
[INFO] 16:51: [PPOAgent[worker: 0]] | max_global_step = 216267 | episode_rewards = 221.73785563376867 | total_episodes = 2172 | fit/surrogate_loss = 5.777106761932373 | fit/entropy_loss = 1.024115800857544 |
[INFO] 16:51: [PPOAgent[worker: 0]] | max_global_step = 218162 | episode_rewards = 216.0446248254139 | total_episodes = 2192 | fit/surrogate_loss = -3.4470815658569336 | fit/entropy_loss = 1.118396282196045 |
[INFO] 16:52: [PPOAgent[worker: 0]] | max_global_step = 219938 | episode_rewards = 132.59513162358 | total_episodes = 2210 | fit/surrogate_loss = 3.124840021133423 | fit/entropy_loss = 1.1320301294326782 |
[INFO] 16:52: [PPOAgent[worker: 0]] | max_global_step = 221842 | episode_rewards = 184.9587281019406 | total_episodes = 2229 | fit/surrogate_loss = -1.0659230947494507 | fit/entropy_loss = 1.031003475189209 |
[INFO] 16:52: [PPOAgent[worker: 0]] | max_global_step = 223804 | episode_rewards = 0.0 | total_episodes = 2249 | fit/surrogate_loss = -2.4230310916900635 | fit/entropy_loss = 1.0751807689666748 |
[INFO] 16:52: [PPOAgent[worker: 0]] | max_global_step = 225768 | episode_rewards = 244.20561467772313 | total_episodes = 2269 | fit/surrogate_loss = -3.263524055480957 | fit/entropy_loss = 1.0651490688323975 |
[INFO] 16:52: [PPOAgent[worker: 0]] | max_global_step = 227599 | episode_rewards = 22.675146305186402 | total_episodes = 2287 | fit/surrogate_loss = 8.311609268188477 | fit/entropy_loss = 1.0777026414871216 |
[INFO] 16:52: [PPOAgent[worker: 0]] | max_global_step = 229551 | episode_rewards = 0.0 | total_episodes = 2306 | fit/surrogate_loss = 2.6351635456085205 | fit/entropy_loss = 1.0248957872390747 |
[INFO] 16:52: [PPOAgent[worker: 0]] | max_global_step = 231495 | episode_rewards = 285.339815879762 | total_episodes = 2326 | fit/surrogate_loss = 1.2256782054901123 | fit/entropy_loss = 1.0415252447128296 |
[INFO] 16:52: [PPOAgent[worker: 0]] | max_global_step = 233411 | episode_rewards = 171.14700593599062 | total_episodes = 2346 | fit/surrogate_loss = -4.267335891723633 | fit/entropy_loss = 1.0866059064865112 |
[INFO] 16:52: [PPOAgent[worker: 0]] | max_global_step = 235322 | episode_rewards = 270.56492642385905 | total_episodes = 2366 | fit/surrogate_loss = -3.5790865421295166 | fit/entropy_loss = 1.0592182874679565 |
[INFO] 16:52: [PPOAgent[worker: 0]] | max_global_step = 237167 | episode_rewards = 215.45273470694696 | total_episodes = 2384 | fit/surrogate_loss = -7.527778625488281 | fit/entropy_loss = 1.069936752319336 |
[INFO] 16:52: [PPOAgent[worker: 0]] | max_global_step = 239003 | episode_rewards = -20.0 | total_episodes = 2403 | fit/surrogate_loss = 5.44546365737915 | fit/entropy_loss = 1.1499285697937012 |
[INFO] 16:52: [PPOAgent[worker: 0]] | max_global_step = 240983 | episode_rewards = 269.9111065055036 | total_episodes = 2422 | fit/surrogate_loss = 2.35046124458313 | fit/entropy_loss = 1.0309354066848755 |
[INFO] 16:52: [PPOAgent[worker: 0]] | max_global_step = 242875 | episode_rewards = 318.0889483706464 | total_episodes = 2442 | fit/surrogate_loss = -4.4732279777526855 | fit/entropy_loss = 0.9553772211074829 |
[INFO] 16:52: [PPOAgent[worker: 0]] | max_global_step = 244771 | episode_rewards = 104.58798503468927 | total_episodes = 2461 | fit/surrogate_loss = -1.5414801836013794 | fit/entropy_loss = 1.0260636806488037 |
[INFO] 16:52: [PPOAgent[worker: 0]] | max_global_step = 246707 | episode_rewards = 0.0 | total_episodes = 2480 | fit/surrogate_loss = 3.820281982421875 | fit/entropy_loss = 1.0317150354385376 |
[INFO] 16:52: [PPOAgent[worker: 0]] | max_global_step = 248705 | episode_rewards = 162.6760645376208 | total_episodes = 2500 | fit/surrogate_loss = 0.21251605451107025 | fit/entropy_loss = 1.0387043952941895 |
[INFO] 16:52: [PPOAgent[worker: 0]] | max_global_step = 250589 | episode_rewards = 90.02615020038215 | total_episodes = 2519 | fit/surrogate_loss = 6.289228439331055 | fit/entropy_loss = 1.0504704713821411 |
[INFO] 16:52: [PPOAgent[worker: 0]] | max_global_step = 252467 | episode_rewards = 214.89130183644414 | total_episodes = 2537 | fit/surrogate_loss = -6.966522693634033 | fit/entropy_loss = 1.026532530784607 |
[INFO] 16:52: [PPOAgent[worker: 0]] | max_global_step = 254358 | episode_rewards = 308.76341924834617 | total_episodes = 2557 | fit/surrogate_loss = 12.154333114624023 | fit/entropy_loss = 1.0158194303512573 |
[INFO] 16:52: [PPOAgent[worker: 0]] | max_global_step = 256267 | episode_rewards = 283.83298564334217 | total_episodes = 2576 | fit/surrogate_loss = 2.852245807647705 | fit/entropy_loss = 1.0434391498565674 |
[INFO] 16:53: [PPOAgent[worker: 0]] | max_global_step = 258172 | episode_rewards = 84.42354537626872 | total_episodes = 2597 | fit/surrogate_loss = -9.431618690490723 | fit/entropy_loss = 1.0137786865234375 |
[INFO] 16:53: [PPOAgent[worker: 0]] | max_global_step = 260093 | episode_rewards = 276.9943537693276 | total_episodes = 2618 | fit/surrogate_loss = 0.1441255807876587 | fit/entropy_loss = 0.963430643081665 |
[INFO] 16:53: [PPOAgent[worker: 0]] | max_global_step = 262045 | episode_rewards = 241.20177508175692 | total_episodes = 2641 | fit/surrogate_loss = 0.2748934328556061 | fit/entropy_loss = 0.990574836730957 |
[INFO] 16:53: [PPOAgent[worker: 0]] | max_global_step = 264026 | episode_rewards = 268.78084189097274 | total_episodes = 2663 | fit/surrogate_loss = 3.5319406986236572 | fit/entropy_loss = 1.0552963018417358 |
[INFO] 16:53: [PPOAgent[worker: 0]] | max_global_step = 265932 | episode_rewards = 242.44137571994833 | total_episodes = 2686 | fit/surrogate_loss = -9.579716682434082 | fit/entropy_loss = 0.9850586652755737 |
[INFO] 16:53: [PPOAgent[worker: 0]] | max_global_step = 267875 | episode_rewards = 245.39531372725043 | total_episodes = 2708 | fit/surrogate_loss = 1.6980942487716675 | fit/entropy_loss = 1.0098484754562378 |
[INFO] 16:53: [PPOAgent[worker: 0]] | max_global_step = 269777 | episode_rewards = 205.25879645323425 | total_episodes = 2726 | fit/surrogate_loss = 6.197882175445557 | fit/entropy_loss = 0.9618978500366211 |
[INFO] 16:53: [PPOAgent[worker: 0]] | max_global_step = 271704 | episode_rewards = 283.7851303751837 | total_episodes = 2747 | fit/surrogate_loss = 0.685973048210144 | fit/entropy_loss = 0.9942080974578857 |
[INFO] 16:53: [PPOAgent[worker: 0]] | max_global_step = 273541 | episode_rewards = 291.6241996708527 | total_episodes = 2766 | fit/surrogate_loss = -1.1453349590301514 | fit/entropy_loss = 1.101591944694519 |
[INFO] 16:53: [PPOAgent[worker: 0]] | max_global_step = 275517 | episode_rewards = 282.02527238406714 | total_episodes = 2786 | fit/surrogate_loss = -6.049954414367676 | fit/entropy_loss = 1.0179861783981323 |
[INFO] 16:53: [PPOAgent[worker: 0]] | max_global_step = 277360 | episode_rewards = 16.0 | total_episodes = 2806 | fit/surrogate_loss = 11.253918647766113 | fit/entropy_loss = 1.049414038658142 |
[INFO] 16:53: [PPOAgent[worker: 0]] | max_global_step = 279367 | episode_rewards = 281.1290978810243 | total_episodes = 2825 | fit/surrogate_loss = -2.6344029903411865 | fit/entropy_loss = 1.0347164869308472 |
[INFO] 16:53: [PPOAgent[worker: 0]] | max_global_step = 281167 | episode_rewards = 0.0 | total_episodes = 2844 | fit/surrogate_loss = 1.1064000129699707 | fit/entropy_loss = 0.9371924996376038 |
[INFO] 16:53: [PPOAgent[worker: 0]] | max_global_step = 283179 | episode_rewards = 250.78104080966858 | total_episodes = 2863 | fit/surrogate_loss = 12.849495887756348 | fit/entropy_loss = 1.0081018209457397 |
[INFO] 16:53: [PPOAgent[worker: 0]] | max_global_step = 285144 | episode_rewards = 287.55088389857764 | total_episodes = 2883 | fit/surrogate_loss = -3.789557695388794 | fit/entropy_loss = 0.9457067847251892 |
[INFO] 16:53: [PPOAgent[worker: 0]] | max_global_step = 287026 | episode_rewards = 254.9618615560583 | total_episodes = 2901 | fit/surrogate_loss = 0.6840089559555054 | fit/entropy_loss = 0.9657329320907593 |
[INFO] 16:53: [PPOAgent[worker: 0]] | max_global_step = 288904 | episode_rewards = 239.0461878378266 | total_episodes = 2919 | fit/surrogate_loss = -0.8061223030090332 | fit/entropy_loss = 1.0255134105682373 |
[INFO] 16:53: [PPOAgent[worker: 0]] | max_global_step = 290812 | episode_rewards = 232.09967697204786 | total_episodes = 2938 | fit/surrogate_loss = -1.9753848314285278 | fit/entropy_loss = 0.9870120286941528 |
[INFO] 16:53: [PPOAgent[worker: 0]] | max_global_step = 292779 | episode_rewards = 293.1587391022088 | total_episodes = 2958 | fit/surrogate_loss = -3.4968020915985107 | fit/entropy_loss = 1.0315067768096924 |
[INFO] 16:54: [PPOAgent[worker: 0]] | max_global_step = 294745 | episode_rewards = 186.26363476045046 | total_episodes = 2977 | fit/surrogate_loss = 7.592077255249023 | fit/entropy_loss = 1.0064972639083862 |
[INFO] 16:54: [PPOAgent[worker: 0]] | max_global_step = 296645 | episode_rewards = 320.222622397055 | total_episodes = 2995 | fit/surrogate_loss = -1.5015491247177124 | fit/entropy_loss = 0.944429337978363 |
[INFO] 16:54: [PPOAgent[worker: 0]] | max_global_step = 298664 | episode_rewards = 187.139482215965 | total_episodes = 3015 | fit/surrogate_loss = 3.7631969451904297 | fit/entropy_loss = 0.9813820123672485 |
[INFO] 16:54: [PPOAgent[worker: 0]] | max_global_step = 300548 | episode_rewards = 328.55860880531134 | total_episodes = 3033 | fit/surrogate_loss = 1.5286048650741577 | fit/entropy_loss = 0.8878442049026489 |
[INFO] 16:54: [PPOAgent[worker: 0]] | max_global_step = 302499 | episode_rewards = 190.13890473281583 | total_episodes = 3052 | fit/surrogate_loss = -2.894129991531372 | fit/entropy_loss = 0.9722768068313599 |
[INFO] 16:54: [PPOAgent[worker: 0]] | max_global_step = 304394 | episode_rewards = 263.70690042441134 | total_episodes = 3071 | fit/surrogate_loss = 2.445435047149658 | fit/entropy_loss = 0.9777105450630188 |
[INFO] 16:54: [PPOAgent[worker: 0]] | max_global_step = 306318 | episode_rewards = -36.0 | total_episodes = 3089 | fit/surrogate_loss = 3.277486801147461 | fit/entropy_loss = 0.9544088244438171 |
[INFO] 16:54: [PPOAgent[worker: 0]] | max_global_step = 308290 | episode_rewards = 140.11722203309503 | total_episodes = 3108 | fit/surrogate_loss = -0.8213484287261963 | fit/entropy_loss = 0.925036609172821 |
[INFO] 16:54: [PPOAgent[worker: 0]] | max_global_step = 310244 | episode_rewards = 291.42358804221857 | total_episodes = 3128 | fit/surrogate_loss = -0.5326124429702759 | fit/entropy_loss = 0.9467383027076721 |
[INFO] 16:54: [PPOAgent[worker: 0]] | max_global_step = 312007 | episode_rewards = -2.0 | total_episodes = 3147 | fit/surrogate_loss = 1.535997748374939 | fit/entropy_loss = 0.8873388171195984 |
[INFO] 16:54: [PPOAgent[worker: 0]] | max_global_step = 313877 | episode_rewards = 296.1952368101264 | total_episodes = 3167 | fit/surrogate_loss = -0.6225532293319702 | fit/entropy_loss = 0.8994657397270203 |
[INFO] 16:54: [PPOAgent[worker: 0]] | max_global_step = 315845 | episode_rewards = 213.33218784048097 | total_episodes = 3186 | fit/surrogate_loss = 3.5437350273132324 | fit/entropy_loss = 0.8863654732704163 |
[INFO] 16:54: [PPOAgent[worker: 0]] | max_global_step = 317720 | episode_rewards = 281.0779023134249 | total_episodes = 3205 | fit/surrogate_loss = -0.6437132954597473 | fit/entropy_loss = 0.8195433020591736 |
[INFO] 16:54: [PPOAgent[worker: 0]] | max_global_step = 319621 | episode_rewards = 80.51307731884518 | total_episodes = 3223 | fit/surrogate_loss = 7.503363132476807 | fit/entropy_loss = 0.8818414807319641 |
[INFO] 16:54: [PPOAgent[worker: 0]] | max_global_step = 321499 | episode_rewards = 263.8596860259897 | total_episodes = 3243 | fit/surrogate_loss = -13.773136138916016 | fit/entropy_loss = 0.8752599358558655 |
[INFO] 16:54: [PPOAgent[worker: 0]] | max_global_step = 323401 | episode_rewards = 322.76747419685483 | total_episodes = 3262 | fit/surrogate_loss = 10.615867614746094 | fit/entropy_loss = 0.883058488368988 |
[INFO] 16:54: [PPOAgent[worker: 0]] | max_global_step = 325272 | episode_rewards = 299.6771221698585 | total_episodes = 3280 | fit/surrogate_loss = 8.855615615844727 | fit/entropy_loss = 0.8485798835754395 |
[INFO] 16:54: [PPOAgent[worker: 0]] | max_global_step = 327263 | episode_rewards = 277.88398165026155 | total_episodes = 3300 | fit/surrogate_loss = -7.257081985473633 | fit/entropy_loss = 0.8347086310386658 |
[INFO] 16:54: [PPOAgent[worker: 0]] | max_global_step = 329167 | episode_rewards = 286.43851646392847 | total_episodes = 3320 | fit/surrogate_loss = 3.605395793914795 | fit/entropy_loss = 0.8551650643348694 |
[INFO] 16:54: [PPOAgent[worker: 0]] | max_global_step = 331089 | episode_rewards = 132.27738500552374 | total_episodes = 3339 | fit/surrogate_loss = 8.149479866027832 | fit/entropy_loss = 0.8589387536048889 |
[INFO] 16:55: [PPOAgent[worker: 0]] | max_global_step = 333046 | episode_rewards = 294.103572316817 | total_episodes = 3358 | fit/surrogate_loss = -14.957820892333984 | fit/entropy_loss = 0.9307618737220764 |
[INFO] 16:55: [PPOAgent[worker: 0]] | max_global_step = 334946 | episode_rewards = 359.3958632660398 | total_episodes = 3378 | fit/surrogate_loss = 12.715492248535156 | fit/entropy_loss = 0.9235902428627014 |
[INFO] 16:55: [PPOAgent[worker: 0]] | max_global_step = 336786 | episode_rewards = 312.18158983836196 | total_episodes = 3397 | fit/surrogate_loss = 3.759201765060425 | fit/entropy_loss = 0.9244174361228943 |
[INFO] 16:55: [PPOAgent[worker: 0]] | max_global_step = 338763 | episode_rewards = 0.0 | total_episodes = 3417 | fit/surrogate_loss = -3.8107359409332275 | fit/entropy_loss = 1.0092012882232666 |
[INFO] 16:55: [PPOAgent[worker: 0]] | max_global_step = 340757 | episode_rewards = 297.4810194857149 | total_episodes = 3439 | fit/surrogate_loss = -3.7150909900665283 | fit/entropy_loss = 0.9435456395149231 |
[INFO] 16:55: [PPOAgent[worker: 0]] | max_global_step = 342703 | episode_rewards = 325.8883162072191 | total_episodes = 3458 | fit/surrogate_loss = 8.093222618103027 | fit/entropy_loss = 0.9791096448898315 |
[INFO] 16:55: [PPOAgent[worker: 0]] | max_global_step = 344588 | episode_rewards = 325.63559877100266 | total_episodes = 3477 | fit/surrogate_loss = 2.5148143768310547 | fit/entropy_loss = 0.9678980708122253 |
[INFO] 16:55: [PPOAgent[worker: 0]] | max_global_step = 346553 | episode_rewards = 264.9606710241308 | total_episodes = 3500 | fit/surrogate_loss = -6.9905500411987305 | fit/entropy_loss = 0.9897512793540955 |
[INFO] 16:55: [PPOAgent[worker: 0]] | max_global_step = 348515 | episode_rewards = 0.0 | total_episodes = 3521 | fit/surrogate_loss = 3.7423617839813232 | fit/entropy_loss = 1.0184897184371948 |
[INFO] 16:55: [PPOAgent[worker: 0]] | max_global_step = 350317 | episode_rewards = 98.06277900045299 | total_episodes = 3541 | fit/surrogate_loss = 1.9921005964279175 | fit/entropy_loss = 1.0237897634506226 |
[INFO] 16:55: [PPOAgent[worker: 0]] | max_global_step = 352120 | episode_rewards = 151.4683927095074 | total_episodes = 3559 | fit/surrogate_loss = 1.9690021276474 | fit/entropy_loss = 1.0069987773895264 |
[INFO] 16:55: [PPOAgent[worker: 0]] | max_global_step = 354120 | episode_rewards = 202.49230749307205 | total_episodes = 3579 | fit/surrogate_loss = -0.41003695130348206 | fit/entropy_loss = 0.9336974024772644 |
[INFO] 16:55: [PPOAgent[worker: 0]] | max_global_step = 356081 | episode_rewards = 301.713127133677 | total_episodes = 3599 | fit/surrogate_loss = 0.08953665941953659 | fit/entropy_loss = 0.9974294304847717 |
[INFO] 16:55: [PPOAgent[worker: 0]] | max_global_step = 358059 | episode_rewards = 0.0 | total_episodes = 3619 | fit/surrogate_loss = 1.450573444366455 | fit/entropy_loss = 1.018861174583435 |
[INFO] 16:55: [PPOAgent[worker: 0]] | max_global_step = 359913 | episode_rewards = 0.0 | total_episodes = 3638 | fit/surrogate_loss = -4.925795078277588 | fit/entropy_loss = 0.966366171836853 |
[INFO] 16:55: [PPOAgent[worker: 0]] | max_global_step = 361833 | episode_rewards = 32.18952697529554 | total_episodes = 3660 | fit/surrogate_loss = -0.42690742015838623 | fit/entropy_loss = 0.9651669859886169 |
[INFO] 16:55: [PPOAgent[worker: 0]] | max_global_step = 363767 | episode_rewards = 226.74993878617212 | total_episodes = 3679 | fit/surrogate_loss = -3.410787582397461 | fit/entropy_loss = 1.0196833610534668 |
[INFO] 16:55: [PPOAgent[worker: 0]] | max_global_step = 365648 | episode_rewards = 0.0 | total_episodes = 3699 | fit/surrogate_loss = -1.4755240678787231 | fit/entropy_loss = 1.0036985874176025 |
[INFO] 16:55: [PPOAgent[worker: 0]] | max_global_step = 367506 | episode_rewards = 241.05188779928602 | total_episodes = 3720 | fit/surrogate_loss = 3.930341958999634 | fit/entropy_loss = 1.0016478300094604 |
[INFO] 16:56: [PPOAgent[worker: 0]] | max_global_step = 369452 | episode_rewards = 284.35806858077245 | total_episodes = 3740 | fit/surrogate_loss = 10.060277938842773 | fit/entropy_loss = 0.9490146636962891 |
[INFO] 16:56: [PPOAgent[worker: 0]] | max_global_step = 371410 | episode_rewards = 246.99344154863303 | total_episodes = 3762 | fit/surrogate_loss = -1.8628233671188354 | fit/entropy_loss = 0.9763739705085754 |
[INFO] 16:56: [PPOAgent[worker: 0]] | max_global_step = 373356 | episode_rewards = 242.28637067749025 | total_episodes = 3781 | fit/surrogate_loss = -1.496168613433838 | fit/entropy_loss = 1.0062780380249023 |
[INFO] 16:56: [PPOAgent[worker: 0]] | max_global_step = 375157 | episode_rewards = 292.2258261309967 | total_episodes = 3800 | fit/surrogate_loss = 4.843698978424072 | fit/entropy_loss = 1.0262356996536255 |
[INFO] 16:56: [PPOAgent[worker: 0]] | max_global_step = 377121 | episode_rewards = 14.0 | total_episodes = 3821 | fit/surrogate_loss = -5.91718864440918 | fit/entropy_loss = 1.0802301168441772 |
[INFO] 16:56: [PPOAgent[worker: 0]] | max_global_step = 379066 | episode_rewards = 218.7146904838953 | total_episodes = 3840 | fit/surrogate_loss = -3.436647415161133 | fit/entropy_loss = 1.0132417678833008 |
[INFO] 16:56: [PPOAgent[worker: 0]] | max_global_step = 380992 | episode_rewards = 234.36918883589595 | total_episodes = 3860 | fit/surrogate_loss = 4.775086402893066 | fit/entropy_loss = 0.9952309131622314 |
[INFO] 16:56: [PPOAgent[worker: 0]] | max_global_step = 382892 | episode_rewards = 284.07871555046415 | total_episodes = 3879 | fit/surrogate_loss = -5.776726245880127 | fit/entropy_loss = 1.0385886430740356 |
[INFO] 16:56: [PPOAgent[worker: 0]] | max_global_step = 384869 | episode_rewards = 324.0644475643733 | total_episodes = 3899 | fit/surrogate_loss = 8.264321327209473 | fit/entropy_loss = 1.0189669132232666 |
[INFO] 16:56: [PPOAgent[worker: 0]] | max_global_step = 386712 | episode_rewards = 15.0 | total_episodes = 3919 | fit/surrogate_loss = 5.558108329772949 | fit/entropy_loss = 0.9657159447669983 |
[INFO] 16:56: [PPOAgent[worker: 0]] | max_global_step = 388612 | episode_rewards = 148.0068600827308 | total_episodes = 3938 | fit/surrogate_loss = -2.7608399391174316 | fit/entropy_loss = 0.9457671046257019 |
[INFO] 16:56: [PPOAgent[worker: 0]] | max_global_step = 390497 | episode_rewards = 292.24682707125083 | total_episodes = 3958 | fit/surrogate_loss = 4.495976448059082 | fit/entropy_loss = 0.9412776827812195 |
[INFO] 16:56: [PPOAgent[worker: 0]] | max_global_step = 392297 | episode_rewards = -12.0 | total_episodes = 3976 | fit/surrogate_loss = -6.394156455993652 | fit/entropy_loss = 0.8748922944068909 |
[INFO] 16:56: [PPOAgent[worker: 0]] | max_global_step = 394354 | episode_rewards = 244.19843768782306 | total_episodes = 3994 | fit/surrogate_loss = 1.8958444595336914 | fit/entropy_loss = 0.9547006487846375 |
[INFO] 16:56: [PPOAgent[worker: 0]] | max_global_step = 396337 | episode_rewards = 252.92778983197093 | total_episodes = 4013 | fit/surrogate_loss = 6.616912364959717 | fit/entropy_loss = 0.9458012580871582 |
[INFO] 16:56: [PPOAgent[worker: 0]] | max_global_step = 398203 | episode_rewards = 286.96804348546357 | total_episodes = 4032 | fit/surrogate_loss = -8.990714073181152 | fit/entropy_loss = 0.9437640309333801 |
[INFO] 16:56: [PPOAgent[worker: 0]] | max_global_step = 400107 | episode_rewards = 0.0 | total_episodes = 4050 | fit/surrogate_loss = 1.1266694068908691 | fit/entropy_loss = 0.9508050084114075 |
[INFO] 16:56: [PPOAgent[worker: 0]] | max_global_step = 402072 | episode_rewards = 0.0 | total_episodes = 4069 | fit/surrogate_loss = 4.978913307189941 | fit/entropy_loss = 0.941914439201355 |
[INFO] 16:56: [PPOAgent[worker: 0]] | max_global_step = 403943 | episode_rewards = 150.6610338132491 | total_episodes = 4087 | fit/surrogate_loss = -7.1340250968933105 | fit/entropy_loss = 1.004734992980957 |
[INFO] 16:56: [PPOAgent[worker: 0]] | max_global_step = 405719 | episode_rewards = 286.1085754713148 | total_episodes = 4105 | fit/surrogate_loss = -3.3456854820251465 | fit/entropy_loss = 0.9755523204803467 |
[INFO] 16:57: [PPOAgent[worker: 0]] | max_global_step = 407602 | episode_rewards = 0.0 | total_episodes = 4124 | fit/surrogate_loss = 2.1803932189941406 | fit/entropy_loss = 0.949565589427948 |
[INFO] 16:57: [PPOAgent[worker: 0]] | max_global_step = 409551 | episode_rewards = 156.19653705120353 | total_episodes = 4142 | fit/surrogate_loss = 1.3640680313110352 | fit/entropy_loss = 0.9335667490959167 |
[INFO] 16:57: [PPOAgent[worker: 0]] | max_global_step = 411380 | episode_rewards = 350.7678222447295 | total_episodes = 4160 | fit/surrogate_loss = 4.900801181793213 | fit/entropy_loss = 0.9382438659667969 |
[INFO] 16:57: [PPOAgent[worker: 0]] | max_global_step = 413195 | episode_rewards = 285.3994499195645 | total_episodes = 4177 | fit/surrogate_loss = -2.779069662094116 | fit/entropy_loss = 0.9222322702407837 |
[INFO] 16:57: [PPOAgent[worker: 0]] | max_global_step = 415163 | episode_rewards = 288.03400255951937 | total_episodes = 4197 | fit/surrogate_loss = 2.873145818710327 | fit/entropy_loss = 0.9784928560256958 |
[INFO] 16:57: [PPOAgent[worker: 0]] | max_global_step = 417081 | episode_rewards = 271.8003946442427 | total_episodes = 4216 | fit/surrogate_loss = 0.8303982019424438 | fit/entropy_loss = 0.9632890820503235 |
[INFO] 16:57: [PPOAgent[worker: 0]] | max_global_step = 419070 | episode_rewards = 274.3610669586073 | total_episodes = 4235 | fit/surrogate_loss = -0.3378711938858032 | fit/entropy_loss = 0.9332031011581421 |
[INFO] 16:57: [PPOAgent[worker: 0]] | max_global_step = 420935 | episode_rewards = 285.698652336808 | total_episodes = 4255 | fit/surrogate_loss = -1.6555614471435547 | fit/entropy_loss = 0.928404688835144 |
[INFO] 16:57: [PPOAgent[worker: 0]] | max_global_step = 422869 | episode_rewards = 321.4248764564302 | total_episodes = 4276 | fit/surrogate_loss = 2.801839828491211 | fit/entropy_loss = 0.9713084697723389 |
[INFO] 16:57: [PPOAgent[worker: 0]] | max_global_step = 424801 | episode_rewards = 347.5848249806207 | total_episodes = 4298 | fit/surrogate_loss = -5.931520462036133 | fit/entropy_loss = 0.9765328764915466 |
[INFO] 16:57: [PPOAgent[worker: 0]] | max_global_step = 426753 | episode_rewards = 235.22405365927392 | total_episodes = 4319 | fit/surrogate_loss = 4.456506252288818 | fit/entropy_loss = 0.9539737105369568 |
[INFO] 16:57: [PPOAgent[worker: 0]] | max_global_step = 428589 | episode_rewards = 293.8601487388092 | total_episodes = 4339 | fit/surrogate_loss = -1.527997374534607 | fit/entropy_loss = 1.0273936986923218 |
[INFO] 16:57: [PPOAgent[worker: 0]] | max_global_step = 430491 | episode_rewards = 238.40763901062212 | total_episodes = 4360 | fit/surrogate_loss = 5.263881683349609 | fit/entropy_loss = 0.9276028275489807 |
[INFO] 16:57: [PPOAgent[worker: 0]] | max_global_step = 432489 | episode_rewards = 298.29358894142774 | total_episodes = 4380 | fit/surrogate_loss = 1.954353928565979 | fit/entropy_loss = 1.0460476875305176 |
[INFO] 16:57: [PPOAgent[worker: 0]] | max_global_step = 434335 | episode_rewards = 249.3733569086184 | total_episodes = 4400 | fit/surrogate_loss = 1.5861694812774658 | fit/entropy_loss = 0.9824934601783752 |
[INFO] 16:57: [PPOAgent[worker: 0]] | max_global_step = 436158 | episode_rewards = 238.96283738341737 | total_episodes = 4417 | fit/surrogate_loss = -0.17691625654697418 | fit/entropy_loss = 0.9926157593727112 |
[INFO] 16:57: [PPOAgent[worker: 0]] | max_global_step = 438063 | episode_rewards = 260.36137471838583 | total_episodes = 4435 | fit/surrogate_loss = 3.9432201385498047 | fit/entropy_loss = 0.9440750479698181 |
[INFO] 16:57: [PPOAgent[worker: 0]] | max_global_step = 440029 | episode_rewards = 281.4619657222258 | total_episodes = 4455 | fit/surrogate_loss = -11.234806060791016 | fit/entropy_loss = 0.9728662967681885 |
[INFO] 16:57: [PPOAgent[worker: 0]] | max_global_step = 441900 | episode_rewards = 303.9872611084485 | total_episodes = 4473 | fit/surrogate_loss = 10.437104225158691 | fit/entropy_loss = 0.9096306562423706 |
[INFO] 16:58: [PPOAgent[worker: 0]] | max_global_step = 443753 | episode_rewards = 315.03343422340197 | total_episodes = 4491 | fit/surrogate_loss = -4.030556678771973 | fit/entropy_loss = 0.9464774131774902 |
[INFO] 16:58: [PPOAgent[worker: 0]] | max_global_step = 445673 | episode_rewards = 0.0 | total_episodes = 4511 | fit/surrogate_loss = -6.509860515594482 | fit/entropy_loss = 0.9715564250946045 |
[INFO] 16:58: [PPOAgent[worker: 0]] | max_global_step = 447532 | episode_rewards = 291.69420461798654 | total_episodes = 4530 | fit/surrogate_loss = 4.072037696838379 | fit/entropy_loss = 0.9845868945121765 |
[INFO] 16:58: [PPOAgent[worker: 0]] | max_global_step = 449527 | episode_rewards = 218.99857107201902 | total_episodes = 4549 | fit/surrogate_loss = -5.0280632972717285 | fit/entropy_loss = 0.8980158567428589 |
[INFO] 16:58: [PPOAgent[worker: 0]] | max_global_step = 451450 | episode_rewards = 323.45601542226564 | total_episodes = 4569 | fit/surrogate_loss = 12.949789047241211 | fit/entropy_loss = 0.9183349609375 |
[INFO] 16:58: [PPOAgent[worker: 0]] | max_global_step = 453436 | episode_rewards = 331.10574235011643 | total_episodes = 4588 | fit/surrogate_loss = -4.701037406921387 | fit/entropy_loss = 0.8513623476028442 |
[INFO] 16:58: [PPOAgent[worker: 0]] | max_global_step = 455331 | episode_rewards = 298.16808547838895 | total_episodes = 4607 | fit/surrogate_loss = 3.3393399715423584 | fit/entropy_loss = 0.9371728301048279 |
[INFO] 16:58: [PPOAgent[worker: 0]] | max_global_step = 457258 | episode_rewards = 305.20445123022546 | total_episodes = 4625 | fit/surrogate_loss = -1.4828935861587524 | fit/entropy_loss = 0.9189344644546509 |
[INFO] 16:58: [PPOAgent[worker: 0]] | max_global_step = 459204 | episode_rewards = 296.97104745527434 | total_episodes = 4646 | fit/surrogate_loss = -6.455825328826904 | fit/entropy_loss = 0.8192564249038696 |
[INFO] 16:58: [PPOAgent[worker: 0]] | max_global_step = 461093 | episode_rewards = 326.2481563297774 | total_episodes = 4665 | fit/surrogate_loss = 8.854265213012695 | fit/entropy_loss = 0.8241569399833679 |
[INFO] 16:58: [PPOAgent[worker: 0]] | max_global_step = 462970 | episode_rewards = 310.69696450029136 | total_episodes = 4683 | fit/surrogate_loss = -1.0951286554336548 | fit/entropy_loss = 0.9279043674468994 |
[INFO] 16:58: [PPOAgent[worker: 0]] | max_global_step = 464888 | episode_rewards = 0.0 | total_episodes = 4702 | fit/surrogate_loss = 7.324929714202881 | fit/entropy_loss = 0.9040438532829285 |
[INFO] 16:58: [PPOAgent[worker: 0]] | max_global_step = 466803 | episode_rewards = 284.0750483620409 | total_episodes = 4720 | fit/surrogate_loss = -10.986502647399902 | fit/entropy_loss = 0.8823921084403992 |
[INFO] 16:58: [PPOAgent[worker: 0]] | max_global_step = 468636 | episode_rewards = -24.0 | total_episodes = 4737 | fit/surrogate_loss = -9.095108985900879 | fit/entropy_loss = 0.9000943303108215 |
[INFO] 16:58: [PPOAgent[worker: 0]] | max_global_step = 470528 | episode_rewards = 0.0 | total_episodes = 4756 | fit/surrogate_loss = 0.49896976351737976 | fit/entropy_loss = 0.9501118063926697 |
[INFO] 16:58: [PPOAgent[worker: 0]] | max_global_step = 472417 | episode_rewards = -36.0 | total_episodes = 4774 | fit/surrogate_loss = 11.732644081115723 | fit/entropy_loss = 0.9509857892990112 |
[INFO] 16:58: [PPOAgent[worker: 0]] | max_global_step = 474413 | episode_rewards = 157.45800940040132 | total_episodes = 4795 | fit/surrogate_loss = -11.044668197631836 | fit/entropy_loss = 0.9405302405357361 |
[INFO] 16:58: [PPOAgent[worker: 0]] | max_global_step = 476233 | episode_rewards = 304.95351220285875 | total_episodes = 4815 | fit/surrogate_loss = 3.230506181716919 | fit/entropy_loss = 0.8895562887191772 |
[INFO] 16:58: [PPOAgent[worker: 0]] | max_global_step = 478088 | episode_rewards = 273.65990527820844 | total_episodes = 4836 | fit/surrogate_loss = -2.057007074356079 | fit/entropy_loss = 0.8450247049331665 |
[INFO] 16:59: [PPOAgent[worker: 0]] | max_global_step = 480096 | episode_rewards = 225.393869417009 | total_episodes = 4854 | fit/surrogate_loss = -6.749125003814697 | fit/entropy_loss = 0.7920497059822083 |
[INFO] 16:59: [PPOAgent[worker: 0]] | max_global_step = 482010 | episode_rewards = 284.2181490104804 | total_episodes = 4874 | fit/surrogate_loss = 11.958834648132324 | fit/entropy_loss = 0.7524480819702148 |
[INFO] 16:59: [PPOAgent[worker: 0]] | max_global_step = 483903 | episode_rewards = 118.52480329809106 | total_episodes = 4892 | fit/surrogate_loss = 1.5171358585357666 | fit/entropy_loss = 0.8042746186256409 |
[INFO] 16:59: [PPOAgent[worker: 0]] | max_global_step = 485814 | episode_rewards = 287.68503833467213 | total_episodes = 4910 | fit/surrogate_loss = 0.09560095518827438 | fit/entropy_loss = 0.7662050724029541 |
[INFO] 16:59: [PPOAgent[worker: 0]] | max_global_step = 487756 | episode_rewards = 294.12818602515716 | total_episodes = 4928 | fit/surrogate_loss = -5.227457523345947 | fit/entropy_loss = 0.8396679759025574 |
[INFO] 16:59: [PPOAgent[worker: 0]] | max_global_step = 489715 | episode_rewards = 212.56486168613156 | total_episodes = 4948 | fit/surrogate_loss = 1.5038007497787476 | fit/entropy_loss = 0.9010404348373413 |
[INFO] 16:59: [PPOAgent[worker: 0]] | max_global_step = 491574 | episode_rewards = 248.34553919585076 | total_episodes = 4967 | fit/surrogate_loss = 3.363309383392334 | fit/entropy_loss = 0.8638811707496643 |
[INFO] 16:59: [PPOAgent[worker: 0]] | max_global_step = 493465 | episode_rewards = 0.0 | total_episodes = 4986 | fit/surrogate_loss = 8.634424209594727 | fit/entropy_loss = 0.8884937167167664 |
[INFO] 16:59: [PPOAgent[worker: 0]] | max_global_step = 495356 | episode_rewards = 151.93969389847663 | total_episodes = 5005 | fit/surrogate_loss = -1.5574429035186768 | fit/entropy_loss = 0.8366995453834534 |
[INFO] 16:59: [PPOAgent[worker: 0]] | max_global_step = 497239 | episode_rewards = 302.49554099930833 | total_episodes = 5024 | fit/surrogate_loss = 1.633874535560608 | fit/entropy_loss = 0.8804383277893066 |
[INFO] 16:59: [PPOAgent[worker: 0]] | max_global_step = 499103 | episode_rewards = 167.51614314661649 | total_episodes = 5044 | fit/surrogate_loss = 1.513819694519043 | fit/entropy_loss = 0.8617693185806274 |
[INFO] 16:59: ... trained!
/home/frost/.local/lib/python3.10/site-packages/gym/spaces/box.py:127: UserWarning: WARN: Box bound precision lowered by casting to float32
  logger.warn(f"Box bound precision lowered by casting to {self.dtype}")
/home/frost/.local/lib/python3.10/site-packages/gym/spaces/box.py:127: UserWarning: WARN: Box bound precision lowered by casting to float32
  logger.warn(f"Box bound precision lowered by casting to {self.dtype}")

Training reward curve

[5]:
data = plot_writer_data(manager, tag="episode_rewards", smooth_weight=0.8) # smoothing tensorboard-style
../_images/notebooks_notebook_ppo_10_0.png

Evaluation of the trained agent

[6]:
evaluation = evaluate_agents([manager], n_simulations=128, plot=False)
evaluation.describe()
[INFO] 17:00: Evaluating PPOAgent...
[INFO] Evaluation:/home/frost/.local/lib/python3.10/site-packages/gym/utils/passive_env_checker.py:174: UserWarning: WARN: Future gym versions will require that `Env.reset` can be passed a `seed` instead of using `Env.seed` for resetting the environment random number generator.
  logger.warn(
/home/frost/.local/lib/python3.10/site-packages/gym/utils/passive_env_checker.py:187: UserWarning: WARN: Future gym versions will require that `Env.reset` can be passed `options` to allow the environment initialisation to be passed additional information.
  logger.warn(
/home/frost/.local/lib/python3.10/site-packages/gym/utils/passive_env_checker.py:195: UserWarning: WARN: The result returned by `env.reset()` was not a tuple of the form `(obs, info)`, where `obs` is a observation and `info` is a dictionary containing additional information. Actual type: `<class 'numpy.ndarray'>`
  logger.warn(
/home/frost/.local/lib/python3.10/site-packages/gym/utils/passive_env_checker.py:219: DeprecationWarning: WARN: Core environment is written in old step API which returns one bool instead of two. It is recommended to rewrite the environment with new step API. 
  logger.deprecation(
/home/frost/.local/lib/python3.10/site-packages/gym/utils/passive_env_checker.py:141: UserWarning: WARN: The obs returned by the `step()` method was expecting numpy array dtype to be float32, actual type: float64
  logger.warn(
/home/frost/.local/lib/python3.10/site-packages/gym/utils/passive_env_checker.py:165: UserWarning: WARN: The obs returned by the `step()` method is not within the observation space.
  logger.warn(f"{pre} is not within the observation space.")
................................................................................................................................  Evaluation finished
[6]:
PPOAgent
count 128.000000
mean 185.779274
std 36.115669
min 82.613569
25% 158.785278
50% 184.637599
75% 218.280214
max 258.224123

Small peek into the agents policy

[7]:
agent = manager.agent_handlers[0] # select the agent from the manager
[9]:
env = env_ctor(**env_kwargs)
obs = env.reset()

actions_txt = ["doing nothing",
           "1L of water",
           "5L of water",
           "harvesting",
           "sow some seeds",
           "scatter fertilizer",
           "scatter herbicide",
           "scatter pesticide",
           "remove weeds by hand",]
episode = pd.DataFrame()
for day in range(365):
    action = agent.policy(obs)
    print("Day: {}, Mean temp: {}, stage: {}, weight of  fruit: {}".format(obs[0], np.round(obs[1],3),
                                                                           int(obs[7]), obs[15]))
    obs,reward, is_done,_ =  env.step(action)
    print("Action is", actions_txt[action])
    episode = pd.concat([episode, pd.DataFrame({'action':[actions_txt[action]],
                                                'reward':[reward]})], ignore_index=True)
    print('')
    if is_done:
        print('Plant is Dead')
        break
/home/frost/.local/lib/python3.10/site-packages/gym/spaces/box.py:127: UserWarning: WARN: Box bound precision lowered by casting to float32
  logger.warn(f"Box bound precision lowered by casting to {self.dtype}")
/home/frost/.local/lib/python3.10/site-packages/gym/utils/passive_env_checker.py:174: UserWarning: WARN: Future gym versions will require that `Env.reset` can be passed a `seed` instead of using `Env.seed` for resetting the environment random number generator.
  logger.warn(
/home/frost/.local/lib/python3.10/site-packages/gym/utils/passive_env_checker.py:187: UserWarning: WARN: Future gym versions will require that `Env.reset` can be passed `options` to allow the environment initialisation to be passed additional information.
  logger.warn(
/home/frost/.local/lib/python3.10/site-packages/gym/utils/passive_env_checker.py:195: UserWarning: WARN: The result returned by `env.reset()` was not a tuple of the form `(obs, info)`, where `obs` is a observation and `info` is a dictionary containing additional information. Actual type: `<class 'numpy.ndarray'>`
  logger.warn(
/home/frost/.local/lib/python3.10/site-packages/gym/utils/passive_env_checker.py:219: DeprecationWarning: WARN: Core environment is written in old step API which returns one bool instead of two. It is recommended to rewrite the environment with new step API. 
  logger.deprecation(
/home/frost/.local/lib/python3.10/site-packages/gym/utils/passive_env_checker.py:141: UserWarning: WARN: The obs returned by the `step()` method was expecting numpy array dtype to be float32, actual type: float64
  logger.warn(
/home/frost/.local/lib/python3.10/site-packages/gym/utils/passive_env_checker.py:165: UserWarning: WARN: The obs returned by the `step()` method is not within the observation space.
  logger.warn(f"{pre} is not within the observation space.")
Day: 1.0, Mean temp: 4.66, stage: 0, weight of  fruit: 0.0
Action is 1L of water

Day: 2.0, Mean temp: 8.107, stage: 0, weight of  fruit: 0.0
Action is sow some seeds

Day: 3.0, Mean temp: 4.794, stage: 1, weight of  fruit: 0.0
Action is sow some seeds

Day: 4.0, Mean temp: 6.229, stage: 1, weight of  fruit: 0.0
Action is sow some seeds

Day: 5.0, Mean temp: 5.328, stage: 1, weight of  fruit: 0.0
Action is sow some seeds

Day: 6.0, Mean temp: 6.831, stage: 1, weight of  fruit: 0.0
Action is sow some seeds

Day: 7.0, Mean temp: 11.42, stage: 2, weight of  fruit: 0.0
Action is sow some seeds

Day: 8.0, Mean temp: 11.808, stage: 3, weight of  fruit: 0.0
Action is sow some seeds

Day: 9.0, Mean temp: 7.065, stage: 3, weight of  fruit: 0.0
Action is sow some seeds

Day: 10.0, Mean temp: 5.824, stage: 3, weight of  fruit: 0.0
Action is 1L of water

Day: 11.0, Mean temp: 7.3, stage: 3, weight of  fruit: 0.0
Action is sow some seeds

Day: 12.0, Mean temp: 7.8, stage: 3, weight of  fruit: 0.0
Action is 1L of water

Day: 13.0, Mean temp: 10.459, stage: 3, weight of  fruit: 0.0
Action is 1L of water

Day: 14.0, Mean temp: 8.196, stage: 3, weight of  fruit: 0.0
Action is 1L of water

Day: 15.0, Mean temp: 6.916, stage: 3, weight of  fruit: 0.0
Action is 1L of water

Day: 16.0, Mean temp: 8.3, stage: 3, weight of  fruit: 0.0
Action is 1L of water

Day: 17.0, Mean temp: 3.395, stage: 3, weight of  fruit: 0.0
Action is 1L of water

Day: 18.0, Mean temp: 3.507, stage: 3, weight of  fruit: 0.0
Action is 1L of water

Day: 19.0, Mean temp: 1.572, stage: 3, weight of  fruit: 0.0
Action is 1L of water

Day: 20.0, Mean temp: -0.618, stage: 3, weight of  fruit: 0.0
Action is remove weeds by hand

Day: 21.0, Mean temp: 1.371, stage: 3, weight of  fruit: 0.0
Action is remove weeds by hand

Day: 22.0, Mean temp: 2.858, stage: 3, weight of  fruit: 0.0
Action is remove weeds by hand

Day: 23.0, Mean temp: 0.966, stage: 3, weight of  fruit: 0.0
Action is 1L of water

Day: 24.0, Mean temp: 1.87, stage: 3, weight of  fruit: 0.0
Action is 1L of water

Day: 25.0, Mean temp: 5.66, stage: 3, weight of  fruit: 0.0
Action is 1L of water

Day: 26.0, Mean temp: 7.941, stage: 3, weight of  fruit: 0.0
Action is remove weeds by hand

Day: 27.0, Mean temp: 5.144, stage: 3, weight of  fruit: 0.0
Action is remove weeds by hand

Day: 28.0, Mean temp: 5.272, stage: 3, weight of  fruit: 0.0
Action is remove weeds by hand

Day: 29.0, Mean temp: 7.686, stage: 3, weight of  fruit: 0.0
Action is remove weeds by hand

Day: 30.0, Mean temp: 11.335, stage: 3, weight of  fruit: 0.0
Action is remove weeds by hand

Day: 31.0, Mean temp: 10.712, stage: 3, weight of  fruit: 0.0
Action is remove weeds by hand

Day: 32.0, Mean temp: 10.321, stage: 3, weight of  fruit: 0.0
Action is remove weeds by hand

Day: 33.0, Mean temp: 9.554, stage: 3, weight of  fruit: 0.0
Action is remove weeds by hand

Day: 34.0, Mean temp: 6.328, stage: 3, weight of  fruit: 0.0
Action is remove weeds by hand

Day: 35.0, Mean temp: 6.586, stage: 3, weight of  fruit: 0.0
Action is remove weeds by hand

Day: 36.0, Mean temp: 4.039, stage: 3, weight of  fruit: 0.0
Action is remove weeds by hand

Day: 37.0, Mean temp: 4.962, stage: 3, weight of  fruit: 0.0
Action is remove weeds by hand

Day: 38.0, Mean temp: 8.219, stage: 3, weight of  fruit: 0.0
Action is remove weeds by hand

Day: 39.0, Mean temp: 9.835, stage: 3, weight of  fruit: 0.0
Action is remove weeds by hand

Day: 40.0, Mean temp: 7.665, stage: 3, weight of  fruit: 0.0
Action is remove weeds by hand

Day: 41.0, Mean temp: 6.337, stage: 3, weight of  fruit: 0.0
Action is remove weeds by hand

Day: 42.0, Mean temp: 5.62, stage: 3, weight of  fruit: 0.0
Action is remove weeds by hand

Day: 43.0, Mean temp: 7.099, stage: 3, weight of  fruit: 0.0
Action is remove weeds by hand

Day: 44.0, Mean temp: 7.855, stage: 3, weight of  fruit: 0.0
Action is remove weeds by hand

Day: 45.0, Mean temp: 10.274, stage: 3, weight of  fruit: 0.0
Action is remove weeds by hand

Day: 46.0, Mean temp: 12.223, stage: 3, weight of  fruit: 0.0
Action is scatter fertilizer

Day: 47.0, Mean temp: 7.699, stage: 3, weight of  fruit: 0.0
Action is remove weeds by hand

Day: 48.0, Mean temp: 7.065, stage: 3, weight of  fruit: 0.0
Action is scatter fertilizer

Day: 49.0, Mean temp: 6.367, stage: 3, weight of  fruit: 0.0
Action is remove weeds by hand

Day: 50.0, Mean temp: 8.157, stage: 3, weight of  fruit: 0.0
Action is scatter fertilizer

Day: 51.0, Mean temp: 6.388, stage: 3, weight of  fruit: 0.0
Action is scatter fertilizer

Day: 52.0, Mean temp: 8.593, stage: 3, weight of  fruit: 0.0
Action is remove weeds by hand

Day: 53.0, Mean temp: 10.491, stage: 3, weight of  fruit: 0.0
Action is scatter pesticide

Day: 54.0, Mean temp: 9.339, stage: 3, weight of  fruit: 0.0
Action is remove weeds by hand

Day: 55.0, Mean temp: 6.897, stage: 3, weight of  fruit: 0.0
Action is remove weeds by hand

Day: 56.0, Mean temp: 4.347, stage: 3, weight of  fruit: 0.0
Action is remove weeds by hand

Day: 57.0, Mean temp: 2.649, stage: 3, weight of  fruit: 0.0
Action is remove weeds by hand

Day: 58.0, Mean temp: 4.304, stage: 3, weight of  fruit: 0.0
Action is scatter pesticide

Day: 59.0, Mean temp: 8.054, stage: 3, weight of  fruit: 0.0
Action is remove weeds by hand

Day: 60.0, Mean temp: 6.459, stage: 3, weight of  fruit: 0.0
Action is scatter fertilizer

Day: 61.0, Mean temp: 5.813, stage: 3, weight of  fruit: 0.0
Action is doing nothing

Day: 62.0, Mean temp: 4.74, stage: 3, weight of  fruit: 0.0
Action is scatter pesticide

Day: 63.0, Mean temp: 5.463, stage: 3, weight of  fruit: 0.0
Action is 1L of water

Day: 64.0, Mean temp: 7.361, stage: 3, weight of  fruit: 0.0
Action is remove weeds by hand

Day: 65.0, Mean temp: 6.06, stage: 3, weight of  fruit: 0.0
Action is remove weeds by hand

Day: 66.0, Mean temp: 6.398, stage: 3, weight of  fruit: 0.0
Action is remove weeds by hand

Day: 67.0, Mean temp: 8.256, stage: 3, weight of  fruit: 0.0
Action is scatter pesticide

Day: 68.0, Mean temp: 7.85, stage: 3, weight of  fruit: 0.0
Action is remove weeds by hand

Day: 69.0, Mean temp: 10.868, stage: 3, weight of  fruit: 0.0
Action is scatter fertilizer

Day: 70.0, Mean temp: 12.469, stage: 3, weight of  fruit: 0.0
Action is scatter pesticide

Day: 71.0, Mean temp: 8.836, stage: 3, weight of  fruit: 0.0
Action is scatter pesticide

Day: 72.0, Mean temp: 7.698, stage: 3, weight of  fruit: 0.0
Action is scatter fertilizer

Day: 73.0, Mean temp: 7.344, stage: 3, weight of  fruit: 0.0
Action is scatter pesticide

Day: 74.0, Mean temp: 9.662, stage: 3, weight of  fruit: 0.0
Action is harvesting

Day: 75.0, Mean temp: 7.742, stage: 3, weight of  fruit: 0.0
Action is scatter pesticide

Day: 76.0, Mean temp: 7.807, stage: 3, weight of  fruit: 0.0
Action is scatter fertilizer

Day: 77.0, Mean temp: 9.912, stage: 3, weight of  fruit: 0.0
Action is scatter fertilizer

Day: 78.0, Mean temp: 9.564, stage: 3, weight of  fruit: 0.0
Action is scatter fertilizer

Day: 79.0, Mean temp: 7.413, stage: 3, weight of  fruit: 0.0
Action is harvesting

Day: 80.0, Mean temp: 6.346, stage: 3, weight of  fruit: 0.0
Action is doing nothing

Day: 81.0, Mean temp: 5.97, stage: 3, weight of  fruit: 0.0
Action is 1L of water

Day: 82.0, Mean temp: 5.594, stage: 3, weight of  fruit: 0.0
Action is doing nothing

Day: 83.0, Mean temp: 6.003, stage: 3, weight of  fruit: 0.0
Action is 1L of water

Day: 84.0, Mean temp: 6.43, stage: 3, weight of  fruit: 0.0
Action is scatter fertilizer

Day: 85.0, Mean temp: 6.677, stage: 3, weight of  fruit: 0.0
Action is 1L of water

Day: 86.0, Mean temp: 7.663, stage: 3, weight of  fruit: 0.0
Action is 1L of water

Day: 87.0, Mean temp: 8.407, stage: 3, weight of  fruit: 0.0
Action is 1L of water

Day: 88.0, Mean temp: 5.038, stage: 3, weight of  fruit: 0.0
Action is 1L of water

Day: 89.0, Mean temp: 4.676, stage: 3, weight of  fruit: 0.0
Action is 1L of water

Day: 90.0, Mean temp: 5.745, stage: 3, weight of  fruit: 0.0
Action is 1L of water

Day: 91.0, Mean temp: 5.488, stage: 3, weight of  fruit: 0.0
Action is 1L of water

Day: 92.0, Mean temp: 7.189, stage: 4, weight of  fruit: 0.0
Action is 1L of water

Day: 93.0, Mean temp: 8.946, stage: 5, weight of  fruit: 0.0
Action is 1L of water

Day: 94.0, Mean temp: 10.253, stage: 5, weight of  fruit: 0.0
Action is 1L of water

Day: 95.0, Mean temp: 13.488, stage: 5, weight of  fruit: 0.0
Action is 1L of water

Day: 96.0, Mean temp: 12.475, stage: 5, weight of  fruit: 0.0
Action is 5L of water

Day: 97.0, Mean temp: 13.701, stage: 5, weight of  fruit: 0.0
Action is 1L of water

Day: 98.0, Mean temp: 17.629, stage: 6, weight of  fruit: 0.0
Action is 1L of water

Day: 99.0, Mean temp: 16.655, stage: 6, weight of  fruit: 1.0
Action is 1L of water

Day: 100.0, Mean temp: 16.302, stage: 6, weight of  fruit: 1.8462644348974382
Action is 1L of water

Day: 101.0, Mean temp: 16.295, stage: 6, weight of  fruit: 2.9439341468885543
Action is 1L of water

Day: 102.0, Mean temp: 16.228, stage: 6, weight of  fruit: 4.277763800353128
Action is doing nothing

Day: 103.0, Mean temp: 9.44, stage: 6, weight of  fruit: 4.277763800353128
Action is 1L of water

Day: 104.0, Mean temp: 6.957, stage: 6, weight of  fruit: 4.277763800353128
Action is 5L of water

Day: 105.0, Mean temp: 10.949, stage: 6, weight of  fruit: 4.277763800353128
Action is doing nothing

Day: 106.0, Mean temp: 14.965, stage: 6, weight of  fruit: 5.864047649310217
Action is 1L of water

Day: 107.0, Mean temp: 15.352, stage: 6, weight of  fruit: 7.669518508245714
Action is harvesting

Plant is Dead
[15]:
import seaborn as sns
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(12,6))
sns.countplot(data = episode, x = "action", order=episode['action'].value_counts().index)
[15]:
<AxesSubplot:xlabel='action', ylabel='count'>
../_images/notebooks_notebook_ppo_16_1.png

From this, we see that PPO learned that pesticide destroy the soil and should be avoided in small quantity. Herbicide is not useful if we can remove weeds by hand.

[ ]: