Illustration of PPO agent applied to a farming environment from farm-gym¶
### Imports:
[1]:
# classical libraries
import numpy as np
import pandas as pd
import seaborn as sns
# farm-gym pre-made environments
import farmgym_games
# RL library
from rlberry.agents.torch import PPOAgent
from rlberry.manager import AgentManager, evaluate_agents, plot_writer_data
from rlberry.agents.torch.utils.training import model_factory_from_env
from rlberry.envs import gym_make
Settings :¶
We’ll use the Farm1 environment
[2]:
env_ctor, env_kwargs = gym_make, {"id": "OldV21Farm1-v0"} # rlberry is gym v0.21 compatible. Use "id"="Farm1-v0" for gym v0.26 compatibility
We use an architecture of \(256\times 256\) for both the value and policy neural network of ppo.
[3]:
policy_configs = {
"type": "MultiLayerPerceptron", # A network architecture
"layer_sizes": (256, 256), # Network dimensions
"reshape": False,
"is_policy": True,
}
value_configs = {
"type": "MultiLayerPerceptron",
"layer_sizes": (256, 256),
"reshape": False,
"out_size": 1,
}
### Agent code: We use rlberry’s PPOAgent. Remark that 365 days is the maximum lenght of an episode. This helps us to fix some of the parameters.
[4]:
manager = AgentManager(
PPOAgent,
(env_ctor, env_kwargs),
agent_name="PPOAgent",
init_kwargs=dict(
policy_net_fn=model_factory_from_env,
policy_net_kwargs=policy_configs,
value_net_fn=model_factory_from_env,
value_net_kwargs=value_configs,
learning_rate=9e-5,
n_steps=5 * 365,
batch_size=365,
eps_clip=0.2,
),
fit_budget=5e5,
eval_kwargs=dict(eval_horizon=365),
n_fit=1,
seed = 42, # Important: as farm-gym is very stochastic, for some choice of seed PPO does not train and the final reward is 0 !
output_dir="ppo1_results", # results/trained agents are kept in this directory
)
manager.fit()
[INFO] 16:46: Running AgentManager fit() for PPOAgent with n_fit = 1 and max_workers = None.
/home/frost/.local/lib/python3.10/site-packages/gym/spaces/box.py:127: UserWarning: WARN: Box bound precision lowered by casting to float32
logger.warn(f"Box bound precision lowered by casting to {self.dtype}")
/home/frost/.local/lib/python3.10/site-packages/gym/spaces/box.py:127: UserWarning: WARN: Box bound precision lowered by casting to float32
logger.warn(f"Box bound precision lowered by casting to {self.dtype}")
/home/frost/.local/lib/python3.10/site-packages/gym/utils/passive_env_checker.py:174: UserWarning: WARN: Future gym versions will require that `Env.reset` can be passed a `seed` instead of using `Env.seed` for resetting the environment random number generator.
logger.warn(
/home/frost/.local/lib/python3.10/site-packages/gym/utils/passive_env_checker.py:187: UserWarning: WARN: Future gym versions will require that `Env.reset` can be passed `options` to allow the environment initialisation to be passed additional information.
logger.warn(
/home/frost/.local/lib/python3.10/site-packages/gym/utils/passive_env_checker.py:195: UserWarning: WARN: The result returned by `env.reset()` was not a tuple of the form `(obs, info)`, where `obs` is a observation and `info` is a dictionary containing additional information. Actual type: `<class 'numpy.ndarray'>`
logger.warn(
/home/frost/.local/lib/python3.10/site-packages/gym/utils/passive_env_checker.py:219: DeprecationWarning: WARN: Core environment is written in old step API which returns one bool instead of two. It is recommended to rewrite the environment with new step API.
logger.deprecation(
/home/frost/.local/lib/python3.10/site-packages/gym/utils/passive_env_checker.py:141: UserWarning: WARN: The obs returned by the `step()` method was expecting numpy array dtype to be float32, actual type: float64
logger.warn(
/home/frost/.local/lib/python3.10/site-packages/gym/utils/passive_env_checker.py:165: UserWarning: WARN: The obs returned by the `step()` method is not within the observation space.
logger.warn(f"{pre} is not within the observation space.")
[INFO] 16:46: [PPOAgent[worker: 0]] | max_global_step = 2209 | episode_rewards = 134.52428322598146 | total_episodes = 22 | fit/surrogate_loss = -17.20841407775879 | fit/entropy_loss = 1.3652242422103882 |
[INFO] 16:46: [PPOAgent[worker: 0]] | max_global_step = 4267 | episode_rewards = 180.22205261986448 | total_episodes = 45 | fit/surrogate_loss = 3.766744375228882 | fit/entropy_loss = 1.2915725708007812 |
[INFO] 16:46: [PPOAgent[worker: 0]] | max_global_step = 6460 | episode_rewards = -174.0 | total_episodes = 68 | fit/surrogate_loss = 1.7429035902023315 | fit/entropy_loss = 1.4351849555969238 |
[INFO] 16:46: [PPOAgent[worker: 0]] | max_global_step = 8671 | episode_rewards = 0.0 | total_episodes = 85 | fit/surrogate_loss = -4.726207256317139 | fit/entropy_loss = 1.5167763233184814 |
[INFO] 16:46: [PPOAgent[worker: 0]] | max_global_step = 10697 | episode_rewards = -62.0 | total_episodes = 101 | fit/surrogate_loss = 7.040808200836182 | fit/entropy_loss = 0.9549528956413269 |
[INFO] 16:46: [PPOAgent[worker: 0]] | max_global_step = 12417 | episode_rewards = -16.0 | total_episodes = 118 | fit/surrogate_loss = 2.217423915863037 | fit/entropy_loss = 1.255665898323059 |
[INFO] 16:46: [PPOAgent[worker: 0]] | max_global_step = 14796 | episode_rewards = 333.7999433709374 | total_episodes = 137 | fit/surrogate_loss = -4.331498622894287 | fit/entropy_loss = 1.1084470748901367 |
[INFO] 16:46: [PPOAgent[worker: 0]] | max_global_step = 16895 | episode_rewards = -122.0 | total_episodes = 154 | fit/surrogate_loss = -1.8878077268600464 | fit/entropy_loss = 0.8683847784996033 |
[INFO] 16:46: [PPOAgent[worker: 0]] | max_global_step = 18919 | episode_rewards = 0.0 | total_episodes = 167 | fit/surrogate_loss = -6.60181188583374 | fit/entropy_loss = 0.9932242035865784 |
[INFO] 16:46: [PPOAgent[worker: 0]] | max_global_step = 20920 | episode_rewards = 0.0 | total_episodes = 185 | fit/surrogate_loss = -1.0884056091308594 | fit/entropy_loss = 0.9581738710403442 |
[INFO] 16:46: [PPOAgent[worker: 0]] | max_global_step = 22962 | episode_rewards = -160.0 | total_episodes = 201 | fit/surrogate_loss = -6.423931121826172 | fit/entropy_loss = 0.916784405708313 |
[INFO] 16:46: [PPOAgent[worker: 0]] | max_global_step = 25143 | episode_rewards = 0.0 | total_episodes = 214 | fit/surrogate_loss = 2.195204019546509 | fit/entropy_loss = 0.8561437129974365 |
[INFO] 16:46: [PPOAgent[worker: 0]] | max_global_step = 27160 | episode_rewards = -32.0 | total_episodes = 228 | fit/surrogate_loss = -3.6802823543548584 | fit/entropy_loss = 0.576086699962616 |
[INFO] 16:46: [PPOAgent[worker: 0]] | max_global_step = 28953 | episode_rewards = 0.0 | total_episodes = 244 | fit/surrogate_loss = -2.6257214546203613 | fit/entropy_loss = 0.5261974334716797 |
[INFO] 16:46: [PPOAgent[worker: 0]] | max_global_step = 31158 | episode_rewards = -36.0 | total_episodes = 263 | fit/surrogate_loss = 0.9804189205169678 | fit/entropy_loss = 0.8518242239952087 |
[INFO] 16:47: [PPOAgent[worker: 0]] | max_global_step = 33170 | episode_rewards = 273.2909831213743 | total_episodes = 280 | fit/surrogate_loss = 0.08106783777475357 | fit/entropy_loss = 0.7627907395362854 |
[INFO] 16:47: [PPOAgent[worker: 0]] | max_global_step = 35126 | episode_rewards = -14.0 | total_episodes = 301 | fit/surrogate_loss = 4.697272777557373 | fit/entropy_loss = 1.3077625036239624 |
[INFO] 16:47: [PPOAgent[worker: 0]] | max_global_step = 37032 | episode_rewards = 0.0 | total_episodes = 316 | fit/surrogate_loss = -3.589635133743286 | fit/entropy_loss = 0.7756088376045227 |
[INFO] 16:47: [PPOAgent[worker: 0]] | max_global_step = 39019 | episode_rewards = 123.06599629645945 | total_episodes = 337 | fit/surrogate_loss = 1.0958138704299927 | fit/entropy_loss = 0.9637163281440735 |
[INFO] 16:47: [PPOAgent[worker: 0]] | max_global_step = 41039 | episode_rewards = 186.96929667917848 | total_episodes = 359 | fit/surrogate_loss = 2.217855930328369 | fit/entropy_loss = 1.3185549974441528 |
[INFO] 16:47: [PPOAgent[worker: 0]] | max_global_step = 42864 | episode_rewards = 0.0 | total_episodes = 376 | fit/surrogate_loss = 3.6343650817871094 | fit/entropy_loss = 0.9253543615341187 |
[INFO] 16:47: [PPOAgent[worker: 0]] | max_global_step = 44901 | episode_rewards = 0.0 | total_episodes = 395 | fit/surrogate_loss = 0.038416508585214615 | fit/entropy_loss = 0.9682783484458923 |
[INFO] 16:47: [PPOAgent[worker: 0]] | max_global_step = 46947 | episode_rewards = 0.0 | total_episodes = 412 | fit/surrogate_loss = 0.11422909051179886 | fit/entropy_loss = 0.6158499121665955 |
[INFO] 16:47: [PPOAgent[worker: 0]] | max_global_step = 48993 | episode_rewards = 0.0 | total_episodes = 431 | fit/surrogate_loss = -3.7807295322418213 | fit/entropy_loss = 0.724868893623352 |
[INFO] 16:47: [PPOAgent[worker: 0]] | max_global_step = 50966 | episode_rewards = 0.0 | total_episodes = 450 | fit/surrogate_loss = -0.5481564998626709 | fit/entropy_loss = 0.5342087149620056 |
[INFO] 16:47: [PPOAgent[worker: 0]] | max_global_step = 52844 | episode_rewards = 0.0 | total_episodes = 467 | fit/surrogate_loss = 4.418668270111084 | fit/entropy_loss = 0.8343883752822876 |
[INFO] 16:47: [PPOAgent[worker: 0]] | max_global_step = 54957 | episode_rewards = 0.0 | total_episodes = 486 | fit/surrogate_loss = -2.375662088394165 | fit/entropy_loss = 0.8427497148513794 |
[INFO] 16:47: [PPOAgent[worker: 0]] | max_global_step = 57007 | episode_rewards = 16.0 | total_episodes = 509 | fit/surrogate_loss = 3.9137234687805176 | fit/entropy_loss = 0.9979891180992126 |
[INFO] 16:47: [PPOAgent[worker: 0]] | max_global_step = 59076 | episode_rewards = 0.0 | total_episodes = 530 | fit/surrogate_loss = -0.2222403734922409 | fit/entropy_loss = 1.0436657667160034 |
[INFO] 16:47: [PPOAgent[worker: 0]] | max_global_step = 61089 | episode_rewards = 0.0 | total_episodes = 552 | fit/surrogate_loss = -3.2849655151367188 | fit/entropy_loss = 1.0299031734466553 |
[INFO] 16:47: [PPOAgent[worker: 0]] | max_global_step = 63029 | episode_rewards = 0.0 | total_episodes = 575 | fit/surrogate_loss = 3.238835334777832 | fit/entropy_loss = 1.221884846687317 |
[INFO] 16:47: [PPOAgent[worker: 0]] | max_global_step = 64886 | episode_rewards = 0.0 | total_episodes = 597 | fit/surrogate_loss = -0.7216285467147827 | fit/entropy_loss = 1.2553671598434448 |
[INFO] 16:47: [PPOAgent[worker: 0]] | max_global_step = 66934 | episode_rewards = 112.86767683173309 | total_episodes = 619 | fit/surrogate_loss = 2.7049076557159424 | fit/entropy_loss = 1.272657871246338 |
[INFO] 16:47: [PPOAgent[worker: 0]] | max_global_step = 68898 | episode_rewards = 18.0 | total_episodes = 641 | fit/surrogate_loss = 4.479696750640869 | fit/entropy_loss = 1.2921851873397827 |
[INFO] 16:48: [PPOAgent[worker: 0]] | max_global_step = 70786 | episode_rewards = 139.0729117340091 | total_episodes = 661 | fit/surrogate_loss = -0.4576759934425354 | fit/entropy_loss = 1.2917993068695068 |
[INFO] 16:48: [PPOAgent[worker: 0]] | max_global_step = 72621 | episode_rewards = -4.0 | total_episodes = 682 | fit/surrogate_loss = -0.2897544205188751 | fit/entropy_loss = 1.2893353700637817 |
[INFO] 16:48: [PPOAgent[worker: 0]] | max_global_step = 74505 | episode_rewards = 0.0 | total_episodes = 704 | fit/surrogate_loss = 3.921590805053711 | fit/entropy_loss = 1.2924174070358276 |
[INFO] 16:48: [PPOAgent[worker: 0]] | max_global_step = 76314 | episode_rewards = 0.0 | total_episodes = 724 | fit/surrogate_loss = 0.5616796016693115 | fit/entropy_loss = 1.3364989757537842 |
[INFO] 16:48: [PPOAgent[worker: 0]] | max_global_step = 78255 | episode_rewards = 0.0 | total_episodes = 746 | fit/surrogate_loss = 1.1299254894256592 | fit/entropy_loss = 1.2764776945114136 |
[INFO] 16:48: [PPOAgent[worker: 0]] | max_global_step = 80013 | episode_rewards = 266.94249821150834 | total_episodes = 763 | fit/surrogate_loss = 4.023199558258057 | fit/entropy_loss = 1.2256656885147095 |
[INFO] 16:48: [PPOAgent[worker: 0]] | max_global_step = 81715 | episode_rewards = 191.68010878323798 | total_episodes = 780 | fit/surrogate_loss = 8.112669944763184 | fit/entropy_loss = 1.2419995069503784 |
[INFO] 16:48: [PPOAgent[worker: 0]] | max_global_step = 83590 | episode_rewards = 276.3426506341879 | total_episodes = 800 | fit/surrogate_loss = -3.397995948791504 | fit/entropy_loss = 1.180037498474121 |
[INFO] 16:48: [PPOAgent[worker: 0]] | max_global_step = 85632 | episode_rewards = 201.806702212911 | total_episodes = 822 | fit/surrogate_loss = 11.917405128479004 | fit/entropy_loss = 1.1466782093048096 |
[INFO] 16:48: [PPOAgent[worker: 0]] | max_global_step = 87484 | episode_rewards = 0.0 | total_episodes = 842 | fit/surrogate_loss = -4.252918243408203 | fit/entropy_loss = 1.2645856142044067 |
[INFO] 16:48: [PPOAgent[worker: 0]] | max_global_step = 89279 | episode_rewards = 190.76893492535956 | total_episodes = 862 | fit/surrogate_loss = -0.5222299695014954 | fit/entropy_loss = 1.1454248428344727 |
[INFO] 16:48: [PPOAgent[worker: 0]] | max_global_step = 91222 | episode_rewards = 259.8129481494122 | total_episodes = 882 | fit/surrogate_loss = -3.2302627563476562 | fit/entropy_loss = 1.1970560550689697 |
[INFO] 16:48: [PPOAgent[worker: 0]] | max_global_step = 93047 | episode_rewards = 275.51577814466606 | total_episodes = 902 | fit/surrogate_loss = 8.772000312805176 | fit/entropy_loss = 1.1550569534301758 |
[INFO] 16:48: [PPOAgent[worker: 0]] | max_global_step = 95024 | episode_rewards = 275.19492424250683 | total_episodes = 922 | fit/surrogate_loss = -2.4215118885040283 | fit/entropy_loss = 1.0462547540664673 |
[INFO] 16:48: [PPOAgent[worker: 0]] | max_global_step = 96894 | episode_rewards = 0.0 | total_episodes = 942 | fit/surrogate_loss = 0.17885783314704895 | fit/entropy_loss = 1.0427830219268799 |
[INFO] 16:48: [PPOAgent[worker: 0]] | max_global_step = 98867 | episode_rewards = 256.3459135016439 | total_episodes = 963 | fit/surrogate_loss = -5.808422565460205 | fit/entropy_loss = 0.9922869801521301 |
[INFO] 16:48: [PPOAgent[worker: 0]] | max_global_step = 100767 | episode_rewards = 261.8378148045977 | total_episodes = 982 | fit/surrogate_loss = -0.07664193212985992 | fit/entropy_loss = 0.9912879467010498 |
[INFO] 16:48: [PPOAgent[worker: 0]] | max_global_step = 102705 | episode_rewards = 250.53306513006976 | total_episodes = 1004 | fit/surrogate_loss = -5.870321273803711 | fit/entropy_loss = 1.03829824924469 |
[INFO] 16:48: [PPOAgent[worker: 0]] | max_global_step = 104646 | episode_rewards = 31.12396309603301 | total_episodes = 1023 | fit/surrogate_loss = 2.2727718353271484 | fit/entropy_loss = 0.9626011848449707 |
[INFO] 16:48: [PPOAgent[worker: 0]] | max_global_step = 106558 | episode_rewards = 0.0 | total_episodes = 1043 | fit/surrogate_loss = 2.6146774291992188 | fit/entropy_loss = 0.8991783857345581 |
[INFO] 16:49: [PPOAgent[worker: 0]] | max_global_step = 108486 | episode_rewards = 197.42062229739346 | total_episodes = 1063 | fit/surrogate_loss = 5.396429538726807 | fit/entropy_loss = 0.8701432943344116 |
[INFO] 16:49: [PPOAgent[worker: 0]] | max_global_step = 110446 | episode_rewards = 293.10522661518553 | total_episodes = 1083 | fit/surrogate_loss = -4.111895561218262 | fit/entropy_loss = 0.925383448600769 |
[INFO] 16:49: [PPOAgent[worker: 0]] | max_global_step = 112293 | episode_rewards = 0.0 | total_episodes = 1103 | fit/surrogate_loss = 2.3448612689971924 | fit/entropy_loss = 0.9377964735031128 |
[INFO] 16:49: [PPOAgent[worker: 0]] | max_global_step = 114293 | episode_rewards = 270.8051013314283 | total_episodes = 1123 | fit/surrogate_loss = -9.176980972290039 | fit/entropy_loss = 0.9452497363090515 |
[INFO] 16:49: [PPOAgent[worker: 0]] | max_global_step = 116258 | episode_rewards = 283.65709798467174 | total_episodes = 1142 | fit/surrogate_loss = 11.655402183532715 | fit/entropy_loss = 0.8301343321800232 |
[INFO] 16:49: [PPOAgent[worker: 0]] | max_global_step = 118209 | episode_rewards = 0.0 | total_episodes = 1163 | fit/surrogate_loss = -5.4693922996521 | fit/entropy_loss = 0.913402795791626 |
[INFO] 16:49: [PPOAgent[worker: 0]] | max_global_step = 120071 | episode_rewards = 302.01518974606034 | total_episodes = 1183 | fit/surrogate_loss = 7.949192047119141 | fit/entropy_loss = 0.9001861810684204 |
[INFO] 16:49: [PPOAgent[worker: 0]] | max_global_step = 122083 | episode_rewards = 271.50668054340275 | total_episodes = 1204 | fit/surrogate_loss = -0.3852759897708893 | fit/entropy_loss = 0.8573261499404907 |
[INFO] 16:49: [PPOAgent[worker: 0]] | max_global_step = 124048 | episode_rewards = 260.3091520178151 | total_episodes = 1223 | fit/surrogate_loss = -10.70649528503418 | fit/entropy_loss = 0.9148780107498169 |
[INFO] 16:49: [PPOAgent[worker: 0]] | max_global_step = 125843 | episode_rewards = 274.7459547222016 | total_episodes = 1245 | fit/surrogate_loss = -2.958625555038452 | fit/entropy_loss = 0.8772678971290588 |
[INFO] 16:49: [PPOAgent[worker: 0]] | max_global_step = 127866 | episode_rewards = 245.85348013466609 | total_episodes = 1267 | fit/surrogate_loss = -3.8383290767669678 | fit/entropy_loss = 1.009056806564331 |
[INFO] 16:49: [PPOAgent[worker: 0]] | max_global_step = 129823 | episode_rewards = 240.46847748136793 | total_episodes = 1287 | fit/surrogate_loss = -5.578691482543945 | fit/entropy_loss = 1.0033317804336548 |
[INFO] 16:49: [PPOAgent[worker: 0]] | max_global_step = 131699 | episode_rewards = 283.56211757181813 | total_episodes = 1306 | fit/surrogate_loss = 10.772649765014648 | fit/entropy_loss = 1.0038061141967773 |
[INFO] 16:49: [PPOAgent[worker: 0]] | max_global_step = 133625 | episode_rewards = 0.0 | total_episodes = 1326 | fit/surrogate_loss = -2.2552425861358643 | fit/entropy_loss = 1.0329627990722656 |
[INFO] 16:49: [PPOAgent[worker: 0]] | max_global_step = 135498 | episode_rewards = 238.22589699089986 | total_episodes = 1349 | fit/surrogate_loss = 1.1353161334991455 | fit/entropy_loss = 1.0362646579742432 |
[INFO] 16:49: [PPOAgent[worker: 0]] | max_global_step = 137453 | episode_rewards = 273.6662891054696 | total_episodes = 1370 | fit/surrogate_loss = -7.482893943786621 | fit/entropy_loss = 1.1330825090408325 |
[INFO] 16:49: [PPOAgent[worker: 0]] | max_global_step = 139374 | episode_rewards = -10.0 | total_episodes = 1389 | fit/surrogate_loss = 4.843024253845215 | fit/entropy_loss = 1.079302430152893 |
[INFO] 16:49: [PPOAgent[worker: 0]] | max_global_step = 141359 | episode_rewards = 195.0093446034527 | total_episodes = 1409 | fit/surrogate_loss = -0.3425341248512268 | fit/entropy_loss = 1.1312304735183716 |
[INFO] 16:49: [PPOAgent[worker: 0]] | max_global_step = 143229 | episode_rewards = 0.0 | total_episodes = 1428 | fit/surrogate_loss = -3.96565318107605 | fit/entropy_loss = 1.0328818559646606 |
[INFO] 16:50: [PPOAgent[worker: 0]] | max_global_step = 145122 | episode_rewards = 109.29218472669729 | total_episodes = 1448 | fit/surrogate_loss = 9.060868263244629 | fit/entropy_loss = 1.0830814838409424 |
[INFO] 16:50: [PPOAgent[worker: 0]] | max_global_step = 147111 | episode_rewards = -22.0 | total_episodes = 1471 | fit/surrogate_loss = -2.5419631004333496 | fit/entropy_loss = 1.1042369604110718 |
[INFO] 16:50: [PPOAgent[worker: 0]] | max_global_step = 149006 | episode_rewards = 279.4907295573699 | total_episodes = 1491 | fit/surrogate_loss = 7.033303260803223 | fit/entropy_loss = 0.9747352004051208 |
[INFO] 16:50: [PPOAgent[worker: 0]] | max_global_step = 150915 | episode_rewards = 267.20894531089453 | total_episodes = 1510 | fit/surrogate_loss = 3.960771322250366 | fit/entropy_loss = 0.9919326305389404 |
[INFO] 16:50: [PPOAgent[worker: 0]] | max_global_step = 152908 | episode_rewards = 256.2250103305771 | total_episodes = 1532 | fit/surrogate_loss = -0.007613801397383213 | fit/entropy_loss = 1.0438289642333984 |
[INFO] 16:50: [PPOAgent[worker: 0]] | max_global_step = 154833 | episode_rewards = 245.52128312357308 | total_episodes = 1551 | fit/surrogate_loss = -10.362034797668457 | fit/entropy_loss = 1.1016902923583984 |
[INFO] 16:50: [PPOAgent[worker: 0]] | max_global_step = 156778 | episode_rewards = 278.79854712757714 | total_episodes = 1572 | fit/surrogate_loss = 6.5567522048950195 | fit/entropy_loss = 1.0572316646575928 |
[INFO] 16:50: [PPOAgent[worker: 0]] | max_global_step = 158651 | episode_rewards = 215.68329328843254 | total_episodes = 1591 | fit/surrogate_loss = 2.0005228519439697 | fit/entropy_loss = 1.053226351737976 |
[INFO] 16:50: [PPOAgent[worker: 0]] | max_global_step = 160549 | episode_rewards = 0.0 | total_episodes = 1612 | fit/surrogate_loss = 4.874629974365234 | fit/entropy_loss = 1.0698308944702148 |
[INFO] 16:50: [PPOAgent[worker: 0]] | max_global_step = 162579 | episode_rewards = 273.16079349912553 | total_episodes = 1632 | fit/surrogate_loss = -4.574504375457764 | fit/entropy_loss = 1.1021976470947266 |
[INFO] 16:50: [PPOAgent[worker: 0]] | max_global_step = 164542 | episode_rewards = 312.9760725954338 | total_episodes = 1654 | fit/surrogate_loss = -9.858379364013672 | fit/entropy_loss = 1.090998888015747 |
[INFO] 16:50: [PPOAgent[worker: 0]] | max_global_step = 166422 | episode_rewards = 267.55608565988587 | total_episodes = 1673 | fit/surrogate_loss = 7.012547016143799 | fit/entropy_loss = 1.1158734560012817 |
[INFO] 16:50: [PPOAgent[worker: 0]] | max_global_step = 168419 | episode_rewards = 274.3685780078847 | total_episodes = 1693 | fit/surrogate_loss = -2.038301944732666 | fit/entropy_loss = 1.1137681007385254 |
[INFO] 16:50: [PPOAgent[worker: 0]] | max_global_step = 170362 | episode_rewards = 228.2306574275855 | total_episodes = 1713 | fit/surrogate_loss = 1.7389625310897827 | fit/entropy_loss = 1.0956854820251465 |
[INFO] 16:50: [PPOAgent[worker: 0]] | max_global_step = 172316 | episode_rewards = 241.30185727204105 | total_episodes = 1732 | fit/surrogate_loss = 2.5269970893859863 | fit/entropy_loss = 1.1481139659881592 |
[INFO] 16:50: [PPOAgent[worker: 0]] | max_global_step = 174196 | episode_rewards = -12.0 | total_episodes = 1750 | fit/surrogate_loss = 1.630097508430481 | fit/entropy_loss = 1.1351419687271118 |
[INFO] 16:50: [PPOAgent[worker: 0]] | max_global_step = 176069 | episode_rewards = 189.3297826228422 | total_episodes = 1771 | fit/surrogate_loss = -3.8621702194213867 | fit/entropy_loss = 1.1028456687927246 |
[INFO] 16:50: [PPOAgent[worker: 0]] | max_global_step = 178013 | episode_rewards = 268.1629180735788 | total_episodes = 1789 | fit/surrogate_loss = 5.526572227478027 | fit/entropy_loss = 1.1215077638626099 |
[INFO] 16:50: [PPOAgent[worker: 0]] | max_global_step = 179927 | episode_rewards = 0.0 | total_episodes = 1808 | fit/surrogate_loss = 1.0089154243469238 | fit/entropy_loss = 1.1004434823989868 |
[INFO] 16:50: [PPOAgent[worker: 0]] | max_global_step = 181719 | episode_rewards = 239.03986857129473 | total_episodes = 1826 | fit/surrogate_loss = 1.3062998056411743 | fit/entropy_loss = 1.0463488101959229 |
[INFO] 16:51: [PPOAgent[worker: 0]] | max_global_step = 183621 | episode_rewards = 265.65874300593913 | total_episodes = 1845 | fit/surrogate_loss = -5.852258682250977 | fit/entropy_loss = 1.133949637413025 |
[INFO] 16:51: [PPOAgent[worker: 0]] | max_global_step = 185565 | episode_rewards = 67.80445637614709 | total_episodes = 1865 | fit/surrogate_loss = 9.158913612365723 | fit/entropy_loss = 1.1922465562820435 |
[INFO] 16:51: [PPOAgent[worker: 0]] | max_global_step = 187451 | episode_rewards = 282.8416307910926 | total_episodes = 1883 | fit/surrogate_loss = 2.0727455615997314 | fit/entropy_loss = 1.1585074663162231 |
[INFO] 16:51: [PPOAgent[worker: 0]] | max_global_step = 189332 | episode_rewards = 205.79572679668 | total_episodes = 1902 | fit/surrogate_loss = 2.384558916091919 | fit/entropy_loss = 1.1495928764343262 |
[INFO] 16:51: [PPOAgent[worker: 0]] | max_global_step = 191211 | episode_rewards = 0.0 | total_episodes = 1922 | fit/surrogate_loss = 0.5820152163505554 | fit/entropy_loss = 1.1726787090301514 |
[INFO] 16:51: [PPOAgent[worker: 0]] | max_global_step = 193215 | episode_rewards = 271.4559898725673 | total_episodes = 1943 | fit/surrogate_loss = -2.57812237739563 | fit/entropy_loss = 1.0842591524124146 |
[INFO] 16:51: [PPOAgent[worker: 0]] | max_global_step = 195103 | episode_rewards = 0.0 | total_episodes = 1962 | fit/surrogate_loss = 1.2391142845153809 | fit/entropy_loss = 1.194376826286316 |
[INFO] 16:51: [PPOAgent[worker: 0]] | max_global_step = 196935 | episode_rewards = 171.46015329686514 | total_episodes = 1981 | fit/surrogate_loss = 1.2422070503234863 | fit/entropy_loss = 1.1019363403320312 |
[INFO] 16:51: [PPOAgent[worker: 0]] | max_global_step = 198911 | episode_rewards = 170.06682650489532 | total_episodes = 2000 | fit/surrogate_loss = 0.9598288536071777 | fit/entropy_loss = 1.1226402521133423 |
[INFO] 16:51: [PPOAgent[worker: 0]] | max_global_step = 200645 | episode_rewards = 16.0 | total_episodes = 2018 | fit/surrogate_loss = -3.168940305709839 | fit/entropy_loss = 1.0989958047866821 |
[INFO] 16:51: [PPOAgent[worker: 0]] | max_global_step = 202747 | episode_rewards = 284.3230790190264 | total_episodes = 2037 | fit/surrogate_loss = 12.86429214477539 | fit/entropy_loss = 1.0760250091552734 |
[INFO] 16:51: [PPOAgent[worker: 0]] | max_global_step = 204687 | episode_rewards = 110.60583113983654 | total_episodes = 2056 | fit/surrogate_loss = -10.055243492126465 | fit/entropy_loss = 1.0619055032730103 |
[INFO] 16:51: [PPOAgent[worker: 0]] | max_global_step = 206614 | episode_rewards = 233.9282157160054 | total_episodes = 2075 | fit/surrogate_loss = -2.6406538486480713 | fit/entropy_loss = 1.0568320751190186 |
[INFO] 16:51: [PPOAgent[worker: 0]] | max_global_step = 208579 | episode_rewards = 278.2330085934469 | total_episodes = 2096 | fit/surrogate_loss = 7.980196475982666 | fit/entropy_loss = 1.0911810398101807 |
[INFO] 16:51: [PPOAgent[worker: 0]] | max_global_step = 210508 | episode_rewards = 260.32258285052814 | total_episodes = 2115 | fit/surrogate_loss = 5.786709308624268 | fit/entropy_loss = 1.0824633836746216 |
[INFO] 16:51: [PPOAgent[worker: 0]] | max_global_step = 212373 | episode_rewards = 218.04632652768822 | total_episodes = 2134 | fit/surrogate_loss = -8.352408409118652 | fit/entropy_loss = 1.0934839248657227 |
[INFO] 16:51: [PPOAgent[worker: 0]] | max_global_step = 214329 | episode_rewards = 271.3465157977871 | total_episodes = 2154 | fit/surrogate_loss = -4.062629222869873 | fit/entropy_loss = 1.0969864130020142 |
[INFO] 16:51: [PPOAgent[worker: 0]] | max_global_step = 216267 | episode_rewards = 221.73785563376867 | total_episodes = 2172 | fit/surrogate_loss = 5.777106761932373 | fit/entropy_loss = 1.024115800857544 |
[INFO] 16:51: [PPOAgent[worker: 0]] | max_global_step = 218162 | episode_rewards = 216.0446248254139 | total_episodes = 2192 | fit/surrogate_loss = -3.4470815658569336 | fit/entropy_loss = 1.118396282196045 |
[INFO] 16:52: [PPOAgent[worker: 0]] | max_global_step = 219938 | episode_rewards = 132.59513162358 | total_episodes = 2210 | fit/surrogate_loss = 3.124840021133423 | fit/entropy_loss = 1.1320301294326782 |
[INFO] 16:52: [PPOAgent[worker: 0]] | max_global_step = 221842 | episode_rewards = 184.9587281019406 | total_episodes = 2229 | fit/surrogate_loss = -1.0659230947494507 | fit/entropy_loss = 1.031003475189209 |
[INFO] 16:52: [PPOAgent[worker: 0]] | max_global_step = 223804 | episode_rewards = 0.0 | total_episodes = 2249 | fit/surrogate_loss = -2.4230310916900635 | fit/entropy_loss = 1.0751807689666748 |
[INFO] 16:52: [PPOAgent[worker: 0]] | max_global_step = 225768 | episode_rewards = 244.20561467772313 | total_episodes = 2269 | fit/surrogate_loss = -3.263524055480957 | fit/entropy_loss = 1.0651490688323975 |
[INFO] 16:52: [PPOAgent[worker: 0]] | max_global_step = 227599 | episode_rewards = 22.675146305186402 | total_episodes = 2287 | fit/surrogate_loss = 8.311609268188477 | fit/entropy_loss = 1.0777026414871216 |
[INFO] 16:52: [PPOAgent[worker: 0]] | max_global_step = 229551 | episode_rewards = 0.0 | total_episodes = 2306 | fit/surrogate_loss = 2.6351635456085205 | fit/entropy_loss = 1.0248957872390747 |
[INFO] 16:52: [PPOAgent[worker: 0]] | max_global_step = 231495 | episode_rewards = 285.339815879762 | total_episodes = 2326 | fit/surrogate_loss = 1.2256782054901123 | fit/entropy_loss = 1.0415252447128296 |
[INFO] 16:52: [PPOAgent[worker: 0]] | max_global_step = 233411 | episode_rewards = 171.14700593599062 | total_episodes = 2346 | fit/surrogate_loss = -4.267335891723633 | fit/entropy_loss = 1.0866059064865112 |
[INFO] 16:52: [PPOAgent[worker: 0]] | max_global_step = 235322 | episode_rewards = 270.56492642385905 | total_episodes = 2366 | fit/surrogate_loss = -3.5790865421295166 | fit/entropy_loss = 1.0592182874679565 |
[INFO] 16:52: [PPOAgent[worker: 0]] | max_global_step = 237167 | episode_rewards = 215.45273470694696 | total_episodes = 2384 | fit/surrogate_loss = -7.527778625488281 | fit/entropy_loss = 1.069936752319336 |
[INFO] 16:52: [PPOAgent[worker: 0]] | max_global_step = 239003 | episode_rewards = -20.0 | total_episodes = 2403 | fit/surrogate_loss = 5.44546365737915 | fit/entropy_loss = 1.1499285697937012 |
[INFO] 16:52: [PPOAgent[worker: 0]] | max_global_step = 240983 | episode_rewards = 269.9111065055036 | total_episodes = 2422 | fit/surrogate_loss = 2.35046124458313 | fit/entropy_loss = 1.0309354066848755 |
[INFO] 16:52: [PPOAgent[worker: 0]] | max_global_step = 242875 | episode_rewards = 318.0889483706464 | total_episodes = 2442 | fit/surrogate_loss = -4.4732279777526855 | fit/entropy_loss = 0.9553772211074829 |
[INFO] 16:52: [PPOAgent[worker: 0]] | max_global_step = 244771 | episode_rewards = 104.58798503468927 | total_episodes = 2461 | fit/surrogate_loss = -1.5414801836013794 | fit/entropy_loss = 1.0260636806488037 |
[INFO] 16:52: [PPOAgent[worker: 0]] | max_global_step = 246707 | episode_rewards = 0.0 | total_episodes = 2480 | fit/surrogate_loss = 3.820281982421875 | fit/entropy_loss = 1.0317150354385376 |
[INFO] 16:52: [PPOAgent[worker: 0]] | max_global_step = 248705 | episode_rewards = 162.6760645376208 | total_episodes = 2500 | fit/surrogate_loss = 0.21251605451107025 | fit/entropy_loss = 1.0387043952941895 |
[INFO] 16:52: [PPOAgent[worker: 0]] | max_global_step = 250589 | episode_rewards = 90.02615020038215 | total_episodes = 2519 | fit/surrogate_loss = 6.289228439331055 | fit/entropy_loss = 1.0504704713821411 |
[INFO] 16:52: [PPOAgent[worker: 0]] | max_global_step = 252467 | episode_rewards = 214.89130183644414 | total_episodes = 2537 | fit/surrogate_loss = -6.966522693634033 | fit/entropy_loss = 1.026532530784607 |
[INFO] 16:52: [PPOAgent[worker: 0]] | max_global_step = 254358 | episode_rewards = 308.76341924834617 | total_episodes = 2557 | fit/surrogate_loss = 12.154333114624023 | fit/entropy_loss = 1.0158194303512573 |
[INFO] 16:52: [PPOAgent[worker: 0]] | max_global_step = 256267 | episode_rewards = 283.83298564334217 | total_episodes = 2576 | fit/surrogate_loss = 2.852245807647705 | fit/entropy_loss = 1.0434391498565674 |
[INFO] 16:53: [PPOAgent[worker: 0]] | max_global_step = 258172 | episode_rewards = 84.42354537626872 | total_episodes = 2597 | fit/surrogate_loss = -9.431618690490723 | fit/entropy_loss = 1.0137786865234375 |
[INFO] 16:53: [PPOAgent[worker: 0]] | max_global_step = 260093 | episode_rewards = 276.9943537693276 | total_episodes = 2618 | fit/surrogate_loss = 0.1441255807876587 | fit/entropy_loss = 0.963430643081665 |
[INFO] 16:53: [PPOAgent[worker: 0]] | max_global_step = 262045 | episode_rewards = 241.20177508175692 | total_episodes = 2641 | fit/surrogate_loss = 0.2748934328556061 | fit/entropy_loss = 0.990574836730957 |
[INFO] 16:53: [PPOAgent[worker: 0]] | max_global_step = 264026 | episode_rewards = 268.78084189097274 | total_episodes = 2663 | fit/surrogate_loss = 3.5319406986236572 | fit/entropy_loss = 1.0552963018417358 |
[INFO] 16:53: [PPOAgent[worker: 0]] | max_global_step = 265932 | episode_rewards = 242.44137571994833 | total_episodes = 2686 | fit/surrogate_loss = -9.579716682434082 | fit/entropy_loss = 0.9850586652755737 |
[INFO] 16:53: [PPOAgent[worker: 0]] | max_global_step = 267875 | episode_rewards = 245.39531372725043 | total_episodes = 2708 | fit/surrogate_loss = 1.6980942487716675 | fit/entropy_loss = 1.0098484754562378 |
[INFO] 16:53: [PPOAgent[worker: 0]] | max_global_step = 269777 | episode_rewards = 205.25879645323425 | total_episodes = 2726 | fit/surrogate_loss = 6.197882175445557 | fit/entropy_loss = 0.9618978500366211 |
[INFO] 16:53: [PPOAgent[worker: 0]] | max_global_step = 271704 | episode_rewards = 283.7851303751837 | total_episodes = 2747 | fit/surrogate_loss = 0.685973048210144 | fit/entropy_loss = 0.9942080974578857 |
[INFO] 16:53: [PPOAgent[worker: 0]] | max_global_step = 273541 | episode_rewards = 291.6241996708527 | total_episodes = 2766 | fit/surrogate_loss = -1.1453349590301514 | fit/entropy_loss = 1.101591944694519 |
[INFO] 16:53: [PPOAgent[worker: 0]] | max_global_step = 275517 | episode_rewards = 282.02527238406714 | total_episodes = 2786 | fit/surrogate_loss = -6.049954414367676 | fit/entropy_loss = 1.0179861783981323 |
[INFO] 16:53: [PPOAgent[worker: 0]] | max_global_step = 277360 | episode_rewards = 16.0 | total_episodes = 2806 | fit/surrogate_loss = 11.253918647766113 | fit/entropy_loss = 1.049414038658142 |
[INFO] 16:53: [PPOAgent[worker: 0]] | max_global_step = 279367 | episode_rewards = 281.1290978810243 | total_episodes = 2825 | fit/surrogate_loss = -2.6344029903411865 | fit/entropy_loss = 1.0347164869308472 |
[INFO] 16:53: [PPOAgent[worker: 0]] | max_global_step = 281167 | episode_rewards = 0.0 | total_episodes = 2844 | fit/surrogate_loss = 1.1064000129699707 | fit/entropy_loss = 0.9371924996376038 |
[INFO] 16:53: [PPOAgent[worker: 0]] | max_global_step = 283179 | episode_rewards = 250.78104080966858 | total_episodes = 2863 | fit/surrogate_loss = 12.849495887756348 | fit/entropy_loss = 1.0081018209457397 |
[INFO] 16:53: [PPOAgent[worker: 0]] | max_global_step = 285144 | episode_rewards = 287.55088389857764 | total_episodes = 2883 | fit/surrogate_loss = -3.789557695388794 | fit/entropy_loss = 0.9457067847251892 |
[INFO] 16:53: [PPOAgent[worker: 0]] | max_global_step = 287026 | episode_rewards = 254.9618615560583 | total_episodes = 2901 | fit/surrogate_loss = 0.6840089559555054 | fit/entropy_loss = 0.9657329320907593 |
[INFO] 16:53: [PPOAgent[worker: 0]] | max_global_step = 288904 | episode_rewards = 239.0461878378266 | total_episodes = 2919 | fit/surrogate_loss = -0.8061223030090332 | fit/entropy_loss = 1.0255134105682373 |
[INFO] 16:53: [PPOAgent[worker: 0]] | max_global_step = 290812 | episode_rewards = 232.09967697204786 | total_episodes = 2938 | fit/surrogate_loss = -1.9753848314285278 | fit/entropy_loss = 0.9870120286941528 |
[INFO] 16:53: [PPOAgent[worker: 0]] | max_global_step = 292779 | episode_rewards = 293.1587391022088 | total_episodes = 2958 | fit/surrogate_loss = -3.4968020915985107 | fit/entropy_loss = 1.0315067768096924 |
[INFO] 16:54: [PPOAgent[worker: 0]] | max_global_step = 294745 | episode_rewards = 186.26363476045046 | total_episodes = 2977 | fit/surrogate_loss = 7.592077255249023 | fit/entropy_loss = 1.0064972639083862 |
[INFO] 16:54: [PPOAgent[worker: 0]] | max_global_step = 296645 | episode_rewards = 320.222622397055 | total_episodes = 2995 | fit/surrogate_loss = -1.5015491247177124 | fit/entropy_loss = 0.944429337978363 |
[INFO] 16:54: [PPOAgent[worker: 0]] | max_global_step = 298664 | episode_rewards = 187.139482215965 | total_episodes = 3015 | fit/surrogate_loss = 3.7631969451904297 | fit/entropy_loss = 0.9813820123672485 |
[INFO] 16:54: [PPOAgent[worker: 0]] | max_global_step = 300548 | episode_rewards = 328.55860880531134 | total_episodes = 3033 | fit/surrogate_loss = 1.5286048650741577 | fit/entropy_loss = 0.8878442049026489 |
[INFO] 16:54: [PPOAgent[worker: 0]] | max_global_step = 302499 | episode_rewards = 190.13890473281583 | total_episodes = 3052 | fit/surrogate_loss = -2.894129991531372 | fit/entropy_loss = 0.9722768068313599 |
[INFO] 16:54: [PPOAgent[worker: 0]] | max_global_step = 304394 | episode_rewards = 263.70690042441134 | total_episodes = 3071 | fit/surrogate_loss = 2.445435047149658 | fit/entropy_loss = 0.9777105450630188 |
[INFO] 16:54: [PPOAgent[worker: 0]] | max_global_step = 306318 | episode_rewards = -36.0 | total_episodes = 3089 | fit/surrogate_loss = 3.277486801147461 | fit/entropy_loss = 0.9544088244438171 |
[INFO] 16:54: [PPOAgent[worker: 0]] | max_global_step = 308290 | episode_rewards = 140.11722203309503 | total_episodes = 3108 | fit/surrogate_loss = -0.8213484287261963 | fit/entropy_loss = 0.925036609172821 |
[INFO] 16:54: [PPOAgent[worker: 0]] | max_global_step = 310244 | episode_rewards = 291.42358804221857 | total_episodes = 3128 | fit/surrogate_loss = -0.5326124429702759 | fit/entropy_loss = 0.9467383027076721 |
[INFO] 16:54: [PPOAgent[worker: 0]] | max_global_step = 312007 | episode_rewards = -2.0 | total_episodes = 3147 | fit/surrogate_loss = 1.535997748374939 | fit/entropy_loss = 0.8873388171195984 |
[INFO] 16:54: [PPOAgent[worker: 0]] | max_global_step = 313877 | episode_rewards = 296.1952368101264 | total_episodes = 3167 | fit/surrogate_loss = -0.6225532293319702 | fit/entropy_loss = 0.8994657397270203 |
[INFO] 16:54: [PPOAgent[worker: 0]] | max_global_step = 315845 | episode_rewards = 213.33218784048097 | total_episodes = 3186 | fit/surrogate_loss = 3.5437350273132324 | fit/entropy_loss = 0.8863654732704163 |
[INFO] 16:54: [PPOAgent[worker: 0]] | max_global_step = 317720 | episode_rewards = 281.0779023134249 | total_episodes = 3205 | fit/surrogate_loss = -0.6437132954597473 | fit/entropy_loss = 0.8195433020591736 |
[INFO] 16:54: [PPOAgent[worker: 0]] | max_global_step = 319621 | episode_rewards = 80.51307731884518 | total_episodes = 3223 | fit/surrogate_loss = 7.503363132476807 | fit/entropy_loss = 0.8818414807319641 |
[INFO] 16:54: [PPOAgent[worker: 0]] | max_global_step = 321499 | episode_rewards = 263.8596860259897 | total_episodes = 3243 | fit/surrogate_loss = -13.773136138916016 | fit/entropy_loss = 0.8752599358558655 |
[INFO] 16:54: [PPOAgent[worker: 0]] | max_global_step = 323401 | episode_rewards = 322.76747419685483 | total_episodes = 3262 | fit/surrogate_loss = 10.615867614746094 | fit/entropy_loss = 0.883058488368988 |
[INFO] 16:54: [PPOAgent[worker: 0]] | max_global_step = 325272 | episode_rewards = 299.6771221698585 | total_episodes = 3280 | fit/surrogate_loss = 8.855615615844727 | fit/entropy_loss = 0.8485798835754395 |
[INFO] 16:54: [PPOAgent[worker: 0]] | max_global_step = 327263 | episode_rewards = 277.88398165026155 | total_episodes = 3300 | fit/surrogate_loss = -7.257081985473633 | fit/entropy_loss = 0.8347086310386658 |
[INFO] 16:54: [PPOAgent[worker: 0]] | max_global_step = 329167 | episode_rewards = 286.43851646392847 | total_episodes = 3320 | fit/surrogate_loss = 3.605395793914795 | fit/entropy_loss = 0.8551650643348694 |
[INFO] 16:54: [PPOAgent[worker: 0]] | max_global_step = 331089 | episode_rewards = 132.27738500552374 | total_episodes = 3339 | fit/surrogate_loss = 8.149479866027832 | fit/entropy_loss = 0.8589387536048889 |
[INFO] 16:55: [PPOAgent[worker: 0]] | max_global_step = 333046 | episode_rewards = 294.103572316817 | total_episodes = 3358 | fit/surrogate_loss = -14.957820892333984 | fit/entropy_loss = 0.9307618737220764 |
[INFO] 16:55: [PPOAgent[worker: 0]] | max_global_step = 334946 | episode_rewards = 359.3958632660398 | total_episodes = 3378 | fit/surrogate_loss = 12.715492248535156 | fit/entropy_loss = 0.9235902428627014 |
[INFO] 16:55: [PPOAgent[worker: 0]] | max_global_step = 336786 | episode_rewards = 312.18158983836196 | total_episodes = 3397 | fit/surrogate_loss = 3.759201765060425 | fit/entropy_loss = 0.9244174361228943 |
[INFO] 16:55: [PPOAgent[worker: 0]] | max_global_step = 338763 | episode_rewards = 0.0 | total_episodes = 3417 | fit/surrogate_loss = -3.8107359409332275 | fit/entropy_loss = 1.0092012882232666 |
[INFO] 16:55: [PPOAgent[worker: 0]] | max_global_step = 340757 | episode_rewards = 297.4810194857149 | total_episodes = 3439 | fit/surrogate_loss = -3.7150909900665283 | fit/entropy_loss = 0.9435456395149231 |
[INFO] 16:55: [PPOAgent[worker: 0]] | max_global_step = 342703 | episode_rewards = 325.8883162072191 | total_episodes = 3458 | fit/surrogate_loss = 8.093222618103027 | fit/entropy_loss = 0.9791096448898315 |
[INFO] 16:55: [PPOAgent[worker: 0]] | max_global_step = 344588 | episode_rewards = 325.63559877100266 | total_episodes = 3477 | fit/surrogate_loss = 2.5148143768310547 | fit/entropy_loss = 0.9678980708122253 |
[INFO] 16:55: [PPOAgent[worker: 0]] | max_global_step = 346553 | episode_rewards = 264.9606710241308 | total_episodes = 3500 | fit/surrogate_loss = -6.9905500411987305 | fit/entropy_loss = 0.9897512793540955 |
[INFO] 16:55: [PPOAgent[worker: 0]] | max_global_step = 348515 | episode_rewards = 0.0 | total_episodes = 3521 | fit/surrogate_loss = 3.7423617839813232 | fit/entropy_loss = 1.0184897184371948 |
[INFO] 16:55: [PPOAgent[worker: 0]] | max_global_step = 350317 | episode_rewards = 98.06277900045299 | total_episodes = 3541 | fit/surrogate_loss = 1.9921005964279175 | fit/entropy_loss = 1.0237897634506226 |
[INFO] 16:55: [PPOAgent[worker: 0]] | max_global_step = 352120 | episode_rewards = 151.4683927095074 | total_episodes = 3559 | fit/surrogate_loss = 1.9690021276474 | fit/entropy_loss = 1.0069987773895264 |
[INFO] 16:55: [PPOAgent[worker: 0]] | max_global_step = 354120 | episode_rewards = 202.49230749307205 | total_episodes = 3579 | fit/surrogate_loss = -0.41003695130348206 | fit/entropy_loss = 0.9336974024772644 |
[INFO] 16:55: [PPOAgent[worker: 0]] | max_global_step = 356081 | episode_rewards = 301.713127133677 | total_episodes = 3599 | fit/surrogate_loss = 0.08953665941953659 | fit/entropy_loss = 0.9974294304847717 |
[INFO] 16:55: [PPOAgent[worker: 0]] | max_global_step = 358059 | episode_rewards = 0.0 | total_episodes = 3619 | fit/surrogate_loss = 1.450573444366455 | fit/entropy_loss = 1.018861174583435 |
[INFO] 16:55: [PPOAgent[worker: 0]] | max_global_step = 359913 | episode_rewards = 0.0 | total_episodes = 3638 | fit/surrogate_loss = -4.925795078277588 | fit/entropy_loss = 0.966366171836853 |
[INFO] 16:55: [PPOAgent[worker: 0]] | max_global_step = 361833 | episode_rewards = 32.18952697529554 | total_episodes = 3660 | fit/surrogate_loss = -0.42690742015838623 | fit/entropy_loss = 0.9651669859886169 |
[INFO] 16:55: [PPOAgent[worker: 0]] | max_global_step = 363767 | episode_rewards = 226.74993878617212 | total_episodes = 3679 | fit/surrogate_loss = -3.410787582397461 | fit/entropy_loss = 1.0196833610534668 |
[INFO] 16:55: [PPOAgent[worker: 0]] | max_global_step = 365648 | episode_rewards = 0.0 | total_episodes = 3699 | fit/surrogate_loss = -1.4755240678787231 | fit/entropy_loss = 1.0036985874176025 |
[INFO] 16:55: [PPOAgent[worker: 0]] | max_global_step = 367506 | episode_rewards = 241.05188779928602 | total_episodes = 3720 | fit/surrogate_loss = 3.930341958999634 | fit/entropy_loss = 1.0016478300094604 |
[INFO] 16:56: [PPOAgent[worker: 0]] | max_global_step = 369452 | episode_rewards = 284.35806858077245 | total_episodes = 3740 | fit/surrogate_loss = 10.060277938842773 | fit/entropy_loss = 0.9490146636962891 |
[INFO] 16:56: [PPOAgent[worker: 0]] | max_global_step = 371410 | episode_rewards = 246.99344154863303 | total_episodes = 3762 | fit/surrogate_loss = -1.8628233671188354 | fit/entropy_loss = 0.9763739705085754 |
[INFO] 16:56: [PPOAgent[worker: 0]] | max_global_step = 373356 | episode_rewards = 242.28637067749025 | total_episodes = 3781 | fit/surrogate_loss = -1.496168613433838 | fit/entropy_loss = 1.0062780380249023 |
[INFO] 16:56: [PPOAgent[worker: 0]] | max_global_step = 375157 | episode_rewards = 292.2258261309967 | total_episodes = 3800 | fit/surrogate_loss = 4.843698978424072 | fit/entropy_loss = 1.0262356996536255 |
[INFO] 16:56: [PPOAgent[worker: 0]] | max_global_step = 377121 | episode_rewards = 14.0 | total_episodes = 3821 | fit/surrogate_loss = -5.91718864440918 | fit/entropy_loss = 1.0802301168441772 |
[INFO] 16:56: [PPOAgent[worker: 0]] | max_global_step = 379066 | episode_rewards = 218.7146904838953 | total_episodes = 3840 | fit/surrogate_loss = -3.436647415161133 | fit/entropy_loss = 1.0132417678833008 |
[INFO] 16:56: [PPOAgent[worker: 0]] | max_global_step = 380992 | episode_rewards = 234.36918883589595 | total_episodes = 3860 | fit/surrogate_loss = 4.775086402893066 | fit/entropy_loss = 0.9952309131622314 |
[INFO] 16:56: [PPOAgent[worker: 0]] | max_global_step = 382892 | episode_rewards = 284.07871555046415 | total_episodes = 3879 | fit/surrogate_loss = -5.776726245880127 | fit/entropy_loss = 1.0385886430740356 |
[INFO] 16:56: [PPOAgent[worker: 0]] | max_global_step = 384869 | episode_rewards = 324.0644475643733 | total_episodes = 3899 | fit/surrogate_loss = 8.264321327209473 | fit/entropy_loss = 1.0189669132232666 |
[INFO] 16:56: [PPOAgent[worker: 0]] | max_global_step = 386712 | episode_rewards = 15.0 | total_episodes = 3919 | fit/surrogate_loss = 5.558108329772949 | fit/entropy_loss = 0.9657159447669983 |
[INFO] 16:56: [PPOAgent[worker: 0]] | max_global_step = 388612 | episode_rewards = 148.0068600827308 | total_episodes = 3938 | fit/surrogate_loss = -2.7608399391174316 | fit/entropy_loss = 0.9457671046257019 |
[INFO] 16:56: [PPOAgent[worker: 0]] | max_global_step = 390497 | episode_rewards = 292.24682707125083 | total_episodes = 3958 | fit/surrogate_loss = 4.495976448059082 | fit/entropy_loss = 0.9412776827812195 |
[INFO] 16:56: [PPOAgent[worker: 0]] | max_global_step = 392297 | episode_rewards = -12.0 | total_episodes = 3976 | fit/surrogate_loss = -6.394156455993652 | fit/entropy_loss = 0.8748922944068909 |
[INFO] 16:56: [PPOAgent[worker: 0]] | max_global_step = 394354 | episode_rewards = 244.19843768782306 | total_episodes = 3994 | fit/surrogate_loss = 1.8958444595336914 | fit/entropy_loss = 0.9547006487846375 |
[INFO] 16:56: [PPOAgent[worker: 0]] | max_global_step = 396337 | episode_rewards = 252.92778983197093 | total_episodes = 4013 | fit/surrogate_loss = 6.616912364959717 | fit/entropy_loss = 0.9458012580871582 |
[INFO] 16:56: [PPOAgent[worker: 0]] | max_global_step = 398203 | episode_rewards = 286.96804348546357 | total_episodes = 4032 | fit/surrogate_loss = -8.990714073181152 | fit/entropy_loss = 0.9437640309333801 |
[INFO] 16:56: [PPOAgent[worker: 0]] | max_global_step = 400107 | episode_rewards = 0.0 | total_episodes = 4050 | fit/surrogate_loss = 1.1266694068908691 | fit/entropy_loss = 0.9508050084114075 |
[INFO] 16:56: [PPOAgent[worker: 0]] | max_global_step = 402072 | episode_rewards = 0.0 | total_episodes = 4069 | fit/surrogate_loss = 4.978913307189941 | fit/entropy_loss = 0.941914439201355 |
[INFO] 16:56: [PPOAgent[worker: 0]] | max_global_step = 403943 | episode_rewards = 150.6610338132491 | total_episodes = 4087 | fit/surrogate_loss = -7.1340250968933105 | fit/entropy_loss = 1.004734992980957 |
[INFO] 16:56: [PPOAgent[worker: 0]] | max_global_step = 405719 | episode_rewards = 286.1085754713148 | total_episodes = 4105 | fit/surrogate_loss = -3.3456854820251465 | fit/entropy_loss = 0.9755523204803467 |
[INFO] 16:57: [PPOAgent[worker: 0]] | max_global_step = 407602 | episode_rewards = 0.0 | total_episodes = 4124 | fit/surrogate_loss = 2.1803932189941406 | fit/entropy_loss = 0.949565589427948 |
[INFO] 16:57: [PPOAgent[worker: 0]] | max_global_step = 409551 | episode_rewards = 156.19653705120353 | total_episodes = 4142 | fit/surrogate_loss = 1.3640680313110352 | fit/entropy_loss = 0.9335667490959167 |
[INFO] 16:57: [PPOAgent[worker: 0]] | max_global_step = 411380 | episode_rewards = 350.7678222447295 | total_episodes = 4160 | fit/surrogate_loss = 4.900801181793213 | fit/entropy_loss = 0.9382438659667969 |
[INFO] 16:57: [PPOAgent[worker: 0]] | max_global_step = 413195 | episode_rewards = 285.3994499195645 | total_episodes = 4177 | fit/surrogate_loss = -2.779069662094116 | fit/entropy_loss = 0.9222322702407837 |
[INFO] 16:57: [PPOAgent[worker: 0]] | max_global_step = 415163 | episode_rewards = 288.03400255951937 | total_episodes = 4197 | fit/surrogate_loss = 2.873145818710327 | fit/entropy_loss = 0.9784928560256958 |
[INFO] 16:57: [PPOAgent[worker: 0]] | max_global_step = 417081 | episode_rewards = 271.8003946442427 | total_episodes = 4216 | fit/surrogate_loss = 0.8303982019424438 | fit/entropy_loss = 0.9632890820503235 |
[INFO] 16:57: [PPOAgent[worker: 0]] | max_global_step = 419070 | episode_rewards = 274.3610669586073 | total_episodes = 4235 | fit/surrogate_loss = -0.3378711938858032 | fit/entropy_loss = 0.9332031011581421 |
[INFO] 16:57: [PPOAgent[worker: 0]] | max_global_step = 420935 | episode_rewards = 285.698652336808 | total_episodes = 4255 | fit/surrogate_loss = -1.6555614471435547 | fit/entropy_loss = 0.928404688835144 |
[INFO] 16:57: [PPOAgent[worker: 0]] | max_global_step = 422869 | episode_rewards = 321.4248764564302 | total_episodes = 4276 | fit/surrogate_loss = 2.801839828491211 | fit/entropy_loss = 0.9713084697723389 |
[INFO] 16:57: [PPOAgent[worker: 0]] | max_global_step = 424801 | episode_rewards = 347.5848249806207 | total_episodes = 4298 | fit/surrogate_loss = -5.931520462036133 | fit/entropy_loss = 0.9765328764915466 |
[INFO] 16:57: [PPOAgent[worker: 0]] | max_global_step = 426753 | episode_rewards = 235.22405365927392 | total_episodes = 4319 | fit/surrogate_loss = 4.456506252288818 | fit/entropy_loss = 0.9539737105369568 |
[INFO] 16:57: [PPOAgent[worker: 0]] | max_global_step = 428589 | episode_rewards = 293.8601487388092 | total_episodes = 4339 | fit/surrogate_loss = -1.527997374534607 | fit/entropy_loss = 1.0273936986923218 |
[INFO] 16:57: [PPOAgent[worker: 0]] | max_global_step = 430491 | episode_rewards = 238.40763901062212 | total_episodes = 4360 | fit/surrogate_loss = 5.263881683349609 | fit/entropy_loss = 0.9276028275489807 |
[INFO] 16:57: [PPOAgent[worker: 0]] | max_global_step = 432489 | episode_rewards = 298.29358894142774 | total_episodes = 4380 | fit/surrogate_loss = 1.954353928565979 | fit/entropy_loss = 1.0460476875305176 |
[INFO] 16:57: [PPOAgent[worker: 0]] | max_global_step = 434335 | episode_rewards = 249.3733569086184 | total_episodes = 4400 | fit/surrogate_loss = 1.5861694812774658 | fit/entropy_loss = 0.9824934601783752 |
[INFO] 16:57: [PPOAgent[worker: 0]] | max_global_step = 436158 | episode_rewards = 238.96283738341737 | total_episodes = 4417 | fit/surrogate_loss = -0.17691625654697418 | fit/entropy_loss = 0.9926157593727112 |
[INFO] 16:57: [PPOAgent[worker: 0]] | max_global_step = 438063 | episode_rewards = 260.36137471838583 | total_episodes = 4435 | fit/surrogate_loss = 3.9432201385498047 | fit/entropy_loss = 0.9440750479698181 |
[INFO] 16:57: [PPOAgent[worker: 0]] | max_global_step = 440029 | episode_rewards = 281.4619657222258 | total_episodes = 4455 | fit/surrogate_loss = -11.234806060791016 | fit/entropy_loss = 0.9728662967681885 |
[INFO] 16:57: [PPOAgent[worker: 0]] | max_global_step = 441900 | episode_rewards = 303.9872611084485 | total_episodes = 4473 | fit/surrogate_loss = 10.437104225158691 | fit/entropy_loss = 0.9096306562423706 |
[INFO] 16:58: [PPOAgent[worker: 0]] | max_global_step = 443753 | episode_rewards = 315.03343422340197 | total_episodes = 4491 | fit/surrogate_loss = -4.030556678771973 | fit/entropy_loss = 0.9464774131774902 |
[INFO] 16:58: [PPOAgent[worker: 0]] | max_global_step = 445673 | episode_rewards = 0.0 | total_episodes = 4511 | fit/surrogate_loss = -6.509860515594482 | fit/entropy_loss = 0.9715564250946045 |
[INFO] 16:58: [PPOAgent[worker: 0]] | max_global_step = 447532 | episode_rewards = 291.69420461798654 | total_episodes = 4530 | fit/surrogate_loss = 4.072037696838379 | fit/entropy_loss = 0.9845868945121765 |
[INFO] 16:58: [PPOAgent[worker: 0]] | max_global_step = 449527 | episode_rewards = 218.99857107201902 | total_episodes = 4549 | fit/surrogate_loss = -5.0280632972717285 | fit/entropy_loss = 0.8980158567428589 |
[INFO] 16:58: [PPOAgent[worker: 0]] | max_global_step = 451450 | episode_rewards = 323.45601542226564 | total_episodes = 4569 | fit/surrogate_loss = 12.949789047241211 | fit/entropy_loss = 0.9183349609375 |
[INFO] 16:58: [PPOAgent[worker: 0]] | max_global_step = 453436 | episode_rewards = 331.10574235011643 | total_episodes = 4588 | fit/surrogate_loss = -4.701037406921387 | fit/entropy_loss = 0.8513623476028442 |
[INFO] 16:58: [PPOAgent[worker: 0]] | max_global_step = 455331 | episode_rewards = 298.16808547838895 | total_episodes = 4607 | fit/surrogate_loss = 3.3393399715423584 | fit/entropy_loss = 0.9371728301048279 |
[INFO] 16:58: [PPOAgent[worker: 0]] | max_global_step = 457258 | episode_rewards = 305.20445123022546 | total_episodes = 4625 | fit/surrogate_loss = -1.4828935861587524 | fit/entropy_loss = 0.9189344644546509 |
[INFO] 16:58: [PPOAgent[worker: 0]] | max_global_step = 459204 | episode_rewards = 296.97104745527434 | total_episodes = 4646 | fit/surrogate_loss = -6.455825328826904 | fit/entropy_loss = 0.8192564249038696 |
[INFO] 16:58: [PPOAgent[worker: 0]] | max_global_step = 461093 | episode_rewards = 326.2481563297774 | total_episodes = 4665 | fit/surrogate_loss = 8.854265213012695 | fit/entropy_loss = 0.8241569399833679 |
[INFO] 16:58: [PPOAgent[worker: 0]] | max_global_step = 462970 | episode_rewards = 310.69696450029136 | total_episodes = 4683 | fit/surrogate_loss = -1.0951286554336548 | fit/entropy_loss = 0.9279043674468994 |
[INFO] 16:58: [PPOAgent[worker: 0]] | max_global_step = 464888 | episode_rewards = 0.0 | total_episodes = 4702 | fit/surrogate_loss = 7.324929714202881 | fit/entropy_loss = 0.9040438532829285 |
[INFO] 16:58: [PPOAgent[worker: 0]] | max_global_step = 466803 | episode_rewards = 284.0750483620409 | total_episodes = 4720 | fit/surrogate_loss = -10.986502647399902 | fit/entropy_loss = 0.8823921084403992 |
[INFO] 16:58: [PPOAgent[worker: 0]] | max_global_step = 468636 | episode_rewards = -24.0 | total_episodes = 4737 | fit/surrogate_loss = -9.095108985900879 | fit/entropy_loss = 0.9000943303108215 |
[INFO] 16:58: [PPOAgent[worker: 0]] | max_global_step = 470528 | episode_rewards = 0.0 | total_episodes = 4756 | fit/surrogate_loss = 0.49896976351737976 | fit/entropy_loss = 0.9501118063926697 |
[INFO] 16:58: [PPOAgent[worker: 0]] | max_global_step = 472417 | episode_rewards = -36.0 | total_episodes = 4774 | fit/surrogate_loss = 11.732644081115723 | fit/entropy_loss = 0.9509857892990112 |
[INFO] 16:58: [PPOAgent[worker: 0]] | max_global_step = 474413 | episode_rewards = 157.45800940040132 | total_episodes = 4795 | fit/surrogate_loss = -11.044668197631836 | fit/entropy_loss = 0.9405302405357361 |
[INFO] 16:58: [PPOAgent[worker: 0]] | max_global_step = 476233 | episode_rewards = 304.95351220285875 | total_episodes = 4815 | fit/surrogate_loss = 3.230506181716919 | fit/entropy_loss = 0.8895562887191772 |
[INFO] 16:58: [PPOAgent[worker: 0]] | max_global_step = 478088 | episode_rewards = 273.65990527820844 | total_episodes = 4836 | fit/surrogate_loss = -2.057007074356079 | fit/entropy_loss = 0.8450247049331665 |
[INFO] 16:59: [PPOAgent[worker: 0]] | max_global_step = 480096 | episode_rewards = 225.393869417009 | total_episodes = 4854 | fit/surrogate_loss = -6.749125003814697 | fit/entropy_loss = 0.7920497059822083 |
[INFO] 16:59: [PPOAgent[worker: 0]] | max_global_step = 482010 | episode_rewards = 284.2181490104804 | total_episodes = 4874 | fit/surrogate_loss = 11.958834648132324 | fit/entropy_loss = 0.7524480819702148 |
[INFO] 16:59: [PPOAgent[worker: 0]] | max_global_step = 483903 | episode_rewards = 118.52480329809106 | total_episodes = 4892 | fit/surrogate_loss = 1.5171358585357666 | fit/entropy_loss = 0.8042746186256409 |
[INFO] 16:59: [PPOAgent[worker: 0]] | max_global_step = 485814 | episode_rewards = 287.68503833467213 | total_episodes = 4910 | fit/surrogate_loss = 0.09560095518827438 | fit/entropy_loss = 0.7662050724029541 |
[INFO] 16:59: [PPOAgent[worker: 0]] | max_global_step = 487756 | episode_rewards = 294.12818602515716 | total_episodes = 4928 | fit/surrogate_loss = -5.227457523345947 | fit/entropy_loss = 0.8396679759025574 |
[INFO] 16:59: [PPOAgent[worker: 0]] | max_global_step = 489715 | episode_rewards = 212.56486168613156 | total_episodes = 4948 | fit/surrogate_loss = 1.5038007497787476 | fit/entropy_loss = 0.9010404348373413 |
[INFO] 16:59: [PPOAgent[worker: 0]] | max_global_step = 491574 | episode_rewards = 248.34553919585076 | total_episodes = 4967 | fit/surrogate_loss = 3.363309383392334 | fit/entropy_loss = 0.8638811707496643 |
[INFO] 16:59: [PPOAgent[worker: 0]] | max_global_step = 493465 | episode_rewards = 0.0 | total_episodes = 4986 | fit/surrogate_loss = 8.634424209594727 | fit/entropy_loss = 0.8884937167167664 |
[INFO] 16:59: [PPOAgent[worker: 0]] | max_global_step = 495356 | episode_rewards = 151.93969389847663 | total_episodes = 5005 | fit/surrogate_loss = -1.5574429035186768 | fit/entropy_loss = 0.8366995453834534 |
[INFO] 16:59: [PPOAgent[worker: 0]] | max_global_step = 497239 | episode_rewards = 302.49554099930833 | total_episodes = 5024 | fit/surrogate_loss = 1.633874535560608 | fit/entropy_loss = 0.8804383277893066 |
[INFO] 16:59: [PPOAgent[worker: 0]] | max_global_step = 499103 | episode_rewards = 167.51614314661649 | total_episodes = 5044 | fit/surrogate_loss = 1.513819694519043 | fit/entropy_loss = 0.8617693185806274 |
[INFO] 16:59: ... trained!
/home/frost/.local/lib/python3.10/site-packages/gym/spaces/box.py:127: UserWarning: WARN: Box bound precision lowered by casting to float32
logger.warn(f"Box bound precision lowered by casting to {self.dtype}")
/home/frost/.local/lib/python3.10/site-packages/gym/spaces/box.py:127: UserWarning: WARN: Box bound precision lowered by casting to float32
logger.warn(f"Box bound precision lowered by casting to {self.dtype}")
Training reward curve¶
[5]:
data = plot_writer_data(manager, tag="episode_rewards", smooth_weight=0.8) # smoothing tensorboard-style
Evaluation of the trained agent¶
[6]:
evaluation = evaluate_agents([manager], n_simulations=128, plot=False)
evaluation.describe()
[INFO] 17:00: Evaluating PPOAgent...
[INFO] Evaluation:/home/frost/.local/lib/python3.10/site-packages/gym/utils/passive_env_checker.py:174: UserWarning: WARN: Future gym versions will require that `Env.reset` can be passed a `seed` instead of using `Env.seed` for resetting the environment random number generator.
logger.warn(
/home/frost/.local/lib/python3.10/site-packages/gym/utils/passive_env_checker.py:187: UserWarning: WARN: Future gym versions will require that `Env.reset` can be passed `options` to allow the environment initialisation to be passed additional information.
logger.warn(
/home/frost/.local/lib/python3.10/site-packages/gym/utils/passive_env_checker.py:195: UserWarning: WARN: The result returned by `env.reset()` was not a tuple of the form `(obs, info)`, where `obs` is a observation and `info` is a dictionary containing additional information. Actual type: `<class 'numpy.ndarray'>`
logger.warn(
/home/frost/.local/lib/python3.10/site-packages/gym/utils/passive_env_checker.py:219: DeprecationWarning: WARN: Core environment is written in old step API which returns one bool instead of two. It is recommended to rewrite the environment with new step API.
logger.deprecation(
/home/frost/.local/lib/python3.10/site-packages/gym/utils/passive_env_checker.py:141: UserWarning: WARN: The obs returned by the `step()` method was expecting numpy array dtype to be float32, actual type: float64
logger.warn(
/home/frost/.local/lib/python3.10/site-packages/gym/utils/passive_env_checker.py:165: UserWarning: WARN: The obs returned by the `step()` method is not within the observation space.
logger.warn(f"{pre} is not within the observation space.")
................................................................................................................................ Evaluation finished
[6]:
| PPOAgent | |
|---|---|
| count | 128.000000 |
| mean | 185.779274 |
| std | 36.115669 |
| min | 82.613569 |
| 25% | 158.785278 |
| 50% | 184.637599 |
| 75% | 218.280214 |
| max | 258.224123 |
Small peek into the agents policy¶
[7]:
agent = manager.agent_handlers[0] # select the agent from the manager
[9]:
env = env_ctor(**env_kwargs)
obs = env.reset()
actions_txt = ["doing nothing",
"1L of water",
"5L of water",
"harvesting",
"sow some seeds",
"scatter fertilizer",
"scatter herbicide",
"scatter pesticide",
"remove weeds by hand",]
episode = pd.DataFrame()
for day in range(365):
action = agent.policy(obs)
print("Day: {}, Mean temp: {}, stage: {}, weight of fruit: {}".format(obs[0], np.round(obs[1],3),
int(obs[7]), obs[15]))
obs,reward, is_done,_ = env.step(action)
print("Action is", actions_txt[action])
episode = pd.concat([episode, pd.DataFrame({'action':[actions_txt[action]],
'reward':[reward]})], ignore_index=True)
print('')
if is_done:
print('Plant is Dead')
break
/home/frost/.local/lib/python3.10/site-packages/gym/spaces/box.py:127: UserWarning: WARN: Box bound precision lowered by casting to float32
logger.warn(f"Box bound precision lowered by casting to {self.dtype}")
/home/frost/.local/lib/python3.10/site-packages/gym/utils/passive_env_checker.py:174: UserWarning: WARN: Future gym versions will require that `Env.reset` can be passed a `seed` instead of using `Env.seed` for resetting the environment random number generator.
logger.warn(
/home/frost/.local/lib/python3.10/site-packages/gym/utils/passive_env_checker.py:187: UserWarning: WARN: Future gym versions will require that `Env.reset` can be passed `options` to allow the environment initialisation to be passed additional information.
logger.warn(
/home/frost/.local/lib/python3.10/site-packages/gym/utils/passive_env_checker.py:195: UserWarning: WARN: The result returned by `env.reset()` was not a tuple of the form `(obs, info)`, where `obs` is a observation and `info` is a dictionary containing additional information. Actual type: `<class 'numpy.ndarray'>`
logger.warn(
/home/frost/.local/lib/python3.10/site-packages/gym/utils/passive_env_checker.py:219: DeprecationWarning: WARN: Core environment is written in old step API which returns one bool instead of two. It is recommended to rewrite the environment with new step API.
logger.deprecation(
/home/frost/.local/lib/python3.10/site-packages/gym/utils/passive_env_checker.py:141: UserWarning: WARN: The obs returned by the `step()` method was expecting numpy array dtype to be float32, actual type: float64
logger.warn(
/home/frost/.local/lib/python3.10/site-packages/gym/utils/passive_env_checker.py:165: UserWarning: WARN: The obs returned by the `step()` method is not within the observation space.
logger.warn(f"{pre} is not within the observation space.")
Day: 1.0, Mean temp: 4.66, stage: 0, weight of fruit: 0.0
Action is 1L of water
Day: 2.0, Mean temp: 8.107, stage: 0, weight of fruit: 0.0
Action is sow some seeds
Day: 3.0, Mean temp: 4.794, stage: 1, weight of fruit: 0.0
Action is sow some seeds
Day: 4.0, Mean temp: 6.229, stage: 1, weight of fruit: 0.0
Action is sow some seeds
Day: 5.0, Mean temp: 5.328, stage: 1, weight of fruit: 0.0
Action is sow some seeds
Day: 6.0, Mean temp: 6.831, stage: 1, weight of fruit: 0.0
Action is sow some seeds
Day: 7.0, Mean temp: 11.42, stage: 2, weight of fruit: 0.0
Action is sow some seeds
Day: 8.0, Mean temp: 11.808, stage: 3, weight of fruit: 0.0
Action is sow some seeds
Day: 9.0, Mean temp: 7.065, stage: 3, weight of fruit: 0.0
Action is sow some seeds
Day: 10.0, Mean temp: 5.824, stage: 3, weight of fruit: 0.0
Action is 1L of water
Day: 11.0, Mean temp: 7.3, stage: 3, weight of fruit: 0.0
Action is sow some seeds
Day: 12.0, Mean temp: 7.8, stage: 3, weight of fruit: 0.0
Action is 1L of water
Day: 13.0, Mean temp: 10.459, stage: 3, weight of fruit: 0.0
Action is 1L of water
Day: 14.0, Mean temp: 8.196, stage: 3, weight of fruit: 0.0
Action is 1L of water
Day: 15.0, Mean temp: 6.916, stage: 3, weight of fruit: 0.0
Action is 1L of water
Day: 16.0, Mean temp: 8.3, stage: 3, weight of fruit: 0.0
Action is 1L of water
Day: 17.0, Mean temp: 3.395, stage: 3, weight of fruit: 0.0
Action is 1L of water
Day: 18.0, Mean temp: 3.507, stage: 3, weight of fruit: 0.0
Action is 1L of water
Day: 19.0, Mean temp: 1.572, stage: 3, weight of fruit: 0.0
Action is 1L of water
Day: 20.0, Mean temp: -0.618, stage: 3, weight of fruit: 0.0
Action is remove weeds by hand
Day: 21.0, Mean temp: 1.371, stage: 3, weight of fruit: 0.0
Action is remove weeds by hand
Day: 22.0, Mean temp: 2.858, stage: 3, weight of fruit: 0.0
Action is remove weeds by hand
Day: 23.0, Mean temp: 0.966, stage: 3, weight of fruit: 0.0
Action is 1L of water
Day: 24.0, Mean temp: 1.87, stage: 3, weight of fruit: 0.0
Action is 1L of water
Day: 25.0, Mean temp: 5.66, stage: 3, weight of fruit: 0.0
Action is 1L of water
Day: 26.0, Mean temp: 7.941, stage: 3, weight of fruit: 0.0
Action is remove weeds by hand
Day: 27.0, Mean temp: 5.144, stage: 3, weight of fruit: 0.0
Action is remove weeds by hand
Day: 28.0, Mean temp: 5.272, stage: 3, weight of fruit: 0.0
Action is remove weeds by hand
Day: 29.0, Mean temp: 7.686, stage: 3, weight of fruit: 0.0
Action is remove weeds by hand
Day: 30.0, Mean temp: 11.335, stage: 3, weight of fruit: 0.0
Action is remove weeds by hand
Day: 31.0, Mean temp: 10.712, stage: 3, weight of fruit: 0.0
Action is remove weeds by hand
Day: 32.0, Mean temp: 10.321, stage: 3, weight of fruit: 0.0
Action is remove weeds by hand
Day: 33.0, Mean temp: 9.554, stage: 3, weight of fruit: 0.0
Action is remove weeds by hand
Day: 34.0, Mean temp: 6.328, stage: 3, weight of fruit: 0.0
Action is remove weeds by hand
Day: 35.0, Mean temp: 6.586, stage: 3, weight of fruit: 0.0
Action is remove weeds by hand
Day: 36.0, Mean temp: 4.039, stage: 3, weight of fruit: 0.0
Action is remove weeds by hand
Day: 37.0, Mean temp: 4.962, stage: 3, weight of fruit: 0.0
Action is remove weeds by hand
Day: 38.0, Mean temp: 8.219, stage: 3, weight of fruit: 0.0
Action is remove weeds by hand
Day: 39.0, Mean temp: 9.835, stage: 3, weight of fruit: 0.0
Action is remove weeds by hand
Day: 40.0, Mean temp: 7.665, stage: 3, weight of fruit: 0.0
Action is remove weeds by hand
Day: 41.0, Mean temp: 6.337, stage: 3, weight of fruit: 0.0
Action is remove weeds by hand
Day: 42.0, Mean temp: 5.62, stage: 3, weight of fruit: 0.0
Action is remove weeds by hand
Day: 43.0, Mean temp: 7.099, stage: 3, weight of fruit: 0.0
Action is remove weeds by hand
Day: 44.0, Mean temp: 7.855, stage: 3, weight of fruit: 0.0
Action is remove weeds by hand
Day: 45.0, Mean temp: 10.274, stage: 3, weight of fruit: 0.0
Action is remove weeds by hand
Day: 46.0, Mean temp: 12.223, stage: 3, weight of fruit: 0.0
Action is scatter fertilizer
Day: 47.0, Mean temp: 7.699, stage: 3, weight of fruit: 0.0
Action is remove weeds by hand
Day: 48.0, Mean temp: 7.065, stage: 3, weight of fruit: 0.0
Action is scatter fertilizer
Day: 49.0, Mean temp: 6.367, stage: 3, weight of fruit: 0.0
Action is remove weeds by hand
Day: 50.0, Mean temp: 8.157, stage: 3, weight of fruit: 0.0
Action is scatter fertilizer
Day: 51.0, Mean temp: 6.388, stage: 3, weight of fruit: 0.0
Action is scatter fertilizer
Day: 52.0, Mean temp: 8.593, stage: 3, weight of fruit: 0.0
Action is remove weeds by hand
Day: 53.0, Mean temp: 10.491, stage: 3, weight of fruit: 0.0
Action is scatter pesticide
Day: 54.0, Mean temp: 9.339, stage: 3, weight of fruit: 0.0
Action is remove weeds by hand
Day: 55.0, Mean temp: 6.897, stage: 3, weight of fruit: 0.0
Action is remove weeds by hand
Day: 56.0, Mean temp: 4.347, stage: 3, weight of fruit: 0.0
Action is remove weeds by hand
Day: 57.0, Mean temp: 2.649, stage: 3, weight of fruit: 0.0
Action is remove weeds by hand
Day: 58.0, Mean temp: 4.304, stage: 3, weight of fruit: 0.0
Action is scatter pesticide
Day: 59.0, Mean temp: 8.054, stage: 3, weight of fruit: 0.0
Action is remove weeds by hand
Day: 60.0, Mean temp: 6.459, stage: 3, weight of fruit: 0.0
Action is scatter fertilizer
Day: 61.0, Mean temp: 5.813, stage: 3, weight of fruit: 0.0
Action is doing nothing
Day: 62.0, Mean temp: 4.74, stage: 3, weight of fruit: 0.0
Action is scatter pesticide
Day: 63.0, Mean temp: 5.463, stage: 3, weight of fruit: 0.0
Action is 1L of water
Day: 64.0, Mean temp: 7.361, stage: 3, weight of fruit: 0.0
Action is remove weeds by hand
Day: 65.0, Mean temp: 6.06, stage: 3, weight of fruit: 0.0
Action is remove weeds by hand
Day: 66.0, Mean temp: 6.398, stage: 3, weight of fruit: 0.0
Action is remove weeds by hand
Day: 67.0, Mean temp: 8.256, stage: 3, weight of fruit: 0.0
Action is scatter pesticide
Day: 68.0, Mean temp: 7.85, stage: 3, weight of fruit: 0.0
Action is remove weeds by hand
Day: 69.0, Mean temp: 10.868, stage: 3, weight of fruit: 0.0
Action is scatter fertilizer
Day: 70.0, Mean temp: 12.469, stage: 3, weight of fruit: 0.0
Action is scatter pesticide
Day: 71.0, Mean temp: 8.836, stage: 3, weight of fruit: 0.0
Action is scatter pesticide
Day: 72.0, Mean temp: 7.698, stage: 3, weight of fruit: 0.0
Action is scatter fertilizer
Day: 73.0, Mean temp: 7.344, stage: 3, weight of fruit: 0.0
Action is scatter pesticide
Day: 74.0, Mean temp: 9.662, stage: 3, weight of fruit: 0.0
Action is harvesting
Day: 75.0, Mean temp: 7.742, stage: 3, weight of fruit: 0.0
Action is scatter pesticide
Day: 76.0, Mean temp: 7.807, stage: 3, weight of fruit: 0.0
Action is scatter fertilizer
Day: 77.0, Mean temp: 9.912, stage: 3, weight of fruit: 0.0
Action is scatter fertilizer
Day: 78.0, Mean temp: 9.564, stage: 3, weight of fruit: 0.0
Action is scatter fertilizer
Day: 79.0, Mean temp: 7.413, stage: 3, weight of fruit: 0.0
Action is harvesting
Day: 80.0, Mean temp: 6.346, stage: 3, weight of fruit: 0.0
Action is doing nothing
Day: 81.0, Mean temp: 5.97, stage: 3, weight of fruit: 0.0
Action is 1L of water
Day: 82.0, Mean temp: 5.594, stage: 3, weight of fruit: 0.0
Action is doing nothing
Day: 83.0, Mean temp: 6.003, stage: 3, weight of fruit: 0.0
Action is 1L of water
Day: 84.0, Mean temp: 6.43, stage: 3, weight of fruit: 0.0
Action is scatter fertilizer
Day: 85.0, Mean temp: 6.677, stage: 3, weight of fruit: 0.0
Action is 1L of water
Day: 86.0, Mean temp: 7.663, stage: 3, weight of fruit: 0.0
Action is 1L of water
Day: 87.0, Mean temp: 8.407, stage: 3, weight of fruit: 0.0
Action is 1L of water
Day: 88.0, Mean temp: 5.038, stage: 3, weight of fruit: 0.0
Action is 1L of water
Day: 89.0, Mean temp: 4.676, stage: 3, weight of fruit: 0.0
Action is 1L of water
Day: 90.0, Mean temp: 5.745, stage: 3, weight of fruit: 0.0
Action is 1L of water
Day: 91.0, Mean temp: 5.488, stage: 3, weight of fruit: 0.0
Action is 1L of water
Day: 92.0, Mean temp: 7.189, stage: 4, weight of fruit: 0.0
Action is 1L of water
Day: 93.0, Mean temp: 8.946, stage: 5, weight of fruit: 0.0
Action is 1L of water
Day: 94.0, Mean temp: 10.253, stage: 5, weight of fruit: 0.0
Action is 1L of water
Day: 95.0, Mean temp: 13.488, stage: 5, weight of fruit: 0.0
Action is 1L of water
Day: 96.0, Mean temp: 12.475, stage: 5, weight of fruit: 0.0
Action is 5L of water
Day: 97.0, Mean temp: 13.701, stage: 5, weight of fruit: 0.0
Action is 1L of water
Day: 98.0, Mean temp: 17.629, stage: 6, weight of fruit: 0.0
Action is 1L of water
Day: 99.0, Mean temp: 16.655, stage: 6, weight of fruit: 1.0
Action is 1L of water
Day: 100.0, Mean temp: 16.302, stage: 6, weight of fruit: 1.8462644348974382
Action is 1L of water
Day: 101.0, Mean temp: 16.295, stage: 6, weight of fruit: 2.9439341468885543
Action is 1L of water
Day: 102.0, Mean temp: 16.228, stage: 6, weight of fruit: 4.277763800353128
Action is doing nothing
Day: 103.0, Mean temp: 9.44, stage: 6, weight of fruit: 4.277763800353128
Action is 1L of water
Day: 104.0, Mean temp: 6.957, stage: 6, weight of fruit: 4.277763800353128
Action is 5L of water
Day: 105.0, Mean temp: 10.949, stage: 6, weight of fruit: 4.277763800353128
Action is doing nothing
Day: 106.0, Mean temp: 14.965, stage: 6, weight of fruit: 5.864047649310217
Action is 1L of water
Day: 107.0, Mean temp: 15.352, stage: 6, weight of fruit: 7.669518508245714
Action is harvesting
Plant is Dead
[15]:
import seaborn as sns
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(12,6))
sns.countplot(data = episode, x = "action", order=episode['action'].value_counts().index)
[15]:
<AxesSubplot:xlabel='action', ylabel='count'>
From this, we see that PPO learned that pesticide destroy the soil and should be avoided in small quantity. Herbicide is not useful if we can remove weeds by hand.
[ ]: