Abstract
這篇博客大概會記錄OpenAI gym的安裝以及使用的簡要說明。
在強化學習里面我們需要讓agent運行在一個環境里面,然鵝手動編環境是一件很耗時間的事情, 所以如果有能力使用別人已經編好的環境, 可以節約我們很多時間。 OpenAI gym 就是這樣一個模塊, 他提供了我們很多優秀的模擬環境. 我們的各種 RL 算法都能使用這些環境.。
不過 OpenAI gym 暫時只支持 MacOS 和 Linux 系統,如果是Win 10 用戶可以參考之前的[1]博客 安裝Win的Linux子系統。
安裝
首先需要安裝一些必要依賴,如果brew或者apt-get沒有安裝或者更新的話需要安裝更新一下:
# MacOS:
$ brew install cmake boost boost-python sdl2 swig wget
# Ubuntu 14.04:
$ apt-get install -y python-numpy python-dev cmake zlib1g-dev libjpeg-dev xvfb libav-tools
然后就可以使用pip安裝gym,如果要安裝gym的全部游戲需要把下面的gym
替換成gym[all]
# python 2.7
$ pip install gym
# python 3.5
$ pip3 install gym
使用
我們先看一段簡短的代碼:
demo1.py
import gym
env = gym.make('CartPole-v0')
for i_episode in range(20):
observation = env.reset()
for step in range(100):
env.render()
print(observation)
action = env.action_space.sample()
observation, reward, done, info = env.step(action)
if done:
print("Episode finished after {} timesteps".format(step+1))
break
- 首先是
gym.make('CartPole-v0')
,gym會運行CartPole-v0的游戲環境 - 在每個episode里面,
env.reset()
會重置環境,即重新開始游戲,并返回觀測值 - 在每次的step里面,
env.render()
會刷新畫面 -
env.action_space.sample()
返回一個action的隨機sample,即隨機在動作空間里面選擇一個動作 -
env.step(action)
返回值有四個:-
observation (object): an environment-specific object representing your observation of the environment. For example, pixel data from a camera, joint angles and joint velocities of a robot, or the board state in a board game.
特定于環境的對象表示人對環境的觀察。 例如,來自相機的像素數據,機器人的關節角度和關節速度,或棋盤游戲中的棋盤狀態 -
reward (float): amount of reward achieved by the previous action. The scale varies between environments, but the goal is always to increase your total reward.
上一個行動獲得的獎勵數額 -
done (boolean): whether it’s time to reset the environment again. Most (but not all) tasks are divided up into well-defined episodes, and done being True indicates the episode has terminated. (For example, perhaps the pole tipped too far, or you lost your last life.)
游戲是否已經結束 -
info (dict): diagnostic information useful for debugging. It can sometimes be useful for learning (for example, it might contain the raw probabilities behind the environment’s last state change). However, official evaluations of your agent are not allowed to use this for learning.
調試用的診斷信息
-
observation (object): an environment-specific object representing your observation of the environment. For example, pixel data from a camera, joint angles and joint velocities of a robot, or the board state in a board game.
我們可以用下圖來表示agent和環境之間的關系:
image.png
運行效果如http://s3-us-west-2.amazonaws.com/rl-gym-doc/cartpole-no-reset.mp4
Space
之后我們看一下上面代碼的action_space
。每個游戲都有自己的action_space和observation_space,表示可以執行的動作空間與觀察空間。我們可以將其打印出來,看動作空間和觀察空間的最大值或者最小值
import gym
env = gym.make('CartPole-v0')
print(env.action_space)
#> Discrete(2) 離散值0或1
print(env.observation_space)
#> Box(4,) 區間值,數組中包含四個數,取值如下
print(env.observation_space.high)
#> array([ 2.4 , inf, 0.20943951, inf])
print(env.observation_space.low)
#> array([-2.4 , -inf, -0.20943951, -inf])