本文基于 ml-agents 中与 python 交互的文档 Unity ML-Agents Python Interface and Trainers ,更加详细地介绍整个交互 API
初始化 unity 环境
1 | import numpy as np |
初始化环境 env = UnityEnvironment(file_name="3DBall", worker_id=0, seed=1)
file_name
是 unity 编译生成的二进制可执行环境。
worker_id
表示你想用哪个端口来与环境进行交互,可以用来进行并行交互,比如 A3C 。
seed
为训练过程中的随机种子,如果想让 untiy 环境每次随机的效果相同,则种子设置为固定值。
若 file_name=None
则表示 python 直接与 unity editor 进行交互,等待编辑器中的开始按钮被按下 Start training by pressing the Play button in the Unity Editor ,同时这个时候 worker_id
必须为 0 才能与编辑器进行连接。
1 | env = UnityEnvironment() |
INFO:mlagents.envs:Start training by pressing the Play button in the Unity Editor.
INFO:mlagents.envs:
'Academy' started successfully!
Unity Academy name: Academy
Number of Brains: 2
Number of Training Brains : 2
Reset Parameters :
copy -> 1.0
Unity brain name: Brain1
Number of Visual Observations (per agent): 2
Vector Observation space size (per agent): 8
Number of stacked Vector Observation: 1
Vector Action space type: continuous
Vector Action space size (per agent): [2]
Vector Action descriptions: 3, 3
Unity brain name: Brain2
Number of Visual Observations (per agent): 0
Vector Observation space size (per agent): 8
Number of stacked Vector Observation: 1
Vector Action space type: continuous
Vector Action space size (per agent): [2]
Vector Action descriptions: 3, 3
获取基础 brain 信息
1 | print(env.brains) |
{'Brain1': <mlagents.envs.brain.BrainParameters object at 0x0000023EC4BFB0F0>, 'Brain2': <mlagents.envs.brain.BrainParameters object at 0x0000023EC5327470>}
['Brain1', 'Brain2']
Unity brain name: Brain1
Number of Visual Observations (per agent): 2
Vector Observation space size (per agent): 8
Number of stacked Vector Observation: 1
Vector Action space type: continuous
Vector Action space size (per agent): [2]
Vector Action descriptions: 3, 3
brain_name: Brain1
num_stacked_vector_observations: 1
vector_observation_space_size: 8
number_visual_observations: 2
camera_resolutions: [{'height': 450, 'width': 500, 'blackAndWhite': False}, {'height': 550, 'width': 600, 'blackAndWhite': False}]
vector_action_space_type: continuous
vector_action_space_size: [2]
vector_action_descriptions: ['3', '3']
---------
Unity brain name: Brain2
Number of Visual Observations (per agent): 0
Vector Observation space size (per agent): 8
Number of stacked Vector Observation: 1
Vector Action space type: continuous
Vector Action space size (per agent): [2]
Vector Action descriptions: 3, 3
brain_name: Brain2
num_stacked_vector_observations: 1
vector_observation_space_size: 8
number_visual_observations: 0
camera_resolutions: []
vector_action_space_type: continuous
vector_action_space_size: [2]
vector_action_descriptions: ['3', '3']
---------
重置训练环境,开始交互
train_mode=True
表示是训练模式,即用的是 Acadmey 中的 Training Configuration , train_mode=False
为推断模式,即用的是 Acadmey 中的 Inference Configuration ,unity 中会将每帧都绘制
config={}
是 reset 环境时的参数,类型为 dict
。需要预先在 unity editor 中定义所有的参数。
返回为一个 dict
,包含所有 brain
的信息
1 | brain_infos = env.reset(train_mode=True, config={ |
INFO:mlagents.envs:Academy reset with parameters: copy -> 1
vector_observations
是一个 numpy
,维度为 (智能体数量, 向量状态的长度 * 向量状态栈大小)
,其中向量状态栈代表有多少状态被储存起来一起作为当前的状态
visual_observations
是一个 list
,个数为 Brain
中设置的摄像机的个数,其中每一个元素为也为一个 list
,长度为智能体数量。该 list
中的元素为一个三维的 numpy
,维度为 (长,宽,通道数)
。如每个 Brain
要设置左右两台观测的摄像机,则 visual_observations
长度为 2 ,分别代表左右两个摄像机。第一个元素为所有智能体的左摄像机的图像集合,第二个元素为所有只能提的右摄像机的图像集合。
1 | for name in env.brain_names: |
<mlagents.envs.brain.BrainInfo object at 0x0000020F88ED0FD0>
vector_observations: (2, 8)
visual_observations:
number of agents 2
(450, 500, 3)
(450, 500, 3)
number of agents 2
(550, 600, 3)
(550, 600, 3)
text_observations: ['', '']
rewards: [0.0, 0.0]
local_done: [False, False]
max_reached: [False, False]
previous_vector_actions: (2, 2)
agents: [14714, 14736]
---------
<mlagents.envs.brain.BrainInfo object at 0x0000020F88F69978>
vector_observations: (1, 8)
visual_observations:
text_observations: ['']
rewards: [0.0]
local_done: [False]
max_reached: [False]
previous_vector_actions: (1, 2)
agents: [14658]
---------
一个最简单的交互方式
1 | env.reset(train_mode=False, config={ |
INFO:mlagents.envs:Academy reset with parameters: copy -> 1
一个单 brain 的复杂例子(不能直接执行,只是演示)
1 | def simulate(brain_info): |
踩过的坑
如果直接给 step
传入 dict
的话,尽管每个 brain
的 action 都是 numpy
类型,但在执行完毕后会变为 list
1 | env.reset(train_mode=True) |
<class 'numpy.ndarray'> <class 'numpy.ndarray'>
<class 'list'> <class 'list'>
关闭连接
1 | env.close() |