Getting Started with Pyquaticus for MCTF
This page gives an overview of how to train agents via deep RL within the Pyquaticus framework to play the MCTF game. For testing the performance of trained agents, check out the Submit Your Entry page.
This page gives an overview of how to train agents via deep RL within the Pyquaticus framework to play the MCTF game. For testing the performance of trained agents, check out the Submit Your Entry page.
A sample code for training three agents to play MCTF as a team is provided inside Pyquaticus
rl_test/train_3v3.py
. It uses the multi-agent reinforcement learning (MARL) library called RLlib. If you are unfamiliar with multiagent training through RLlib, we recommend reading the RLlib documentation here.
python train_3v3.py
ray_tests/
in the same folder where the training script is run.ray_tests/<checkpoint_num>/policies/<policy-name>
. The frequency at which policies are saved is defined incompetition_train_example.py line:112
. More information about the policy_name is given in the "Policy Mapping to Agent Ids" section below.Below is a code snippet from the
rl_test/train_3v3.py
file. Starting online: 1
below (line: 93
in the code repository) we first define a dictionary for mapping between policy names, and then we define the policy mapping fucntion. This policy mapping function is used by RLlib training algorithm to ensure that each agent is being correctly mapped to a learned policy or the intended opponent policy during the training phase.We give below an example of a PPO algorithm to train our learning policies established in the 'Policy Mapping to Agent Ids' section above. The code snippet is from the
train_3v3.py
file.You could modify
line: 3
above (line: 111
in the repository) to adapt to your available computing resources.On
line: 7
above (line: 113
in the repository), you could modify the names of the policies you are training based on the policies you set up in the 'Policy Mapping to Agent Ids' section above.A critical component to training successful agents is the design of the reward function. This particularly important in multiagent scenarios where we have to assign rewards to two agents who might be cooperatively completing tasks. To assist with this we hav eprovided a few examples in the
rewards.py
file found in theutils
folderpyquaticus/envs/utils/rewards.py
. Below is the complete list of reward function parameters along with descriptions:You can also take a look at parameters
state
andprev_state
that get passed into the reward function. These two variables are used to determine the game states and assign rewards to your agents during the training period. Below we have included an example of a sparse reward function: