Rules & Scoring Software Installation Training Agents Competition History Eligibility FAQ Submission Formatting Organizers


Getting Started with Pyquaticus for MCTF

This page gives an overview of how to train agents via deep RL within the Pyquaticus framework to play the MCTF game. For testing the performance of trained agents, check out the Submit Your Entry page.

Training Agents to Play MCTF

A sample code for training three agents to play MCTF as a team is provided inside Pyquaticus

. It uses the multi-agent reinforcement learning (MARL) library called RLlib. If you are unfamiliar with multiagent training through RLlib, we recommend reading the RLlib documentation here.

Policy Mapping to Agent Ids

Below is a code snippet from the

file. Starting on below ( in the code repository) we first define a dictionary for mapping between policy names, and then we define the policy mapping fucntion. This policy mapping function is used by RLlib training algorithm to ensure that each agent is being correctly mapped to a learned policy or the intended opponent policy during the training phase.

Training Algorithm: Rollout Workers and GPUs

We give below an example of a PPO algorithm to train our learning policies established in the 'Policy Mapping to Agent Ids' section above. The code snippet is from the

file.

You could modify

above ( in the repository) to adapt to your available computing resources.

On

above ( in the repository), you could modify the names of the policies you are training based on the policies you set up in the 'Policy Mapping to Agent Ids' section above.

Reward Function Design

A critical component to training successful agents is the design of the reward function. This particularly important in multiagent scenarios where we have to assign rewards to two agents who might be cooperatively completing tasks. To assist with this we hav eprovided a few examples in the

file found in the folder . Below is the complete list of reward function parameters along with descriptions:

You can also take a look at parameters

and that get passed into the reward function. These two variables are used to determine the game states and assign rewards to your agents during the training period. Below we have included an example of a sparse reward function: