2025 MCTF Competition

Getting Started with Pyquaticus for MCTF

This page gives an overview of how to train agents via deep RL within the Pyquaticus framework to play the MCTF game. For testing the performance of trained agents, check out the Submit Your Entry page.

Training Agents to Play MCTF

A sample code for training three agents to play MCTF as a team is provided inside Pyquaticus

rl_test/train_3v3.py

. It uses the multi-agent reinforcement learning (MARL) library called RLlib. If you are unfamiliar with multiagent training through RLlib, we recommend reading the RLlib documentation here.

Before running the program, make sure your virtual environment is activated.
Run
python train_3v3.py
The above training script trains the models or policies for the two agents being trained and saves both models as checkpoint files into a folder named
ray_tests/
in the same folder where the training script is run.
The saved policies are located in the file:
ray_tests/<checkpoint_num>/policies/<policy-name>
. The frequency at which policies are saved is defined in
competition_train_example.py line:112
. More information about the policy_name is given in the "Policy Mapping to Agent Ids" section below.

Policy Mapping to Agent Ids

Below is a code snippet from the

rl_test/train_3v3.py

file. Starting on

line: 1

below (

line: 93

in the code repository) we first define a dictionary for mapping between policy names, and then we define the policy mapping fucntion. This policy mapping function is used by RLlib training algorithm to ensure that each agent is being correctly mapped to a learned policy or the intended opponent policy during the training phase.

Training Algorithm: Rollout Workers and GPUs

We give below an example of a PPO algorithm to train our learning policies established in the 'Policy Mapping to Agent Ids' section above. The code snippet is from the

train_3v3.py

file.

You could modify

line: 3

above (

line: 111

in the repository) to adapt to your available computing resources.

On

line: 7

above (

line: 113

in the repository), you could modify the names of the policies you are training based on the policies you set up in the 'Policy Mapping to Agent Ids' section above.

Reward Function Design

A critical component to training successful agents is the design of the reward function. This particularly important in multiagent scenarios where we have to assign rewards to two agents who might be cooperatively completing tasks. To assist with this we hav eprovided a few examples in the

rewards.py

file found in the

utils

folder

pyquaticus/envs/utils/rewards.py

. Below is the complete list of reward function parameters along with descriptions:

You can also take a look at parameters

state

and

prev_state

that get passed into the reward function. These two variables are used to determine the game states and assign rewards to your agents during the training period. Below we have included an example of a sparse reward function: