SpyParty Prediction competition
Compete! Fun! Stats!
Hello and welcome to the SpyParty prediction page :) The goal of this is for
people to design a program that takes in a Triple Agent style json - describing an in progress game -
and predicts the spy win probability. Programs will be scored better the more their
certainty of the outcome matches reality.
The aim is to provide a nice standardised way of comparing different approaches, and motivate people
to work together on this problem.
Training/calibration data
You will be given a training set of just over 26000 replays. You can use this to tune your
model parameters/for training if using an ML model etc.
It contains full Triple Agent files for all SCL events, Summer and
Winter cups, and Hidden Cup. To enable fair comparisons between different approaches, please
only use these replays for training/tuning your algorithm if you are submitting results.
Download Training Set
Getting started
There is a dummy test set, comprising of truncated replays corresponding to game states from Pax Invitational -
these can be downloaded below. Each replay will be in the Triple Agent data format,
with many of the fields anonymised. You should use the duration field to get the elapsed
time of the game state.
The folder test_data includes the game states in the same format you will get the real test set.
The folder data_with_ground_truth contains the game states with ground truth included, in case you want
to score your results locally.
The main purpose of this is to provide a way to check you have everything setup correctly and are returning
your answers in the correct format.
Included in the mini test set is both solutions.json, which is a json dict
containing both the
result of each game, plus what the reference model used for the scoring predicted (see Brier section
below).
Also included is sample_answers.json - this is an example of the format required for submitting,
and indeed submitting this to this website should result in a perfect score of 1.0.
Neither of those two files will be provided for the actual test set!
For debugging purposes the tiny test set of 14 gamestates from 2 matches is provided -
if you provide results for this you will get a detailed table back explaining your score.
Download Tiny Test Set
Download Dummy Test Set
Sample code
Sample code has been provided
here
to get people started with loading and interacting with the data - it include two (very) noddy
models.
Initial feedback
Answers should be provided in a json file containing of a single dictionary, where the keys of the
dictionary are the UUIDs of the gamestates in the test set, and the associated value is a number from 0
to 1 reflecting the probability of the spy winning this game. You can
submit answers for the dummy test
set here
Prediction test set
The real test set contains replays from an anonymous shape based tournament, which we will refer to hereon
simply as "Hexagon Cup". There are just over 1000 games in this, corresponding to 100000
truncated games. Answers to this should be sent to
Wobble (DM me on SpyParty Discord) - there is no endpoint for this since a) you aren't allowed to repeatedly score
on this set and overfit and b) I will need to manually add scores to the scoreboard. Within reason, you are allowed
to enter as many models as you like - and please don't feel that you should only enter if you will come top! It will
be nice just to see a variety of approaches.
Download Test Set
Prediction scoring
Answers will be scored using
Brier Skill Score, described at the bottom
here.
Basic idea - you score 1 overall if you always predict 0 or 1 and are always correct, otherwise you
score a penalty based on the square of how far away your answer was from the truth.
Higher scores are better, and a score of 0 means you did as well as the reference strategy, which is
simply predicting the spy's win chances to be the spy winrate for that venue. It is possible to get a
negative score, but your model has probably gone wrong if you do!
Will probably put the scoring code somewhere public - definitely messsage me if you want more details
on the scoring code / want to know how I'm making the test sets.