SpyParty Prediction competition

Compete! Fun! Stats!

Hello and welcome to the SpyParty prediction page :) The goal of this is for people to design a program that takes in a Triple Agent style json - describing an in progress game - and predicts the spy win probability. Programs will be scored better the more their certainty of the outcome matches reality. The aim is to provide a nice standardised way of comparing different approaches, and motivate people to work together on this problem.

Training/calibration data

You will be given a training set of just over 26000 replays. You can use this to tune your model parameters/for training if using an ML model etc. It contains full Triple Agent files for all SCL events, Summer and Winter cups, and Hidden Cup. To enable fair comparisons between different approaches, please only use these replays for training/tuning your algorithm if you are submitting results.

Download Training Set

Getting started

There is a dummy test set, comprising of truncated replays corresponding to game states from Pax Invitational - these can be downloaded below. Each replay will be in the Triple Agent data format, with many of the fields anonymised. You should use the duration field to get the elapsed time of the game state. The folder test_data includes the game states in the same format you will get the real test set. The folder data_with_ground_truth contains the game states with ground truth included, in case you want to score your results locally. The main purpose of this is to provide a way to check you have everything setup correctly and are returning your answers in the correct format.

Included in the mini test set is both solutions.json, which is a json dict containing both the result of each game, plus what the reference model used for the scoring predicted (see Brier section below). Also included is sample_answers.json - this is an example of the format required for submitting, and indeed submitting this to this website should result in a perfect score of 1.0. Neither of those two files will be provided for the actual test set!

For debugging purposes the tiny test set of 14 gamestates from 2 matches is provided - if you provide results for this you will get a detailed table back explaining your score.

Download Tiny Test Set Download Dummy Test Set

Sample code

Sample code has been provided here to get people started with loading and interacting with the data - it include two (very) noddy models.

Initial feedback

Answers should be provided in a json file containing of a single dictionary, where the keys of the dictionary are the UUIDs of the gamestates in the test set, and the associated value is a number from 0 to 1 reflecting the probability of the spy winning this game. You can submit answers for the dummy test set here

Prediction test set

The real test set contains replays from an anonymous shape based tournament, which we will refer to hereon simply as "Hexagon Cup". There are just over 1000 games in this, corresponding to 100000 truncated games. Answers to this should be sent to Wobble (DM me on SpyParty Discord) - there is no endpoint for this since a) you aren't allowed to repeatedly score on this set and overfit and b) I will need to manually add scores to the scoreboard. Within reason, you are allowed to enter as many models as you like - and please don't feel that you should only enter if you will come top! It will be nice just to see a variety of approaches.

Download Test Set

Prediction scoring

Answers will be scored using Brier Skill Score, described at the bottom here. Basic idea - you score 1 overall if you always predict 0 or 1 and are always correct, otherwise you score a penalty based on the square of how far away your answer was from the truth. Higher scores are better, and a score of 0 means you did as well as the reference strategy, which is simply predicting the spy's win chances to be the spy winrate for that venue. It is possible to get a negative score, but your model has probably gone wrong if you do! Will probably put the scoring code somewhere public - definitely messsage me if you want more details on the scoring code / want to know how I'm making the test sets.

Get scored on the dummy test set See the current leaderboard