Using the COLE Benchmark

Training and Testing

The COLE benchmark can be used to train and/or test models on multiple tasks. To train or fine-tune a model, you can fetch the train, validation and test data splits from our Hugging Face public repository. We recommend using Hugging Face’s libraries to simplify the process.

To test a model, you also need to fetch the data in the same way. Once done, your model should infer predictions for each line in the test split. Our repository includes benchmark evaluation scripts for each dataset. You only need to plug in your model's inference method using the HuggingFace Model interface. Our inference scripts are available on our GitHub Repository.

If you prefer to run inference separately, please ensure that the predictions are formatted correctly before submitting them for evaluation (see our "Formatting the Dataset" section).

Formatting the Dataset

Before submitting your results, make sure your output is properly formatted so that our systems can process it. The expected format is a nested JSON dictionary as follows:

{
  "model_name": "a_model_name",
  "model_url": "a_model_url",
  "tasks": [
    {
      "qfrcola": { "predictions": [1,1,1,1,1] }
    },
    {
      "allocine": { "predictions": [1,1,1,1,1] }
    }
  ]
}