Dataset Splits:
BlackSwanSuite has three types of task variants: MCQ, Y/N and Generative.
We provide validation and test splits, which you can find as a split under each Huggingface Dataset card.
Validation Set: The Validation subset is available for development work, where ground truth labels are provided.
Test Set: The Test subset is available for evaluation. Ground truth labels are not provided, to prevent misuse of the dataset. Please submit to the public leaderboard to evaluate your model's performance on MCQ and Y/N variants. For the generative variant, please send us an email for an LLM Match score. We may take a few days to respond, so please be patient.
🤗 Access Data LeaderboardNote: When using the leaderboard, once you are logged in, please go to participate > select team > accept licence > then the submit tab shows up (with an example format for submission).