Detailed Leaderboard Mean scores across all 19 tasks. The two scenarios are: 1000 examples per task, and full datasets. All cells report top-1 accuracy.

Breakdown Across Tasks

Select datasets to include:

1000 Training Examples per task

Full Training Sets