Offline-to-online
Here, we report reproduced scores after offline pretraining and online fine-tuning for all datasets and offline-to-online algorithms considered.
Tip
If you want to re-collect our results in a more structured/nuanced manner, see [how to reproduce](repro.md) section.
Scores
Antmaze
Task-Name | AWAC | CQL | IQL | SPOT | Cal-QL |
---|---|---|---|---|---|
antmaze-umaze-v2 | 52.75 ± 8.67 → 98.75 ± 1.09 | 94.00 ± 1.58 → 99.50 ± 0.87 | 77.00 ± 0.71 → 96.50 ± 1.12 | 91.00 ± 2.55 → 99.50 ± 0.50 | 76.75 ± 7.53 → 99.75 ± 0.43 |
antmaze-umaze-diverse-v2 | 56.00 ± 2.74 → 0.00 ± 0.00 | 9.50 ± 9.91 → 99.00 ± 1.22 | 59.50 ± 9.55 → 63.75 ± 25.02 | 36.25 ± 2.17 → 95.00 ± 3.67 | 32.00 ± 27.79 → 98.50 ± 1.12 |
antmaze-medium-play-v2 | 0.00 ± 0.00 → 0.00 ± 0.00 | 59.00 ± 11.18 → 97.75 ± 1.30 | 71.75 ± 2.95 → 89.75 ± 1.09 | 67.25 ± 10.47 → 97.25 ± 1.30 | 71.75 ± 3.27 → 98.75 ± 1.64 |
antmaze-medium-diverse-v2 | 0.00 ± 0.00 → 0.00 ± 0.00 | 63.50 ± 6.84 → 97.25 ± 1.92 | 64.25 ± 1.92 → 92.25 ± 2.86 | 73.75 ± 7.29 → 94.50 ± 1.66 | 62.00 ± 4.30 → 98.25 ± 1.48 |
antmaze-large-play-v2 | 0.00 ± 0.00 → 0.00 ± 0.00 | 28.75 ± 7.76 → 88.25 ± 2.28 | 38.50 ± 8.73 → 64.50 ± 17.04 | 31.50 ± 12.58 → 87.00 ± 3.24 | 31.75 ± 8.87 → 97.25 ± 1.79 |
antmaze-large-diverse-v2 | 0.00 ± 0.00 → 0.00 ± 0.00 | 35.50 ± 3.64 → 91.75 ± 3.96 | 26.75 ± 3.77 → 64.25 ± 4.15 | 17.50 ± 7.26 → 81.00 ± 14.14 | 44.00 ± 8.69 → 91.50 ± 3.91 |
average | 18.12 → 16.46 | 48.38 → 95.58 | 56.29 → 78.50 | 52.88 → 92.38 | 53.04 → 97.33 |
Adroit
Task-Name | AWAC | CQL | IQL | SPOT | Cal-QL |
---|---|---|---|---|---|
pen-cloned-v1 | 88.66 ± 15.10 → 86.82 ± 11.12 | -2.76 ± 0.08 → -1.28 ± 2.16 | 84.19 ± 3.96 → 102.02 ± 20.75 | 6.19 ± 5.21 → 43.63 ± 20.09 | -2.66 ± 0.04 → -2.68 ± 0.12 |
door-cloned-v1 | 0.93 ± 1.66 → 0.01 ± 0.00 | -0.33 ± 0.01 → -0.33 ± 0.01 | 1.19 ± 0.93 → 20.34 ± 9.32 | -0.21 ± 0.14 → 0.02 ± 0.31 | -0.33 ± 0.01 → -0.33 ± 0.01 |
hammer-cloned-v1 | 1.80 ± 3.01 → 0.24 ± 0.04 | 0.56 ± 0.55 → 2.85 ± 4.81 | 1.35 ± 0.32 → 57.27 ± 28.49 | 3.97 ± 6.39 → 3.73 ± 4.99 | 0.25 ± 0.04 → 0.17 ± 0.17 |
relocate-cloned-v1 | -0.04 ± 0.04 → -0.04 ± 0.01 | -0.33 ± 0.01 → -0.33 ± 0.01 | 0.04 ± 0.04 → 0.32 ± 0.38 | -0.24 ± 0.01 → -0.15 ± 0.05 | -0.31 ± 0.05 → -0.31 ± 0.04 |
average | 22.84 → 21.76 | -0.72 → 0.22 | 21.69 → 44.99 | 2.43 → 11.81 | -0.76 → -0.79 |
Regrets
Antmaze
Task-Name | AWAC | CQL | IQL | SPOT | Cal-QL |
---|---|---|---|---|---|
antmaze-umaze-v2 | 0.04 ± 0.01 | 0.02 ± 0.00 | 0.07 ± 0.00 | 0.02 ± 0.00 | 0.01 ± 0.00 |
antmaze-umaze-diverse-v2 | 0.88 ± 0.01 | 0.09 ± 0.01 | 0.43 ± 0.11 | 0.22 ± 0.07 | 0.05 ± 0.01 |
antmaze-medium-play-v2 | 1.00 ± 0.00 | 0.08 ± 0.01 | 0.09 ± 0.01 | 0.06 ± 0.00 | 0.04 ± 0.01 |
antmaze-medium-diverse-v2 | 1.00 ± 0.00 | 0.08 ± 0.00 | 0.10 ± 0.01 | 0.05 ± 0.01 | 0.04 ± 0.01 |
antmaze-large-play-v2 | 1.00 ± 0.00 | 0.21 ± 0.02 | 0.34 ± 0.05 | 0.29 ± 0.07 | 0.13 ± 0.02 |
antmaze-large-diverse-v2 | 1.00 ± 0.00 | 0.21 ± 0.03 | 0.41 ± 0.03 | 0.23 ± 0.08 | 0.13 ± 0.02 |
average | 0.82 | 0.11 | 0.24 | 0.15 | 0.07 |
Adroit
Task-Name | AWAC | CQL | IQL | SPOT | Cal-QL |
---|---|---|---|---|---|
pen-cloned-v1 | 0.46 ± 0.02 | 0.97 ± 0.00 | 0.37 ± 0.01 | 0.58 ± 0.02 | 0.98 ± 0.01 |
door-cloned-v1 | 1.00 ± 0.00 | 1.00 ± 0.00 | 0.83 ± 0.03 | 0.99 ± 0.01 | 1.00 ± 0.00 |
hammer-cloned-v1 | 1.00 ± 0.00 | 1.00 ± 0.00 | 0.65 ± 0.10 | 0.98 ± 0.01 | 1.00 ± 0.00 |
relocate-cloned-v1 | 1.00 ± 0.00 | 1.00 ± 0.00 | 1.00 ± 0.00 | 1.00 ± 0.00 | 1.00 ± 0.00 |
average | 0.86 | 0.99 | 0.71 | 0.89 | 0.99 |