Skip to content

Offline-to-online

Here, we report reproduced scores after offline pretraining and online fine-tuning for all datasets and offline-to-online algorithms considered.

Tip

If you want to re-collect our results in a more structured/nuanced manner, see [how to reproduce](repro.md) section.

Scores

Antmaze

Task-Name AWAC CQL IQL SPOT Cal-QL
antmaze-umaze-v2 52.75 ± 8.67 → 98.75 ± 1.09 94.00 ± 1.58 → 99.50 ± 0.87 77.00 ± 0.71 → 96.50 ± 1.12 91.00 ± 2.55 → 99.50 ± 0.50 76.75 ± 7.53 → 99.75 ± 0.43
antmaze-umaze-diverse-v2 56.00 ± 2.74 → 0.00 ± 0.00 9.50 ± 9.91 → 99.00 ± 1.22 59.50 ± 9.55 → 63.75 ± 25.02 36.25 ± 2.17 → 95.00 ± 3.67 32.00 ± 27.79 → 98.50 ± 1.12
antmaze-medium-play-v2 0.00 ± 0.00 → 0.00 ± 0.00 59.00 ± 11.18 → 97.75 ± 1.30 71.75 ± 2.95 → 89.75 ± 1.09 67.25 ± 10.47 → 97.25 ± 1.30 71.75 ± 3.27 → 98.75 ± 1.64
antmaze-medium-diverse-v2 0.00 ± 0.00 → 0.00 ± 0.00 63.50 ± 6.84 → 97.25 ± 1.92 64.25 ± 1.92 → 92.25 ± 2.86 73.75 ± 7.29 → 94.50 ± 1.66 62.00 ± 4.30 → 98.25 ± 1.48
antmaze-large-play-v2 0.00 ± 0.00 → 0.00 ± 0.00 28.75 ± 7.76 → 88.25 ± 2.28 38.50 ± 8.73 → 64.50 ± 17.04 31.50 ± 12.58 → 87.00 ± 3.24 31.75 ± 8.87 → 97.25 ± 1.79
antmaze-large-diverse-v2 0.00 ± 0.00 → 0.00 ± 0.00 35.50 ± 3.64 → 91.75 ± 3.96 26.75 ± 3.77 → 64.25 ± 4.15 17.50 ± 7.26 → 81.00 ± 14.14 44.00 ± 8.69 → 91.50 ± 3.91
average 18.12 → 16.46 48.38 → 95.58 56.29 → 78.50 52.88 → 92.38 53.04 → 97.33

Adroit

Task-Name AWAC CQL IQL SPOT Cal-QL
pen-cloned-v1 88.66 ± 15.10 → 86.82 ± 11.12 -2.76 ± 0.08 → -1.28 ± 2.16 84.19 ± 3.96 → 102.02 ± 20.75 6.19 ± 5.21 → 43.63 ± 20.09 -2.66 ± 0.04 → -2.68 ± 0.12
door-cloned-v1 0.93 ± 1.66 → 0.01 ± 0.00 -0.33 ± 0.01 → -0.33 ± 0.01 1.19 ± 0.93 → 20.34 ± 9.32 -0.21 ± 0.14 → 0.02 ± 0.31 -0.33 ± 0.01 → -0.33 ± 0.01
hammer-cloned-v1 1.80 ± 3.01 → 0.24 ± 0.04 0.56 ± 0.55 → 2.85 ± 4.81 1.35 ± 0.32 → 57.27 ± 28.49 3.97 ± 6.39 → 3.73 ± 4.99 0.25 ± 0.04 → 0.17 ± 0.17
relocate-cloned-v1 -0.04 ± 0.04 → -0.04 ± 0.01 -0.33 ± 0.01 → -0.33 ± 0.01 0.04 ± 0.04 → 0.32 ± 0.38 -0.24 ± 0.01 → -0.15 ± 0.05 -0.31 ± 0.05 → -0.31 ± 0.04
average 22.84 → 21.76 -0.72 → 0.22 21.69 → 44.99 2.43 → 11.81 -0.76 → -0.79

Regrets

Antmaze

Task-Name AWAC CQL IQL SPOT Cal-QL
antmaze-umaze-v2 0.04 ± 0.01 0.02 ± 0.00 0.07 ± 0.00 0.02 ± 0.00 0.01 ± 0.00
antmaze-umaze-diverse-v2 0.88 ± 0.01 0.09 ± 0.01 0.43 ± 0.11 0.22 ± 0.07 0.05 ± 0.01
antmaze-medium-play-v2 1.00 ± 0.00 0.08 ± 0.01 0.09 ± 0.01 0.06 ± 0.00 0.04 ± 0.01
antmaze-medium-diverse-v2 1.00 ± 0.00 0.08 ± 0.00 0.10 ± 0.01 0.05 ± 0.01 0.04 ± 0.01
antmaze-large-play-v2 1.00 ± 0.00 0.21 ± 0.02 0.34 ± 0.05 0.29 ± 0.07 0.13 ± 0.02
antmaze-large-diverse-v2 1.00 ± 0.00 0.21 ± 0.03 0.41 ± 0.03 0.23 ± 0.08 0.13 ± 0.02
average 0.82 0.11 0.24 0.15 0.07

Adroit

Task-Name AWAC CQL IQL SPOT Cal-QL
pen-cloned-v1 0.46 ± 0.02 0.97 ± 0.00 0.37 ± 0.01 0.58 ± 0.02 0.98 ± 0.01
door-cloned-v1 1.00 ± 0.00 1.00 ± 0.00 0.83 ± 0.03 0.99 ± 0.01 1.00 ± 0.00
hammer-cloned-v1 1.00 ± 0.00 1.00 ± 0.00 0.65 ± 0.10 0.98 ± 0.01 1.00 ± 0.00
relocate-cloned-v1 1.00 ± 0.00 1.00 ± 0.00 1.00 ± 0.00 1.00 ± 0.00 1.00 ± 0.00
average 0.86 0.99 0.71 0.89 0.99

Visual summary