CS7675: Benchmarking Robot Learning Algorithms

Introduction

Robots have the potential to transform the way we work and live, but to truly integrate into human environments, they must learn complex tasks efficiently. One of the most promising approaches is Imitation Learning, where robots acquire skills by observing and mimicking expert demonstrations—much like how humans learn by watching others.

This research explores and compares state-of-the-art imitation learning algorithms to determine which performs best in real-world, dynamically changing environments. For example, tasks like filling a water bottle or pressing an elevator button seem simple for humans but are challenging for robots. Unlike us, robots do not inherently understand their surroundings—unless we train them to.

Research Questions

How do different imitation learning algorithms perform across varying tasks and environmental conditions?

What are the trade-offs between computational efficiency, training time, and task success rates?

Which algorithms demonstrate the most robust generalization capabilities in dynamic environments?

Methodology

Deep Perception

Imitation Learning

1

Data Collection

Expert demonstrations collected for target tasks (button pressing, bottle reorientation)

2

Algorithm Training

Implementation of four different policies: Open/Close Deep Perception, Diffusion Policy, VQ-BeT

3

Real-World Testing

Evaluation in dynamic environments with varying conditions

4

Performance Analysis

Measurement of execution time, success rates, and resource requirements

Results

Push Button Task

Policy	Execution Time	Success Rate	Engineering Time	Training Time
Open Deep Perception	1 m 52 s	40 %	1 d 6 hr	0 hr
Close Deep Perception	2 m 34 s	55 %	3 d 4 hr	0 hr
Diffusion Policy	1 m 20 s	85%	2 hr	1 d 9 hr
VQ-BeT*	inf	0 %	2 hr	10 hr

Reorient Bottle Task

Policy	Execution Time (s)	Success Rate (%)	Engineering Time (h)	Training Time (h)
Open Deep Perception	??	??	??	0 hr
Close Deep Perception	??	??	??	0 hr
Diffusion Policy	??	??	0 hr	1d 18 hr
VQ-BeT*	??	??	0 hr	13 hr

* incomplete training (work in progress)

Conclusion

Open-loop Deep Perception

Works well but is slow and computationally expensive, limiting real-time applications.

Close-loop Deep Perception

Provides better adaptability to changing conditions with moderate computational requirements.

Diffusion Policy (Small Dataset)

Learns suboptimal behavior due to limited representation of environment and dataset bias.

Diffusion Policy (Large Dataset)

Demonstrates improved performance but requires significantly more training data and resources.

VQ-BeT

Can learn more robust policies that generalize better to novel environments and perturbations.

Future Work

Investigate multi-modal learning approaches combining vision, force feedback, and language instructions

Develop hybrid models that leverage the strengths of multiple algorithms

Explore collaborative learning scenarios where robots learn from both experts and non-experts

Extend benchmarking to more complex, multi-stage manipulation tasks

References

1

Diffusion Policy

Chi, Z., Guan, Z., Krynitsky, J., Biswal, A., Gupta, A., Zhao, S., Shi, B., Yuan, X., Bisk, Y., and Oh, J. (2023). Diffusion Policy: Visuomotor Policy Learning via Action Diffusion. Robotics: Science and Systems (RSS).

2

VQ-BeT

Wang, A., Kumar, A., Corso, J., Fox, D., Garg, A., and Singh, A. (2023). VQ-BeT: Behavior Transformer with Discrete Vector-Quantized Codes for Open-Loop Robot Control. Conference on Robot Learning (CoRL).

Acknowledgements

I'd like to extend my sincerest gratitude to Prof. Christopher Amato for nominating me in this Research Apprenticeship way back when I was novice in the field of RL. Thank you for believing in me, when I could not do it myself. I am also thankful to Prof. Zhi Tan for his guidance and support throughout this research project. It took me a while to figure things out, but you were patient with me. Thank you so very much for motivating me at every step! Special thanks to my peers and friends I made along the way in my journey of learning, who provided valuable feedback and encouragement.