EfficientGrasp: A Scalable and Modular Solution to Robotic Grasping Applications Using Vision
Keywords:
Robotic grasping, Vision based grasping, Grasping in clutter, Deep learning.Abstract
Advancements in deep learning and computational power have tran- spired complex, highly accurate models for vision-based cognitive tasks. But in real-time robotic applications, where multiple tasks are per- formed simultaneously, the computational power constraints restrict the available resources for any one specific task. With that in mind, an EfficienNets based scalable and modular model has been presented for the robotic grasp detection task. The proposed EfficientGrasp model effectively tackles single and multiple objects in isolated and clutter configurations. The scalability aspect covers the computational power constraints with some accuracy trade-off in lighter variants by vary- ing the parameter count in the model. The modularity feature reduces the redundancy of extracting high-level features from images for differ- ent vision-based cognitive applications by using small subnets for each specific task. This work builds upon the EfficientPose model by propos- ing subnets for the robotic grasp detection task. The work focuses on parallel plate gripper and allows incorporating gripper configurations both pre-training and post-training. The model is shown to achieve a 5-fold cross-validation top grasp accuracy of 96.05% and Top-5 grasp accuracy of 98.87% on Cornell dataset and and Top-5 grasp accu- racy of 96.46% on Visual Manipulation Relationship Dataset(VMRD).