Learning Efficient Robotic Garment Manipulation with Standardization

Abstract

Garment manipulation is a significant challenge for robots due to the complex dynamics and potential self-occlusion of garments. Most existing methods of efficient garment unfolding overlook the crucial role of standardization of flattened garments, which could significantly simplify downstream tasks like folding, ironing, and packing.

This paper presents APS-Net, a novel approach to garment manipulation that combines unfolding and standardization in a unified framework. APS-Net employs a dual-arm, multi-primitive policy with dynamic fling to quickly unfold crumpled garments and with pick-and-place for precise alignment. The purpose of garment standardization during unfolding involves not only maximizing surface coverage but also aligning the garment’s shape and orientation to predefined requirements. To guide the effective robot learning, we introduce a novel factorized reward function for standardization, which incorporates garment coverage (Cov), keypoint distance (KD), and intersection-over-union (IoU) metrics. Additionally, we introduce a spatial action mask and an Action Optimized Module to improve unfolding efficiency by selecting actions and operation points effectively. In simulation, APS-Net outperforms state-of-the-art methods for long-sleeves, achieving 3.9% better coverage ,5.2% higher IoU, and 1.4% higher KD. Real-world folding tasks further demonstrate that standardization simplifies the folding process.


Approach Overview

Fig. 1. Approach overview. APS-Net takes a batch of rotated and scaled RGBD images as input and uses three encoders, each producing a pair of decoders — the fling and P\&P decoders — which are weighted to generate the corresponding Spatial Action Map. A Spatial Action Mask is then applied to filter out invalid actions. The valid primitive batches are concatenated, and the action to be executed is parameterized by the maximum value pixel. Once the garment is sufficiently flattened, a keypoint detection-based method is employed to perform the folding.


Spatial Action Mask

Fig. 2. Our action spaces and action sampling using Spatial Action Maps. The series of smaller maps in (a) represent different slices of the Spatial Action Maps for various primitives, each corresponding to a layer at a different scale and rotation. The masks applied to each layer serve to filter out invalid actions — those that would cause the robot’s end-effector to collide or extend beyond the garment.


Simulation Experiments

1. Results of our method and other baselines

easy task

medium task

hard task

extra hard task

expert task


2. Performance of APS-Net on Downstream Folding Task

easy task

medium task

hard task

extra hard task

expert task

Read-World Experiments

1. Unfolding effect on Long Sleeve 1

P&P

Flingbot

APS-Net

2. Unfolding effect on Long Sleeve 2

P&P

Flingbot

APS-Net

3. Unfolding effect on Long Sleeve 3

P&P

Flingbot

APS-Net

4. Unfolding effect on Long Sleeve 4

P&P

Flingbot

APS-Net


1. Performance of APS-Net on Downstream Folding Task

Long Sleeve 1

Long Sleeve 2

Long Sleeve 3