Skip to content

[Question] Adapting OpenVLA-OFT for 7-DOF Robot Arm (Joint Angle Action Space) #149

@lozoRUST

Description

@lozoRUST

Hi OpenVLA team,

Has anyone adapted the openvla-oft project for a 7-DOF robot arm? I collected a set of real-world data for a 7-DOF robot arm where both the state and action are in joint angles (specifically, 7 joint angles + 1 gripper state, making it an 8D vector).

To accommodate this specific format, I devised the following settings and modifications. I would greatly appreciate it if you could review these modifications and let me know if this is the correct approach.

1. prismatic/vla/constants.py

I added the real-world platform constants and adjusted the platform branching to default to REALWORLD.

# Add REALWORLD Platform Constants
REALWORLD_CONSTANTS = {
    "NUM_ACTIONS_CHUNK": 8,
    "ACTION_DIM": 8,              # 7 joint angles + 1 gripper
    "PROPRIO_DIM": 8,
    "ACTION_PROPRIO_NORMALIZATION_TYPE": NormalizationType.BOUNDS_Q99,
}

# Change Default Platform
elif ROBOT_PLATFORM == "REALWORLD":
    constants = REALWORLD_CONSTANTS
else:
    return "REALWORLD"  # Default fallback changed from "LIBERO"

2. prismatic/vla/datasets/rlds/oxe/configs.py

I added two custom real-world dataset configurations. The key specifications are primary + wrist cameras, 8D proprioception, and JOINT_POS action encoding (absolute joint positions, not delta).

OXE_DATASET_CONFIGS = {
    # ... existing configs ...

    ### Tienkung real robot datasets
    "erase_small_whiteboard": {
        "image_obs_keys": {"primary": "image", "secondary": None, "wrist": "wrist_image"},
        "depth_obs_keys": {"primary": None, "secondary": None, "wrist": None},
        "state_obs_keys": ["state"],
        "state_encoding": StateEncoding.JOINT,           # 8D Joint angles
        "action_encoding": ActionEncoding.JOINT_POS,     # 8D absolute joint positions
    },
    "pick_and_stack_the_plates": {
        # Same configuration as above
    },
}

3. prismatic/vla/datasets/rlds/oxe/materialize.py

I extended the support for JOINT_POS Action Encoding. Unlike EEF_POS (where only the gripper is absolute), JOINT_POS treats all 8 dimensions as absolute.

def make_oxe_dataset_kwargs_and_weights(...):
    # Extend validation to include JOINT_POS
    if dataset_kwargs["action_encoding"] not in [
        ActionEncoding.EEF_POS,
        ActionEncoding.EEF_R6,
        ActionEncoding.JOINT_POS_BIMANUAL,
        ActionEncoding.EEF_QUAT_POS,
        ActionEncoding.JOINT_POS,          # NEW
    ]:
        raise ValueError(...)

    # Add JOINT_POS processing branch
    elif dataset_kwargs["action_encoding"] is ActionEncoding.JOINT_POS:
        dataset_kwargs["absolute_action_mask"] = [True] * 8      # All 8 dims are absolute
        dataset_kwargs["action_normalization_mask"] = [True] * 8 # Normalize all dims

4. prismatic/vla/datasets/rlds/oxe/transforms.py

I added the dataset transform function for the custom RLDS format and registered it to the standardization dictionary.

def tienkung_machine_dataset_transform(trajectory: Dict[str, Any]) -> Dict[str, Any]:
    """
    Transform for Tienkung real robot datasets.
    
    Dataset RLDS format (from tienkung_dataset_builder):
    - Action: [8] - absolute joint positions (7 joints + 1 gripper)
    - State: [8] - joint positions (7 joints + 1 gripper)
    - Images: 'image' (top camera), 'wrist_image' (right/wrist camera)
    """
    # Map 'state' to 'proprio' for consistency with other datasets
    if "state" in trajectory["observation"]:
        trajectory["observation"]["proprio"] = trajectory["observation"]["state"]

    # Action normalization is handled automatically by normalize_action_and_proprio
    return trajectory


OXE_STANDARDIZATION_TRANSFORMS = {
    # ... existing transforms ...

    ### Tienkung real robot datasets
    "erase_small_whiteboard": tienkung_machine_dataset_transform,
    "pick_and_stack_the_plates": tienkung_machine_dataset_transform,
}

Could anyone confirm if handling JOINT_POS with absolute_action_mask = [True] * 8 is fully sufficient for the OpenVLA training pipeline, or if there are other modifications I need to make?

Thanks in advance!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions