[Question] Adapting OpenVLA-OFT for 7-DOF Robot Arm (Joint Angle Action Space)

Hi OpenVLA team,

Has anyone adapted the `openvla-oft` project for a 7-DOF robot arm? I collected a set of real-world data for a 7-DOF robot arm where both the state and action are in joint angles (specifically, 7 joint angles + 1 gripper state, making it an 8D vector).

To accommodate this specific format, I devised the following settings and modifications. I would greatly appreciate it if you could review these modifications and let me know if this is the correct approach.

### 1. `prismatic/vla/constants.py`

I added the real-world platform constants and adjusted the platform branching to default to `REALWORLD`.

```python
# Add REALWORLD Platform Constants
REALWORLD_CONSTANTS = {
    "NUM_ACTIONS_CHUNK": 8,
    "ACTION_DIM": 8,              # 7 joint angles + 1 gripper
    "PROPRIO_DIM": 8,
    "ACTION_PROPRIO_NORMALIZATION_TYPE": NormalizationType.BOUNDS_Q99,
}

# Change Default Platform
elif ROBOT_PLATFORM == "REALWORLD":
    constants = REALWORLD_CONSTANTS
else:
    return "REALWORLD"  # Default fallback changed from "LIBERO"

```

### 2. `prismatic/vla/datasets/rlds/oxe/configs.py`

I added two custom real-world dataset configurations. The key specifications are primary + wrist cameras, 8D proprioception, and `JOINT_POS` action encoding (absolute joint positions, not delta).

```python
OXE_DATASET_CONFIGS = {
    # ... existing configs ...

    ### Tienkung real robot datasets
    "erase_small_whiteboard": {
        "image_obs_keys": {"primary": "image", "secondary": None, "wrist": "wrist_image"},
        "depth_obs_keys": {"primary": None, "secondary": None, "wrist": None},
        "state_obs_keys": ["state"],
        "state_encoding": StateEncoding.JOINT,           # 8D Joint angles
        "action_encoding": ActionEncoding.JOINT_POS,     # 8D absolute joint positions
    },
    "pick_and_stack_the_plates": {
        # Same configuration as above
    },
}

```

### 3. `prismatic/vla/datasets/rlds/oxe/materialize.py`

I extended the support for `JOINT_POS` Action Encoding. Unlike `EEF_POS` (where only the gripper is absolute), `JOINT_POS` treats all 8 dimensions as absolute.

```python
def make_oxe_dataset_kwargs_and_weights(...):
    # Extend validation to include JOINT_POS
    if dataset_kwargs["action_encoding"] not in [
        ActionEncoding.EEF_POS,
        ActionEncoding.EEF_R6,
        ActionEncoding.JOINT_POS_BIMANUAL,
        ActionEncoding.EEF_QUAT_POS,
        ActionEncoding.JOINT_POS,          # NEW
    ]:
        raise ValueError(...)

    # Add JOINT_POS processing branch
    elif dataset_kwargs["action_encoding"] is ActionEncoding.JOINT_POS:
        dataset_kwargs["absolute_action_mask"] = [True] * 8      # All 8 dims are absolute
        dataset_kwargs["action_normalization_mask"] = [True] * 8 # Normalize all dims

```

### 4. `prismatic/vla/datasets/rlds/oxe/transforms.py`

I added the dataset transform function for the custom RLDS format and registered it to the standardization dictionary.

```python
def tienkung_machine_dataset_transform(trajectory: Dict[str, Any]) -> Dict[str, Any]:
    """
    Transform for Tienkung real robot datasets.
    
    Dataset RLDS format (from tienkung_dataset_builder):
    - Action: [8] - absolute joint positions (7 joints + 1 gripper)
    - State: [8] - joint positions (7 joints + 1 gripper)
    - Images: 'image' (top camera), 'wrist_image' (right/wrist camera)
    """
    # Map 'state' to 'proprio' for consistency with other datasets
    if "state" in trajectory["observation"]:
        trajectory["observation"]["proprio"] = trajectory["observation"]["state"]

    # Action normalization is handled automatically by normalize_action_and_proprio
    return trajectory


OXE_STANDARDIZATION_TRANSFORMS = {
    # ... existing transforms ...

    ### Tienkung real robot datasets
    "erase_small_whiteboard": tienkung_machine_dataset_transform,
    "pick_and_stack_the_plates": tienkung_machine_dataset_transform,
}

```

Could anyone confirm if handling `JOINT_POS` with `absolute_action_mask = [True] * 8` is fully sufficient for the OpenVLA training pipeline, or if there are other modifications I need to make?

Thanks in advance!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] Adapting OpenVLA-OFT for 7-DOF Robot Arm (Joint Angle Action Space) #149

1. `prismatic/vla/constants.py`

2. `prismatic/vla/datasets/rlds/oxe/configs.py`

3. `prismatic/vla/datasets/rlds/oxe/materialize.py`

4. `prismatic/vla/datasets/rlds/oxe/transforms.py`

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Question] Adapting OpenVLA-OFT for 7-DOF Robot Arm (Joint Angle Action Space) #149

Description

1. prismatic/vla/constants.py

2. prismatic/vla/datasets/rlds/oxe/configs.py

3. prismatic/vla/datasets/rlds/oxe/materialize.py

4. prismatic/vla/datasets/rlds/oxe/transforms.py

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

1. `prismatic/vla/constants.py`

2. `prismatic/vla/datasets/rlds/oxe/configs.py`

3. `prismatic/vla/datasets/rlds/oxe/materialize.py`

4. `prismatic/vla/datasets/rlds/oxe/transforms.py`