Hi OpenVLA team,
Has anyone adapted the openvla-oft project for a 7-DOF robot arm? I collected a set of real-world data for a 7-DOF robot arm where both the state and action are in joint angles (specifically, 7 joint angles + 1 gripper state, making it an 8D vector).
To accommodate this specific format, I devised the following settings and modifications. I would greatly appreciate it if you could review these modifications and let me know if this is the correct approach.
1. prismatic/vla/constants.py
I added the real-world platform constants and adjusted the platform branching to default to REALWORLD.
# Add REALWORLD Platform Constants
REALWORLD_CONSTANTS = {
"NUM_ACTIONS_CHUNK": 8,
"ACTION_DIM": 8, # 7 joint angles + 1 gripper
"PROPRIO_DIM": 8,
"ACTION_PROPRIO_NORMALIZATION_TYPE": NormalizationType.BOUNDS_Q99,
}
# Change Default Platform
elif ROBOT_PLATFORM == "REALWORLD":
constants = REALWORLD_CONSTANTS
else:
return "REALWORLD" # Default fallback changed from "LIBERO"
2. prismatic/vla/datasets/rlds/oxe/configs.py
I added two custom real-world dataset configurations. The key specifications are primary + wrist cameras, 8D proprioception, and JOINT_POS action encoding (absolute joint positions, not delta).
OXE_DATASET_CONFIGS = {
# ... existing configs ...
### Tienkung real robot datasets
"erase_small_whiteboard": {
"image_obs_keys": {"primary": "image", "secondary": None, "wrist": "wrist_image"},
"depth_obs_keys": {"primary": None, "secondary": None, "wrist": None},
"state_obs_keys": ["state"],
"state_encoding": StateEncoding.JOINT, # 8D Joint angles
"action_encoding": ActionEncoding.JOINT_POS, # 8D absolute joint positions
},
"pick_and_stack_the_plates": {
# Same configuration as above
},
}
3. prismatic/vla/datasets/rlds/oxe/materialize.py
I extended the support for JOINT_POS Action Encoding. Unlike EEF_POS (where only the gripper is absolute), JOINT_POS treats all 8 dimensions as absolute.
def make_oxe_dataset_kwargs_and_weights(...):
# Extend validation to include JOINT_POS
if dataset_kwargs["action_encoding"] not in [
ActionEncoding.EEF_POS,
ActionEncoding.EEF_R6,
ActionEncoding.JOINT_POS_BIMANUAL,
ActionEncoding.EEF_QUAT_POS,
ActionEncoding.JOINT_POS, # NEW
]:
raise ValueError(...)
# Add JOINT_POS processing branch
elif dataset_kwargs["action_encoding"] is ActionEncoding.JOINT_POS:
dataset_kwargs["absolute_action_mask"] = [True] * 8 # All 8 dims are absolute
dataset_kwargs["action_normalization_mask"] = [True] * 8 # Normalize all dims
4. prismatic/vla/datasets/rlds/oxe/transforms.py
I added the dataset transform function for the custom RLDS format and registered it to the standardization dictionary.
def tienkung_machine_dataset_transform(trajectory: Dict[str, Any]) -> Dict[str, Any]:
"""
Transform for Tienkung real robot datasets.
Dataset RLDS format (from tienkung_dataset_builder):
- Action: [8] - absolute joint positions (7 joints + 1 gripper)
- State: [8] - joint positions (7 joints + 1 gripper)
- Images: 'image' (top camera), 'wrist_image' (right/wrist camera)
"""
# Map 'state' to 'proprio' for consistency with other datasets
if "state" in trajectory["observation"]:
trajectory["observation"]["proprio"] = trajectory["observation"]["state"]
# Action normalization is handled automatically by normalize_action_and_proprio
return trajectory
OXE_STANDARDIZATION_TRANSFORMS = {
# ... existing transforms ...
### Tienkung real robot datasets
"erase_small_whiteboard": tienkung_machine_dataset_transform,
"pick_and_stack_the_plates": tienkung_machine_dataset_transform,
}
Could anyone confirm if handling JOINT_POS with absolute_action_mask = [True] * 8 is fully sufficient for the OpenVLA training pipeline, or if there are other modifications I need to make?
Thanks in advance!
Hi OpenVLA team,
Has anyone adapted the
openvla-oftproject for a 7-DOF robot arm? I collected a set of real-world data for a 7-DOF robot arm where both the state and action are in joint angles (specifically, 7 joint angles + 1 gripper state, making it an 8D vector).To accommodate this specific format, I devised the following settings and modifications. I would greatly appreciate it if you could review these modifications and let me know if this is the correct approach.
1.
prismatic/vla/constants.pyI added the real-world platform constants and adjusted the platform branching to default to
REALWORLD.2.
prismatic/vla/datasets/rlds/oxe/configs.pyI added two custom real-world dataset configurations. The key specifications are primary + wrist cameras, 8D proprioception, and
JOINT_POSaction encoding (absolute joint positions, not delta).3.
prismatic/vla/datasets/rlds/oxe/materialize.pyI extended the support for
JOINT_POSAction Encoding. UnlikeEEF_POS(where only the gripper is absolute),JOINT_POStreats all 8 dimensions as absolute.4.
prismatic/vla/datasets/rlds/oxe/transforms.pyI added the dataset transform function for the custom RLDS format and registered it to the standardization dictionary.
Could anyone confirm if handling
JOINT_POSwithabsolute_action_mask = [True] * 8is fully sufficient for the OpenVLA training pipeline, or if there are other modifications I need to make?Thanks in advance!