You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I would like to propose incorporating an essential evaluation metric for 3D talking heads into the TorchMetrics library: Lip Vertex Error (LVE) . This metric is widely used in speech-driven facial animation research to assess lip synchronization accuracy respectively.
Motivation
Current TorchMetrics offerings lack dedicated metrics for evaluating 3D talking heads, particularly in assessing the quality of lip synchronization . I think this metric also fits in multimodal folder of this library.
Pitch
The Lip Vertex Error (LVE) metric evaluates the quality of lip synchronization in 3D facial animations by measuring the maximum Euclidean distance (L2 error) between corresponding lip vertices of the generated and ground truth meshes for each frame. This metric assesses how accurately the animated lip movements align with the intended speech input.
rittik9
changed the title
Proposal to Add Lip Vertex Error (LVE) Metrics for 3D Talking Heads Evaluation
Vertex Error (LVE) Metrics for 3D Talking Heads Evaluation
Mar 11, 2025
rittik9
changed the title
Vertex Error (LVE) Metrics for 3D Talking Heads Evaluation
Lip Vertex Error (LVE) Metrics for 3D Talking Heads Evaluation
Mar 11, 2025
🚀 Feature
I would like to propose incorporating an essential evaluation metric for 3D talking heads into the TorchMetrics library: Lip Vertex Error (LVE) . This metric is widely used in speech-driven facial animation research to assess lip synchronization accuracy respectively.
Motivation
Current TorchMetrics offerings lack dedicated metrics for evaluating 3D talking heads, particularly in assessing the quality of lip synchronization . I think this metric also fits in multimodal folder of this library.
Pitch
The Lip Vertex Error (LVE) metric evaluates the quality of lip synchronization in 3D facial animations by measuring the maximum Euclidean distance (L2 error) between corresponding lip vertices of the generated and ground truth meshes for each frame. This metric assesses how accurately the animated lip movements align with the intended speech input.
References
Paper: MeshTalk
Additional context
I would like to open a PR for the same.
The text was updated successfully, but these errors were encountered: