You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In version v1.1 of Torchmetrics, in total five new metrics have been added, bringing the total number of metrics up to 128! In particular, we have two new exciting metrics for evaluating your favorite generative models for images.
Perceptual Path length
Introduced in the famous StyleGAN paper back in 2018 the Perceptual path length metric is used to quantify how smoothly a generator manages to interpolate between points in its latent space.
Why does the smoothness of the latent space of your generative model matter? Assume you find an image at some point in your latent space that generates an image you like, but you would like to see if you could find a better one if you slightly change the latent point it was generated from. If your latent space could be smoother, this because very hard because even small changes to the latent point can lead to large changes in the generated image.
CLIP image quality assessment
CLIP image quality assessment (CLIPIQA) is a very recently proposed metric in this paper. The metrics build on the OpenAI CLIP model, which is a multi-modal model for connecting text and images. The core idea behind the metric is that different properties of an image can be assessed by measuring how similar the CLIP embedding of the image is to the respective CLIP embedding of a positive and negative prompt for that given property.
VIF, Edit, and SA-SDR
VisualInformationFidelity has been added to the image package. The first proposed in this paper can be used to automatically assess the quality of images in a perceptual manner.
EditDistance have been added to the text package. A very classical metric for text that simply measures the amount of characters that need to be substituted, inserted, or deleted, to transform the predicted text into the reference text.
SourceAggregatedSignalDistortionRatio has been added to the audio package. Metric was originally proposed in this paper and is an improvement over the classical Signal-to-Distortion Ratio (SDR) metric (also found in torchmetrics) that provides more stable gradients during training when trying to train models for style source separation.
[1.1.0] - 2022-08-22
Added
Added source aggregated signal-to-distortion ratio (SA-SDR) metric (#1882
Added VisualInformationFidelity to image package (#1830)
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
In version v1.1 of Torchmetrics, in total five new metrics have been added, bringing the total number of metrics up to 128! In particular, we have two new exciting metrics for evaluating your favorite generative models for images.
Perceptual Path length
Introduced in the famous StyleGAN paper back in 2018 the Perceptual path length metric is used to quantify how smoothly a generator manages to interpolate between points in its latent space.
Why does the smoothness of the latent space of your generative model matter? Assume you find an image at some point in your latent space that generates an image you like, but you would like to see if you could find a better one if you slightly change the latent point it was generated from. If your latent space could be smoother, this because very hard because even small changes to the latent point can lead to large changes in the generated image.
CLIP image quality assessment
CLIP image quality assessment (CLIPIQA) is a very recently proposed metric in this paper. The metrics build on the OpenAI CLIP model, which is a multi-modal model for connecting text and images. The core idea behind the metric is that different properties of an image can be assessed by measuring how similar the CLIP embedding of the image is to the respective CLIP embedding of a positive and negative prompt for that given property.
VIF, Edit, and SA-SDR
VisualInformationFidelity
has been added to the image package. The first proposed in this paper can be used to automatically assess the quality of images in a perceptual manner.EditDistance
have been added to the text package. A very classical metric for text that simply measures the amount of characters that need to be substituted, inserted, or deleted, to transform the predicted text into the reference text.SourceAggregatedSignalDistortionRatio
has been added to the audio package. Metric was originally proposed in this paper and is an improvement over the classical Signal-to-Distortion Ratio (SDR) metric (also found in torchmetrics) that provides more stable gradients during training when trying to train models for style source separation.[1.1.0] - 2022-08-22
Added
VisualInformationFidelity
to image package (#1830)EditDistance
to text package (#1906)top_k
argument toRetrievalMRR
in retrieval package (#1961)"segm"
and"bbox"
detection inMeanAveragePrecision
at the same time (#1928)PerceptualPathLength
to image package (#1939)MeanSquaredError
(#1937)extended_summary
toMeanAveragePrecision
such that precision, recall, iou can be easily returned (#1983)ClipScore
if long captions are detected and truncate (#2001)CLIPImageQualityAssessment
to multimodal package (#1931)metric_state
to all metrics for users to investigate currently stored tensors in memory (#2006)Full Changelog: v1.0.0...v1.1.0
New Contributors since
v1.0.0
Contributors
@bojobo, @lucadiliello, @quancs, @SkafteNicki
If we forgot someone due to not matching commit email with GitHub account, let us know :]
This discussion was created from the release Into Generative AI.
Beta Was this translation helpful? Give feedback.
All reactions