Thank you for your contribution!
I used your dataset for my Master's Thesis, and when describing how my experiments were designed, I was planning on showing an example of one of the func_before rows. However, I found inconsistencies in the data.
I split the MSR_data_cleaned.csv into datasubsets, where one datasubset consist of rows where lang="CPP" and CWE ID="CWE-119". In order to find a suitable example, I used the following RBQL query:
SELECT a1, a['CVE ID'], a["CVE Page"], a.commit_id, a.vul, a.func_before, a.func_after ORDER BY len(a.func_before) WHERE a.vul == "1"
This query gets the CVE ID, CVE Page, commit id, etc. that are labeled as vulnerable and orders them by the length of the function in func_before.
While inspecting the func_before on index 177908 I found it strange that some lines were repeated, as shown below:
void Splash::vertFlipImage(SplashBitmap *img, int width, int height,
int nComps) {
Guchar *lineBuf;
Guchar *p0, *p1;
int w;
w = width * nComps;
Guchar *lineBuf;
Guchar *p0, *p1;
int w;
w = width * nComps;
lineBuf = (Guchar *)gmalloc(w);
p0 += width, p1 -= width) {
memcpy(lineBuf, p0, width);
memcpy(p0, p1, width);
memcpy(p1, lineBuf, width);
}
}
I found the same while inspecting the func_after on index 177908:
void Splash::vertFlipImage(SplashBitmap *img, int width, int height,
int nComps) {
Guchar *lineBuf;
Guchar *p0, *p1;
int w;
w = width * nComps;
Guchar *lineBuf;
Guchar *p0, *p1;
int w;
if (unlikely(img->data == NULL)) {
error(errInternal, -1, ""img->data is NULL in Splash::vertFlipImage"");
return;
}
w = width * nComps;
lineBuf = (Guchar *)gmalloc(w);
p0 += width, p1 -= width) {
memcpy(lineBuf, p0, width);
memcpy(p0, p1, width);
memcpy(p1, lineBuf, width);
}
}
As far as I can tell, this func_after at index (177908), was extracted from the commit-sha bbc2d8918fe234b7ef2c480eb148943922cc0959 from poppler's previous Git-repository. This link is now broken. The poppler project has later been made accessible from https://gitlab.freedesktop.org/poppler/poppler.
When I looked up the commit bbc2d8918fe234b7ef2c480eb148943922cc0959 at their current Git-repository I found this:
void Splash::vertFlipImage(SplashBitmap *img, int width, int height,
int nComps) {
Guchar *lineBuf;
Guchar *p0, *p1;
int w;
if (unlikely(img->data == NULL)) {
error(errInternal, -1, "img->data is NULL in Splash::vertFlipImage");
return;
}
w = width * nComps;
lineBuf = (Guchar *)gmalloc(w);
for (p0 = img->data, p1 = img->data + (height - 1) * w;
p0 < p1;
p0 += w, p1 -= w) {
memcpy(lineBuf, p0, w);
memcpy(p0, p1, w);
memcpy(p1, lineBuf, w);
}
if (img->alpha) {
for (p0 = img->alpha, p1 = img->alpha + (height - 1) * width;
p0 < p1;
p0 += width, p1 -= width) {
memcpy(lineBuf, p0, width);
memcpy(p0, p1, width);
memcpy(p1, lineBuf, width);
}
}
gfree(lineBuf);
}
When I compare the func_after with the commit in the poppler repository, I find a diff, as shown in the image below:
When I go to a commit before this solution (https://gitlab.freedesktop.org/poppler/poppler/-/blob/d46b673c46a72132fb3918b64733be552e35952f/splash/Splash.cc#L4421-4446), I again find a mismatch with the expected value in the func_before, as shown in the image below:
Since I found what seems like inconsistencies in both the func_before and func_after on the first example I looked at, I have reason to believe that there are additional inconsistencies in the func_before and func_after columns.
Thank you for your contribution!
I used your dataset for my Master's Thesis, and when describing how my experiments were designed, I was planning on showing an example of one of the func_before rows. However, I found inconsistencies in the data.
I split the MSR_data_cleaned.csv into datasubsets, where one datasubset consist of rows where
lang="CPP"andCWE ID="CWE-119". In order to find a suitable example, I used the following RBQL query:SELECT a1, a['CVE ID'], a["CVE Page"], a.commit_id, a.vul, a.func_before, a.func_after ORDER BY len(a.func_before) WHERE a.vul == "1"This query gets the CVE ID, CVE Page, commit id, etc. that are labeled as vulnerable and orders them by the length of the function in func_before.
While inspecting the func_before on index 177908 I found it strange that some lines were repeated, as shown below:
I found the same while inspecting the func_after on index 177908:
As far as I can tell, this func_after at index (177908), was extracted from the commit-sha bbc2d8918fe234b7ef2c480eb148943922cc0959 from poppler's previous Git-repository. This link is now broken. The poppler project has later been made accessible from https://gitlab.freedesktop.org/poppler/poppler.
When I looked up the commit bbc2d8918fe234b7ef2c480eb148943922cc0959 at their current Git-repository I found this:
When I compare the func_after with the commit in the poppler repository, I find a diff, as shown in the image below:
When I go to a commit before this solution (https://gitlab.freedesktop.org/poppler/poppler/-/blob/d46b673c46a72132fb3918b64733be552e35952f/splash/Splash.cc#L4421-4446), I again find a mismatch with the expected value in the func_before, as shown in the image below:
Since I found what seems like inconsistencies in both the func_before and func_after on the first example I looked at, I have reason to believe that there are additional inconsistencies in the func_before and func_after columns.