Skip to content

Inconsistent data in func_before and func_after row (index 177908) in MSR_data_cleaned.csv compared to source #12

@MartinJrgsn

Description

@MartinJrgsn

Thank you for your contribution!

I used your dataset for my Master's Thesis, and when describing how my experiments were designed, I was planning on showing an example of one of the func_before rows. However, I found inconsistencies in the data.

I split the MSR_data_cleaned.csv into datasubsets, where one datasubset consist of rows where lang="CPP" and CWE ID="CWE-119". In order to find a suitable example, I used the following RBQL query:

SELECT a1, a['CVE ID'], a["CVE Page"], a.commit_id, a.vul, a.func_before, a.func_after ORDER BY len(a.func_before) WHERE a.vul == "1"

This query gets the CVE ID, CVE Page, commit id, etc. that are labeled as vulnerable and orders them by the length of the function in func_before.

While inspecting the func_before on index 177908 I found it strange that some lines were repeated, as shown below:

void Splash::vertFlipImage(SplashBitmap *img, int width, int height,
			   int nComps) {
  Guchar *lineBuf;
  Guchar *p0, *p1;
  int w;

  w = width * nComps;
   Guchar *lineBuf;
   Guchar *p0, *p1;
   int w;
 
   w = width * nComps;
   lineBuf = (Guchar *)gmalloc(w);
	 p0 += width, p1 -= width) {
      memcpy(lineBuf, p0, width);
      memcpy(p0, p1, width);
      memcpy(p1, lineBuf, width);
    }
  }

I found the same while inspecting the func_after on index 177908:

void Splash::vertFlipImage(SplashBitmap *img, int width, int height,
			   int nComps) {
  Guchar *lineBuf;
  Guchar *p0, *p1;
  int w;

  w = width * nComps;
   Guchar *lineBuf;
   Guchar *p0, *p1;
   int w;
  
  if (unlikely(img->data == NULL)) {
    error(errInternal, -1, ""img->data is NULL in Splash::vertFlipImage"");
    return;
  }
 
   w = width * nComps;
   lineBuf = (Guchar *)gmalloc(w);
	 p0 += width, p1 -= width) {
      memcpy(lineBuf, p0, width);
      memcpy(p0, p1, width);
      memcpy(p1, lineBuf, width);
    }
  }

As far as I can tell, this func_after at index (177908), was extracted from the commit-sha bbc2d8918fe234b7ef2c480eb148943922cc0959 from poppler's previous Git-repository. This link is now broken. The poppler project has later been made accessible from https://gitlab.freedesktop.org/poppler/poppler.

When I looked up the commit bbc2d8918fe234b7ef2c480eb148943922cc0959 at their current Git-repository I found this:

void Splash::vertFlipImage(SplashBitmap *img, int width, int height,
			   int nComps) {
  Guchar *lineBuf;
  Guchar *p0, *p1;
  int w;
  
  if (unlikely(img->data == NULL)) {
    error(errInternal, -1, "img->data is NULL in Splash::vertFlipImage");
    return;
  }

  w = width * nComps;
  lineBuf = (Guchar *)gmalloc(w);
  for (p0 = img->data, p1 = img->data + (height - 1) * w;
       p0 < p1;
       p0 += w, p1 -= w) {
    memcpy(lineBuf, p0, w);
    memcpy(p0, p1, w);
    memcpy(p1, lineBuf, w);
  }
  if (img->alpha) {
    for (p0 = img->alpha, p1 = img->alpha + (height - 1) * width;
	 p0 < p1;
	 p0 += width, p1 -= width) {
      memcpy(lineBuf, p0, width);
      memcpy(p0, p1, width);
      memcpy(p1, lineBuf, width);
    }
  }
  gfree(lineBuf);
}

When I compare the func_after with the commit in the poppler repository, I find a diff, as shown in the image below:

Image

When I go to a commit before this solution (https://gitlab.freedesktop.org/poppler/poppler/-/blob/d46b673c46a72132fb3918b64733be552e35952f/splash/Splash.cc#L4421-4446), I again find a mismatch with the expected value in the func_before, as shown in the image below:

Image

Since I found what seems like inconsistencies in both the func_before and func_after on the first example I looked at, I have reason to believe that there are additional inconsistencies in the func_before and func_after columns.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions