Hi Sebastian,
I am currently working with massive MassPeaks lists of MALDI-FTICR data. By massive I mean
>length(e$msDataPeaks)
[1] 41371 # number of spectra
> mean(lengths(e$msDataPeaks))
[1] 2027.565 # average number of peaks per spectrum
Everything works flawlessly, but I noticed i) a huge memory usage (can reach up to 120 GBs!) when calling mergeMassPeaks and ii) huge memory usage + sometimes error messages when calling filterMassPeaks. Both were called after binPeaks. The error message is as follows:
Fehler in which(is.na(m)) :
lange Vektoren noch nicht unterstützt: ../../src/include/Rinlinedfuns.h:138
I know that internally both functions construct intensity matrices which blows up memory usage. Did you ever face such issues? What could you recommend in such situation?
Suggestion
For the internal construction of the intensity matrices, do you think it would be a better idea to construct spars matrices? for examples instead of the current implementation of .as.matrix.MassObjectList to use something like this :
.mass = unlist(lapply(focusRegion, MALDIquant::mass))
.intensity = unlist(lapply(focusRegion, MALDIquant::intensity))
.uniqueMass = sort.int(unique(.mass))
n = lengths(focusRegion)
r = rep.int(seq_along(focusRegion), n)
i = findInterval(.mass, .uniqueMass)
sparmat = Matrix::sparseMatrix(i = r, j = i, x = .intensity,
dimnames = list(NULL, .uniqueMass),
dims = c(length(focusRegion), length(.uniqueMass)))
what do you think?