Inefficient ProxyDatabase slicing

Operations on a ProxyDatabase that loop over all records slow down/do not have constant iteration time (as measured by tqdm's it/s), which is especially noticeable for large databases. Take for example slicing:
https://github.com/fzhu2e/cfr/blob/a99c13bfeae4643b64c63be339894b48abd7f5ef/cfr/proxy.py#L1730-L1746

The operation `new += spobj` calls `.refresh()` every time, which itself iterates over all proxies. Therefore, it is O(n^2), while it could easily be O(n) if they were added to a dictionary first, then converted to a ProxyDatabase at the end. Is there a reason this is not being done?

	def slice(self, timespan):
	''' Slice the records in the proxy database.

	Args:
	timespan (tuple or list):
	The list of time points for slicing, whose length must be even.
	When there are n time points, the output Series includes n/2 segments.
	For example, if timespan = [a, b], then the sliced output includes one segment [a, b];
	if timespan = [a, b, c, d], then the sliced output includes segment [a, b] and segment [c, d].
	'''
	new = ProxyDatabase()
	for pid, pobj in tqdm(self.records.items(), total=self.nrec, desc='Slicing ProxyRecord'):
	spobj = pobj.slice(timespan=timespan)
	new += spobj

	new.refresh()
	return new

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Inefficient ProxyDatabase slicing #20

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Inefficient ProxyDatabase slicing #20

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions