-
Notifications
You must be signed in to change notification settings - Fork 9
Open
Description
Operations on a ProxyDatabase that loop over all records slow down/do not have constant iteration time (as measured by tqdm's it/s), which is especially noticeable for large databases. Take for example slicing:
Lines 1730 to 1746 in a99c13b
| def slice(self, timespan): | |
| ''' Slice the records in the proxy database. | |
| Args: | |
| timespan (tuple or list): | |
| The list of time points for slicing, whose length must be even. | |
| When there are n time points, the output Series includes n/2 segments. | |
| For example, if timespan = [a, b], then the sliced output includes one segment [a, b]; | |
| if timespan = [a, b, c, d], then the sliced output includes segment [a, b] and segment [c, d]. | |
| ''' | |
| new = ProxyDatabase() | |
| for pid, pobj in tqdm(self.records.items(), total=self.nrec, desc='Slicing ProxyRecord'): | |
| spobj = pobj.slice(timespan=timespan) | |
| new += spobj | |
| new.refresh() | |
| return new |
The operation new += spobj calls .refresh() every time, which itself iterates over all proxies. Therefore, it is O(n^2), while it could easily be O(n) if they were added to a dictionary first, then converted to a ProxyDatabase at the end. Is there a reason this is not being done?
Metadata
Metadata
Assignees
Labels
No labels