Skip to content

Commit 751e7a7

Browse files
committed
update: Rewrite update script
New update script uses futures to dynamically schedule many smaller tasks between a constant number of threads, instead of statically assigning a single long running task to each thread. This results in better CPU saturation. Database handles are not shared between threads anymore, instead the main thread is used to commit results of other threads into the database. This trades locking on database access for serialization costs - since multiprocessing is used, values returned from futures are pickled. (although in practice that depends on ProcessPool configuration)
1 parent b58cc27 commit 751e7a7

File tree

2 files changed

+444
-0
lines changed

2 files changed

+444
-0
lines changed

Diff for: elixir/data.py

+13
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,14 @@ def iter(self, dummy=False):
7272
if dummy:
7373
yield maxId, None, None, None
7474

75+
def exists(self, idx, line_num):
76+
entries = deflist_regex.findall(self.data)
77+
for id, _, line, _ in entries:
78+
if id == idx and int(line) == line_num:
79+
return True
80+
81+
return False
82+
7583
def append(self, id, type, line, family):
7684
if type not in defTypeD:
7785
return
@@ -165,6 +173,8 @@ def exists(self, key):
165173
def get(self, key):
166174
key = autoBytes(key)
167175
p = self.db.get(key)
176+
if p is None:
177+
return None
168178
p = self.ctype(p)
169179
return p
170180

@@ -180,6 +190,9 @@ def put(self, key, val, sync=False):
180190
if sync:
181191
self.db.sync()
182192

193+
def sync(self):
194+
self.db.sync()
195+
183196
def close(self):
184197
self.db.close()
185198

0 commit comments

Comments
 (0)