Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gemini comp_hets memory issue #921

Open
8nb24 opened this issue Feb 28, 2019 · 3 comments
Open

gemini comp_hets memory issue #921

8nb24 opened this issue Feb 28, 2019 · 3 comments

Comments

@8nb24
Copy link

8nb24 commented Feb 28, 2019

Output of gemini --version: gemini 0.20.1

...

When running gemini comp_hets on a large database (~1800 individuals WGS) I get the following error:

Traceback (most recent call last):
File "/usr/local/apps/gemini/0.20.1/bin/gemini", line 7, in
gemini_main.main()
File "/usr/local/Anaconda/envs_app/gemini/0.20.1/lib/python2.7/site-packages/gemini/gemini_main.py", line 1248, in main
args.func(parser, args)
File "/usr/local/Anaconda/envs_app/gemini/0.20.1/lib/python2.7/site-packages/gemini/gemini_main.py", line 710, in comp_hets_fn
CompoundHet(args).run()
File "/usr/local/Anaconda/envs_app/gemini/0.20.1/lib/python2.7/site-packages/gemini/gim.py", line 307, in run
for i, s in enumerate(self.report_candidates()):
File "/usr/local/Anaconda/envs_app/gemini/0.20.1/lib/python2.7/site-packages/gemini/gim.py", line 213, in report_candidates
for gene, li in self.candidates():
File "/usr/local/Anaconda/envs_app/gemini/0.20.1/lib/python2.7/site-packages/gemini/gim.py", line 459, in candidates
for grp, li in self.gen_candidates('gene'):
File "/usr/local/Anaconda/envs_app/gemini/0.20.1/lib/python2.7/site-packages/gemini/gim.py", line 115, in gen_candidates
self.gq.run(q, needs_genotypes=True)
File "/usr/local/Anaconda/envs_app/gemini/0.20.1/lib/python2.7/site-packages/gemini/GeminiQuery.py", line 653, in run
self.result_proxy = res = iter(self._apply_query())
File "/usr/local/Anaconda/envs_app/gemini/0.20.1/lib/python2.7/site-packages/gemini/GeminiQuery.py", line 907, in _apply_query
res = self._execute_query()
File "/usr/local/Anaconda/envs_app/gemini/0.20.1/lib/python2.7/site-packages/gemini/GeminiQuery.py", line 879, in _execute_query
res = self.conn.execute(sql.text(self.query))
File "/usr/local/Anaconda/envs_app/gemini/0.20.1/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 1176, in execute
bind, close_with_result=True).execute(clause, params or {})
File "/usr/local/Anaconda/envs_app/gemini/0.20.1/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 948, in execute
return meth(self, multiparams, params)
File "/usr/local/Anaconda/envs_app/gemini/0.20.1/lib/python2.7/site-packages/sqlalchemy/sql/elements.py", line 269, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File "/usr/local/Anaconda/envs_app/gemini/0.20.1/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1060, in _execute_clauseelement
compiled_sql, distilled_params
File "/usr/local/Anaconda/envs_app/gemini/0.20.1/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1200, in _execute_context
context)
File "/usr/local/Anaconda/envs_app/gemini/0.20.1/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1416, in _handle_dbapi_exception
util.reraise(*exc_info)
File "/usr/local/Anaconda/envs_app/gemini/0.20.1/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1193, in _execute_context
context)
File "/usr/local/Anaconda/envs_app/gemini/0.20.1/lib/python2.7/site-packages/sqlalchemy/engine/default.py", line 507, in do_execute
cursor.execute(statement, parameters) MemoryError

I attempted to run this command on a large memory node allocated specifically to this task unsuccessfully. I am wondering if there is an alternative way to store the database that would alleviate this issue or how you would otherwise advise?

@brentp
Copy link
Collaborator

brentp commented Feb 28, 2019

hmm. I'll have a look and see if I can reduce the memory use a bit or see why this might be happening. Even with 1800 samples, it shouldn't use much memory.

@8nb24
Copy link
Author

8nb24 commented Feb 28, 2019

Thanks for looking. I got this same error with and without bcolz_index made. I looked at the node statistics and it looks like it never exceeded 3.5G of memory in use.

@brentp
Copy link
Collaborator

brentp commented Mar 6, 2019

could you add --filter " gene != '' " to you comp_hets call? or if you already have a --filter, add AND gene != '' ?
and let me know if that reduces the memory use?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants