Skip to content

DASPK error after attempted model resurrection #1258

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
cgrambow opened this issue Jan 15, 2018 · 14 comments
Closed

DASPK error after attempted model resurrection #1258

cgrambow opened this issue Jan 15, 2018 · 14 comments
Labels
abandoned abandoned issue/PR as determined by actions bot stale stale issue/PR as determined by actions bot Type: Error

Comments

@cgrambow
Copy link

I am trying to run two separate hydrocarbon oxidation simulations, which run fine by themselves. Upon adding a phenolic species to each job, I get DASPK errors for both jobs after ~160 species. Both errors look like this:

Error: Trying to step from time 0.0 to 100000.0 resulted in a solver (DASPK) error
Resurrecting Model...
Error: Model Resurrection has failed

[...]

Traceback (most recent call last):
  File "/data1/cgrambow/Code/RMG-Py/rmg.py", line 173, in <module>
    main()
  File "/data1/cgrambow/Code/RMG-Py/rmg.py", line 167, in main
    rmg.execute(**kwargs)
  File "/data1/cgrambow/Code/RMG-Py/rmgpy/rmg/main.py", line 702, in execute
    simulatorSettings = simulatorSettings,
  File "rmgpy/solver/base.pyx", line 498, in rmgpy.solver.base.ReactionSystem.simulate (build/pyrex/rmgpy/solver/base.c:24537)
  File "rmgpy/solver/base.pyx", line 678, in rmgpy.solver.base.ReactionSystem.simulate (build/pyrex/rmgpy/solver/base.c:15263)
ValueError: invalidObjects could not be filled during resurrection process
 DASPK--  AT T (=R1) AND STEPSIZE H (=R2) THE
      In above,  R1 =  0.7945984920517D+05   R2 =  0.1598236102956D-04
 DASPK--  NONLINEAR SOLVER FAILED TO CONVERGE
 DASPK--  REPEATEDLY OR WITH ABS(H)=HMIN

Evidently, model resurrection was not able to fix the simulation. Any ideas on how to avoid the error?

@mjohnson541
Copy link
Contributor

So if you have the latest model resurrection related commit this should only happen if your edge is entirely empty. Is that the case?

@cgrambow
Copy link
Author

Maybe? Two lines above the traceback, it prints

Error: Edge species net rates: array([], dtype=float64)
Error: Network leak rates: array([], dtype=float64)

but the latest log about the edge after model enlargement was

The model edge has 157570 species and 409040 reactions

@mjohnson541
Copy link
Contributor

Ok, this may be related to how I fill variables before the first time step, although I'm pretty sure I tested failures on the first time step pretty well. Perhaps a more useful question might be why is it trying to go from 0.0 to 100000 sec in one step?...What time did that last simulation end at? What's the context of the last species it added? To my understanding it should try to step 1e-12 and then shrink it if necessary for the first step.

@cgrambow
Copy link
Author

This is what I have before the error for one of the simulations:

Conducting simulation of reaction system 1...
initializing surface ...
surface initialization complete
At time 5.9914e-01 s, species CCCCCCCOOC1C(O)=CC=CC1O[O](143898) at 0.00113370727027 exceeded the minimum rate
 for simulation interruption of 0.001
At time 5.9914e-01 s, species CCCCCCCOOC1C(O)=CC=CC1O[O](143898) at rate ratio 0.00113370727027 exceeded the m
inimum rate for moving to model core of 0.001
terminating simulation due to interrupt...


Adding species CCCCCCCOOC1C(O)=CC=CC1O[O](143898) to model core
Generating kinetics for new reactions...

Summary of Model Enlargement
---------------------------------
Added 1 new core species
    CCCCCCCOOC1C(O)=CC=CC1O[O](143898)
Created 0 new edge species
Moved 1 reactions from edge to core
    CCCCCCCOOC1[CH]C=CC=C1O(143560) + oxygen(3) <=> CCCCCCCOOC1C(O)=CC=CC1O[O](143898)
Added 0 new core reactions
Created 0 new edge reactions

After model enlargement:
    The model core has 153 species and 360 reactions
    The model edge has 157570 species and 409040 reactions

and this is what I have for the other simulation:

Conducting simulation of reaction system 1...
initializing surface ...
surface initialization complete
At time 4.4913e-05 s, species CCCCCCCCCCC1(CCCCC1)O[O](184640) at 0.00125938573324 exceeded the minimum rate f
or simulation interruption of 0.001
At time 4.4913e-05 s, species CCCCCCCCCCC1(CCCCC1)O[O](184640) at rate ratio 0.00125938573324 exceeded the min
imum rate for moving to model core of 0.001
terminating simulation due to interrupt...


Adding species CCCCCCCCCCC1(CCCCC1)O[O](184640) to model core
Generating kinetics for new reactions...

Summary of Model Enlargement
---------------------------------
Added 1 new core species
    CCCCCCCCCCC1(CCCCC1)O[O](184640)
Created 0 new edge species
Moved 1 reactions from edge to core
    CCCCCCCCCC[C]1CCCCC1(11) + oxygen(3) <=> CCCCCCCCCCC1(CCCCC1)O[O](184640)
Added 0 new core reactions
Created 0 new edge reactions

After model enlargement:
    The model core has 162 species and 506 reactions
    The model edge has 185155 species and 511073 reactions

No idea why it's trying to step that far.

@mjohnson541
Copy link
Contributor

Do you have filterReactions=True? Is this the first time it tried model resurrection in the run? Model resurrection seems to be behaving oddly here. Can you run the MR_test example (shouldn't take long ~2 min) and tell me if it ends with model resurrection failing?

The DASPK error is probably a result of you having bad thermo. I should also note that model resurrection even when it succeeds does not always fix your run...if the thermo is bad enough the model will continue to get DASPK errors each run causing it to resurrect again, which means you're really just adding the highest flux edge species at t=0 each iteration. Sometimes it will go through 5 or so of these cycles and then recover, other times it never recovers.

@cgrambow
Copy link
Author

Yes, filterReactions=True. Yes, it's the first occurrence of model resurrection. MR_test finishes successfully.

Most thermo is group additivity so it's totally possible that there is bad thermo in there; however, I don't really understand why I should suddenly be getting significantly worse thermo after adding phenol when the simulations run fine without it.

@mjohnson541
Copy link
Contributor

Hmm...so MR_test was design to test an issue that looks very similar to this. If it isn't that issue in order to debug it I need to be able to reproduce it. Do you have the core and edge seed mechanisms for one of these runs?

I've only seen DASPK errors avoid in two ways: get rid of the offending reactor or improve your thermo estimates. Why DASPK errors occur as a function of an RMG job is not particularly well understood, but I think adding the phenol could do it if its concentrated enough.

@cgrambow
Copy link
Author

I don't have the seed mechanisms right now. I just started a job in which I generate those, so I'll get those to you once the error occurs again.

Phenol is present at a concentration of 1%. How would I tell which thermo estimates are causing the issue?

@mjohnson541
Copy link
Contributor

Ok, send it to me when you get it and I'll take a look at the issue. In my experience it's usually the thermo you would expect to be sensitive. In this case since you've added phenol I think its likely the problem is the thermo involved in reactions with phenol, but it could still be something else. In my experiences it was just a matter of not using thermo libraries I should have been using, your case is significantly trickier. To be honest you might want to consider changing surrogates rather than trying to fix this problem.

@cgrambow
Copy link
Author

cgrambow commented Feb 7, 2018

It seems the model does not crash when running it without filterReactions, or at least not as soon as it did when turning filtering on. It's now at 190 species. However, turning filtering off is not really a viable option for me, because it took over a week to get to this point.

@jimchu10
Copy link

I also got the similar error in few runs these day. The unexpected thing was that after the error showed up, RMG seemed to stay on for few reaction systems until system 16 (the last line). Even though the last time said "surface initialization complete", the RMG job basically froze. Log file was not updated anymore, but the RMG server showed that the job is still active.

The last lines on log files are shown below:

Conducting simulation of reaction system 9...
initializing surface ...
surface initialization complete
In above, R1 = 0.1000000000000D-14 R2 = 0.1907348632813D-20
DASPK-- NONLINEAR SOLVER FAILED TO CONVERGE
DASPK-- REPEATEDLY OR WITH ABS(H)=HMIN
DASPK-- AT T (=R1) AND STEPSIZE H (=R2) THE
In above, R1 = 0.7000000000000D-14 R2 = 0.7629394531250D-20
DASPK-- NONLINEAR SOLVER FAILED TO CONVERGE
DASPK-- REPEATEDLY OR WITH ABS(H)=HMIN
DASPK-- AT T (=R1) AND STEPSIZE H (=R2) THE
In above, R1 = 0.5110000000000D-12 R2 = 0.4882812500000D-18
DASPK-- NONLINEAR SOLVER FAILED TO CONVERGE
DASPK-- REPEATEDLY OR WITH ABS(H)=HMIN
DASPK-- AT T (=R1) AND STEPSIZE H (=R2) THE
In above, R1 = 0.3000000000000D-14 R2 = 0.3814697265625D-20
DASPK-- ITERATION MATRIX IS SINGULAR.
DASPK-- AT T (=R1) AND STEPSIZE H (=R2) THE
In above, R1 = 0.9000000000000D-14 R2 = 0.3814697265625D-20
DASPK-- ITERATION MATRIX IS SINGULAR.
DASPK-- AT T (=R1) AND STEPSIZE H (=R2) THE
In above, R1 = 0.3000000000000D-14 R2 = 0.3814697265625D-20
DASPK-- NONLINEAR SOLVER FAILED TO CONVERGE
DASPK-- REPEATEDLY OR WITH ABS(H)=HMIN
DASPK-- AT T (=R1) AND STEPSIZE H (=R2) THE
In above, R1 = 0.1000000000000D-14 R2 = 0.1907348632813D-20
DASPK-- ITERATION MATRIX IS SINGULAR.
DASPK-- AT T (=R1) AND STEPSIZE H (=R2) THE
In above, R1 = 0.1000000000000D-14 R2 = 0.4768371582031D-21
DASPK-- NONLINEAR SOLVER FAILED TO CONVERGE
DASPK-- REPEATEDLY OR WITH ABS(H)=HMIN
DASPK-- AT T (=R1) AND STEPSIZE H (=R2) THE
In above, R1 = 0.1075233395875D-14 R2 = 0.4122301402165D-22
DASPK-- ITERATION MATRIX IS SINGULAR.
DASPK-- AT T (=R1) AND STEPSIZE H (=R2) THE
In above, R1 = 0.3000000000000D-14 R2 = 0.3814697265625D-20
DASPK-- ITERATION MATRIX IS SINGULAR.
DASPK-- AT T (=R1) AND STEPSIZE H (=R2) THE
In above, R1 = 0.3000000000000D-14 R2 = 0.3814697265625D-20
DASPK-- ITERATION MATRIX IS SINGULAR.
DASPK-- AT T (=R1) AND STEPSIZE H (=R2) THE
In above, R1 = 0.1001464843750D-14 R2 = 0.1862645149231D-23
DASPK-- NONLINEAR SOLVER FAILED TO CONVERGE
DASPK-- REPEATEDLY OR WITH ABS(H)=HMIN
DASPK-- AT T (=R1) AND STEPSIZE H (=R2) THE
In above, R1 = 0.1000000000000D-14 R2 = 0.1907348632813D-20
DASPK-- NONLINEAR SOLVER FAILED TO CONVERGE
DASPK-- REPEATEDLY OR WITH ABS(H)=HMIN
DASPK-- AT T (=R1) AND STEPSIZE H (=R2) THE
In above, R1 = 0.3000000000000D-14 R2 = 0.3814697265625D-20
DASPK-- ITERATION MATRIX IS SINGULAR.
DASPK-- AT T (=R1) AND STEPSIZE H (=R2) THE
In above, R1 = 0.3000000000000D-14 R2 = 0.3814697265625D-20
DASPK-- ITERATION MATRIX IS SINGULAR.
DASPK-- AT T (=R1) AND STEPSIZE H (=R2) THE
In above, R1 = 0.6300000000000D-13 R2 = 0.6103515625000D-19
DASPK-- NONLINEAR SOLVER FAILED TO CONVERGE
DASPK-- REPEATEDLY OR WITH ABS(H)=HMIN
DASPK-- AT T (=R1) AND STEPSIZE H (=R2) THE
In above, R1 = 0.1000010174469D-14 R2 = 0.2273736754432D-27
DASPK-- NONLINEAR SOLVER FAILED TO CONVERGE
DASPK-- REPEATEDLY OR WITH ABS(H)=HMIN
DASPK-- AT T (=R1) AND STEPSIZE H (=R2) THE
In above, R1 = 0.6325225340204D-13 R2 = 0.5750222231950D-27
DASPK-- ITERATION MATRIX IS SINGULAR.
DASPK-- AT T (=R1) AND STEPSIZE H (=R2) THE
In above, R1 = 0.7000000000000D-14 R2 = 0.7629394531250D-20
DASPK-- NONLINEAR SOLVER FAILED TO CONVERGE
DASPK-- REPEATEDLY OR WITH ABS(H)=HMIN
DASPK-- AT T (=R1) AND STEPSIZE H (=R2) THE
In above, R1 = 0.5110000000000D-12 R2 = 0.4882812500000D-18
DASPK-- NONLINEAR SOLVER FAILED TO CONVERGE
DASPK-- REPEATEDLY OR WITH ABS(H)=HMIN
DASPK-- AT T (=R1) AND STEPSIZE H (=R2) THE
In above, R1 = 0.3000000000000D-14 R2 = 0.3814697265625D-20
DASPK-- ITERATION MATRIX IS SINGULAR.
DASPK-- AT T (=R1) AND STEPSIZE H (=R2) THE
In above, R1 = 0.9023500442505D-14 R2 = 0.3637978807092D-26
DASPK-- ITERATION MATRIX IS SINGULAR.
DASPK-- AT T (=R1) AND STEPSIZE H (=R2) THE
In above, R1 = 0.6325225340204D-13 R2 = 0.5750222231950D-27
DASPK-- ITERATION MATRIX IS SINGULAR.
DASPK-- AT T (=R1) AND STEPSIZE H (=R2) THE
Error: Trying to step from time 0.0 to 1e-12 resulted in a solver (DASPK) error
Resurrecting Model...
At time 7.0000e-15 s, species ketenylperoxy(117) at rate ratio 1.72773686948e-35 was added to model core in model resurrection process
At time 7.0000e-15 s, PDepNetwork #40 at network leak rate 5.806216375e-43 was sent for exploring during model resurrection process

Conducting simulation of reaction system 10...
initializing surface ...
surface initialization complete
Error: Trying to step from time 0.0 to 1e-12 resulted in a solver (DASPK) error
Resurrecting Model...
At time 3.0000e-15 s, species [O]C(=O)OC=O(7051) at rate ratio 4.25473705982e-28 was added to model core in model resurrection process
At time 3.0000e-15 s, PDepNetwork #491 at network leak rate 1.19493350374e-61 was sent for exploring during model resurrection process

Conducting simulation of reaction system 11...
initializing surface ...
surface initialization complete
Error: Trying to step from time 0.0 to 1e-12 resulted in a solver (DASPK) error
Resurrecting Model...
At time 3.0938e-15 s, species O=C[CH]OO(7088) at rate ratio 2.49581611041e-23 was added to model core in model resurrection process
At time 3.0938e-15 s, PDepNetwork #41 at network leak rate 1.66741121137e-48 was sent for exploring during model resurrection process

Conducting simulation of reaction system 12...
initializing surface ...
surface initialization complete
Error: Trying to step from time 0.0 to 1e-12 resulted in a solver (DASPK) error
Resurrecting Model...
At time 8.2275e-15 s, species CC1=CC=C[C]C1(6861) at rate ratio 1.3767994187e-25 was added to model core in model resurrection process
At time 8.2275e-15 s, PDepNetwork #40 at network leak rate 3.13207723904e-61 was sent for exploring during model resurrection process

Conducting simulation of reaction system 13...
initializing surface ...
surface initialization complete
Error: Trying to step from time 0.0 to 1e-12 resulted in a solver (DASPK) error
Resurrecting Model...
At time 1.0313e-15 s, species [C]1=CCC1(2790) at rate ratio 1.26288445604e-26 was added to model core in model resurrection process
At time 1.0313e-15 s, PDepNetwork #23 at network leak rate 2.93872356408e-26 was sent for exploring during model resurrection process

Conducting simulation of reaction system 14...
initializing surface ...
surface initialization complete
Error: Trying to step from time 0.0 to 1e-12 resulted in a solver (DASPK) error
Resurrecting Model...
At time 3.0000e-15 s, species [C]1=CCC1(2790) at rate ratio 4.60969204845e-23 was added to model core in model resurrection process
At time 3.0000e-15 s, PDepNetwork #210 at network leak rate 6.04795214033e-22 was sent for exploring during model resurrection process

Conducting simulation of reaction system 15...
initializing surface ...
surface initialization complete
At time 2.0000e+00 s, reached target termination time.

Conducting simulation of reaction system 16...
initializing surface ...
surface initialization complete

@mjohnson541
Copy link
Contributor

@jimchu10 it looks like resurrection functioned properly for your system, I'm not sure what's causing it to hang on system 16, but you aren't getting a model resurrection failure.

@jimchu10
Copy link

@mjohnson541 Thank you for the comment. I will look into the problem in detail to see if the source can be identified. This problem happens when I set the tolerance to be very low, but not sure whether it is related.

@github-actions
Copy link

This issue is being automatically marked as stale because it has not received any interaction in the last 90 days. Please leave a comment if this is still a relevant issue, otherwise it will automatically be closed in 30 days.

@github-actions github-actions bot added the stale stale issue/PR as determined by actions bot label Jun 22, 2023
@github-actions github-actions bot added the abandoned abandoned issue/PR as determined by actions bot label Jul 24, 2023
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jul 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
abandoned abandoned issue/PR as determined by actions bot stale stale issue/PR as determined by actions bot Type: Error
Projects
None yet
Development

No branches or pull requests

3 participants