-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Seeing the pool randomly being destroyed after timeout #12
Comments
When you say "the pool is being destroyed", do you mean that knex reports it as destroyed or pool2 reports it as destroyed? One thing I've observed in general, even with the previous pooler, is that "general problems/misconfigurations with connecting" in knex tend to wind up being reported as pool errors; this makes it hard to distinguish things sometimes. The only call in the entire file that could initiate destruction of the pool itself is here: https://github.com/myndzi/pool2/blob/master/lib/pool.js#L410 I believe we talked a little bit about this one before: the purpose is to throw an error if the pool receives an error trying to acquire its first resource(s), so that invalid instantiation/configuration causes a crash error instead of silently getting swallowed in an infinite invalid reconnect loop. Once it has successfully acquired at least one resource, it is assumed that the configuration is valid, and this behavior is disabled, controlled by So, if pool2 is reporting a 'destroyed' error, it would have to be 'failed to ever acquire any resource', or something external is calling the _destroyPool method (.end should be preferred, though perhaps it should fall back to destroy or export this method directly if we don't care about or can't clean up nicely). If knex is reporting a 'destroyed' error, well, you'd know more about that one than I :) |
I just got something like this while doing some work, is this what you were referring to? Knex:Error Pool2 - Error: connect ECONNREFUSED |
Yep |
Yeah, that's basically "the settings were wrong and I couldn't connect, ever" |
Oh wait sorry it was ETIMEDOUT not ECONNREFUSED. I'm traveling right now but will take a look when I get back tomorrow |
Same thing, different error. ETIMEDOUT would be "settings were wrong" or possibly "network connectivity bombed while this thing tried to start up" |
Any news on this? If it's not an initial time out it might also be related to #14. |
I really hope @tgriesser or @myndzi can help me out with this. PROBLEM CONSOLE hapi server started QUESTIONS
|
Can you clarify: does this happen when you start your application if the DB is inaccessible, or does this happen after your application has been running and working, but loses connection to the db? If the former, I have a patch to allow you to configure this behavior, I will probably just publish it shortly. I wanted to make sure I had a chance to catch tgriesser before pushing it out. You can try installing the version of pool2 from master on this repository instead of the one installed with knex, and setting the 'bailAfter' option to something large enough to encompass the expected downtime. |
Just chiming in that I'm having this issue against mysql using bookshelf. The application runs for a while, then a stack like this occurs:
|
What version of pool2 is currently installed? ( |
Ah, I think I see the problem. acquireRawConnection is returning a promise, however the error handler is attempting to destroy the Alternately, it might be trying to release even after an error causes the resource to be destroyed. I can't see how that error would result from the connection error handler in knex, since it calls |
@myndzi I have knex 0.8.6 and with that pool2 1.1.0. Sorry for not replying earlier, notification email was lost in thousand other from work github. |
@myndzi |
Refused the first time after a long rest = a timeout; pool2 was designed to throw an error if the first attempted connection fails. You could update to 1.2.0 to control some of this behavior if you desire, but there shouldn't be anything that changes functionally. This error is probably safe to ignore, unless knex is destroying the pool because of it, which it should not do. I'll try and reproduce a timed out query using knex and see if I can produce this problem and run it down. |
My error was with pool2 1.1.6 and knex 0.8.6 Any update on this issue? Thanks for all your work here both @myndzi and @tgriesser |
Apologies, I've gotten quite busy at work and didn't do what I said I would do to look into this. I would appreciate it if you could tell me if it appears to be a cosmetic problem (reporting an error but not affecting your application) or an actual functional problem, because I think the chance is strong it's the former. If it's the latter, I'll make it a priority to address it. The unfortunate nature of this kind of error is that, since pool2 is involved at such a core level, any connection "stuff" that happens is going to have pool2's "name" attached to it -- but in a number of cases, these are situations pool2 is built to deal with; the message may simply be getting passed along when it would be better ignored. |
Thanks for the reply - in my case it is serious because it crashes the node instance and I cannot figure out how to catch the exception in my code. If what you are saying is correct, the quickest fix may be for knex to report the exception without propagating it. @tgriesser what do you think? |
If it's crashing, then it's not being caught at all.. but the "Knex:Error" part gets added by knex so it must obviously be captured...? I recommend in the short term trying to find out what the error is being emitted on (the knex object?) and binding it. I'm sorry for letting this linger. |
Okay! I managed to reproduce the error scenario by hooking into net.Socket and "eating" a selected write while emitting an error on the socket object. However, the application does not crash for me. I am using the same versions of knex and pool2 that you are. Can you please give the following code a try? 'use strict';
var Socket = require('net').Socket;
var writeCount = 0, readCount = 0;
var connect = Socket.prototype.connect;
var write = Socket.prototype.write;
var target;
Socket.prototype.write = function (chunk, encoding, cb) {
if (this !== process.stdout) {
console.log('write', writeCount++);
if (writeCount === 3) {
this.emit('error', new Error('foo'));
return;
}
}
return write.call(this, chunk, encoding, cb);
};
Socket.prototype.connect = function (options, cb) {
return connect.call(this, options, function (err, socket) {
console.log('got socket');
target = socket;
this.on('data', function () {
console.log('read', readCount++);
});
if (typeof cb === 'function') { cb(); }
});
};
var knex = require('knex')({
client: 'mysql',
connection: {
database: 'test'
},
pool: {
max: 1,
min: 1
}
});
function ping() {
return knex.select(knex.raw(1));
}
ping()
.then(function () {
console.log('ping 1 returned');
return ping();
})
.then(function () {
console.log('ping 2 returned');
})
.catch(function (e) {
console.error(e);
return ping();
})
.then(function () {
console.log('ping 3 returned');
}); My output looks like this:
Be sure to adjust the connection parameters to a valid database. You will note that the third ping is executed after receiving the error, and that it succeeds, showing that the socket error case is handled correctly except for the fact that knex is likely double-releasing the resource or something, causing an error to be displayed that shouldn't. |
We can see the double release here:
With a stack dump, it is revealed that the first release comes from here:
-> https://github.com/tgriesser/knex/blob/0.8.6/lib/dialects/mysql/index.js#L133 My original suspicion appears to be correct: the error handler destroys the resource, but the promise cleanup is attempting to release it even though there was an error: -> https://github.com/tgriesser/knex/blob/0.8.6/lib/runner.js#L140 I'm not entirely sure how to cleanly address this; ideally the disposer would be aware of the error case and destroy the resource there instead of in the other place. Alternately, it could not attempt to release the connection if it had an error, or trap the error case and ignore it. |
Fix turned out to be somewhat easy (though still a little messy) since there was already a property I could check to know if the connection had been disposed of. You can use I'm still uncertain why your application was exiting due to this error, perhaps this is programmed behavior on your part? (bound an error event and died when you didn't need to?) -- so if the problem continues for some reason, or your program still exits, do get back to me and let's dig in further. |
I just wanted to post the log as I finally got the error to occur with DEBUG=pool2 on. I'm going to try your fix right now in parallel.
|
To clarify my crash issue - it was what Express decides to do when it has an unhanded exception when no error handler is defined. By defining an error handler I was able to stop the app from crashing (although the requests still failed as a result of this bug) |
Express doesn't do that, node does -- but when I reproduced the error, it was not an unhandled error event -- I didn't bind a handler, but it did not appear to be emitted on any object without an error handler. So, if it's being re-emitted somewhere or not bound somewhere, that may be in the purview of your app and not knex. I'm uncertain. |
The requests themselves will fail anyway; this is not inherently a bug that causes requests to fail -- it's the opposite. A network problem (timeout on reading from socket) causes the request to fail, and this occurs as a result of that. There is no way to recover the request other than to re-issue it, which would be either knex's job, or your job in response to a failed query. If the error isn't getting passed back to your query, that's something that should hopefully be fixed in knex, so that you have the ability to see that "this query failed because the underlying network socket died, so I can probably retry it" and do so if you so choose. |
The stack dump after "Sat, 12 Sep 2015 15:36:55 GMT pool2 Successfully allocated new resource (cur=5, ac=0, id=39)" -- did you add that? |
Thank you for the stack dump, too. Strangely, it seems to maybe not be the same problem, although sourced through the same files and for the same reason. I see the attempted release, but not the destroy as triggered by the connection error handler in the mysql client file. I'll have to dig a little deeper, but this supports the unhandled error scenario. While you're testing with the new code, tack knex:* onto your debug line if you want a little more info on what knex itself is doing:
|
You know what, do you have Skype? I have time this weekend and could get on a call so we could cut out all the back and forth... (Add me: |
It does seem like your patch does not fix this issue :( Here's a new log using the double released tag.
|
To summarize from our Skype conversation: The crash was because the knex query was being returned to some promise-handling code in the middleware which passed it along as an error, and there was no error handler middleware. The error message was logged simply because it was the result of the query. I gave you some options to deal with the underlying network problem that's causing you grief, which included reducing the pool minimum to 0 and/or specifying a resource ping handler. |
For anyone curious about my issue, and looking for the fix to knex, I made a gist with the full post mortem of the bug: https://gist.github.com/acgourley/9a11ffedd44c414fb4b8 |
I appreciate the work to follow this through. I have a great level of confidence with this community. @acgourley thank you for your gist. |
I tried setting the
I would really appreciate a solution to this issue. Here's some of my output with
With
|
Been trying everything and i'm running out of ideas. Any thoughts? |
To be honest this error is really generic, and since the pool is central to connection management in Knex, any connection-related errors wind up with "pool2" tacked on to them, though they aren't usually because of pool2. In your error, "Knex:Error Pool2 - Error: connect ETIMEDOUT {MYIP}:5432" seems to indicate you're trying to open a connection to yourself, not Amazon, so that would be a problem. I recommend trying to connect to the correct host :) P.S. min 0 means that it won't open a connection until you try to use it, not that it will open new connections every time. |
(Either way, it's almost certainly a network error... either your settings are incorrect or you can't connect due to firewall restrictions or something like that) |
Thanks! @myndzi. It is totally generic. I was having AWS security group issues. I tested |
It would be great if we could mark this issue as resolved. Do you know if the latest knex still uses pool2 incorrectly? I checked the source to see if they merged in myndzi/knex@92f6333 however the source seems to have changed so much that those lines are not relevant so I can't tell. In addition I will move to the 0.10 version of knex and report back if I see bad behavior with my database. |
Also curious if the merge has taken place? |
I need help with this issue too. If I close de DB connection while the server is executing a query or a migration, the process hangs and never ends. The output of executing with DEBUG=pool2 is: pool2 reap (cur=0, av=0) +9s I'm using Bookshelf 0.9.4, Knex 0.10.0 and pool2 1.3.4. |
No answer yet? I'm with the same issue. Once my pool was destroyed it never try to reconnect |
Here goes the solution that worked for me: if one of my DB operations fails because of a lost of connection, I capture the error on the catch() function at the end of the promise.Then, I check the error code received to know what is the reason of the error. I've seen that the error code value is 'PROTOCOL_ENQUEUE_AFTER_FATAL_ERROR' when it happens because of a lost of connection, so if that's the value of the error catched, I destroy de Bookshelf pool connection.
|
I'm seeing similar errors in my production logs but not sure how to reproduce: Error: Pool is destroyed
at Pool.acquire (/app/node_modules/pool2/lib/pool.js:166:12)
at /app/node_modules/knex/lib/client.js:220:19
at tryCatcher (/app/node_modules/knex/node_modules/bluebird/js/main/util.js:26:23)
at Promise._resolveFromResolver (/app/node_modules/knex/node_modules/bluebird/js/main/promise.js:480:31)
at new Promise (/app/node_modules/knex/node_modules/bluebird/js/main/promise.js:70:37)
at Client.acquireConnection (/app/node_modules/knex/lib/client.js:216:12)
at /app/node_modules/knex/lib/runner.js:137:23
at tryCatcher (/app/node_modules/knex/node_modules/bluebird/js/main/util.js:26:23)
at Promise._resolveFromResolver (/app/node_modules/knex/node_modules/bluebird/js/main/promise.js:480:31)
at new Promise (/app/node_modules/knex/node_modules/bluebird/js/main/promise.js:70:37)
at /app/node_modules/knex/lib/runner.js:136:35
at tryCatcher (/app/node_modules/knex/node_modules/bluebird/js/main/util.js:26:23)
at Function.Promise.attempt.Promise.try (/app/node_modules/knex/node_modules/bluebird/js/main/method.js:31:24)
at Runner.ensureConnection (/app/node_modules/knex/lib/runner.js:135:26)
at Runner.run (/app/node_modules/knex/lib/runner.js:29:31)
at QueryBuilder.Target.then (/app/node_modules/knex/lib/interface.js:26:43)
at QueryBuilder.tryCatcher (/app/node_modules/bookshelf/node_modules/bluebird/js/main/util.js:26:23)
at doThenable (/app/node_modules/bookshelf/node_modules/bluebird/js/main/thenables.js:52:38)
at tryConvertToPromise (/app/node_modules/bookshelf/node_modules/bluebird/js/main/thenables.js:30:20)
at Promise._resolveCallback (/app/node_modules/bookshelf/node_modules/bluebird/js/main/promise.js:442:24)
at Promise._settlePromiseFromHandler (/app/node_modules/bookshelf/node_modules/bluebird/js/main/promise.js:515:17)
at Promise._settlePromiseAt (/app/node_modules/bookshelf/node_modules/bluebird/js/main/promise.js:581:18)
at Async._drainQueue (/app/node_modules/bookshelf/node_modules/bluebird/js/main/async.js:128:12)
at Async._drainQueues (/app/node_modules/bookshelf/node_modules/bluebird/js/main/async.js:133:10)
at Immediate.Async.drainQueues (/app/node_modules/bookshelf/node_modules/bluebird/js/main/async.js:15:14)
at runCallback (timers.js:570:20)
at tryOnImmediate (timers.js:550:5)
at processImmediate [as _immediateCallback] (timers.js:529:5) |
I'm sorry for not being responsive, I don't seem to be receiving notifications for this project/thread in my e-mail? Bizarre. This error is generally "can't connect to database"; I leave this issue open because closing it would probably just wind up with new ones getting opened for the same thing. It's not generally an actual problem with the module itself. @olalonde : Your error indicates that knex at some point destroyed the connection, perhaps in response to an error event emitted by the underlying database driver. Once the pool has been destroyed, it cannot be reused; a new one would need to be instantiated. The correct handling of the problem, if it's not a fatal error, would be to not destroy the pool in the first place (a temporary or one-off connection error should simply drop the specific resource from the pool and re-queue the request for serving from a new connection) @AleRed : what do you mean specifically by "I close the connection"? A pool has an An option was added somewhat recently, "requestTimeout", which should prevent your infinite retry scenario; you should be able to configure this via your knex setup, but the underlying problem may be elsewhere (in the means you are using to close the connection? some bug?) -- the section of log you pasted shows that the pool is unable to reconnect, and your comments seem to indicate that it was once connected, so I'm assuming by 'close the connection' you mean actually stop the database (so that it becomes unconnectable) -- not sure why you would do this while trying to execute a query, but the @herlon214 : Your problem does not seem to be the same if you say "once it is destroyed it never tries to reconnect", assuming you were responding to AleRed -- his problem appears to be "it tries to reconnect forever". As mentioned in my response to olalonde, the 'destroy' case should only occur normally in the case where you never succeeded in acquiring a resource. This is to surface configuration problems and the like. The only reference in the source is here: https://github.com/myndzi/pool2/blob/master/lib/pool.js#L465 That is the only function that sets 'this.destroyed': https://github.com/myndzi/pool2/blob/master/lib/pool.js#L538 It's possible that something is calling _destroyPool, though a quick Github search through Knex's source does not indicate this is the case. Note that 'pool.destroy()' is not the same thing; this is for destroying a resource (uncleanly removing it from the pool) I'm sorry at the lack of responsiveness to some of the previous issues; if any of you catch this and respond, I'll be happy to look into it further with you -- but I suspect that it will come down to a connectivity problem at some point, so I'd start looking there. |
Thanks Kris! Em sexta-feira, 5 de agosto de 2016, Kris Reeves [email protected]
Att, Herlon Aguiar |
...seems like the change in Github subject lines borked my e-mail filters and these didn't go straight to my inbox anymore :( Hopefully fixed now. |
I ran into this issue myself, and oddly enough it was running on a shared environment where someone upgraded node to latest v6.3.1 and downgrading it back to v4.4.7 resolved it. |
Does that mean you're able to reproduce it readily? |
Getting the same issue here, hard to repro as I'm not sure what triggers it yet but will update if I find out. |
If you are having this issue with knex, knex v0.12.0 replaced pool2 with node-pool, see knex/knex#1591 (comment) |
Knex:warning - Pool2 - Error: Pool was destroyed
|
what can I do? |
This seems to be key: You are either connecting to the wrong host/port, or your database is not running |
The database is postgreSQL working on heroqu. |
Connection refused just means that the database driver ( |
Not sure what's causing it and haven't had time to completely reproduce it, but the pool seems to be randomly being
destroyed
after aconnect ETIMEDOUT
- only seeing this recently so suspect it's something with the latest patch version.I'll see about being able to reproduce it, but any ideas there?
The text was updated successfully, but these errors were encountered: