-
Notifications
You must be signed in to change notification settings - Fork 213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stale Haproxy processes #200
Comments
I can report exactly the same issues. When I run my bamboo docker container in an old build, everything works fine but since I updated my container 2 days ago this is happening to me too. A little help would be awesome :) |
I've seen this before - haroxy constantly reloading will often cause a logjam if they happen too frequently, although Bamboo attempts to debounce reloads. Are you constantly redeploying apps? I fixed #177 - which caused unnecessary restarts - for the 0.2.14 release, are you using an older version? |
We typically deploy multiple times in the course of a day. This is part of CI/CD env and it happens for me with 0.2.14. Sorry about the long post.. Here is the snip in bamboo.json "HAProxy": { Before starting bamboo, here is the ps output looks like... #> ps aux | grep haproxy |grep -v grep #> cat /var/run/haproxy.pid At the next refresh, or app deploy we notice the following.. Bamboo logs #> ps aux | grep haproxy |grep -v grep After the next refresh... #> ps aux | grep haproxy |grep -v grep The next refresh just keeps adding to the list of haproxy processes |
HAProxy processes are designed to live as long as there are still connections being served. Could it possibly be that you have some long-running connections still pending when the reload is initiated? We're operating in a high-frequency deployment environment as well. For us, it's not uncommon to see 15-20 HAProxy processes being alive at the same time due to long-running WebSocket connections. They do rotate out after a few hours and get replaced by newer processes, however, which is an indication for progress. You might want to check on that behavior as well. |
There shouldn't be any long-running connections to be honest (5s max). Our problem is: the stale haproxy processes still accept connections and cause 503 due to obviously now defunct instances. |
It seems strange that HAProxy takes over so many PIDs. For us, it's only ever one PID that's passed to I'd try to figure if the PID file is populated/cleaned up properly. Are you using HAProxy natively or inside Docker? |
I always get 4 because of nbproc. The issue remains even when I use nbproc 1 and hence only one PID. |
i found a similar problem others have reported with consul template / haproxy - hashicorp/consul-template#442 Could this be similar [go] related issues? Today we updated to 0.2.15 - and changed the reload command as follows: "ReloadCommand": "/bin/systemctl reload haproxy" so far, seems to be working a world better |
EDIT: Even with grace 0s I'm having stale haproxy processes.
Documentation: this works for short lived connections.. if you have long running conns.. those will get killed, so perhaps you can increase the grace period... But the weird thing is.. Why haproxy keeps accepting new connections? |
Has anyone tried marathon-lb's reload command? https://github.com/mesosphere/marathon-lb/blob/master/service/haproxy/run
|
Found hashicorp/consul-template#442 and golang/go#13164 Could actually be related to Go. I just compiled bamboo with go 1.6 and will update this accordingly. BTW: Another reload script that I might try if doesn't work: https://github.com/eBayClassifiedsGroup/PanteraS/blob/master/infrastructure/haproxy_reload.sh |
@elmalto: were you able to resolve the issue with the latest Go 1.6 or did you employ a reload script? -Imran |
I have not seen this issue since upgrading to 1.6 Malte
|
Hey, As stated in #206 , we had this issue before on our Mesos cluster. After migrating our docker images to Go 1.6 about 2 days ago, looks like it fixed it. I can't confirm this since we also have a lot of long running connections, but the number of haproxy processes after 2 days seems much more reasonnable than before. I'll have another look during the next week and post again if something changes. Thanks for having found that out anyway :). |
Upgrade might have helped. Do you have any information/data about how often your deployment triggers reload?
|
Hey, sure ! Usually on this cluster we get around 0 to 5 updates a day. The day it failed, we had many more (probably around 10), which resulted in something like 15+ haproxy processes on some mesos slaves. We have one bamboo running (as a docker container) on each mesos slave. Right now, they have been up for 1 week, we had some updates on last friday but the amount of haproxy processes increased only until 6. And more important, sometimes, this amount is getting down, which wasn't the case before the upgrade.
My guess is you are right. We used marathon-lb before bamboo, and we also had this issue with it. |
I suggest moving to Nginx to replace Haproxy, there's a branch that @bluepeppers has been working on that would support multiple reload destination - but it's still WIP. |
Does nginx support TCP balancing (as Haproxy does) out of its "Plus" version ? Looks unclear to me. I know it may work after building nginx with some extra modules. I'm just unsure about what those "extra modules" may or may not support compared to haproxy. |
Yup, it does support it. If you are using open sourced, try to use this Nginx compatible fork: https://github.com/alibaba/tengine Sent from my iPhone
|
@activars Just to let you know (we're still on haproxy), it happened again today on one of our mesos slaves. This issue now seems to happen only in very rare / specific situations (this is the only time it happened since), and is probably more related to haproxy or go, so it's probably not necessary to reopen. According to the refs above, upgrading haproxy to the latest 1.5.x might be the way to definitely fix that out. Considering minor version upgrades shouldn't harm, I prepared a docker image including haproxy 1.5.19 (vs 1.5.8) and based on golang 1.8 (vs 1.6, well, that's not minor but let's trust the promise of compatibility, and let me know if that sounds like a terrible mistake :). I'm going to test this during the next few days. |
I still have the problem. Running haproxy:1.7.5 |
I'm very sorry for the lack of comms on this thread. We no longer run bamboo (no longer on mesos), and are not going to be able to provide ongoing maintenance. If anyone is interesting in maintaining it going forward, please raise another issue and we'll look at redirecting people to a fork. |
Hi,
I am running into the issue that I constantly get stale haproxy processes. I have tried "everything", but can't get it to work. This is my bamboo.log for an occasion, where it happened:
as you can see from my processes, there are two sets of haproxies:
and I use the following config for bamboo:
and the config parts of my haproxy.cfg:
Sorry for the long post. Would've gone to SO or SF, but thought this might be an issue with bamboo.
Can anyone point me in the right direction?
The text was updated successfully, but these errors were encountered: