Segfault happening #2265

joto · 2024-10-13T07:26:33Z

See https://gist.github.com/tomhughes/0ebdc537b6a9a390b904d394d796b5e0

@pnorman Please add all the details about what you were doing here.

What version of osm2pgsql are you using?

2.0.0+ds-1~bpo12+1

What operating system and PostgreSQL/PostGIS version are you using?

Debian 12

Tell us something about your system

OpenStack Virtual Machine with 64 x 1 core AMD EPYC Processor, 444GB

What did you do exactly?

export LUA_PATH='/srv/vector.openstreetmap.org/osm2pgsql-themepark/lua/?.lua;/srv/vector.openstreetmap.org/spirit/?.lua;;'

# Import the osm2pgsql file specified as an argument, using the locations for spirit
osm2pgsql \
  --output flex \
  --style '/srv/vector.openstreetmap.org/spirit/shortbread.lua' \
  --slim \
  --flat-nodes '/srv/vector.openstreetmap.org/data/nodes.bin' \
  -d spirit \
  --cache 75000 data.pbf

What did you expect to happen?

The planet to import.

What did happen instead?

A core dump occurred in the post-processing stage

What did you do to try analyzing the problem?

@tomhughes provided the backtrace at https://gist.github.com/tomhughes/0ebdc537b6a9a390b904d394d796b5e0

The machine had an unusual postgresql setup at the time. When starting the import it had sufficient connection slots for osm2pgsql to start, but by the time post-processing had started it would not be possible to acquire new connections. When I fixed this error in the machine configuration it worked fine.

The text was updated successfully, but these errors were encountered:

joto · 2024-10-19T12:49:20Z

Strange. The backtrace does not fit with the idea that the database connection failed. I am not sure that's what the problem was. What was this "unusual postgresql setup"? I tried limiting the number of connections and osm2pgsql properly reported an error and exited.

pnorman · 2024-10-21T00:02:45Z

What was this "unusual postgresql setup"?

Too few max_connections. I fixed it by properly stopping chef and the replication process.

joto · 2024-10-21T08:15:55Z

I am pretty sure it is not the max_connections which caused this. At least when I try that it seems to work correctly. I'd rather expect some race condition that went one way in the first try and the other in the seconds. Is there any chance you can re-create the situation with the too small max_connection and try again? At the moment I don't know how to debug this until we get a reproducible way to trigger the bug.

pnorman · 2024-10-23T21:09:08Z

I can see if I have a suitable system

joto · 2024-12-12T16:41:05Z

I believe I have figured out what is happening here: There are several threads in the thread pool. Thread A is started and opens a database connection and all is fine so far. Then thread B is started and doesn't get a database connection any more, it throws an exception. The exception is propagated and as part of that propagation lots of things are destructed. At some point the data structure containing the information about tables is destructed. Now thread A gets a chance to run again and wants to build the CREATE INDEX command. It needs the information about the tables, but that was destructed already. And then it segfaults.

To solve this we would need to make sure all threads are destructed before anything else. But there is no way to kill a running std::thread. We would have to wait until it is done with its work, which doesn't make much sense, because that situation isn't recoverable anyway. In C++20 there is a new std::jthread which has some mechanism for stopping it from the outside, but I believe that also only works if the thread cooperates. But if that thread just called CREATE INDEX it might be a long time before it even gets to run and can notice that it should shutdown.

At the moment, I don't have a good idea how to solve this. :-(

tomhughes · 2024-12-12T16:57:39Z

Yes jthread stop tokens are essentially co-operative so you'd probably have to combine that with use of PQcancelCreate to hold a cancellation object from each connection that you could call PQcancelBlocking on at the same time you requested the thread to stop.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Segfault happening #2265

Segfault happening #2265

joto commented Oct 13, 2024 •

edited by pnorman

Loading

joto commented Oct 19, 2024

pnorman commented Oct 21, 2024 •

edited

Loading

joto commented Oct 21, 2024

pnorman commented Oct 23, 2024

joto commented Dec 12, 2024

tomhughes commented Dec 12, 2024

Segfault happening #2265

Segfault happening #2265

Comments

joto commented Oct 13, 2024 • edited by pnorman Loading

What version of osm2pgsql are you using?

What operating system and PostgreSQL/PostGIS version are you using?

Tell us something about your system

What did you do exactly?

What did you expect to happen?

What did happen instead?

What did you do to try analyzing the problem?

joto commented Oct 19, 2024

pnorman commented Oct 21, 2024 • edited Loading

joto commented Oct 21, 2024

pnorman commented Oct 23, 2024

joto commented Dec 12, 2024

tomhughes commented Dec 12, 2024

joto commented Oct 13, 2024 •

edited by pnorman

Loading

pnorman commented Oct 21, 2024 •

edited

Loading