Skip to content

Commit 240c1b0

Browse files
committed
gdb: fix parsing of DIEs with both low/high pc AND ranges attributes
After the commit: commit b9de07a Date: Thu Oct 10 11:37:34 2024 +0100 gdb: fix handling of DW_AT_entry_pc of inlined subroutines GDB's buildbot CI testing highlighted this assertion failure: (gdb) c Continuing. ../../binutils-gdb/gdb/block.h:203: internal-error: set_entry_pc: Assertion `start >= this->start () && start < this->end ()' failed. A problem internal to GDB has been detected, further debugging may prove unreliable. ----- Backtrace ----- FAIL: gdb.base/break-probes.exp: run til our library loads (GDB internal error) This assertion was in the new function set_entry_pc and is asserting that the default_entry_pc() value is within the blocks start/end range. The default_entry_pc() is the value GDB will use as the entry-pc if the DWARF doesn't specifically override the entry-pc. This value is calculated as: 1. The start address of the first sub-range within the block, if the block has more than 1 range, or 2. The low address (from DW_AT_low_pc) for the block. If the block only has a single range then this means the block was defined with low/high pc attributes (case #2 above). These low/high pc values are what block::start() and block::end() return. This means that by definition, if the block is continuous, the above assert cannot trigger as 'start', the default_entry_pc() would be equivalent to block::start(). This means that, for the assert to trigger, the block must have multiple ranges, and the first address of the first range is not within the blocks low/high address range. This seems wrong. I inspected the state at the time the assert triggered and discovered the block's start() address. Then I removed the assert and restarted GDB. I was now able to inspect the blocks at the offending address: (gdb) maintenance info blocks 0x7ffff7dddaa4 Blocks at 0x7ffff7dddaa4: from objfile: [(objfile *) 0x44a37f0] /lib64/ld-linux-x86-64.so.2 [(block *) 0x46b30c0] 0x7ffff7ddd5a0..0x7ffff7dde8a6 entry pc: 0x7ffff7ddd5a0 is global block symbol count: 4 is contiguous [(block *) 0x46b3020] 0x7ffff7ddd5a0..0x7ffff7dde8a6 entry pc: 0x7ffff7ddd5a0 is static block symbol count: 9 is contiguous [(block *) 0x46b2f70] 0x7ffff7ddda00..0x7ffff7dddac3 entry pc: 0x7ffff7ddda00 function: __GI__dl_find_dso_for_object symbol count: 4 is contiguous [(block *) 0x46b2e10] 0x7ffff7dddaa4..0x7ffff7dddac3 entry pc: 0x7ffff7dddaa4 inline function: __GI__dl_find_dso_for_object symbol count: 5 is contiguous [(block *) 0x46b2a40] 0x7ffff7dddaa4..0x7ffff7dddac3 entry pc: 0x7ffff7dddaa4 symbol count: 1 is contiguous [(block *) 0x46b2970] 0x7ffff7dddaa4..0x7ffff7dddac3 entry pc: 0x7ffff7dddaa4 symbol count: 2 address ranges: 0x7ffff7ddda0e..0x7ffff7ddda77 0x7ffff7ddda90..0x7ffff7ddda96 I've left everything in for context, but the only really interesting bit is the very last block, it's low/high range is: 0x7ffff7dddaa4..0x7ffff7dddac3 but it has separate ranges: 0x7ffff7ddda0e..0x7ffff7ddda77 0x7ffff7ddda90..0x7ffff7ddda96 which are all outside the low/high range. This is what triggers the assert. But why does that block exist at all? What I believe is happening is that we're running into a bug in older versions of GCC. The buildbot failure was with an 8.5 gcc, and Tom de Vries also reported seeing failures when using version 7 and 8 gcc, but not with gcc 9 and onward. Looking at the DWARF I can see that the problematic block is created from this DIE: <4><15efb>: Abbrev Number: 83 (DW_TAG_lexical_block) <15efc> DW_AT_abstract_origin: <0x15e9f> <15efe> DW_AT_low_pc : 0x7ffff7dddaa4 <15f06> DW_AT_high_pc : 31 which links via DW_AT_abstract_origin to: <2><15e9f>: Abbrev Number: 80 (DW_TAG_lexical_block) <15ea0> DW_AT_ranges : 0x38e0 <15ea4> DW_AT_sibling : <0x15eca> And so we can see that <15efb> has got both low/high pc attributes and a ranges attribute. If I widen my checking to parents of DIE <15efb> then I see that they also have DW_AT_abstract_origin, however, there is something interesting going on, the parent DIEs are linking to a different DIE tree than <15efb>. What I believe is happening is this, we have an abstract instance tree, this is rooted at a DW_AT_subprogram, and contains all the blocks, variables, parameters, etc, that you would expect. As this is an abstract instance, then there are no low/high pc attributes, and no ranges attributes in this tree. This makes sense. Now elsewhere we have a DW_TAG_subprogram (not DW_TAG_inlined_subroutine) which links via DW_AT_abstract_origin to the abstract DW_AT_subprogram. This case is documented in the DWARF 5 spec in section 3.3.8.3, and describes an Out-of-Line Instance of an Inlined Subroutine. Within this out of line instance many of the DIE correctly link back, using DW_AT_abstract_origin to the abstract instance tree. This tree also includes the DIE <15e9f>, which is where our problem DIE references. Now, to really confuse things, within this out-of-line instance we have a DW_TAG_inlined_subroutine, which is another instance of the same abstract instance tree! This would seem to indicate a recursive call to the inline function, and the compiler, for some reason, needed to instantiate an out of line instance of this function. And it is within this nested, inlined subroutine, that the problem DIE exists. The problem DIE is referencing the corresponding DIE within the out of line instance tree, but I am convinced this must be a (long fixed) GCC bug, and that the problem DIE should be referencing the DIE within the abstract instance tree. I'm aware that the above is pretty confusing. The actual DWARF would be a around 200 lines long, so I'd like to avoid dumping it in here. But here's my attempt at representing what's going on in a minimal example. The numbers down the side represent the section offset, not the nesting level, and I've removed any attributes that are not relevant: <1> DW_TAG_subprogram <2> DW_TAG_lexical_block <3> DW_TAG_subprogram DW_AT_abstract_origin <1> <4> DW_TAG_lexical_block DW_AT_ranges ... <5> DW_TAG_inlined_subroutine DW_AT_abstract_origin <1> <6> DW_TAG_lexical_block DW_AT_abstract_origin <4> DW_AT_low_pc ... DW_AT_high_pc ... The lexical block at <6> is linking to <4> when it should be linking to <2>. There is one additional thing that we might wonder about, which is, when calculating the low/high pc range for a block, why does GDB not make use of the range information and expand the range beyond the defined low/high values? The answer to this is in dwarf_get_pc_bounds_ranges_or_highlow_pc in dwarf/read.c. This is where the low/high bounds are calculated. What we see is that GDB first checks for a low/high attribute pair, and if that is present, this defines the address range for the block. Only if there is no DW_AT_low_pc do we check for the DW_AT_ranges, and use that to define the extent of the block. And this makes sense, section 3.5 of the DWARF-5 spec says: The lexical block entry may have either a DW_AT_low_pc and DW_AT_high_pc pair of attributes or a DW_AT_ranges attribute whose values encode the contiguous or non-contiguous address ranges, respectively, of the machine instructions generated for the lexical block... Section 3.5 is specifically about lexical blocks, but the same wording, about it being either low/high OR ranges is repeated for other DW_TAG_ types. So this explains why GDB doesn't use the ranges to expand the problem blocks ranges; as the first DIE has low/high addresses, these are used, and the ranges is not consulted. It is only later in dwarf2_record_block_ranges that we create a range based off the low/high pc, and then also process the ranges data, this allows the problem block to exist with ranges that are outside the low/high range. To solve this I considered a number of options: 1. Prevent loading certain attributes from an abstract instance. Section 3.3.8.1 of the DWARF-5 spec talks about which attributes are appropriate to place in an abstract instance. Any attribute that might vary between instances should not appear in an abstract instance. DW_AT_ranges is included as an example in the non-exhaustive list of attributes that should not appear in an abstract instance. Currently in dwarf2_attr (dwarf2/read.c), when we see a DW_AT_abstract_origin attribute, we always follow this to try and find the attribute we are looking for. But we could change this function so that we prevent this following for attributes that we know should not be looked up in an abstract instance. This would solve the problem in this case by preventing us finding the DW_AT_ranges in the incorrect abstract instance. 2. Filter the ranges. Having established a blocks low/high address range in dwarf_get_pc_bounds_ranges_or_highlow_pc, we could allow dwarf2_record_block_ranges to parse the ranges, but we could reject any range that extends outside the blocks defined start and end addresses. For well behaved DWARF where we have either low/high or ranges, then the blocks start/end are defined from the range data, and so, by definition, every range would be acceptable. But in our problem case we would reject all of the invalid ranges. This is my least favourite solution as it feels like rejecting the ranges is tackling the problem too late on. 3. Don't try to parse ranges when we have low/high attributes. This option involves updating dwarf2_record_block_ranges to match the behaviour of dwarf_get_pc_bounds_ranges_or_highlow_pc, and, I believe, to match the DWARF spec: don't try to read range data from DW_AT_ranges if we have low/high pc attributes. In our case this solves the issue because the problematic DIE has the low/high attributes, and it then links to the wrong DIE which happens to have DW_AT_ranges. With this change in place we don't even look for the DW_AT_ranges. If the problem were reversed, and the initial DIE had DW_AT_ranges, but the incorrectly referenced DIE had the low/high pc attributes, we would pick up the wrong addresses, but this wouldn't trigger any asserts. The reason is that dwarf_get_pc_bounds_ranges_or_highlow_pc would also find the low/high addresses from the incorrectly referenced DIE, and so we would just end up with a block which had the wrong address ranges, but the block would be self consistent, which is different to the problem we hit here. In the end, in this commit I went with solution #3, having dwarf_get_pc_bounds_ranges_or_highlow_pc and dwarf2_record_block_ranges be consistent seems sensible. However, I do wonder if in the future we might want to explore solution #1 as an additional safety feature. With this patch in place I'm able to run the gdb.base/break-probes.exp without seeing the assert that CI testing highlighted. I see no regressions when testing on x86-64 GNU/Linux with gcc 9.3.1. Note: the diff in this commit looks big, but it's really just me indenting the code. Approved-By: Tom Tromey <[email protected]>
1 parent cf8d35f commit 240c1b0

File tree

3 files changed

+403
-22
lines changed

3 files changed

+403
-22
lines changed

gdb/dwarf2/read.c

Lines changed: 46 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -11450,6 +11450,21 @@ dwarf2_record_block_ranges (struct die_info *die, struct block *block,
1145011450
struct attribute *attr;
1145111451
struct attribute *attr_high;
1145211452

11453+
/* Like dwarf_get_pc_bounds_ranges_or_highlow_pc, we read either the
11454+
low/high pc attributes, OR the ranges attribute, but not both. If we
11455+
parse both here then we open up the possibility that, due to invalid
11456+
DWARF, a block's start() and end() might not contain all of the ranges.
11457+
11458+
We have seen this in the wild with older (pre v9) versions of GCC. In
11459+
this case a GCC bug meant that a DIE was linked via DW_AT_abstract_origin
11460+
to the wrong DIE. Instead of pointing at the abstract DIE, GCC was
11461+
linking one instance DIE to an earlier instance DIE. The first instance
11462+
DIE had low/high pc attributes, while the second instance DIE had a
11463+
ranges attribute. When processing the incorrectly linked instance GDB
11464+
would see a DIE with both a low/high pc and some ranges data. However,
11465+
the ranges data was all outside the low/high range, which would trigger
11466+
asserts when setting the entry-pc. */
11467+
1145311468
attr_high = dwarf2_attr (die, DW_AT_high_pc, cu);
1145411469
if (attr_high)
1145511470
{
@@ -11467,32 +11482,41 @@ dwarf2_record_block_ranges (struct die_info *die, struct block *block,
1146711482
CORE_ADDR high = per_objfile->relocate (unrel_high);
1146811483
cu->get_builder ()->record_block_range (block, low, high - 1);
1146911484
}
11470-
}
1147111485

11472-
attr = dwarf2_attr (die, DW_AT_ranges, cu);
11473-
if (attr != nullptr && attr->form_is_unsigned ())
11486+
attr = dwarf2_attr (die, DW_AT_ranges, cu);
11487+
if (attr != nullptr && attr->form_is_unsigned ())
11488+
complaint (_("in %s, DIE %s, DW_AT_ranges ignored due to DW_AT_low_pc"),
11489+
objfile_name (per_objfile->objfile),
11490+
sect_offset_str (die->sect_off));
11491+
}
11492+
else
1147411493
{
11475-
/* Offset in the .debug_ranges or .debug_rnglist section (depending
11476-
on DWARF version). */
11477-
ULONGEST ranges_offset = attr->as_unsigned ();
11478-
11479-
/* See dwarf2_cu::gnu_ranges_base's doc for why we might want to add
11480-
this value. */
11481-
if (die->tag != DW_TAG_compile_unit)
11482-
ranges_offset += cu->gnu_ranges_base;
11483-
11484-
std::vector<blockrange> blockvec;
11485-
dwarf2_ranges_process (ranges_offset, cu, die->tag,
11486-
[&] (unrelocated_addr start, unrelocated_addr end)
11494+
attr = dwarf2_attr (die, DW_AT_ranges, cu);
11495+
if (attr != nullptr && attr->form_is_unsigned ())
1148711496
{
11488-
CORE_ADDR abs_start = per_objfile->relocate (start);
11489-
CORE_ADDR abs_end = per_objfile->relocate (end);
11490-
cu->get_builder ()->record_block_range (block, abs_start,
11491-
abs_end - 1);
11492-
blockvec.emplace_back (abs_start, abs_end);
11493-
});
11497+
/* Offset in the .debug_ranges or .debug_rnglist section (depending
11498+
on DWARF version). */
11499+
ULONGEST ranges_offset = attr->as_unsigned ();
1149411500

11495-
block->set_ranges (make_blockranges (objfile, blockvec));
11501+
/* See dwarf2_cu::gnu_ranges_base's doc for why we might want to add
11502+
this value. */
11503+
if (die->tag != DW_TAG_compile_unit)
11504+
ranges_offset += cu->gnu_ranges_base;
11505+
11506+
std::vector<blockrange> blockvec;
11507+
dwarf2_ranges_process (ranges_offset, cu, die->tag,
11508+
[&] (unrelocated_addr start,
11509+
unrelocated_addr end)
11510+
{
11511+
CORE_ADDR abs_start = per_objfile->relocate (start);
11512+
CORE_ADDR abs_end = per_objfile->relocate (end);
11513+
cu->get_builder ()->record_block_range (block, abs_start,
11514+
abs_end - 1);
11515+
blockvec.emplace_back (abs_start, abs_end);
11516+
});
11517+
11518+
block->set_ranges (make_blockranges (objfile, blockvec));
11519+
}
1149611520
}
1149711521

1149811522
dwarf2_record_block_entry_pc (die, block, cu);
Lines changed: 83 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,83 @@
1+
/* This testcase is part of GDB, the GNU debugger.
2+
3+
Copyright 2024 Free Software Foundation, Inc.
4+
5+
This program is free software; you can redistribute it and/or modify
6+
it under the terms of the GNU General Public License as published by
7+
the Free Software Foundation; either version 3 of the License, or
8+
(at your option) any later version.
9+
10+
This program is distributed in the hope that it will be useful,
11+
but WITHOUT ANY WARRANTY; without even the implied warranty of
12+
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
13+
GNU General Public License for more details.
14+
15+
You should have received a copy of the GNU General Public License
16+
along with this program. If not, see <http://www.gnu.org/licenses/>. */
17+
18+
volatile int global_var = 0;
19+
20+
void
21+
func_a (void) /* func_a decl line */
22+
{
23+
/* This label is used to find the start of 'func_a' when generating the
24+
debug information. Place nothing before it. */
25+
asm ("func_a_label: .globl func_a_label");
26+
++global_var;
27+
28+
asm ("func_a_0: .globl func_a_0");
29+
++global_var;
30+
31+
asm ("func_a_1: .globl func_a_1");
32+
++global_var;
33+
34+
asm ("func_a_2: .globl func_a_2");
35+
++global_var;
36+
37+
asm ("func_a_3: .globl func_a_3");
38+
++global_var;
39+
40+
asm ("func_a_4: .globl func_a_4");
41+
++global_var;
42+
43+
asm ("func_a_5: .globl func_a_5");
44+
++global_var;
45+
46+
asm ("func_a_6: .globl func_a_6");
47+
++global_var;
48+
}
49+
50+
void
51+
func_b (void) /* func_b decl line */
52+
{
53+
/* This label is used to find the start of 'func_b' when generating the
54+
debug information. Place nothing before it. */
55+
asm ("func_b_label: .globl func_b_label");
56+
++global_var;
57+
58+
asm ("func_b_0: .globl func_b_0");
59+
++global_var; /* inline func_a call line */
60+
61+
asm ("func_b_1: .globl func_b_1");
62+
++global_var;
63+
64+
asm ("func_b_2: .globl func_b_2");
65+
++global_var;
66+
67+
asm ("func_b_3: .globl func_b_3");
68+
++global_var;
69+
70+
asm ("func_b_4: .globl func_b_4");
71+
++global_var;
72+
73+
asm ("func_b_5: .globl func_b_5");
74+
++global_var;
75+
}
76+
77+
int
78+
main (void)
79+
{
80+
asm ("main_label: .globl main_label");
81+
func_b ();
82+
func_a ();
83+
}

0 commit comments

Comments
 (0)