Branch data Line data Source code
1 : : /* Basic block reordering routines for the GNU compiler.
2 : : Copyright (C) 2000-2025 Free Software Foundation, Inc.
3 : :
4 : : This file is part of GCC.
5 : :
6 : : GCC is free software; you can redistribute it and/or modify it
7 : : under the terms of the GNU General Public License as published by
8 : : the Free Software Foundation; either version 3, or (at your option)
9 : : any later version.
10 : :
11 : : GCC is distributed in the hope that it will be useful, but WITHOUT
12 : : ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
13 : : or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public
14 : : License for more details.
15 : :
16 : : You should have received a copy of the GNU General Public License
17 : : along with GCC; see the file COPYING3. If not see
18 : : <http://www.gnu.org/licenses/>. */
19 : :
20 : : /* This file contains the "reorder blocks" pass, which changes the control
21 : : flow of a function to encounter fewer branches; the "partition blocks"
22 : : pass, which divides the basic blocks into "hot" and "cold" partitions,
23 : : which are kept separate; and the "duplicate computed gotos" pass, which
24 : : duplicates blocks ending in an indirect jump.
25 : :
26 : : There are two algorithms for "reorder blocks": the "simple" algorithm,
27 : : which just rearranges blocks, trying to minimize the number of executed
28 : : unconditional branches; and the "software trace cache" algorithm, which
29 : : also copies code, and in general tries a lot harder to have long linear
30 : : pieces of machine code executed. This algorithm is described next. */
31 : :
32 : : /* This (greedy) algorithm constructs traces in several rounds.
33 : : The construction starts from "seeds". The seed for the first round
34 : : is the entry point of the function. When there are more than one seed,
35 : : the one with the lowest key in the heap is selected first (see bb_to_key).
36 : : Then the algorithm repeatedly adds the most probable successor to the end
37 : : of a trace. Finally it connects the traces.
38 : :
39 : : There are two parameters: Branch Threshold and Exec Threshold.
40 : : If the probability of an edge to a successor of the current basic block is
41 : : lower than Branch Threshold or its count is lower than Exec Threshold,
42 : : then the successor will be the seed in one of the next rounds.
43 : : Each round has these parameters lower than the previous one.
44 : : The last round has to have these parameters set to zero so that the
45 : : remaining blocks are picked up.
46 : :
47 : : The algorithm selects the most probable successor from all unvisited
48 : : successors and successors that have been added to this trace.
49 : : The other successors (that has not been "sent" to the next round) will be
50 : : other seeds for this round and the secondary traces will start from them.
51 : : If the successor has not been visited in this trace, it is added to the
52 : : trace (however, there is some heuristic for simple branches).
53 : : If the successor has been visited in this trace, a loop has been found.
54 : : If the loop has many iterations, the loop is rotated so that the source
55 : : block of the most probable edge going out of the loop is the last block
56 : : of the trace.
57 : : If the loop has few iterations and there is no edge from the last block of
58 : : the loop going out of the loop, the loop header is duplicated.
59 : :
60 : : When connecting traces, the algorithm first checks whether there is an edge
61 : : from the last block of a trace to the first block of another trace.
62 : : When there are still some unconnected traces it checks whether there exists
63 : : a basic block BB such that BB is a successor of the last block of a trace
64 : : and BB is a predecessor of the first block of another trace. In this case,
65 : : BB is duplicated, added at the end of the first trace and the traces are
66 : : connected through it.
67 : : The rest of traces are simply connected so there will be a jump to the
68 : : beginning of the rest of traces.
69 : :
70 : : The above description is for the full algorithm, which is used when the
71 : : function is optimized for speed. When the function is optimized for size,
72 : : in order to reduce long jumps and connect more fallthru edges, the
73 : : algorithm is modified as follows:
74 : : (1) Break long traces to short ones. A trace is broken at a block that has
75 : : multiple predecessors/ successors during trace discovery. When connecting
76 : : traces, only connect Trace n with Trace n + 1. This change reduces most
77 : : long jumps compared with the above algorithm.
78 : : (2) Ignore the edge probability and count for fallthru edges.
79 : : (3) Keep the original order of blocks when there is no chance to fall
80 : : through. We rely on the results of cfg_cleanup.
81 : :
82 : : To implement the change for code size optimization, block's index is
83 : : selected as the key and all traces are found in one round.
84 : :
85 : : References:
86 : :
87 : : "Software Trace Cache"
88 : : A. Ramirez, J. Larriba-Pey, C. Navarro, J. Torrellas and M. Valero; 1999
89 : : http://citeseer.nj.nec.com/15361.html
90 : :
91 : : */
92 : :
93 : : #include "config.h"
94 : : #include "system.h"
95 : : #include "coretypes.h"
96 : : #include "backend.h"
97 : : #include "target.h"
98 : : #include "rtl.h"
99 : : #include "tree.h"
100 : : #include "cfghooks.h"
101 : : #include "df.h"
102 : : #include "memmodel.h"
103 : : #include "optabs.h"
104 : : #include "regs.h"
105 : : #include "emit-rtl.h"
106 : : #include "output.h"
107 : : #include "expr.h"
108 : : #include "tree-pass.h"
109 : : #include "cfgrtl.h"
110 : : #include "cfganal.h"
111 : : #include "cfgbuild.h"
112 : : #include "cfgcleanup.h"
113 : : #include "bb-reorder.h"
114 : : #include "except.h"
115 : : #include "alloc-pool.h"
116 : : #include "fibonacci_heap.h"
117 : : #include "stringpool.h"
118 : : #include "attribs.h"
119 : : #include "common/common-target.h"
120 : :
121 : : /* The number of rounds. In most cases there will only be 4 rounds, but
122 : : when partitioning hot and cold basic blocks into separate sections of
123 : : the object file there will be an extra round. */
124 : : #define N_ROUNDS 5
125 : :
126 : : struct target_bb_reorder default_target_bb_reorder;
127 : : #if SWITCHABLE_TARGET
128 : : struct target_bb_reorder *this_target_bb_reorder = &default_target_bb_reorder;
129 : : #endif
130 : :
131 : : #define uncond_jump_length \
132 : : (this_target_bb_reorder->x_uncond_jump_length)
133 : :
134 : : /* Branch thresholds in thousandths (per mille) of the REG_BR_PROB_BASE. */
135 : : static const int branch_threshold[N_ROUNDS] = {400, 200, 100, 0, 0};
136 : :
137 : : /* Exec thresholds in thousandths (per mille) of the count of bb 0. */
138 : : static const int exec_threshold[N_ROUNDS] = {500, 200, 50, 0, 0};
139 : :
140 : : /* If edge count is lower than DUPLICATION_THRESHOLD per mille of entry
141 : : block the edge destination is not duplicated while connecting traces. */
142 : : #define DUPLICATION_THRESHOLD 100
143 : :
144 : : typedef fibonacci_heap <long, basic_block_def> bb_heap_t;
145 : : typedef fibonacci_node <long, basic_block_def> bb_heap_node_t;
146 : :
147 : : /* Structure to hold needed information for each basic block. */
148 : : struct bbro_basic_block_data
149 : : {
150 : : /* Which trace is the bb start of (-1 means it is not a start of any). */
151 : : int start_of_trace;
152 : :
153 : : /* Which trace is the bb end of (-1 means it is not an end of any). */
154 : : int end_of_trace;
155 : :
156 : : /* Which trace is the bb in? */
157 : : int in_trace;
158 : :
159 : : /* Which trace was this bb visited in? */
160 : : int visited;
161 : :
162 : : /* Cached maximum frequency of interesting incoming edges.
163 : : Minus one means not yet computed. */
164 : : int priority;
165 : :
166 : : /* Which heap is BB in (if any)? */
167 : : bb_heap_t *heap;
168 : :
169 : : /* Which heap node is BB in (if any)? */
170 : : bb_heap_node_t *node;
171 : : };
172 : :
173 : : /* The current size of the following dynamic array. */
174 : : static int array_size;
175 : :
176 : : /* The array which holds needed information for basic blocks. */
177 : : static bbro_basic_block_data *bbd;
178 : :
179 : : /* To avoid frequent reallocation the size of arrays is greater than needed,
180 : : the number of elements is (not less than) 1.25 * size_wanted. */
181 : : #define GET_ARRAY_SIZE(X) ((((X) / 4) + 1) * 5)
182 : :
183 : : /* Free the memory and set the pointer to NULL. */
184 : : #define FREE(P) (gcc_assert (P), free (P), P = 0)
185 : :
186 : : /* Structure for holding information about a trace. */
187 : : struct trace
188 : : {
189 : : /* First and last basic block of the trace. */
190 : : basic_block first, last;
191 : :
192 : : /* The round of the STC creation which this trace was found in. */
193 : : int round;
194 : :
195 : : /* The length (i.e. the number of basic blocks) of the trace. */
196 : : int length;
197 : : };
198 : :
199 : : /* Maximum count of one of the entry blocks. */
200 : : static profile_count max_entry_count;
201 : :
202 : : /* Local function prototypes. */
203 : : static void find_traces_1_round (int, profile_count, struct trace *, int *,
204 : : int, bb_heap_t **, int);
205 : : static basic_block copy_bb (basic_block, edge, basic_block, int);
206 : : static long bb_to_key (basic_block);
207 : : static bool better_edge_p (const_basic_block, const_edge, profile_probability,
208 : : profile_count, profile_probability, profile_count,
209 : : const_edge);
210 : : static bool copy_bb_p (const_basic_block, int);
211 : :
212 : : /* Return the trace number in which BB was visited. */
213 : :
214 : : static int
215 : 34852157 : bb_visited_trace (const_basic_block bb)
216 : : {
217 : 34852157 : gcc_assert (bb->index < array_size);
218 : 34852157 : return bbd[bb->index].visited;
219 : : }
220 : :
221 : : /* This function marks BB that it was visited in trace number TRACE. */
222 : :
223 : : static void
224 : 9307194 : mark_bb_visited (basic_block bb, int trace)
225 : : {
226 : 9307194 : bbd[bb->index].visited = trace;
227 : 9307194 : if (bbd[bb->index].heap)
228 : : {
229 : 365857 : bbd[bb->index].heap->delete_node (bbd[bb->index].node);
230 : 365857 : bbd[bb->index].heap = NULL;
231 : 365857 : bbd[bb->index].node = NULL;
232 : : }
233 : 9307194 : }
234 : :
235 : : /* Check to see if bb should be pushed into the next round of trace
236 : : collections or not. Reasons for pushing the block forward are 1).
237 : : If the block is cold, we are doing partitioning, and there will be
238 : : another round (cold partition blocks are not supposed to be
239 : : collected into traces until the very last round); or 2). There will
240 : : be another round, and the basic block is not "hot enough" for the
241 : : current round of trace collection. */
242 : :
243 : : static bool
244 : 8117868 : push_to_next_round_p (const_basic_block bb, int round, int number_of_rounds,
245 : : profile_count count_th)
246 : : {
247 : 8117868 : bool there_exists_another_round;
248 : 8117868 : bool block_not_hot_enough;
249 : :
250 : 8117868 : there_exists_another_round = round < number_of_rounds - 1;
251 : :
252 : 8117868 : block_not_hot_enough = (bb->count < count_th
253 : 8117868 : || probably_never_executed_bb_p (cfun, bb));
254 : :
255 : 3976009 : if (there_exists_another_round
256 : : && block_not_hot_enough)
257 : : return true;
258 : : else
259 : 4831385 : return false;
260 : : }
261 : :
262 : : /* Find the traces for Software Trace Cache. Chain each trace through
263 : : RBI()->next. Store the number of traces to N_TRACES and description of
264 : : traces to TRACES. */
265 : :
266 : : static void
267 : 579469 : find_traces (int *n_traces, struct trace *traces)
268 : : {
269 : 579469 : int i;
270 : 579469 : int number_of_rounds;
271 : 579469 : edge e;
272 : 579469 : edge_iterator ei;
273 : 579469 : bb_heap_t *heap = new bb_heap_t (LONG_MIN);
274 : :
275 : : /* Add one extra round of trace collection when partitioning hot/cold
276 : : basic blocks into separate sections. The last round is for all the
277 : : cold blocks (and ONLY the cold blocks). */
278 : :
279 : 579469 : number_of_rounds = N_ROUNDS - 1;
280 : :
281 : : /* Insert entry points of function into heap. */
282 : 579469 : max_entry_count = profile_count::zero ();
283 : 1158938 : FOR_EACH_EDGE (e, ei, ENTRY_BLOCK_PTR_FOR_FN (cfun)->succs)
284 : : {
285 : 579469 : bbd[e->dest->index].heap = heap;
286 : 579469 : bbd[e->dest->index].node = heap->insert (bb_to_key (e->dest), e->dest);
287 : 579469 : if (e->dest->count > max_entry_count)
288 : 578619 : max_entry_count = e->dest->count;
289 : : }
290 : :
291 : : /* Find the traces. */
292 : 2897345 : for (i = 0; i < number_of_rounds; i++)
293 : : {
294 : 2317876 : profile_count count_threshold;
295 : :
296 : 2317876 : if (dump_file)
297 : 92 : fprintf (dump_file, "STC - round %d\n", i + 1);
298 : :
299 : 2317876 : count_threshold = max_entry_count.apply_scale (exec_threshold[i], 1000);
300 : :
301 : 2317876 : find_traces_1_round (REG_BR_PROB_BASE * branch_threshold[i] / 1000,
302 : : count_threshold, traces, n_traces, i, &heap,
303 : : number_of_rounds);
304 : : }
305 : 579469 : delete heap;
306 : :
307 : 579469 : if (dump_file)
308 : : {
309 : 115 : for (i = 0; i < *n_traces; i++)
310 : : {
311 : 92 : basic_block bb;
312 : 92 : fprintf (dump_file, "Trace %d (round %d): ", i + 1,
313 : 92 : traces[i].round + 1);
314 : 92 : for (bb = traces[i].first;
315 : 184 : bb != traces[i].last;
316 : 92 : bb = (basic_block) bb->aux)
317 : : {
318 : 92 : fprintf (dump_file, "%d [", bb->index);
319 : 92 : bb->count.dump (dump_file);
320 : 92 : fprintf (dump_file, "] ");
321 : : }
322 : 92 : fprintf (dump_file, "%d [", bb->index);
323 : 92 : bb->count.dump (dump_file);
324 : 92 : fprintf (dump_file, "]\n");
325 : : }
326 : 23 : fflush (dump_file);
327 : : }
328 : 579469 : }
329 : :
330 : : /* Rotate loop whose back edge is BACK_EDGE in the tail of trace TRACE
331 : : (with sequential number TRACE_N). */
332 : :
333 : : static basic_block
334 : 150473 : rotate_loop (edge back_edge, struct trace *trace, int trace_n)
335 : : {
336 : 150473 : basic_block bb;
337 : :
338 : : /* Information about the best end (end after rotation) of the loop. */
339 : 150473 : basic_block best_bb = NULL;
340 : 150473 : edge best_edge = NULL;
341 : 150473 : profile_count best_count = profile_count::uninitialized ();
342 : : /* The best edge is preferred when its destination is not visited yet
343 : : or is a start block of some trace. */
344 : 150473 : bool is_preferred = false;
345 : :
346 : : /* Find the most frequent edge that goes out from current trace. */
347 : 150473 : bb = back_edge->dest;
348 : 589047 : do
349 : : {
350 : 589047 : edge e;
351 : 589047 : edge_iterator ei;
352 : :
353 : 1655116 : FOR_EACH_EDGE (e, ei, bb->succs)
354 : 1066069 : if (e->dest != EXIT_BLOCK_PTR_FOR_FN (cfun)
355 : 1066069 : && bb_visited_trace (e->dest) != trace_n
356 : : && (e->flags & EDGE_CAN_FALLTHRU)
357 : 1492645 : && !(e->flags & EDGE_COMPLEX))
358 : : {
359 : 387075 : if (is_preferred)
360 : : {
361 : : /* The best edge is preferred. */
362 : 234214 : if (!bb_visited_trace (e->dest)
363 : 234214 : || bbd[e->dest->index].start_of_trace >= 0)
364 : : {
365 : : /* The current edge E is also preferred. */
366 : 230642 : if (e->count () > best_count)
367 : : {
368 : 56631 : best_count = e->count ();
369 : 56631 : best_edge = e;
370 : 56631 : best_bb = bb;
371 : : }
372 : : }
373 : : }
374 : : else
375 : : {
376 : 152861 : if (!bb_visited_trace (e->dest)
377 : 152861 : || bbd[e->dest->index].start_of_trace >= 0)
378 : : {
379 : : /* The current edge E is preferred. */
380 : 149270 : is_preferred = true;
381 : 149270 : best_count = e->count ();
382 : 149270 : best_edge = e;
383 : 149270 : best_bb = bb;
384 : : }
385 : : else
386 : : {
387 : 3591 : if (!best_edge || e->count () > best_count)
388 : : {
389 : 3055 : best_count = e->count ();
390 : 3055 : best_edge = e;
391 : 3055 : best_bb = bb;
392 : : }
393 : : }
394 : : }
395 : : }
396 : 589047 : bb = (basic_block) bb->aux;
397 : : }
398 : 589047 : while (bb != back_edge->dest);
399 : :
400 : 150473 : if (best_bb)
401 : : {
402 : : /* Rotate the loop so that the BEST_EDGE goes out from the last block of
403 : : the trace. */
404 : 150411 : if (back_edge->dest == trace->first)
405 : : {
406 : 5490 : trace->first = (basic_block) best_bb->aux;
407 : : }
408 : : else
409 : : {
410 : : basic_block prev_bb;
411 : :
412 : : for (prev_bb = trace->first;
413 : 449290 : prev_bb->aux != back_edge->dest;
414 : : prev_bb = (basic_block) prev_bb->aux)
415 : : ;
416 : 144921 : prev_bb->aux = best_bb->aux;
417 : :
418 : : /* Try to get rid of uncond jump to cond jump. */
419 : 144921 : if (single_succ_p (prev_bb))
420 : : {
421 : 120808 : basic_block header = single_succ (prev_bb);
422 : :
423 : : /* Duplicate HEADER if it is a small block containing cond jump
424 : : in the end. */
425 : 234092 : if (any_condjump_p (BB_END (header)) && copy_bb_p (header, 0)
426 : 120808 : && !CROSSING_JUMP_P (BB_END (header)))
427 : 0 : copy_bb (header, single_succ_edge (prev_bb), prev_bb, trace_n);
428 : : }
429 : : }
430 : : }
431 : : else
432 : : {
433 : : /* We have not found suitable loop tail so do no rotation. */
434 : 62 : best_bb = back_edge->src;
435 : : }
436 : 150473 : best_bb->aux = NULL;
437 : 150473 : return best_bb;
438 : : }
439 : :
440 : : /* One round of finding traces. Find traces for BRANCH_TH and EXEC_TH i.e. do
441 : : not include basic blocks whose probability is lower than BRANCH_TH or whose
442 : : count is lower than EXEC_TH into traces (or whose count is lower than
443 : : COUNT_TH). Store the new traces into TRACES and modify the number of
444 : : traces *N_TRACES. Set the round (which the trace belongs to) to ROUND.
445 : : The function expects starting basic blocks to be in *HEAP and will delete
446 : : *HEAP and store starting points for the next round into new *HEAP. */
447 : :
448 : : static void
449 : 2317876 : find_traces_1_round (int branch_th, profile_count count_th,
450 : : struct trace *traces, int *n_traces, int round,
451 : : bb_heap_t **heap, int number_of_rounds)
452 : : {
453 : : /* Heap for discarded basic blocks which are possible starting points for
454 : : the next round. */
455 : 2317876 : bb_heap_t *new_heap = new bb_heap_t (LONG_MIN);
456 : 2317876 : bool for_size = optimize_function_for_size_p (cfun);
457 : :
458 : 7874994 : while (!(*heap)->empty ())
459 : : {
460 : 5557118 : basic_block bb;
461 : 5557118 : struct trace *trace;
462 : 5557118 : edge best_edge, e;
463 : 5557118 : long key;
464 : 5557118 : edge_iterator ei;
465 : :
466 : 5557118 : bb = (*heap)->extract_min ();
467 : 5557118 : bbd[bb->index].heap = NULL;
468 : 5557118 : bbd[bb->index].node = NULL;
469 : :
470 : 5557118 : if (dump_file)
471 : 106 : fprintf (dump_file, "Getting bb %d\n", bb->index);
472 : :
473 : : /* If the BB's count is too low, send BB to the next round. When
474 : : partitioning hot/cold blocks into separate sections, make sure all
475 : : the cold blocks (and ONLY the cold blocks) go into the (extra) final
476 : : round. When optimizing for size, do not push to next round. */
477 : :
478 : 5557118 : if (!for_size
479 : 5557118 : && push_to_next_round_p (bb, round, number_of_rounds,
480 : : count_th))
481 : : {
482 : 1472664 : int key = bb_to_key (bb);
483 : 1472664 : bbd[bb->index].heap = new_heap;
484 : 1472664 : bbd[bb->index].node = new_heap->insert (key, bb);
485 : :
486 : 1472664 : if (dump_file)
487 : 14 : fprintf (dump_file,
488 : : " Possible start point of next round: %d (key: %d)\n",
489 : : bb->index, key);
490 : 1472664 : continue;
491 : 1472664 : }
492 : :
493 : 4084454 : trace = traces + *n_traces;
494 : 4084454 : trace->first = bb;
495 : 4084454 : trace->round = round;
496 : 4084454 : trace->length = 0;
497 : 4084454 : bbd[bb->index].in_trace = *n_traces;
498 : 4084454 : (*n_traces)++;
499 : :
500 : 9049160 : do
501 : : {
502 : 9049160 : bool ends_in_call;
503 : :
504 : : /* The probability and count of the best edge. */
505 : 9049160 : profile_probability best_prob = profile_probability::uninitialized ();
506 : 9049160 : profile_count best_count = profile_count::uninitialized ();
507 : :
508 : 9049160 : best_edge = NULL;
509 : 9049160 : mark_bb_visited (bb, *n_traces);
510 : 9049160 : trace->length++;
511 : :
512 : 9049160 : if (dump_file)
513 : 184 : fprintf (dump_file, "Basic block %d was visited in trace %d\n",
514 : : bb->index, *n_traces);
515 : :
516 : 9049160 : ends_in_call = block_ends_with_call_p (bb);
517 : :
518 : : /* Select the successor that will be placed after BB. */
519 : 22694927 : FOR_EACH_EDGE (e, ei, bb->succs)
520 : : {
521 : 13645767 : gcc_assert (!(e->flags & EDGE_FAKE));
522 : :
523 : 13645767 : if (e->dest == EXIT_BLOCK_PTR_FOR_FN (cfun))
524 : 8213461 : continue;
525 : :
526 : 12976156 : if (bb_visited_trace (e->dest)
527 : 12976156 : && bb_visited_trace (e->dest) != *n_traces)
528 : 2400120 : continue;
529 : :
530 : : /* If partitioning hot/cold basic blocks, don't consider edges
531 : : that cross section boundaries. */
532 : 10576036 : if (BB_PARTITION (e->dest) != BB_PARTITION (bb))
533 : 386558 : continue;
534 : :
535 : 10189478 : profile_probability prob = e->probability;
536 : 10189478 : profile_count count = e->dest->count;
537 : :
538 : : /* The only sensible preference for a call instruction is the
539 : : fallthru edge. Don't bother selecting anything else. */
540 : 10189478 : if (ends_in_call)
541 : : {
542 : 967114 : if (e->flags & EDGE_CAN_FALLTHRU)
543 : : {
544 : 505294 : best_edge = e;
545 : 505294 : best_prob = prob;
546 : 505294 : best_count = count;
547 : : }
548 : 967114 : continue;
549 : : }
550 : :
551 : : /* Edge that cannot be fallthru or improbable or infrequent
552 : : successor (i.e. it is unsuitable successor). When optimizing
553 : : for size, ignore the probability and count. */
554 : 12342811 : if (!(e->flags & EDGE_CAN_FALLTHRU) || (e->flags & EDGE_COMPLEX)
555 : 8996806 : || !prob.initialized_p ()
556 : 18217817 : || ((prob.to_reg_br_prob_base () < branch_th
557 : 8995453 : || e->count () < count_th) && (!for_size)))
558 : 3120447 : continue;
559 : :
560 : 6101917 : if (better_edge_p (bb, e, prob, count, best_prob, best_count,
561 : : best_edge))
562 : : {
563 : 5583955 : best_edge = e;
564 : 5583955 : best_prob = prob;
565 : 5583955 : best_count = count;
566 : : }
567 : : }
568 : :
569 : : /* If the best destination has multiple predecessors and can be
570 : : duplicated cheaper than a jump, don't allow it to be added to
571 : : a trace; we'll duplicate it when connecting the traces later.
572 : : However, we need to check that this duplication wouldn't leave
573 : : the best destination with only crossing predecessors, because
574 : : this would change its effective partition from hot to cold. */
575 : 9049160 : if (best_edge
576 : 5615867 : && EDGE_COUNT (best_edge->dest->preds) >= 2
577 : 11451602 : && copy_bb_p (best_edge->dest, 0))
578 : : {
579 : 64846 : bool only_crossing_preds = true;
580 : 64846 : edge e;
581 : 64846 : edge_iterator ei;
582 : 91039 : FOR_EACH_EDGE (e, ei, best_edge->dest->preds)
583 : 90993 : if (e != best_edge && !(e->flags & EDGE_CROSSING))
584 : : {
585 : : only_crossing_preds = false;
586 : : break;
587 : : }
588 : 64846 : if (!only_crossing_preds)
589 : 64800 : best_edge = NULL;
590 : : }
591 : :
592 : : /* If the best destination has multiple successors or predecessors,
593 : : don't allow it to be added when optimizing for size. This makes
594 : : sure predecessors with smaller index are handled before the best
595 : : destination. It breaks long trace and reduces long jumps.
596 : :
597 : : Take if-then-else as an example.
598 : : A
599 : : / \
600 : : B C
601 : : \ /
602 : : D
603 : : If we do not remove the best edge B->D/C->D, the final order might
604 : : be A B D ... C. C is at the end of the program. If D's successors
605 : : and D are complicated, might need long jumps for A->C and C->D.
606 : : Similar issue for order: A C D ... B.
607 : :
608 : : After removing the best edge, the final result will be ABCD/ ACBD.
609 : : It does not add jump compared with the previous order. But it
610 : : reduces the possibility of long jumps. */
611 : 9049160 : if (best_edge && for_size
612 : 9049160 : && (EDGE_COUNT (best_edge->dest->succs) > 1
613 : 127931 : || EDGE_COUNT (best_edge->dest->preds) > 1))
614 : : best_edge = NULL;
615 : :
616 : : /* Add all non-selected successors to the heaps. */
617 : 22694927 : FOR_EACH_EDGE (e, ei, bb->succs)
618 : : {
619 : 22111376 : if (e == best_edge
620 : 8284675 : || e->dest == EXIT_BLOCK_PTR_FOR_FN (cfun)
621 : 21260831 : || bb_visited_trace (e->dest))
622 : 8465609 : continue;
623 : :
624 : 5180158 : key = bb_to_key (e->dest);
625 : :
626 : 5180158 : if (bbd[e->dest->index].heap)
627 : : {
628 : : /* E->DEST is already in some heap. */
629 : 1309316 : if (key != bbd[e->dest->index].node->get_key ())
630 : : {
631 : 0 : if (dump_file)
632 : : {
633 : 0 : fprintf (dump_file,
634 : : "Changing key for bb %d from %ld to %ld.\n",
635 : : e->dest->index,
636 : 0 : (long) bbd[e->dest->index].node->get_key (),
637 : : key);
638 : : }
639 : 0 : bbd[e->dest->index].heap->replace_key
640 : 0 : (bbd[e->dest->index].node, key);
641 : : }
642 : : }
643 : : else
644 : : {
645 : 3870842 : bb_heap_t *which_heap = *heap;
646 : :
647 : 3870842 : profile_probability prob = e->probability;
648 : :
649 : 3870842 : if (!(e->flags & EDGE_CAN_FALLTHRU)
650 : 3870842 : || (e->flags & EDGE_COMPLEX)
651 : 3525859 : || !prob.initialized_p ()
652 : 3524652 : || prob.to_reg_br_prob_base () < branch_th
653 : 5714358 : || e->count () < count_th)
654 : : {
655 : : /* When partitioning hot/cold basic blocks, make sure
656 : : the cold blocks (and only the cold blocks) all get
657 : : pushed to the last round of trace collection. When
658 : : optimizing for size, do not push to next round. */
659 : :
660 : 2895824 : if (!for_size && push_to_next_round_p (e->dest, round,
661 : : number_of_rounds,
662 : : count_th))
663 : : which_heap = new_heap;
664 : : }
665 : :
666 : 3870842 : bbd[e->dest->index].heap = which_heap;
667 : 3870842 : bbd[e->dest->index].node = which_heap->insert (key, e->dest);
668 : :
669 : 3870842 : if (dump_file)
670 : : {
671 : 87 : fprintf (dump_file,
672 : : " Possible start of %s round: %d (key: %ld)\n",
673 : : (which_heap == new_heap) ? "next" : "this",
674 : 87 : e->dest->index, (long) key);
675 : : }
676 : :
677 : : }
678 : : }
679 : :
680 : 9049160 : if (best_edge) /* Suitable successor was found. */
681 : : {
682 : 5361092 : if (bb_visited_trace (best_edge->dest) == *n_traces)
683 : : {
684 : : /* We do nothing with one basic block loops. */
685 : 396386 : if (best_edge->dest != bb)
686 : : {
687 : 557919 : if (best_edge->count ()
688 : 185973 : > best_edge->dest->count.apply_scale (4, 5))
689 : : {
690 : : /* The loop has at least 4 iterations. If the loop
691 : : header is not the first block of the function
692 : : we can rotate the loop. */
693 : :
694 : 150536 : if (best_edge->dest
695 : 150536 : != ENTRY_BLOCK_PTR_FOR_FN (cfun)->next_bb)
696 : : {
697 : 150473 : if (dump_file)
698 : : {
699 : 0 : fprintf (dump_file,
700 : : "Rotating loop %d - %d\n",
701 : : best_edge->dest->index, bb->index);
702 : : }
703 : 150473 : bb->aux = best_edge->dest;
704 : 150473 : bbd[best_edge->dest->index].in_trace =
705 : 150473 : (*n_traces) - 1;
706 : 150473 : bb = rotate_loop (best_edge, trace, *n_traces);
707 : : }
708 : : }
709 : : else
710 : : {
711 : : /* The loop has less than 4 iterations. */
712 : :
713 : 35437 : if (single_succ_p (bb)
714 : 47681 : && copy_bb_p (best_edge->dest,
715 : : optimize_edge_for_speed_p
716 : 12244 : (best_edge)))
717 : : {
718 : 7218 : bb = copy_bb (best_edge->dest, best_edge, bb,
719 : : *n_traces);
720 : 7218 : trace->length++;
721 : : }
722 : : }
723 : : }
724 : :
725 : : /* Terminate the trace. */
726 : 396386 : break;
727 : : }
728 : : else
729 : : {
730 : : /* Check for a situation
731 : :
732 : : A
733 : : /|
734 : : B |
735 : : \|
736 : : C
737 : :
738 : : where
739 : : AB->count () + BC->count () >= AC->count ().
740 : : (i.e. 2 * B->count >= AC->count )
741 : : Best ordering is then A B C.
742 : :
743 : : When optimizing for size, A B C is always the best order.
744 : :
745 : : This situation is created for example by:
746 : :
747 : : if (A) B;
748 : : C;
749 : :
750 : : */
751 : :
752 : 13679483 : FOR_EACH_EDGE (e, ei, bb->succs)
753 : 8728102 : if (e != best_edge
754 : : && (e->flags & EDGE_CAN_FALLTHRU)
755 : 3774365 : && !(e->flags & EDGE_COMPLEX)
756 : 3297058 : && !bb_visited_trace (e->dest)
757 : 2940091 : && single_pred_p (e->dest)
758 : 1529716 : && !(e->flags & EDGE_CROSSING)
759 : 9638775 : && single_succ_p (e->dest)
760 : 923998 : && (single_succ_edge (e->dest)->flags
761 : 923998 : & EDGE_CAN_FALLTHRU)
762 : 819602 : && !(single_succ_edge (e->dest)->flags & EDGE_COMPLEX)
763 : 819602 : && single_succ (e->dest) == best_edge->dest
764 : 9019551 : && (e->dest->count * 2
765 : 9019551 : >= best_edge->count () || for_size))
766 : : {
767 : 13325 : best_edge = e;
768 : 13325 : if (dump_file)
769 : 0 : fprintf (dump_file, "Selecting BB %d\n",
770 : 0 : best_edge->dest->index);
771 : : break;
772 : : }
773 : :
774 : 4964706 : bb->aux = best_edge->dest;
775 : 4964706 : bbd[best_edge->dest->index].in_trace = (*n_traces) - 1;
776 : 4964706 : bb = best_edge->dest;
777 : : }
778 : : }
779 : : }
780 : 8652774 : while (best_edge);
781 : 4084454 : trace->last = bb;
782 : 4084454 : bbd[trace->first->index].start_of_trace = *n_traces - 1;
783 : 4084454 : if (bbd[trace->last->index].end_of_trace != *n_traces - 1)
784 : : {
785 : 4084454 : bbd[trace->last->index].end_of_trace = *n_traces - 1;
786 : : /* Update the cached maximum frequency for interesting predecessor
787 : : edges for successors of the new trace end. */
788 : 9032749 : FOR_EACH_EDGE (e, ei, trace->last->succs)
789 : 4948295 : if (EDGE_FREQUENCY (e) > bbd[e->dest->index].priority)
790 : 3781725 : bbd[e->dest->index].priority = EDGE_FREQUENCY (e);
791 : : }
792 : :
793 : : /* The trace is terminated so we have to recount the keys in heap
794 : : (some block can have a lower key because now one of its predecessors
795 : : is an end of the trace). */
796 : 9032749 : FOR_EACH_EDGE (e, ei, bb->succs)
797 : : {
798 : 8093792 : if (e->dest == EXIT_BLOCK_PTR_FOR_FN (cfun)
799 : 4948295 : || bb_visited_trace (e->dest))
800 : 3145497 : continue;
801 : :
802 : 1802798 : if (bbd[e->dest->index].heap)
803 : : {
804 : 1802798 : key = bb_to_key (e->dest);
805 : 1802798 : if (key != bbd[e->dest->index].node->get_key ())
806 : : {
807 : 1162303 : if (dump_file)
808 : : {
809 : 49 : fprintf (dump_file,
810 : : "Changing key for bb %d from %ld to %ld.\n",
811 : : e->dest->index,
812 : 49 : (long) bbd[e->dest->index].node->get_key (), key);
813 : : }
814 : 1162303 : bbd[e->dest->index].heap->replace_key
815 : 1162303 : (bbd[e->dest->index].node, key);
816 : : }
817 : : }
818 : : }
819 : : }
820 : :
821 : 2317876 : delete (*heap);
822 : :
823 : : /* "Return" the new heap. */
824 : 2317876 : *heap = new_heap;
825 : 2317876 : }
826 : :
827 : : /* Create a duplicate of the basic block OLD_BB and redirect edge E to it, add
828 : : it to trace after BB, mark OLD_BB visited and update pass' data structures
829 : : (TRACE is a number of trace which OLD_BB is duplicated to). */
830 : :
831 : : static basic_block
832 : 258034 : copy_bb (basic_block old_bb, edge e, basic_block bb, int trace)
833 : : {
834 : 258034 : basic_block new_bb;
835 : :
836 : 258034 : new_bb = duplicate_block (old_bb, e, bb);
837 : 258034 : BB_COPY_PARTITION (new_bb, old_bb);
838 : :
839 : 258034 : gcc_assert (e->dest == new_bb);
840 : :
841 : 258034 : if (dump_file)
842 : 0 : fprintf (dump_file,
843 : : "Duplicated bb %d (created bb %d)\n",
844 : : old_bb->index, new_bb->index);
845 : :
846 : 258034 : if (new_bb->index >= array_size
847 : 257713 : || last_basic_block_for_fn (cfun) > array_size)
848 : : {
849 : 321 : int i;
850 : 321 : int new_size;
851 : :
852 : 321 : new_size = MAX (last_basic_block_for_fn (cfun), new_bb->index + 1);
853 : 321 : new_size = GET_ARRAY_SIZE (new_size);
854 : 321 : bbd = XRESIZEVEC (bbro_basic_block_data, bbd, new_size);
855 : 29336 : for (i = array_size; i < new_size; i++)
856 : : {
857 : 29015 : bbd[i].start_of_trace = -1;
858 : 29015 : bbd[i].end_of_trace = -1;
859 : 29015 : bbd[i].in_trace = -1;
860 : 29015 : bbd[i].visited = 0;
861 : 29015 : bbd[i].priority = -1;
862 : 29015 : bbd[i].heap = NULL;
863 : 29015 : bbd[i].node = NULL;
864 : : }
865 : 321 : array_size = new_size;
866 : :
867 : 321 : if (dump_file)
868 : : {
869 : 0 : fprintf (dump_file,
870 : : "Growing the dynamic array to %d elements.\n",
871 : : array_size);
872 : : }
873 : : }
874 : :
875 : 258034 : gcc_assert (!bb_visited_trace (e->dest));
876 : 258034 : mark_bb_visited (new_bb, trace);
877 : 258034 : new_bb->aux = bb->aux;
878 : 258034 : bb->aux = new_bb;
879 : :
880 : 258034 : bbd[new_bb->index].in_trace = trace;
881 : :
882 : 258034 : return new_bb;
883 : : }
884 : :
885 : : /* Compute and return the key (for the heap) of the basic block BB. */
886 : :
887 : : static long
888 : 9035089 : bb_to_key (basic_block bb)
889 : : {
890 : 9035089 : edge e;
891 : 9035089 : edge_iterator ei;
892 : :
893 : : /* Use index as key to align with its original order. */
894 : 9035089 : if (optimize_function_for_size_p (cfun))
895 : 580740 : return bb->index;
896 : :
897 : : /* Do not start in probably never executed blocks. */
898 : :
899 : 8454349 : if (BB_PARTITION (bb) == BB_COLD_PARTITION
900 : 8454349 : || probably_never_executed_bb_p (cfun, bb))
901 : 1858175 : return BB_FREQ_MAX;
902 : :
903 : : /* Prefer blocks whose predecessor is an end of some trace
904 : : or whose predecessor edge is EDGE_DFS_BACK. */
905 : 6596174 : int priority = bbd[bb->index].priority;
906 : 6596174 : if (priority == -1)
907 : : {
908 : 3721236 : priority = 0;
909 : 9076326 : FOR_EACH_EDGE (e, ei, bb->preds)
910 : : {
911 : 5355090 : if ((e->src != ENTRY_BLOCK_PTR_FOR_FN (cfun)
912 : 4809680 : && bbd[e->src->index].end_of_trace >= 0)
913 : 5355090 : || (e->flags & EDGE_DFS_BACK))
914 : : {
915 : 41968 : int edge_freq = EDGE_FREQUENCY (e);
916 : :
917 : 41968 : if (edge_freq > priority)
918 : 5355090 : priority = edge_freq;
919 : : }
920 : : }
921 : 3721236 : bbd[bb->index].priority = priority;
922 : : }
923 : :
924 : 6596174 : if (priority)
925 : : /* The block with priority should have significantly lower key. */
926 : 1411249 : return -(100 * BB_FREQ_MAX + 100 * priority + bb->count.to_frequency (cfun));
927 : :
928 : 5184925 : return -bb->count.to_frequency (cfun);
929 : : }
930 : :
931 : : /* Return true when the edge E from basic block BB is better than the temporary
932 : : best edge (details are in function). The probability of edge E is PROB. The
933 : : count of the successor is COUNT. The current best probability is
934 : : BEST_PROB, the best count is BEST_COUNT.
935 : : The edge is considered to be equivalent when PROB does not differ much from
936 : : BEST_PROB; similarly for count. */
937 : :
938 : : static bool
939 : 6101917 : better_edge_p (const_basic_block bb, const_edge e, profile_probability prob,
940 : : profile_count count, profile_probability best_prob,
941 : : profile_count best_count, const_edge cur_best_edge)
942 : : {
943 : 6101917 : bool is_better_edge;
944 : :
945 : : /* The BEST_* values do not have to be best, but can be a bit smaller than
946 : : maximum values. */
947 : 6101917 : profile_probability diff_prob = best_prob / 10;
948 : :
949 : : /* The smaller one is better to keep the original order. */
950 : 6101917 : if (optimize_function_for_size_p (cfun))
951 : 379206 : return !cur_best_edge
952 : 710826 : || cur_best_edge->dest->index > e->dest->index;
953 : :
954 : : /* Those edges are so expensive that continuing a trace is not useful
955 : : performance wise. */
956 : 5722711 : if (e->flags & (EDGE_ABNORMAL | EDGE_EH))
957 : : return false;
958 : :
959 : 11140123 : if (prob > best_prob + diff_prob
960 : 5417412 : || (!best_prob.initialized_p ()
961 : 10005983 : && prob > profile_probability::guessed_never ()))
962 : : /* The edge has higher probability than the temporary best edge. */
963 : : is_better_edge = true;
964 : 872506 : else if (prob < best_prob - diff_prob)
965 : : /* The edge has lower probability than the temporary best edge. */
966 : : is_better_edge = false;
967 : : else
968 : : {
969 : 267030 : profile_count diff_count = best_count / 10;
970 : 267030 : if (count < best_count - diff_count
971 : 267030 : || (!best_count.initialized_p ()
972 : 64501 : && count.nonzero_p ()))
973 : : /* The edge and the temporary best edge have almost equivalent
974 : : probabilities. The higher countuency of a successor now means
975 : : that there is another edge going into that successor.
976 : : This successor has lower countuency so it is better. */
977 : : is_better_edge = true;
978 : 202529 : else if (count > best_count + diff_count)
979 : : /* This successor has higher countuency so it is worse. */
980 : : is_better_edge = false;
981 : 109954 : else if (e->dest->prev_bb == bb)
982 : : /* The edges have equivalent probabilities and the successors
983 : : have equivalent frequencies. Select the previous successor. */
984 : : is_better_edge = true;
985 : : else
986 : 267030 : is_better_edge = false;
987 : : }
988 : :
989 : : return is_better_edge;
990 : : }
991 : :
992 : : /* Return true when the edge E is better than the temporary best edge
993 : : CUR_BEST_EDGE. If SRC_INDEX_P is true, the function compares the src bb of
994 : : E and CUR_BEST_EDGE; otherwise it will compare the dest bb.
995 : : BEST_LEN is the trace length of src (or dest) bb in CUR_BEST_EDGE.
996 : : TRACES record the information about traces.
997 : : When optimizing for size, the edge with smaller index is better.
998 : : When optimizing for speed, the edge with bigger probability or longer trace
999 : : is better. */
1000 : :
1001 : : static bool
1002 : 1690409 : connect_better_edge_p (const_edge e, bool src_index_p, int best_len,
1003 : : const_edge cur_best_edge, struct trace *traces)
1004 : : {
1005 : 1690409 : int e_index;
1006 : 1690409 : int b_index;
1007 : 1690409 : bool is_better_edge;
1008 : :
1009 : 1690409 : if (!cur_best_edge)
1010 : : return true;
1011 : :
1012 : 421370 : if (optimize_function_for_size_p (cfun))
1013 : : {
1014 : 45754 : e_index = src_index_p ? e->src->index : e->dest->index;
1015 : 45754 : b_index = src_index_p ? cur_best_edge->src->index
1016 : 43603 : : cur_best_edge->dest->index;
1017 : : /* The smaller one is better to keep the original order. */
1018 : 45754 : return b_index > e_index;
1019 : : }
1020 : :
1021 : 375616 : if (src_index_p)
1022 : : {
1023 : 35515 : e_index = e->src->index;
1024 : :
1025 : : /* We are looking for predecessor, so probabilities are not that
1026 : : informative. We do not want to connect A to B because A has
1027 : : only one successor (probability is 100%) while there is edge
1028 : : A' to B where probability is 90% but which is much more frequent. */
1029 : 35515 : if (e->count () > cur_best_edge->count ())
1030 : : /* The edge has higher probability than the temporary best edge. */
1031 : : is_better_edge = true;
1032 : 25206 : else if (e->count () < cur_best_edge->count ())
1033 : : /* The edge has lower probability than the temporary best edge. */
1034 : : is_better_edge = false;
1035 : 10045 : else if (e->probability > cur_best_edge->probability)
1036 : : /* The edge has higher probability than the temporary best edge. */
1037 : : is_better_edge = true;
1038 : 9561 : else if (e->probability < cur_best_edge->probability)
1039 : : /* The edge has lower probability than the temporary best edge. */
1040 : : is_better_edge = false;
1041 : 8580 : else if (traces[bbd[e_index].end_of_trace].length > best_len)
1042 : : /* The edge and the temporary best edge have equivalent probabilities.
1043 : : The edge with longer trace is better. */
1044 : : is_better_edge = true;
1045 : : else
1046 : : is_better_edge = false;
1047 : : }
1048 : : else
1049 : : {
1050 : 340101 : e_index = e->dest->index;
1051 : :
1052 : 340101 : if (e->probability > cur_best_edge->probability)
1053 : : /* The edge has higher probability than the temporary best edge. */
1054 : : is_better_edge = true;
1055 : 215705 : else if (e->probability < cur_best_edge->probability)
1056 : : /* The edge has lower probability than the temporary best edge. */
1057 : : is_better_edge = false;
1058 : 93180 : else if (traces[bbd[e_index].start_of_trace].length > best_len)
1059 : : /* The edge and the temporary best edge have equivalent probabilities.
1060 : : The edge with longer trace is better. */
1061 : : is_better_edge = true;
1062 : : else
1063 : : is_better_edge = false;
1064 : : }
1065 : :
1066 : : return is_better_edge;
1067 : : }
1068 : :
1069 : : /* Connect traces in array TRACES, N_TRACES is the count of traces. */
1070 : :
1071 : : static void
1072 : 579469 : connect_traces (int n_traces, struct trace *traces)
1073 : : {
1074 : 579469 : int i;
1075 : 579469 : bool *connected;
1076 : 579469 : bool two_passes;
1077 : 579469 : int last_trace;
1078 : 579469 : int current_pass;
1079 : 579469 : int current_partition;
1080 : 579469 : profile_count count_threshold;
1081 : 579469 : bool for_size = optimize_function_for_size_p (cfun);
1082 : :
1083 : 579469 : count_threshold = max_entry_count.apply_scale (DUPLICATION_THRESHOLD, 1000);
1084 : :
1085 : 579469 : connected = XCNEWVEC (bool, n_traces);
1086 : 579469 : last_trace = -1;
1087 : 579469 : current_pass = 1;
1088 : 579469 : current_partition = BB_PARTITION (traces[0].first);
1089 : 579469 : two_passes = false;
1090 : :
1091 : 579469 : if (crtl->has_bb_partition)
1092 : 469407 : for (i = 0; i < n_traces && !two_passes; i++)
1093 : 409323 : if (BB_PARTITION (traces[0].first)
1094 : 409323 : != BB_PARTITION (traces[i].first))
1095 : 60080 : two_passes = true;
1096 : :
1097 : 5215761 : for (i = 0; i < n_traces || (two_passes && current_pass == 1) ; i++)
1098 : : {
1099 : 4636292 : int t = i;
1100 : 4636292 : int t2;
1101 : 4636292 : edge e, best;
1102 : 4636292 : int best_len;
1103 : :
1104 : 4636292 : if (i >= n_traces)
1105 : : {
1106 : 60080 : gcc_assert (two_passes && current_pass == 1);
1107 : 60080 : i = 0;
1108 : 60080 : t = i;
1109 : 60080 : current_pass = 2;
1110 : 60080 : if (current_partition == BB_HOT_PARTITION)
1111 : : current_partition = BB_COLD_PARTITION;
1112 : : else
1113 : 0 : current_partition = BB_HOT_PARTITION;
1114 : : }
1115 : :
1116 : 4636292 : if (connected[t])
1117 : 1847812 : continue;
1118 : :
1119 : 2945854 : if (two_passes
1120 : 555340 : && BB_PARTITION (traces[t].first) != current_partition)
1121 : 157374 : continue;
1122 : :
1123 : 2788480 : connected[t] = true;
1124 : :
1125 : : /* Find the predecessor traces. */
1126 : 2881674 : for (t2 = t; t2 > 0;)
1127 : : {
1128 : 2302205 : edge_iterator ei;
1129 : 2302205 : best = NULL;
1130 : 2302205 : best_len = 0;
1131 : 5791147 : FOR_EACH_EDGE (e, ei, traces[t2].first->preds)
1132 : : {
1133 : 3488942 : int si = e->src->index;
1134 : :
1135 : 3488942 : if (e->src != ENTRY_BLOCK_PTR_FOR_FN (cfun)
1136 : : && (e->flags & EDGE_CAN_FALLTHRU)
1137 : 3488942 : && !(e->flags & EDGE_COMPLEX)
1138 : 2739845 : && bbd[si].end_of_trace >= 0
1139 : 458703 : && !connected[bbd[si].end_of_trace]
1140 : 130865 : && (BB_PARTITION (e->src) == current_partition)
1141 : 3619802 : && connect_better_edge_p (e, true, best_len, best, traces))
1142 : : {
1143 : 105054 : best = e;
1144 : 105054 : best_len = traces[bbd[si].end_of_trace].length;
1145 : : }
1146 : : }
1147 : 2302205 : if (best)
1148 : : {
1149 : 93194 : best->src->aux = best->dest;
1150 : 93194 : t2 = bbd[best->src->index].end_of_trace;
1151 : 93194 : connected[t2] = true;
1152 : :
1153 : 93194 : if (dump_file)
1154 : : {
1155 : 0 : fprintf (dump_file, "Connection: %d %d\n",
1156 : : best->src->index, best->dest->index);
1157 : : }
1158 : : }
1159 : : else
1160 : : break;
1161 : : }
1162 : :
1163 : 2788480 : if (last_trace >= 0)
1164 : 2209011 : traces[last_trace].last->aux = traces[t2].first;
1165 : : last_trace = t;
1166 : :
1167 : : /* Find the successor traces. */
1168 : 5194040 : while (1)
1169 : : {
1170 : : /* Find the continuation of the chain. */
1171 : 3991260 : edge_iterator ei;
1172 : 3991260 : best = NULL;
1173 : 3991260 : best_len = 0;
1174 : 8808818 : FOR_EACH_EDGE (e, ei, traces[t].last->succs)
1175 : : {
1176 : 4817558 : int di = e->dest->index;
1177 : :
1178 : 4817558 : if (e->dest != EXIT_BLOCK_PTR_FOR_FN (cfun)
1179 : : && (e->flags & EDGE_CAN_FALLTHRU)
1180 : 4147947 : && !(e->flags & EDGE_COMPLEX)
1181 : 3843936 : && bbd[di].start_of_trace >= 0
1182 : 2085833 : && !connected[bbd[di].start_of_trace]
1183 : 1575487 : && (BB_PARTITION (e->dest) == current_partition)
1184 : 6377107 : && connect_better_edge_p (e, false, best_len, best, traces))
1185 : : {
1186 : 1361860 : best = e;
1187 : 1361860 : best_len = traces[bbd[di].start_of_trace].length;
1188 : : }
1189 : : }
1190 : :
1191 : 3991260 : if (for_size)
1192 : : {
1193 : 259918 : if (!best)
1194 : : /* Stop finding the successor traces. */
1195 : : break;
1196 : :
1197 : : /* It is OK to connect block n with block n + 1 or a block
1198 : : before n. For others, only connect to the loop header. */
1199 : 186462 : if (best->dest->index > (traces[t].last->index + 1))
1200 : : {
1201 : 20423 : int count = EDGE_COUNT (best->dest->preds);
1202 : :
1203 : 325495 : FOR_EACH_EDGE (e, ei, best->dest->preds)
1204 : 305072 : if (e->flags & EDGE_DFS_BACK)
1205 : 4768 : count--;
1206 : :
1207 : : /* If dest has multiple predecessors, skip it. We expect
1208 : : that one predecessor with smaller index connects with it
1209 : : later. */
1210 : 20423 : if (count != 1)
1211 : : break;
1212 : : }
1213 : :
1214 : : /* Only connect Trace n with Trace n + 1. It is conservative
1215 : : to keep the order as close as possible to the original order.
1216 : : It also helps to reduce long jumps. */
1217 : 172325 : if (last_trace != bbd[best->dest->index].start_of_trace - 1)
1218 : : break;
1219 : :
1220 : 170010 : if (dump_file)
1221 : 5 : fprintf (dump_file, "Connection: %d %d\n",
1222 : 5 : best->src->index, best->dest->index);
1223 : :
1224 : 170010 : t = bbd[best->dest->index].start_of_trace;
1225 : 170010 : traces[last_trace].last->aux = traces[t].first;
1226 : 170010 : connected[t] = true;
1227 : 170010 : last_trace = t;
1228 : : }
1229 : 3731342 : else if (best)
1230 : : {
1231 : 989383 : if (dump_file)
1232 : : {
1233 : 49 : fprintf (dump_file, "Connection: %d %d\n",
1234 : 49 : best->src->index, best->dest->index);
1235 : : }
1236 : 989383 : t = bbd[best->dest->index].start_of_trace;
1237 : 989383 : traces[last_trace].last->aux = traces[t].first;
1238 : 989383 : connected[t] = true;
1239 : 989383 : last_trace = t;
1240 : : }
1241 : : else
1242 : : {
1243 : : /* Try to connect the traces by duplication of 1 block. */
1244 : 2741959 : edge e2;
1245 : 2741959 : basic_block next_bb = NULL;
1246 : 2741959 : bool try_copy = false;
1247 : :
1248 : 5362257 : FOR_EACH_EDGE (e, ei, traces[t].last->succs)
1249 : 2620298 : if (e->dest != EXIT_BLOCK_PTR_FOR_FN (cfun)
1250 : : && (e->flags & EDGE_CAN_FALLTHRU)
1251 : 1985953 : && !(e->flags & EDGE_COMPLEX)
1252 : 6294571 : && (!best || e->probability > best->probability))
1253 : : {
1254 : 1673952 : edge_iterator ei;
1255 : 1673952 : edge best2 = NULL;
1256 : 1673952 : int best2_len = 0;
1257 : :
1258 : : /* If the destination is a start of a trace which is only
1259 : : one block long, then no need to search the successor
1260 : : blocks of the trace. Accept it. */
1261 : 1673952 : if (bbd[e->dest->index].start_of_trace >= 0
1262 : 373946 : && traces[bbd[e->dest->index].start_of_trace].length
1263 : : == 1)
1264 : : {
1265 : 274804 : best = e;
1266 : 274804 : try_copy = true;
1267 : 274804 : continue;
1268 : : }
1269 : :
1270 : 3588502 : FOR_EACH_EDGE (e2, ei, e->dest->succs)
1271 : : {
1272 : 2189354 : int di = e2->dest->index;
1273 : :
1274 : 2189354 : if (e2->dest == EXIT_BLOCK_PTR_FOR_FN (cfun)
1275 : 2189354 : || ((e2->flags & EDGE_CAN_FALLTHRU)
1276 : 1934261 : && !(e2->flags & EDGE_COMPLEX)
1277 : 1807799 : && bbd[di].start_of_trace >= 0
1278 : 644666 : && !connected[bbd[di].start_of_trace]
1279 : 226028 : && BB_PARTITION (e2->dest) == current_partition
1280 : 213455 : && e2->count () >= count_threshold
1281 : 101582 : && (!best2
1282 : 824 : || e2->probability > best2->probability
1283 : 1833470 : || (e2->probability == best2->probability
1284 : 145 : && traces[bbd[di].start_of_trace].length
1285 : : > best2_len))))
1286 : : {
1287 : 356029 : best = e;
1288 : 356029 : best2 = e2;
1289 : 356029 : if (e2->dest != EXIT_BLOCK_PTR_FOR_FN (cfun))
1290 : 100936 : best2_len = traces[bbd[di].start_of_trace].length;
1291 : : else
1292 : : best2_len = INT_MAX;
1293 : : next_bb = e2->dest;
1294 : : try_copy = true;
1295 : : }
1296 : : }
1297 : : }
1298 : :
1299 : : /* Copy tiny blocks always; copy larger blocks only when the
1300 : : edge is traversed frequently enough. */
1301 : 2741959 : if (try_copy
1302 : 624219 : && BB_PARTITION (best->src) == BB_PARTITION (best->dest)
1303 : 3364181 : && copy_bb_p (best->dest,
1304 : 622222 : optimize_edge_for_speed_p (best)
1305 : 989983 : && (!best->count ().initialized_p ()
1306 : 2858830 : || best->count () >= count_threshold)))
1307 : : {
1308 : 250816 : basic_block new_bb;
1309 : :
1310 : 250816 : if (dump_file)
1311 : : {
1312 : 0 : fprintf (dump_file, "Connection: %d %d ",
1313 : 0 : traces[t].last->index, best->dest->index);
1314 : 0 : if (!next_bb)
1315 : 0 : fputc ('\n', dump_file);
1316 : 0 : else if (next_bb == EXIT_BLOCK_PTR_FOR_FN (cfun))
1317 : 0 : fprintf (dump_file, "exit\n");
1318 : : else
1319 : 0 : fprintf (dump_file, "%d\n", next_bb->index);
1320 : : }
1321 : :
1322 : 250816 : new_bb = copy_bb (best->dest, best, traces[t].last, t);
1323 : 250816 : traces[t].last = new_bb;
1324 : 250816 : if (next_bb && next_bb != EXIT_BLOCK_PTR_FOR_FN (cfun))
1325 : : {
1326 : 43387 : t = bbd[next_bb->index].start_of_trace;
1327 : 43387 : traces[last_trace].last->aux = traces[t].first;
1328 : 43387 : connected[t] = true;
1329 : 43387 : last_trace = t;
1330 : : }
1331 : : else
1332 : : break; /* Stop finding the successor traces. */
1333 : : }
1334 : : else
1335 : : break; /* Stop finding the successor traces. */
1336 : : }
1337 : 1202780 : }
1338 : : }
1339 : :
1340 : 579469 : if (dump_file)
1341 : : {
1342 : 23 : basic_block bb;
1343 : :
1344 : 23 : fprintf (dump_file, "Final order:\n");
1345 : 207 : for (bb = traces[0].first; bb; bb = (basic_block) bb->aux)
1346 : 184 : fprintf (dump_file, "%d ", bb->index);
1347 : 23 : fprintf (dump_file, "\n");
1348 : 23 : fflush (dump_file);
1349 : : }
1350 : :
1351 : 579469 : FREE (connected);
1352 : 579469 : }
1353 : :
1354 : : /* Return true when BB can and should be copied. CODE_MAY_GROW is true
1355 : : when code size is allowed to grow by duplication. */
1356 : :
1357 : : static bool
1358 : 3150192 : copy_bb_p (const_basic_block bb, int code_may_grow)
1359 : : {
1360 : 3150192 : unsigned int size = 0;
1361 : 3150192 : unsigned int max_size = uncond_jump_length;
1362 : 3150192 : rtx_insn *insn;
1363 : :
1364 : 5974217 : if (EDGE_COUNT (bb->preds) < 2)
1365 : : return false;
1366 : 3146905 : if (!can_duplicate_block_p (bb))
1367 : : return false;
1368 : :
1369 : : /* Avoid duplicating blocks which have many successors (PR/13430). */
1370 : 3145561 : if (EDGE_COUNT (bb->succs) > 8)
1371 : : return false;
1372 : :
1373 : 3145557 : if (code_may_grow && optimize_bb_for_speed_p (bb))
1374 : 263823 : max_size *= param_max_grow_copy_bb_insns;
1375 : :
1376 : 21293249 : FOR_BB_INSNS (bb, insn)
1377 : : {
1378 : 20970369 : if (INSN_P (insn))
1379 : : {
1380 : 13759719 : size += get_attr_min_length (insn);
1381 : 13759719 : if (size > max_size)
1382 : : break;
1383 : : }
1384 : : }
1385 : :
1386 : 3145557 : if (size <= max_size)
1387 : : return true;
1388 : :
1389 : 2822677 : if (dump_file)
1390 : : {
1391 : 57 : fprintf (dump_file,
1392 : : "Block %d can't be copied because its size = %u.\n",
1393 : 57 : bb->index, size);
1394 : : }
1395 : :
1396 : : return false;
1397 : : }
1398 : :
1399 : : /* Return the length of unconditional jump instruction. */
1400 : :
1401 : : int
1402 : 744560 : get_uncond_jump_length (void)
1403 : : {
1404 : 744560 : unsigned int length;
1405 : :
1406 : 744560 : start_sequence ();
1407 : 744560 : rtx_code_label *label = emit_label (gen_label_rtx ());
1408 : 744560 : rtx_insn *jump = emit_jump_insn (targetm.gen_jump (label));
1409 : 744560 : length = get_attr_min_length (jump);
1410 : 744560 : end_sequence ();
1411 : :
1412 : 744560 : gcc_assert (length < INT_MAX);
1413 : 744560 : return length;
1414 : : }
1415 : :
1416 : : /* Create a forwarder block to OLD_BB starting with NEW_LABEL and in the
1417 : : other partition wrt OLD_BB. */
1418 : :
1419 : : static basic_block
1420 : 4610 : create_eh_forwarder_block (rtx_code_label *new_label, basic_block old_bb)
1421 : : {
1422 : : /* Split OLD_BB, so that EH pads have always only incoming EH edges,
1423 : : bb_has_eh_pred bbs are treated specially by DF infrastructure. */
1424 : 4610 : old_bb = split_block_after_labels (old_bb)->dest;
1425 : :
1426 : : /* Put the new label and a jump in the new basic block. */
1427 : 4610 : rtx_insn *label = emit_label (new_label);
1428 : 4610 : rtx_code_label *old_label = block_label (old_bb);
1429 : 4610 : rtx_insn *jump = emit_jump_insn (targetm.gen_jump (old_label));
1430 : 4610 : JUMP_LABEL (jump) = old_label;
1431 : :
1432 : : /* Create the new basic block and put it in last position. */
1433 : 4610 : basic_block last_bb = EXIT_BLOCK_PTR_FOR_FN (cfun)->prev_bb;
1434 : 4610 : basic_block new_bb = create_basic_block (label, jump, last_bb);
1435 : 4610 : new_bb->aux = last_bb->aux;
1436 : 4610 : new_bb->count = old_bb->count;
1437 : 4610 : last_bb->aux = new_bb;
1438 : :
1439 : 4610 : emit_barrier_after_bb (new_bb);
1440 : :
1441 : 4610 : make_single_succ_edge (new_bb, old_bb, 0);
1442 : :
1443 : : /* Make sure the new basic block is in the other partition. */
1444 : 4610 : unsigned new_partition = BB_PARTITION (old_bb);
1445 : 4610 : new_partition ^= BB_HOT_PARTITION | BB_COLD_PARTITION;
1446 : 4610 : BB_SET_PARTITION (new_bb, new_partition);
1447 : :
1448 : 4610 : return new_bb;
1449 : : }
1450 : :
1451 : : /* The common landing pad in block OLD_BB has edges from both partitions.
1452 : : Add a new landing pad that will just jump to the old one and split the
1453 : : edges so that no EH edge crosses partitions. */
1454 : :
1455 : : static void
1456 : 0 : sjlj_fix_up_crossing_landing_pad (basic_block old_bb)
1457 : : {
1458 : 0 : const unsigned lp_len = cfun->eh->lp_array->length ();
1459 : 0 : edge_iterator ei;
1460 : 0 : edge e;
1461 : :
1462 : : /* Generate the new common landing-pad label. */
1463 : 0 : rtx_code_label *new_label = gen_label_rtx ();
1464 : 0 : LABEL_PRESERVE_P (new_label) = 1;
1465 : :
1466 : : /* Create the forwarder block. */
1467 : 0 : basic_block new_bb = create_eh_forwarder_block (new_label, old_bb);
1468 : :
1469 : : /* Create the map from old to new lp index and initialize it. */
1470 : 0 : unsigned *index_map = (unsigned *) alloca (lp_len * sizeof (unsigned));
1471 : 0 : memset (index_map, 0, lp_len * sizeof (unsigned));
1472 : :
1473 : : /* Fix up the edges. */
1474 : 0 : for (ei = ei_start (old_bb->preds); (e = ei_safe_edge (ei)) != NULL; )
1475 : 0 : if (e->src != new_bb && BB_PARTITION (e->src) == BB_PARTITION (new_bb))
1476 : : {
1477 : 0 : rtx_insn *insn = BB_END (e->src);
1478 : 0 : rtx note = find_reg_note (insn, REG_EH_REGION, NULL_RTX);
1479 : :
1480 : 0 : gcc_assert (note != NULL);
1481 : 0 : const unsigned old_index = INTVAL (XEXP (note, 0));
1482 : :
1483 : : /* Generate the new landing-pad structure. */
1484 : 0 : if (index_map[old_index] == 0)
1485 : : {
1486 : 0 : eh_landing_pad old_lp = (*cfun->eh->lp_array)[old_index];
1487 : 0 : eh_landing_pad new_lp = gen_eh_landing_pad (old_lp->region);
1488 : 0 : new_lp->post_landing_pad = old_lp->post_landing_pad;
1489 : 0 : new_lp->landing_pad = new_label;
1490 : 0 : index_map[old_index] = new_lp->index;
1491 : : }
1492 : 0 : XEXP (note, 0) = GEN_INT (index_map[old_index]);
1493 : :
1494 : : /* Adjust the edge to the new destination. */
1495 : 0 : redirect_edge_succ (e, new_bb);
1496 : 0 : }
1497 : : else
1498 : 0 : ei_next (&ei);
1499 : 0 : }
1500 : :
1501 : : /* The landing pad OLD_LP, in block OLD_BB, has edges from both partitions.
1502 : : Add a new landing pad that will just jump to the old one and split the
1503 : : edges so that no EH edge crosses partitions. */
1504 : :
1505 : : static void
1506 : 4610 : dw2_fix_up_crossing_landing_pad (eh_landing_pad old_lp, basic_block old_bb)
1507 : : {
1508 : 4610 : eh_landing_pad new_lp;
1509 : 4610 : edge_iterator ei;
1510 : 4610 : edge e;
1511 : :
1512 : : /* Generate the new landing-pad structure. */
1513 : 4610 : new_lp = gen_eh_landing_pad (old_lp->region);
1514 : 4610 : new_lp->post_landing_pad = old_lp->post_landing_pad;
1515 : 4610 : new_lp->landing_pad = gen_label_rtx ();
1516 : 4610 : LABEL_PRESERVE_P (new_lp->landing_pad) = 1;
1517 : :
1518 : : /* Create the forwarder block. */
1519 : 4610 : basic_block new_bb = create_eh_forwarder_block (new_lp->landing_pad, old_bb);
1520 : :
1521 : : /* Fix up the edges. */
1522 : 37731 : for (ei = ei_start (old_bb->preds); (e = ei_safe_edge (ei)) != NULL; )
1523 : 33121 : if (e->src != new_bb && BB_PARTITION (e->src) == BB_PARTITION (new_bb))
1524 : : {
1525 : 25309 : rtx_insn *insn = BB_END (e->src);
1526 : 25309 : rtx note = find_reg_note (insn, REG_EH_REGION, NULL_RTX);
1527 : :
1528 : 25309 : gcc_assert (note != NULL);
1529 : 25309 : gcc_checking_assert (INTVAL (XEXP (note, 0)) == old_lp->index);
1530 : 25309 : XEXP (note, 0) = GEN_INT (new_lp->index);
1531 : :
1532 : : /* Adjust the edge to the new destination. */
1533 : 25309 : redirect_edge_succ (e, new_bb);
1534 : 25309 : }
1535 : : else
1536 : 7812 : ei_next (&ei);
1537 : 4610 : }
1538 : :
1539 : :
1540 : : /* Ensure that all hot bbs are included in a hot path through the
1541 : : procedure. This is done by calling this function twice, once
1542 : : with WALK_UP true (to look for paths from the entry to hot bbs) and
1543 : : once with WALK_UP false (to look for paths from hot bbs to the exit).
1544 : : Returns the updated value of COLD_BB_COUNT and adds newly-hot bbs
1545 : : to BBS_IN_HOT_PARTITION. */
1546 : :
1547 : : static unsigned int
1548 : 120544 : sanitize_hot_paths (bool walk_up, unsigned int cold_bb_count,
1549 : : vec<basic_block> *bbs_in_hot_partition)
1550 : : {
1551 : : /* Callers check this. */
1552 : 120544 : gcc_checking_assert (cold_bb_count);
1553 : :
1554 : : /* Keep examining hot bbs while we still have some left to check
1555 : : and there are remaining cold bbs. */
1556 : 120544 : vec<basic_block> hot_bbs_to_check = bbs_in_hot_partition->copy ();
1557 : 120544 : while (! hot_bbs_to_check.is_empty ()
1558 : 4799023 : && cold_bb_count)
1559 : : {
1560 : 2339153 : basic_block bb = hot_bbs_to_check.pop ();
1561 : 2339153 : vec<edge, va_gc> *edges = walk_up ? bb->preds : bb->succs;
1562 : 2339153 : edge e;
1563 : 2339153 : edge_iterator ei;
1564 : 2339153 : profile_probability highest_probability
1565 : 2339153 : = profile_probability::uninitialized ();
1566 : 2339153 : profile_count highest_count = profile_count::uninitialized ();
1567 : 2339153 : bool found = false;
1568 : :
1569 : : /* Walk the preds/succs and check if there is at least one already
1570 : : marked hot. Keep track of the most frequent pred/succ so that we
1571 : : can mark it hot if we don't find one. */
1572 : 2809741 : FOR_EACH_EDGE (e, ei, edges)
1573 : : {
1574 : 2772724 : basic_block reach_bb = walk_up ? e->src : e->dest;
1575 : :
1576 : 2772724 : if (e->flags & EDGE_DFS_BACK)
1577 : 91641 : continue;
1578 : :
1579 : : /* Do not expect profile insanities when profile was not adjusted. */
1580 : 2692961 : if (e->probability == profile_probability::never ()
1581 : 2321449 : || e->count () == profile_count::zero ())
1582 : 361234 : continue;
1583 : :
1584 : 2319849 : if (BB_PARTITION (reach_bb) != BB_COLD_PARTITION)
1585 : : {
1586 : : found = true;
1587 : : break;
1588 : : }
1589 : : /* The following loop will look for the hottest edge via
1590 : : the edge count, if it is non-zero, then fallback to
1591 : : the edge probability. */
1592 : 17713 : if (!(e->count () > highest_count))
1593 : 17703 : highest_count = e->count ();
1594 : 17713 : if (!highest_probability.initialized_p ()
1595 : 488301 : || e->probability > highest_probability)
1596 : 17701 : highest_probability = e->probability;
1597 : : }
1598 : :
1599 : : /* If bb is reached by (or reaches, in the case of !WALK_UP) another hot
1600 : : block (or unpartitioned, e.g. the entry block) then it is ok. If not,
1601 : : then the most frequent pred (or succ) needs to be adjusted. In the
1602 : : case where multiple preds/succs have the same frequency (e.g. a
1603 : : 50-50 branch), then both will be adjusted. */
1604 : 2339153 : if (found)
1605 : 2302136 : continue;
1606 : :
1607 : 72192 : FOR_EACH_EDGE (e, ei, edges)
1608 : : {
1609 : 35175 : if (e->flags & EDGE_DFS_BACK)
1610 : 30782 : continue;
1611 : : /* Do not expect profile insanities when profile was not adjusted. */
1612 : 21312 : if (e->probability == profile_probability::never ()
1613 : 4482 : || e->count () == profile_count::zero ())
1614 : 16876 : continue;
1615 : : /* Select the hottest edge using the edge count, if it is non-zero,
1616 : : then fallback to the edge probability. */
1617 : 4393 : if (highest_count.initialized_p ())
1618 : : {
1619 : 4393 : if (!(e->count () >= highest_count))
1620 : 0 : continue;
1621 : : }
1622 : 0 : else if (!(e->probability >= highest_probability))
1623 : 0 : continue;
1624 : :
1625 : 4393 : basic_block reach_bb = walk_up ? e->src : e->dest;
1626 : :
1627 : : /* We have a hot bb with an immediate dominator that is cold.
1628 : : The dominator needs to be re-marked hot. */
1629 : 4393 : BB_SET_PARTITION (reach_bb, BB_HOT_PARTITION);
1630 : 4393 : if (dump_file)
1631 : 0 : fprintf (dump_file, "Promoting bb %i to hot partition to sanitize "
1632 : : "profile of bb %i in %s walk\n", reach_bb->index,
1633 : : bb->index, walk_up ? "backward" : "forward");
1634 : 4393 : cold_bb_count--;
1635 : :
1636 : : /* Now we need to examine newly-hot reach_bb to see if it is also
1637 : : dominated by a cold bb. */
1638 : 4393 : bbs_in_hot_partition->safe_push (reach_bb);
1639 : 4393 : hot_bbs_to_check.safe_push (reach_bb);
1640 : : }
1641 : : }
1642 : 120544 : hot_bbs_to_check.release ();
1643 : :
1644 : 120544 : return cold_bb_count;
1645 : : }
1646 : :
1647 : :
1648 : : /* Find the basic blocks that are rarely executed and need to be moved to
1649 : : a separate section of the .o file (to cut down on paging and improve
1650 : : cache locality). Return a vector of all edges that cross. */
1651 : :
1652 : : static vec<edge>
1653 : 255685 : find_rarely_executed_basic_blocks_and_crossing_edges (void)
1654 : : {
1655 : 255685 : vec<edge> crossing_edges = vNULL;
1656 : 255685 : basic_block bb;
1657 : 255685 : edge e;
1658 : 255685 : edge_iterator ei;
1659 : 255685 : unsigned int cold_bb_count = 0;
1660 : 255685 : auto_vec<basic_block> bbs_in_hot_partition;
1661 : :
1662 : 255685 : propagate_unlikely_bbs_forward ();
1663 : :
1664 : : /* Mark which partition (hot/cold) each basic block belongs in. */
1665 : 4483471 : FOR_EACH_BB_FN (bb, cfun)
1666 : : {
1667 : 4227786 : bool cold_bb = false;
1668 : :
1669 : 4227786 : if (probably_never_executed_bb_p (cfun, bb))
1670 : : {
1671 : 254473 : cold_bb = true;
1672 : :
1673 : : /* Handle profile insanities created by upstream optimizations
1674 : : by also checking the incoming edge weights. If there is a non-cold
1675 : : incoming edge, conservatively prevent this block from being split
1676 : : into the cold section. */
1677 : 254473 : if (!bb->count.precise_p ())
1678 : 136 : FOR_EACH_EDGE (e, ei, bb->preds)
1679 : 76 : if (!probably_never_executed_edge_p (cfun, e))
1680 : : {
1681 : : cold_bb = false;
1682 : : break;
1683 : : }
1684 : : }
1685 : 60 : if (cold_bb)
1686 : : {
1687 : 254473 : BB_SET_PARTITION (bb, BB_COLD_PARTITION);
1688 : 254473 : cold_bb_count++;
1689 : : }
1690 : : else
1691 : : {
1692 : 3973313 : BB_SET_PARTITION (bb, BB_HOT_PARTITION);
1693 : 3973313 : bbs_in_hot_partition.safe_push (bb);
1694 : : }
1695 : : }
1696 : :
1697 : : /* Ensure that hot bbs are included along a hot path from the entry to exit.
1698 : : Several different possibilities may include cold bbs along all paths
1699 : : to/from a hot bb. One is that there are edge weight insanities
1700 : : due to optimization phases that do not properly update basic block profile
1701 : : counts. The second is that the entry of the function may not be hot, because
1702 : : it is entered fewer times than the number of profile training runs, but there
1703 : : is a loop inside the function that causes blocks within the function to be
1704 : : above the threshold for hotness. This is fixed by walking up from hot bbs
1705 : : to the entry block, and then down from hot bbs to the exit, performing
1706 : : partitioning fixups as necessary. */
1707 : 255685 : if (cold_bb_count)
1708 : : {
1709 : 60272 : mark_dfs_back_edges ();
1710 : 60272 : cold_bb_count = sanitize_hot_paths (true, cold_bb_count,
1711 : : &bbs_in_hot_partition);
1712 : 60272 : if (cold_bb_count)
1713 : 60272 : sanitize_hot_paths (false, cold_bb_count, &bbs_in_hot_partition);
1714 : :
1715 : 60272 : hash_set <basic_block> set;
1716 : 60272 : find_bbs_reachable_by_hot_paths (&set);
1717 : 1484938 : FOR_EACH_BB_FN (bb, cfun)
1718 : 1424666 : if (!set.contains (bb))
1719 : 250080 : BB_SET_PARTITION (bb, BB_COLD_PARTITION);
1720 : 60272 : }
1721 : :
1722 : : /* The format of .gcc_except_table does not allow landing pads to
1723 : : be in a different partition as the throw. Fix this by either
1724 : : moving the landing pads or inserting forwarder landing pads. */
1725 : 255685 : if (cfun->eh->lp_array)
1726 : : {
1727 : 255685 : const bool sjlj
1728 : 255685 : = (targetm_common.except_unwind_info (&global_options) == UI_SJLJ);
1729 : 255685 : unsigned i;
1730 : 255685 : eh_landing_pad lp;
1731 : :
1732 : 819453 : FOR_EACH_VEC_ELT (*cfun->eh->lp_array, i, lp)
1733 : : {
1734 : 563768 : bool all_same, all_diff;
1735 : :
1736 : 563768 : if (lp == NULL
1737 : 76651 : || lp->landing_pad == NULL_RTX
1738 : 76651 : || !LABEL_P (lp->landing_pad))
1739 : 487157 : continue;
1740 : :
1741 : 76611 : all_same = all_diff = true;
1742 : 76611 : bb = BLOCK_FOR_INSN (lp->landing_pad);
1743 : 254061 : FOR_EACH_EDGE (e, ei, bb->preds)
1744 : : {
1745 : 177450 : gcc_assert (e->flags & EDGE_EH);
1746 : 177450 : if (BB_PARTITION (bb) == BB_PARTITION (e->src))
1747 : : all_diff = false;
1748 : : else
1749 : 131189 : all_same = false;
1750 : : }
1751 : :
1752 : 76611 : if (all_same)
1753 : : ;
1754 : 60885 : else if (all_diff)
1755 : : {
1756 : 56275 : int which = BB_PARTITION (bb);
1757 : 56275 : which ^= BB_HOT_PARTITION | BB_COLD_PARTITION;
1758 : 56275 : BB_SET_PARTITION (bb, which);
1759 : : }
1760 : 4610 : else if (sjlj)
1761 : 0 : sjlj_fix_up_crossing_landing_pad (bb);
1762 : : else
1763 : 4610 : dw2_fix_up_crossing_landing_pad (lp, bb);
1764 : :
1765 : : /* There is a single, common landing pad in SJLJ mode. */
1766 : 76611 : if (sjlj)
1767 : : break;
1768 : : }
1769 : : }
1770 : :
1771 : : /* Mark every edge that crosses between sections. */
1772 : 4492691 : FOR_EACH_BB_FN (bb, cfun)
1773 : 10684450 : FOR_EACH_EDGE (e, ei, bb->succs)
1774 : : {
1775 : 6447444 : unsigned int flags = e->flags;
1776 : :
1777 : : /* We should never have EDGE_CROSSING set yet. */
1778 : 6447444 : gcc_checking_assert ((flags & EDGE_CROSSING) == 0);
1779 : :
1780 : 6447444 : if (e->src != ENTRY_BLOCK_PTR_FOR_FN (cfun)
1781 : 6447444 : && e->dest != EXIT_BLOCK_PTR_FOR_FN (cfun)
1782 : 6157710 : && BB_PARTITION (e->src) != BB_PARTITION (e->dest))
1783 : : {
1784 : 407914 : crossing_edges.safe_push (e);
1785 : 407914 : flags |= EDGE_CROSSING;
1786 : : }
1787 : :
1788 : : /* Now that we've split eh edges as appropriate, allow landing pads
1789 : : to be merged with the post-landing pads. */
1790 : 6447444 : flags &= ~EDGE_PRESERVE;
1791 : :
1792 : 6447444 : e->flags = flags;
1793 : : }
1794 : :
1795 : 255685 : return crossing_edges;
1796 : 255685 : }
1797 : :
1798 : : /* Set the flag EDGE_CAN_FALLTHRU for edges that can be fallthru. */
1799 : :
1800 : : static void
1801 : 612723 : set_edge_can_fallthru_flag (void)
1802 : : {
1803 : 612723 : basic_block bb;
1804 : :
1805 : 10160952 : FOR_EACH_BB_FN (bb, cfun)
1806 : : {
1807 : 9548229 : edge e;
1808 : 9548229 : edge_iterator ei;
1809 : :
1810 : 23866280 : FOR_EACH_EDGE (e, ei, bb->succs)
1811 : : {
1812 : 14318051 : e->flags &= ~EDGE_CAN_FALLTHRU;
1813 : :
1814 : : /* The FALLTHRU edge is also CAN_FALLTHRU edge. */
1815 : 14318051 : if (e->flags & EDGE_FALLTHRU)
1816 : 8232391 : e->flags |= EDGE_CAN_FALLTHRU;
1817 : : }
1818 : :
1819 : : /* If the BB ends with an invertible condjump all (2) edges are
1820 : : CAN_FALLTHRU edges. */
1821 : 9548229 : if (EDGE_COUNT (bb->succs) != 2)
1822 : 4908925 : continue;
1823 : 5177021 : if (!any_condjump_p (BB_END (bb)))
1824 : 537717 : continue;
1825 : :
1826 : 4639304 : rtx_jump_insn *bb_end_jump = as_a <rtx_jump_insn *> (BB_END (bb));
1827 : 4639304 : if (!invert_jump (bb_end_jump, JUMP_LABEL (bb_end_jump), 0))
1828 : 0 : continue;
1829 : 4639304 : invert_jump (bb_end_jump, JUMP_LABEL (bb_end_jump), 0);
1830 : 4639304 : EDGE_SUCC (bb, 0)->flags |= EDGE_CAN_FALLTHRU;
1831 : 4639304 : EDGE_SUCC (bb, 1)->flags |= EDGE_CAN_FALLTHRU;
1832 : : }
1833 : 612723 : }
1834 : :
1835 : : /* If any destination of a crossing edge does not have a label, add label;
1836 : : Convert any easy fall-through crossing edges to unconditional jumps. */
1837 : :
1838 : : static void
1839 : 60093 : add_labels_and_missing_jumps (vec<edge> crossing_edges)
1840 : : {
1841 : 60093 : size_t i;
1842 : 60093 : edge e;
1843 : :
1844 : 468007 : FOR_EACH_VEC_ELT (crossing_edges, i, e)
1845 : : {
1846 : 407914 : basic_block src = e->src;
1847 : 407914 : basic_block dest = e->dest;
1848 : 407914 : rtx_jump_insn *new_jump;
1849 : :
1850 : 407914 : if (dest == EXIT_BLOCK_PTR_FOR_FN (cfun))
1851 : 0 : continue;
1852 : :
1853 : : /* Make sure dest has a label. */
1854 : 407914 : rtx_code_label *label = block_label (dest);
1855 : :
1856 : : /* Nothing to do for non-fallthru edges. */
1857 : 407914 : if (src == ENTRY_BLOCK_PTR_FOR_FN (cfun))
1858 : 0 : continue;
1859 : 407914 : if ((e->flags & EDGE_FALLTHRU) == 0)
1860 : 242163 : continue;
1861 : :
1862 : : /* If the block does not end with a control flow insn, then we
1863 : : can trivially add a jump to the end to fixup the crossing.
1864 : : Otherwise the jump will have to go in a new bb, which will
1865 : : be handled by fix_up_fall_thru_edges function. */
1866 : 165751 : if (control_flow_insn_p (BB_END (src)))
1867 : 105248 : continue;
1868 : :
1869 : : /* Make sure there's only one successor. */
1870 : 60503 : gcc_assert (single_succ_p (src));
1871 : :
1872 : 60503 : new_jump = emit_jump_insn_after (targetm.gen_jump (label), BB_END (src));
1873 : 60503 : BB_END (src) = new_jump;
1874 : 60503 : JUMP_LABEL (new_jump) = label;
1875 : 60503 : LABEL_NUSES (label) += 1;
1876 : :
1877 : 60503 : emit_barrier_after_bb (src);
1878 : :
1879 : : /* Mark edge as non-fallthru. */
1880 : 60503 : e->flags &= ~EDGE_FALLTHRU;
1881 : : }
1882 : 60093 : }
1883 : :
1884 : : /* Find any bb's where the fall-through edge is a crossing edge (note that
1885 : : these bb's must also contain a conditional jump or end with a call
1886 : : instruction; we've already dealt with fall-through edges for blocks
1887 : : that didn't have a conditional jump or didn't end with call instruction
1888 : : in the call to add_labels_and_missing_jumps). Convert the fall-through
1889 : : edge to non-crossing edge by inserting a new bb to fall-through into.
1890 : : The new bb will contain an unconditional jump (crossing edge) to the
1891 : : original fall through destination. */
1892 : :
1893 : : static void
1894 : 60093 : fix_up_fall_thru_edges (void)
1895 : : {
1896 : 60093 : basic_block cur_bb;
1897 : :
1898 : 1489747 : FOR_EACH_BB_FN (cur_bb, cfun)
1899 : : {
1900 : 1429654 : edge succ1;
1901 : 1429654 : edge succ2;
1902 : 1429654 : edge fall_thru = NULL;
1903 : 1429654 : edge cond_jump = NULL;
1904 : :
1905 : 1429654 : fall_thru = NULL;
1906 : 1429654 : if (EDGE_COUNT (cur_bb->succs) > 0)
1907 : 1338711 : succ1 = EDGE_SUCC (cur_bb, 0);
1908 : : else
1909 : : succ1 = NULL;
1910 : :
1911 : 1429654 : if (EDGE_COUNT (cur_bb->succs) > 1)
1912 : 853672 : succ2 = EDGE_SUCC (cur_bb, 1);
1913 : : else
1914 : : succ2 = NULL;
1915 : :
1916 : : /* Find the fall-through edge. */
1917 : :
1918 : 1429654 : if (succ1
1919 : 1338711 : && (succ1->flags & EDGE_FALLTHRU))
1920 : : {
1921 : : fall_thru = succ1;
1922 : : cond_jump = succ2;
1923 : : }
1924 : 743050 : else if (succ2
1925 : 567203 : && (succ2->flags & EDGE_FALLTHRU))
1926 : : {
1927 : : fall_thru = succ2;
1928 : : cond_jump = succ1;
1929 : : }
1930 : 1430965 : else if (succ2 && EDGE_COUNT (cur_bb->succs) > 2)
1931 : 1311 : fall_thru = find_fallthru_edge (cur_bb->succs);
1932 : :
1933 : 1253773 : if (fall_thru && (fall_thru->dest != EXIT_BLOCK_PTR_FOR_FN (cfun)))
1934 : : {
1935 : : /* Check to see if the fall-thru edge is a crossing edge. */
1936 : :
1937 : 1197201 : if (fall_thru->flags & EDGE_CROSSING)
1938 : : {
1939 : : /* The fall_thru edge crosses; now check the cond jump edge, if
1940 : : it exists. */
1941 : :
1942 : 105248 : bool cond_jump_crosses = true;
1943 : 105248 : int invert_worked = 0;
1944 : 105248 : rtx_insn *old_jump = BB_END (cur_bb);
1945 : :
1946 : : /* Find the jump instruction, if there is one. */
1947 : :
1948 : 105248 : if (cond_jump)
1949 : : {
1950 : 105098 : if (!(cond_jump->flags & EDGE_CROSSING))
1951 : 104706 : cond_jump_crosses = false;
1952 : :
1953 : : /* We know the fall-thru edge crosses; if the cond
1954 : : jump edge does NOT cross, and its destination is the
1955 : : next block in the bb order, invert the jump
1956 : : (i.e. fix it so the fall through does not cross and
1957 : : the cond jump does). */
1958 : :
1959 : 104706 : if (!cond_jump_crosses)
1960 : : {
1961 : : /* Find label in fall_thru block. We've already added
1962 : : any missing labels, so there must be one. */
1963 : :
1964 : 104706 : rtx_code_label *fall_thru_label
1965 : 104706 : = block_label (fall_thru->dest);
1966 : :
1967 : 104706 : if (old_jump && fall_thru_label)
1968 : : {
1969 : 104706 : rtx_jump_insn *old_jump_insn
1970 : 207890 : = dyn_cast <rtx_jump_insn *> (old_jump);
1971 : 102642 : if (old_jump_insn)
1972 : 102642 : invert_worked = invert_jump (old_jump_insn,
1973 : : fall_thru_label, 0);
1974 : : }
1975 : :
1976 : 102642 : if (invert_worked)
1977 : : {
1978 : 102632 : fall_thru->flags &= ~EDGE_FALLTHRU;
1979 : 102632 : cond_jump->flags |= EDGE_FALLTHRU;
1980 : 102632 : update_br_prob_note (cur_bb);
1981 : 102632 : std::swap (fall_thru, cond_jump);
1982 : 102632 : cond_jump->flags |= EDGE_CROSSING;
1983 : 102632 : fall_thru->flags &= ~EDGE_CROSSING;
1984 : : }
1985 : : }
1986 : : }
1987 : :
1988 : 105248 : if (cond_jump_crosses || !invert_worked)
1989 : : {
1990 : : /* This is the case where both edges out of the basic
1991 : : block are crossing edges. Here we will fix up the
1992 : : fall through edge. The jump edge will be taken care
1993 : : of later. The EDGE_CROSSING flag of fall_thru edge
1994 : : is unset before the call to force_nonfallthru
1995 : : function because if a new basic-block is created
1996 : : this edge remains in the current section boundary
1997 : : while the edge between new_bb and the fall_thru->dest
1998 : : becomes EDGE_CROSSING. */
1999 : :
2000 : 2616 : fall_thru->flags &= ~EDGE_CROSSING;
2001 : 2616 : unsigned old_count = EDGE_COUNT (cur_bb->succs);
2002 : 2616 : basic_block new_bb = force_nonfallthru (fall_thru);
2003 : :
2004 : 2616 : if (new_bb)
2005 : : {
2006 : 2616 : new_bb->aux = cur_bb->aux;
2007 : 2616 : cur_bb->aux = new_bb;
2008 : :
2009 : : /* This is done by force_nonfallthru_and_redirect. */
2010 : 2616 : gcc_assert (BB_PARTITION (new_bb)
2011 : : == BB_PARTITION (cur_bb));
2012 : :
2013 : 2616 : edge e = single_succ_edge (new_bb);
2014 : 2616 : e->flags |= EDGE_CROSSING;
2015 : 2616 : if (EDGE_COUNT (cur_bb->succs) > old_count)
2016 : : {
2017 : : /* If asm goto has a crossing fallthrough edge
2018 : : and at least one of the labels to the same bb,
2019 : : force_nonfallthru can result in the fallthrough
2020 : : edge being redirected and a new edge added for the
2021 : : label or more labels to e->dest. As we've
2022 : : temporarily cleared EDGE_CROSSING flag on the
2023 : : fallthrough edge, we need to restore it again.
2024 : : See PR108596. */
2025 : 3 : rtx_insn *j = BB_END (cur_bb);
2026 : 3 : gcc_checking_assert (JUMP_P (j)
2027 : : && (asm_noperands (PATTERN (j))
2028 : : > 0));
2029 : 3 : edge e2 = find_edge (cur_bb, e->dest);
2030 : 3 : if (e2)
2031 : 3 : e2->flags |= EDGE_CROSSING;
2032 : : }
2033 : : }
2034 : : else
2035 : : {
2036 : : /* If a new basic-block was not created; restore
2037 : : the EDGE_CROSSING flag. */
2038 : 0 : fall_thru->flags |= EDGE_CROSSING;
2039 : : }
2040 : :
2041 : : /* Add barrier after new jump */
2042 : 2616 : emit_barrier_after_bb (new_bb ? new_bb : cur_bb);
2043 : : }
2044 : : }
2045 : : }
2046 : : }
2047 : 60093 : }
2048 : :
2049 : : /* This function checks the destination block of a "crossing jump" to
2050 : : see if it has any crossing predecessors that begin with a code label
2051 : : and end with an unconditional jump. If so, it returns that predecessor
2052 : : block. (This is to avoid creating lots of new basic blocks that all
2053 : : contain unconditional jumps to the same destination). */
2054 : :
2055 : : static basic_block
2056 : 0 : find_jump_block (basic_block jump_dest)
2057 : : {
2058 : 0 : basic_block source_bb = NULL;
2059 : 0 : edge e;
2060 : 0 : rtx_insn *insn;
2061 : 0 : edge_iterator ei;
2062 : :
2063 : 0 : FOR_EACH_EDGE (e, ei, jump_dest->preds)
2064 : 0 : if (e->flags & EDGE_CROSSING)
2065 : : {
2066 : 0 : basic_block src = e->src;
2067 : :
2068 : : /* Check each predecessor to see if it has a label, and contains
2069 : : only one executable instruction, which is an unconditional jump.
2070 : : If so, we can use it. */
2071 : :
2072 : 0 : if (LABEL_P (BB_HEAD (src)))
2073 : 0 : for (insn = BB_HEAD (src);
2074 : 0 : !INSN_P (insn) && insn != NEXT_INSN (BB_END (src));
2075 : 0 : insn = NEXT_INSN (insn))
2076 : : {
2077 : 0 : if (INSN_P (insn)
2078 : : && insn == BB_END (src)
2079 : : && JUMP_P (insn)
2080 : : && !any_condjump_p (insn))
2081 : : {
2082 : : source_bb = src;
2083 : : break;
2084 : : }
2085 : : }
2086 : :
2087 : : if (source_bb)
2088 : : break;
2089 : : }
2090 : :
2091 : 0 : return source_bb;
2092 : : }
2093 : :
2094 : : /* Find all BB's with conditional jumps that are crossing edges;
2095 : : insert a new bb and make the conditional jump branch to the new
2096 : : bb instead (make the new bb same color so conditional branch won't
2097 : : be a 'crossing' edge). Insert an unconditional jump from the
2098 : : new bb to the original destination of the conditional jump. */
2099 : :
2100 : : static void
2101 : 0 : fix_crossing_conditional_branches (void)
2102 : : {
2103 : 0 : basic_block cur_bb;
2104 : 0 : basic_block new_bb;
2105 : 0 : basic_block dest;
2106 : 0 : edge succ1;
2107 : 0 : edge succ2;
2108 : 0 : edge crossing_edge;
2109 : 0 : edge new_edge;
2110 : 0 : rtx set_src;
2111 : 0 : rtx old_label = NULL_RTX;
2112 : 0 : rtx_code_label *new_label;
2113 : :
2114 : 0 : FOR_EACH_BB_FN (cur_bb, cfun)
2115 : : {
2116 : 0 : crossing_edge = NULL;
2117 : 0 : if (EDGE_COUNT (cur_bb->succs) > 0)
2118 : 0 : succ1 = EDGE_SUCC (cur_bb, 0);
2119 : : else
2120 : : succ1 = NULL;
2121 : :
2122 : 0 : if (EDGE_COUNT (cur_bb->succs) > 1)
2123 : 0 : succ2 = EDGE_SUCC (cur_bb, 1);
2124 : : else
2125 : : succ2 = NULL;
2126 : :
2127 : : /* We already took care of fall-through edges, so only one successor
2128 : : can be a crossing edge. */
2129 : :
2130 : 0 : if (succ1 && (succ1->flags & EDGE_CROSSING))
2131 : : crossing_edge = succ1;
2132 : 0 : else if (succ2 && (succ2->flags & EDGE_CROSSING))
2133 : : crossing_edge = succ2;
2134 : :
2135 : : if (crossing_edge)
2136 : : {
2137 : 0 : rtx_insn *old_jump = BB_END (cur_bb);
2138 : :
2139 : : /* Check to make sure the jump instruction is a
2140 : : conditional jump. */
2141 : :
2142 : 0 : set_src = NULL_RTX;
2143 : :
2144 : 0 : if (any_condjump_p (old_jump))
2145 : : {
2146 : 0 : if (GET_CODE (PATTERN (old_jump)) == SET)
2147 : 0 : set_src = SET_SRC (PATTERN (old_jump));
2148 : 0 : else if (GET_CODE (PATTERN (old_jump)) == PARALLEL)
2149 : : {
2150 : 0 : set_src = XVECEXP (PATTERN (old_jump), 0,0);
2151 : 0 : if (GET_CODE (set_src) == SET)
2152 : 0 : set_src = SET_SRC (set_src);
2153 : : else
2154 : : set_src = NULL_RTX;
2155 : : }
2156 : : }
2157 : :
2158 : 0 : if (set_src && (GET_CODE (set_src) == IF_THEN_ELSE))
2159 : : {
2160 : 0 : rtx_jump_insn *old_jump_insn =
2161 : 0 : as_a <rtx_jump_insn *> (old_jump);
2162 : :
2163 : 0 : if (GET_CODE (XEXP (set_src, 1)) == PC)
2164 : 0 : old_label = XEXP (set_src, 2);
2165 : 0 : else if (GET_CODE (XEXP (set_src, 2)) == PC)
2166 : 0 : old_label = XEXP (set_src, 1);
2167 : :
2168 : : /* Check to see if new bb for jumping to that dest has
2169 : : already been created; if so, use it; if not, create
2170 : : a new one. */
2171 : :
2172 : 0 : new_bb = find_jump_block (crossing_edge->dest);
2173 : :
2174 : 0 : if (new_bb)
2175 : : new_label = block_label (new_bb);
2176 : : else
2177 : : {
2178 : 0 : basic_block last_bb;
2179 : 0 : rtx_code_label *old_jump_target;
2180 : 0 : rtx_jump_insn *new_jump;
2181 : :
2182 : : /* Create new basic block to be dest for
2183 : : conditional jump. */
2184 : :
2185 : : /* Put appropriate instructions in new bb. */
2186 : :
2187 : 0 : new_label = gen_label_rtx ();
2188 : 0 : emit_label (new_label);
2189 : :
2190 : 0 : gcc_assert (GET_CODE (old_label) == LABEL_REF);
2191 : 0 : old_jump_target = old_jump_insn->jump_target ();
2192 : 0 : new_jump = as_a <rtx_jump_insn *>
2193 : 0 : (emit_jump_insn (targetm.gen_jump (old_jump_target)));
2194 : 0 : new_jump->set_jump_target (old_jump_target);
2195 : :
2196 : 0 : last_bb = EXIT_BLOCK_PTR_FOR_FN (cfun)->prev_bb;
2197 : 0 : new_bb = create_basic_block (new_label, new_jump, last_bb);
2198 : 0 : new_bb->aux = last_bb->aux;
2199 : 0 : last_bb->aux = new_bb;
2200 : :
2201 : 0 : emit_barrier_after_bb (new_bb);
2202 : :
2203 : : /* Make sure new bb is in same partition as source
2204 : : of conditional branch. */
2205 : 0 : BB_COPY_PARTITION (new_bb, cur_bb);
2206 : : }
2207 : :
2208 : : /* Make old jump branch to new bb. */
2209 : :
2210 : 0 : redirect_jump (old_jump_insn, new_label, 0);
2211 : :
2212 : : /* Remove crossing_edge as predecessor of 'dest'. */
2213 : :
2214 : 0 : dest = crossing_edge->dest;
2215 : :
2216 : 0 : redirect_edge_succ (crossing_edge, new_bb);
2217 : :
2218 : : /* Make a new edge from new_bb to old dest; new edge
2219 : : will be a successor for new_bb and a predecessor
2220 : : for 'dest'. */
2221 : :
2222 : 0 : if (EDGE_COUNT (new_bb->succs) == 0)
2223 : 0 : new_edge = make_single_succ_edge (new_bb, dest, 0);
2224 : : else
2225 : 0 : new_edge = EDGE_SUCC (new_bb, 0);
2226 : :
2227 : 0 : crossing_edge->flags &= ~EDGE_CROSSING;
2228 : 0 : new_edge->flags |= EDGE_CROSSING;
2229 : : }
2230 : : }
2231 : : }
2232 : 0 : }
2233 : :
2234 : : /* Find any unconditional branches that cross between hot and cold
2235 : : sections. Convert them into indirect jumps instead. */
2236 : :
2237 : : static void
2238 : 0 : fix_crossing_unconditional_branches (void)
2239 : : {
2240 : 0 : basic_block cur_bb;
2241 : 0 : rtx_insn *last_insn;
2242 : 0 : rtx label;
2243 : 0 : rtx label_addr;
2244 : 0 : rtx_insn *indirect_jump_sequence;
2245 : 0 : rtx_insn *jump_insn = NULL;
2246 : 0 : rtx new_reg;
2247 : 0 : rtx_insn *cur_insn;
2248 : 0 : edge succ;
2249 : :
2250 : 0 : FOR_EACH_BB_FN (cur_bb, cfun)
2251 : : {
2252 : 0 : last_insn = BB_END (cur_bb);
2253 : :
2254 : 0 : if (EDGE_COUNT (cur_bb->succs) < 1)
2255 : 0 : continue;
2256 : :
2257 : 0 : succ = EDGE_SUCC (cur_bb, 0);
2258 : :
2259 : : /* Check to see if bb ends in a crossing (unconditional) jump. At
2260 : : this point, no crossing jumps should be conditional. */
2261 : :
2262 : 0 : if (JUMP_P (last_insn)
2263 : 0 : && (succ->flags & EDGE_CROSSING))
2264 : : {
2265 : 0 : gcc_assert (!any_condjump_p (last_insn));
2266 : :
2267 : : /* Make sure the jump is not already an indirect or table jump. */
2268 : :
2269 : 0 : if (!computed_jump_p (last_insn)
2270 : 0 : && !tablejump_p (last_insn, NULL, NULL)
2271 : 0 : && asm_noperands (PATTERN (last_insn)) < 0)
2272 : : {
2273 : : /* We have found a "crossing" unconditional branch. Now
2274 : : we must convert it to an indirect jump. First create
2275 : : reference of label, as target for jump. */
2276 : :
2277 : 0 : label = JUMP_LABEL (last_insn);
2278 : 0 : label_addr = gen_rtx_LABEL_REF (Pmode, label);
2279 : 0 : LABEL_NUSES (label) += 1;
2280 : :
2281 : : /* Get a register to use for the indirect jump. */
2282 : :
2283 : 0 : new_reg = gen_reg_rtx (Pmode);
2284 : :
2285 : : /* Generate indirect the jump sequence. */
2286 : :
2287 : 0 : start_sequence ();
2288 : 0 : emit_move_insn (new_reg, label_addr);
2289 : 0 : emit_indirect_jump (new_reg);
2290 : 0 : indirect_jump_sequence = get_insns ();
2291 : 0 : end_sequence ();
2292 : :
2293 : : /* Make sure every instruction in the new jump sequence has
2294 : : its basic block set to be cur_bb. */
2295 : :
2296 : 0 : for (cur_insn = indirect_jump_sequence; cur_insn;
2297 : 0 : cur_insn = NEXT_INSN (cur_insn))
2298 : : {
2299 : 0 : if (!BARRIER_P (cur_insn))
2300 : 0 : BLOCK_FOR_INSN (cur_insn) = cur_bb;
2301 : 0 : if (JUMP_P (cur_insn))
2302 : 0 : jump_insn = cur_insn;
2303 : : }
2304 : :
2305 : : /* Insert the new (indirect) jump sequence immediately before
2306 : : the unconditional jump, then delete the unconditional jump. */
2307 : :
2308 : 0 : emit_insn_before (indirect_jump_sequence, last_insn);
2309 : 0 : delete_insn (last_insn);
2310 : :
2311 : 0 : JUMP_LABEL (jump_insn) = label;
2312 : 0 : LABEL_NUSES (label)++;
2313 : :
2314 : : /* Make BB_END for cur_bb be the jump instruction (NOT the
2315 : : barrier instruction at the end of the sequence...). */
2316 : :
2317 : 0 : BB_END (cur_bb) = jump_insn;
2318 : : }
2319 : : }
2320 : : }
2321 : 0 : }
2322 : :
2323 : : /* Update CROSSING_JUMP_P flags on all jump insns. */
2324 : :
2325 : : static void
2326 : 60093 : update_crossing_jump_flags (void)
2327 : : {
2328 : 60093 : basic_block bb;
2329 : 60093 : edge e;
2330 : 60093 : edge_iterator ei;
2331 : :
2332 : 1489747 : FOR_EACH_BB_FN (bb, cfun)
2333 : 2943824 : FOR_EACH_EDGE (e, ei, bb->succs)
2334 : 1921570 : if (e->flags & EDGE_CROSSING)
2335 : : {
2336 : 407400 : if (JUMP_P (BB_END (bb)))
2337 : 407265 : CROSSING_JUMP_P (BB_END (bb)) = 1;
2338 : : break;
2339 : : }
2340 : 60093 : }
2341 : :
2342 : : /* Reorder basic blocks using the software trace cache (STC) algorithm. */
2343 : :
2344 : : static void
2345 : 579469 : reorder_basic_blocks_software_trace_cache (void)
2346 : : {
2347 : 579469 : if (dump_file)
2348 : 23 : fprintf (dump_file, "\nReordering with the STC algorithm.\n\n");
2349 : :
2350 : 579469 : int n_traces;
2351 : 579469 : int i;
2352 : 579469 : struct trace *traces;
2353 : :
2354 : : /* We are estimating the length of uncond jump insn only once since the code
2355 : : for getting the insn length always returns the minimal length now. */
2356 : 579469 : if (uncond_jump_length == 0)
2357 : 8856 : uncond_jump_length = get_uncond_jump_length ();
2358 : :
2359 : : /* We need to know some information for each basic block. */
2360 : 579469 : array_size = GET_ARRAY_SIZE (last_basic_block_for_fn (cfun));
2361 : 579469 : bbd = XNEWVEC (bbro_basic_block_data, array_size);
2362 : 15978589 : for (i = 0; i < array_size; i++)
2363 : : {
2364 : 15399120 : bbd[i].start_of_trace = -1;
2365 : 15399120 : bbd[i].end_of_trace = -1;
2366 : 15399120 : bbd[i].in_trace = -1;
2367 : 15399120 : bbd[i].visited = 0;
2368 : 15399120 : bbd[i].priority = -1;
2369 : 15399120 : bbd[i].heap = NULL;
2370 : 15399120 : bbd[i].node = NULL;
2371 : : }
2372 : :
2373 : 579469 : traces = XNEWVEC (struct trace, n_basic_blocks_for_fn (cfun));
2374 : 579469 : n_traces = 0;
2375 : 579469 : find_traces (&n_traces, traces);
2376 : 579469 : connect_traces (n_traces, traces);
2377 : 579469 : FREE (traces);
2378 : 579469 : FREE (bbd);
2379 : 579469 : }
2380 : :
2381 : : /* Order edges by execution frequency, higher first. */
2382 : :
2383 : : static int
2384 : 18860950 : edge_order (const void *ve1, const void *ve2)
2385 : : {
2386 : 18860950 : edge e1 = *(const edge *) ve1;
2387 : 18860950 : edge e2 = *(const edge *) ve2;
2388 : 18860950 : profile_count c1 = e1->count ();
2389 : 18860950 : profile_count c2 = e2->count ();
2390 : : /* Since profile_count::operator< does not establish a strict weak order
2391 : : in presence of uninitialized counts, use 'max': this makes them appear
2392 : : as if having execution frequency less than any initialized count. */
2393 : 18860950 : profile_count m = c1.max (c2);
2394 : 37721900 : return (m == c2) - (m == c1);
2395 : : }
2396 : :
2397 : : /* Reorder basic blocks using the "simple" algorithm. This tries to
2398 : : maximize the dynamic number of branches that are fallthrough, without
2399 : : copying instructions. The algorithm is greedy, looking at the most
2400 : : frequently executed branch first. */
2401 : :
2402 : : static void
2403 : 33254 : reorder_basic_blocks_simple (void)
2404 : : {
2405 : 33254 : if (dump_file)
2406 : 8 : fprintf (dump_file, "\nReordering with the \"simple\" algorithm.\n\n");
2407 : :
2408 : 33254 : edge *edges = new edge[2 * n_basic_blocks_for_fn (cfun)];
2409 : :
2410 : : /* First, collect all edges that can be optimized by reordering blocks:
2411 : : simple jumps and conditional jumps, as well as the function entry edge. */
2412 : :
2413 : 33254 : int n = 0;
2414 : 33254 : edges[n++] = EDGE_SUCC (ENTRY_BLOCK_PTR_FOR_FN (cfun), 0);
2415 : :
2416 : 33254 : basic_block bb;
2417 : 532323 : FOR_EACH_BB_FN (bb, cfun)
2418 : : {
2419 : 499069 : rtx_insn *end = BB_END (bb);
2420 : :
2421 : 499069 : if (computed_jump_p (end) || tablejump_p (end, NULL, NULL))
2422 : 346 : continue;
2423 : :
2424 : : /* We cannot optimize asm goto. */
2425 : 498723 : if (JUMP_P (end) && extract_asm_operands (end))
2426 : 0 : continue;
2427 : :
2428 : 498723 : if (single_succ_p (bb))
2429 : 147839 : edges[n++] = EDGE_SUCC (bb, 0);
2430 : 350884 : else if (any_condjump_p (end))
2431 : : {
2432 : 255677 : edge e0 = EDGE_SUCC (bb, 0);
2433 : 255677 : edge e1 = EDGE_SUCC (bb, 1);
2434 : : /* When optimizing for size it is best to keep the original
2435 : : fallthrough edges. */
2436 : 255677 : if (e1->flags & EDGE_FALLTHRU)
2437 : 137610 : std::swap (e0, e1);
2438 : 255677 : edges[n++] = e0;
2439 : 255677 : edges[n++] = e1;
2440 : : }
2441 : : }
2442 : :
2443 : : /* Sort the edges, the most desirable first. When optimizing for size
2444 : : all edges are equally desirable. */
2445 : :
2446 : 33254 : if (optimize_function_for_speed_p (cfun))
2447 : 33096 : gcc_stablesort (edges, n, sizeof *edges, edge_order);
2448 : :
2449 : : /* Now decide which of those edges to make fallthrough edges. We set
2450 : : BB_VISITED if a block already has a fallthrough successor assigned
2451 : : to it. We make ->AUX of an endpoint point to the opposite endpoint
2452 : : of a sequence of blocks that fall through, and ->AUX will be NULL
2453 : : for a block that is in such a sequence but not an endpoint anymore.
2454 : :
2455 : : To start with, everything points to itself, nothing is assigned yet. */
2456 : :
2457 : 598831 : FOR_ALL_BB_FN (bb, cfun)
2458 : : {
2459 : 565577 : bb->aux = bb;
2460 : 565577 : bb->flags &= ~BB_VISITED;
2461 : : }
2462 : :
2463 : 33254 : EXIT_BLOCK_PTR_FOR_FN (cfun)->aux = 0;
2464 : :
2465 : : /* Now for all edges, the most desirable first, see if that edge can
2466 : : connect two sequences. If it can, update AUX and BB_VISITED; if it
2467 : : cannot, zero out the edge in the table. */
2468 : :
2469 : 725701 : for (int j = 0; j < n; j++)
2470 : : {
2471 : 692447 : edge e = edges[j];
2472 : :
2473 : 692447 : basic_block tail_a = e->src;
2474 : 692447 : basic_block head_b = e->dest;
2475 : 692447 : basic_block head_a = (basic_block) tail_a->aux;
2476 : 692447 : basic_block tail_b = (basic_block) head_b->aux;
2477 : :
2478 : : /* An edge cannot connect two sequences if:
2479 : : - it crosses partitions;
2480 : : - its src is not a current endpoint;
2481 : : - its dest is not a current endpoint;
2482 : : - or, it would create a loop. */
2483 : :
2484 : 692447 : if (e->flags & EDGE_CROSSING
2485 : 692435 : || tail_a->flags & BB_VISITED
2486 : 472808 : || !tail_b
2487 : 406379 : || (!(head_b->flags & BB_VISITED) && head_b != tail_b)
2488 : 378656 : || tail_a == tail_b)
2489 : : {
2490 : 349535 : edges[j] = 0;
2491 : 349535 : continue;
2492 : : }
2493 : :
2494 : 342912 : tail_a->aux = 0;
2495 : 342912 : head_b->aux = 0;
2496 : 342912 : head_a->aux = tail_b;
2497 : 342912 : tail_b->aux = head_a;
2498 : 342912 : tail_a->flags |= BB_VISITED;
2499 : : }
2500 : :
2501 : : /* Put the pieces together, in the same order that the start blocks of
2502 : : the sequences already had. The hot/cold partitioning gives a little
2503 : : complication: as a first pass only do this for blocks in the same
2504 : : partition as the start block, and (if there is anything left to do)
2505 : : in a second pass handle the other partition. */
2506 : :
2507 : 33254 : basic_block last_tail = (basic_block) ENTRY_BLOCK_PTR_FOR_FN (cfun)->aux;
2508 : :
2509 : 33254 : int current_partition
2510 : 33254 : = BB_PARTITION (last_tail == ENTRY_BLOCK_PTR_FOR_FN (cfun)
2511 : : ? EDGE_SUCC (ENTRY_BLOCK_PTR_FOR_FN (cfun), 0)->dest
2512 : : : last_tail);
2513 : 33254 : bool need_another_pass = true;
2514 : :
2515 : 66515 : for (int pass = 0; pass < 2 && need_another_pass; pass++)
2516 : : {
2517 : 33261 : need_another_pass = false;
2518 : :
2519 : 532376 : FOR_EACH_BB_FN (bb, cfun)
2520 : 499115 : if ((bb->flags & BB_VISITED && bb->aux) || bb->aux == bb)
2521 : : {
2522 : 156172 : if (BB_PARTITION (bb) != current_partition)
2523 : : {
2524 : 15 : need_another_pass = true;
2525 : 15 : continue;
2526 : : }
2527 : :
2528 : 156157 : last_tail->aux = bb;
2529 : 156157 : last_tail = (basic_block) bb->aux;
2530 : : }
2531 : :
2532 : 33261 : current_partition ^= BB_HOT_PARTITION | BB_COLD_PARTITION;
2533 : : }
2534 : :
2535 : 33254 : last_tail->aux = 0;
2536 : :
2537 : : /* Finally, link all the chosen fallthrough edges. */
2538 : :
2539 : 725701 : for (int j = 0; j < n; j++)
2540 : 692447 : if (edges[j])
2541 : 342912 : edges[j]->src->aux = edges[j]->dest;
2542 : :
2543 : 33254 : delete[] edges;
2544 : :
2545 : : /* If the entry edge no longer falls through we have to make a new
2546 : : block so it can do so again. */
2547 : :
2548 : 33254 : edge e = EDGE_SUCC (ENTRY_BLOCK_PTR_FOR_FN (cfun), 0);
2549 : 33254 : if (e->dest != ENTRY_BLOCK_PTR_FOR_FN (cfun)->aux)
2550 : : {
2551 : 6 : force_nonfallthru (e);
2552 : 6 : e->src->aux = ENTRY_BLOCK_PTR_FOR_FN (cfun)->aux;
2553 : : }
2554 : 33254 : }
2555 : :
2556 : : /* Reorder basic blocks. The main entry point to this file. */
2557 : :
2558 : : static void
2559 : 1003193 : reorder_basic_blocks (void)
2560 : : {
2561 : 1003193 : gcc_assert (current_ir_type () == IR_RTL_CFGLAYOUT);
2562 : :
2563 : 1003193 : if (n_basic_blocks_for_fn (cfun) <= NUM_FIXED_BLOCKS + 1)
2564 : : return;
2565 : :
2566 : 612723 : set_edge_can_fallthru_flag ();
2567 : 612723 : mark_dfs_back_edges ();
2568 : :
2569 : 612723 : switch (flag_reorder_blocks_algorithm)
2570 : : {
2571 : 33254 : case REORDER_BLOCKS_ALGORITHM_SIMPLE:
2572 : 33254 : reorder_basic_blocks_simple ();
2573 : 33254 : break;
2574 : :
2575 : 579469 : case REORDER_BLOCKS_ALGORITHM_STC:
2576 : 579469 : reorder_basic_blocks_software_trace_cache ();
2577 : 579469 : break;
2578 : :
2579 : 0 : default:
2580 : 0 : gcc_unreachable ();
2581 : : }
2582 : :
2583 : 612723 : relink_block_chain (/*stay_in_cfglayout_mode=*/true);
2584 : :
2585 : 612723 : if (dump_file)
2586 : : {
2587 : 31 : if (dump_flags & TDF_DETAILS)
2588 : 0 : dump_reg_info (dump_file);
2589 : 31 : dump_flow_info (dump_file, dump_flags);
2590 : : }
2591 : :
2592 : : /* Signal that rtl_verify_flow_info_1 can now verify that there
2593 : : is at most one switch between hot/cold sections. */
2594 : 612723 : crtl->bb_reorder_complete = true;
2595 : : }
2596 : :
2597 : : /* Determine which partition the first basic block in the function
2598 : : belongs to, then find the first basic block in the current function
2599 : : that belongs to a different section, and insert a
2600 : : NOTE_INSN_SWITCH_TEXT_SECTIONS note immediately before it in the
2601 : : instruction stream. When writing out the assembly code,
2602 : : encountering this note will make the compiler switch between the
2603 : : hot and cold text sections. */
2604 : :
2605 : : void
2606 : 60093 : insert_section_boundary_note (void)
2607 : : {
2608 : 60093 : basic_block bb;
2609 : 60093 : bool switched_sections = false;
2610 : 60093 : int current_partition = 0;
2611 : :
2612 : 60093 : if (!crtl->has_bb_partition)
2613 : : return;
2614 : :
2615 : 1522566 : FOR_EACH_BB_FN (bb, cfun)
2616 : : {
2617 : 1462473 : if (!current_partition)
2618 : 60093 : current_partition = BB_PARTITION (bb);
2619 : 1462473 : if (BB_PARTITION (bb) != current_partition)
2620 : : {
2621 : 60087 : gcc_assert (!switched_sections);
2622 : 60087 : switched_sections = true;
2623 : 60087 : emit_note_before (NOTE_INSN_SWITCH_TEXT_SECTIONS, BB_HEAD (bb));
2624 : 60087 : current_partition = BB_PARTITION (bb);
2625 : : }
2626 : : }
2627 : :
2628 : : /* Make sure crtl->has_bb_partition matches reality even if bbpart finds
2629 : : some hot and some cold basic blocks, but later one of those kinds is
2630 : : optimized away. */
2631 : 60093 : crtl->has_bb_partition = switched_sections;
2632 : : }
2633 : :
2634 : : namespace {
2635 : :
2636 : : const pass_data pass_data_reorder_blocks =
2637 : : {
2638 : : RTL_PASS, /* type */
2639 : : "bbro", /* name */
2640 : : OPTGROUP_NONE, /* optinfo_flags */
2641 : : TV_REORDER_BLOCKS, /* tv_id */
2642 : : 0, /* properties_required */
2643 : : 0, /* properties_provided */
2644 : : 0, /* properties_destroyed */
2645 : : 0, /* todo_flags_start */
2646 : : 0, /* todo_flags_finish */
2647 : : };
2648 : :
2649 : : class pass_reorder_blocks : public rtl_opt_pass
2650 : : {
2651 : : public:
2652 : 282866 : pass_reorder_blocks (gcc::context *ctxt)
2653 : 565732 : : rtl_opt_pass (pass_data_reorder_blocks, ctxt)
2654 : : {}
2655 : :
2656 : : /* opt_pass methods: */
2657 : 1431630 : bool gate (function *) final override
2658 : : {
2659 : 1431630 : if (targetm.cannot_modify_jumps_p ())
2660 : : return false;
2661 : 1431630 : return (optimize > 0
2662 : 1431630 : && (flag_reorder_blocks || flag_reorder_blocks_and_partition));
2663 : : }
2664 : :
2665 : : unsigned int execute (function *) final override;
2666 : :
2667 : : }; // class pass_reorder_blocks
2668 : :
2669 : : unsigned int
2670 : 1003193 : pass_reorder_blocks::execute (function *fun)
2671 : : {
2672 : 1003193 : basic_block bb;
2673 : :
2674 : : /* Last attempt to optimize CFG, as scheduling, peepholing and insn
2675 : : splitting possibly introduced more crossjumping opportunities. */
2676 : 1003193 : cfg_layout_initialize (CLEANUP_EXPENSIVE);
2677 : :
2678 : 1003193 : reorder_basic_blocks ();
2679 : 1003193 : cleanup_cfg (CLEANUP_EXPENSIVE | CLEANUP_NO_PARTITIONING);
2680 : :
2681 : 10934350 : FOR_EACH_BB_FN (bb, fun)
2682 : 9931157 : if (bb->next_bb != EXIT_BLOCK_PTR_FOR_FN (fun))
2683 : 8927964 : bb->aux = bb->next_bb;
2684 : 1003193 : cfg_layout_finalize ();
2685 : :
2686 : 11100119 : FOR_EACH_BB_FN (bb, fun)
2687 : 10096926 : df_recompute_luids (bb);
2688 : 1003193 : return 0;
2689 : : }
2690 : :
2691 : : } // anon namespace
2692 : :
2693 : : rtl_opt_pass *
2694 : 282866 : make_pass_reorder_blocks (gcc::context *ctxt)
2695 : : {
2696 : 282866 : return new pass_reorder_blocks (ctxt);
2697 : : }
2698 : :
2699 : : /* Duplicate a block (that we already know ends in a computed jump) into its
2700 : : predecessors, where possible. Return whether anything is changed. */
2701 : : static bool
2702 : 872 : maybe_duplicate_computed_goto (basic_block bb, int max_size)
2703 : : {
2704 : : /* Make sure that the block is small enough. */
2705 : 872 : rtx_insn *insn;
2706 : 8521 : FOR_BB_INSNS (bb, insn)
2707 : 8298 : if (INSN_P (insn))
2708 : : {
2709 : 5819 : max_size -= get_attr_min_length (insn);
2710 : 5819 : if (max_size < 0)
2711 : : return false;
2712 : : }
2713 : :
2714 : 223 : bool changed = false;
2715 : 223 : edge e;
2716 : 223 : edge_iterator ei;
2717 : 738 : for (ei = ei_start (bb->preds); (e = ei_safe_edge (ei)); )
2718 : : {
2719 : 515 : basic_block pred = e->src;
2720 : :
2721 : : /* Do not duplicate BB into PRED if we cannot merge a copy of BB
2722 : : with PRED. */
2723 : 515 : if (!single_succ_p (pred)
2724 : 199 : || e->flags & EDGE_COMPLEX
2725 : 199 : || pred->index < NUM_FIXED_BLOCKS
2726 : 177 : || (JUMP_P (BB_END (pred)) && !simplejump_p (BB_END (pred)))
2727 : 692 : || (JUMP_P (BB_END (pred)) && CROSSING_JUMP_P (BB_END (pred))))
2728 : : {
2729 : 340 : ei_next (&ei);
2730 : 340 : continue;
2731 : : }
2732 : :
2733 : 175 : if (dump_file)
2734 : 0 : fprintf (dump_file, "Duplicating computed goto bb %d into bb %d\n",
2735 : 0 : bb->index, e->src->index);
2736 : :
2737 : : /* Remember if PRED can be duplicated; if so, the copy of BB merged
2738 : : with PRED can be duplicated as well. */
2739 : 175 : bool can_dup_more = can_duplicate_block_p (pred);
2740 : :
2741 : : /* Make a copy of BB, merge it into PRED. */
2742 : 175 : basic_block copy = duplicate_block (bb, e, NULL);
2743 : 175 : emit_barrier_after_bb (copy);
2744 : 175 : reorder_insns_nobb (BB_HEAD (copy), BB_END (copy), BB_END (pred));
2745 : 175 : merge_blocks (pred, copy);
2746 : :
2747 : 175 : changed = true;
2748 : :
2749 : : /* Try to merge the resulting merged PRED into further predecessors. */
2750 : 175 : if (can_dup_more)
2751 : 175 : maybe_duplicate_computed_goto (pred, max_size);
2752 : : }
2753 : :
2754 : : return changed;
2755 : : }
2756 : :
2757 : : /* Duplicate the blocks containing computed gotos. This basically unfactors
2758 : : computed gotos that were factored early on in the compilation process to
2759 : : speed up edge based data flow. We used to not unfactor them again, which
2760 : : can seriously pessimize code with many computed jumps in the source code,
2761 : : such as interpreters. See e.g. PR15242. */
2762 : : static void
2763 : 866644 : duplicate_computed_gotos (function *fun)
2764 : : {
2765 : : /* We are estimating the length of uncond jump insn only once
2766 : : since the code for getting the insn length always returns
2767 : : the minimal length now. */
2768 : 866644 : if (uncond_jump_length == 0)
2769 : 105383 : uncond_jump_length = get_uncond_jump_length ();
2770 : :
2771 : : /* Never copy a block larger than this. */
2772 : 866644 : int max_size
2773 : 866644 : = uncond_jump_length * param_max_goto_duplication_insns;
2774 : :
2775 : 866644 : bool changed = false;
2776 : :
2777 : : /* Try to duplicate all blocks that end in a computed jump and that
2778 : : can be duplicated at all. */
2779 : 866644 : basic_block bb;
2780 : 10544547 : FOR_EACH_BB_FN (bb, fun)
2781 : 9677903 : if (computed_jump_p (BB_END (bb)) && can_duplicate_block_p (bb))
2782 : 697 : changed |= maybe_duplicate_computed_goto (bb, max_size);
2783 : :
2784 : : /* Some blocks may have become unreachable. */
2785 : 866644 : if (changed)
2786 : 78 : cleanup_cfg (0);
2787 : :
2788 : : /* Duplicating blocks will redirect edges and may cause hot blocks
2789 : : previously reached by both hot and cold blocks to become dominated
2790 : : only by cold blocks. */
2791 : 78 : if (changed)
2792 : 78 : fixup_partitions ();
2793 : 866644 : }
2794 : :
2795 : : namespace {
2796 : :
2797 : : const pass_data pass_data_duplicate_computed_gotos =
2798 : : {
2799 : : RTL_PASS, /* type */
2800 : : "compgotos", /* name */
2801 : : OPTGROUP_NONE, /* optinfo_flags */
2802 : : TV_DUP_COMPGOTO, /* tv_id */
2803 : : 0, /* properties_required */
2804 : : 0, /* properties_provided */
2805 : : 0, /* properties_destroyed */
2806 : : 0, /* todo_flags_start */
2807 : : 0, /* todo_flags_finish */
2808 : : };
2809 : :
2810 : : class pass_duplicate_computed_gotos : public rtl_opt_pass
2811 : : {
2812 : : public:
2813 : 282866 : pass_duplicate_computed_gotos (gcc::context *ctxt)
2814 : 565732 : : rtl_opt_pass (pass_data_duplicate_computed_gotos, ctxt)
2815 : : {}
2816 : :
2817 : : /* opt_pass methods: */
2818 : : bool gate (function *) final override;
2819 : : unsigned int execute (function *) final override;
2820 : :
2821 : : }; // class pass_duplicate_computed_gotos
2822 : :
2823 : : bool
2824 : 1431630 : pass_duplicate_computed_gotos::gate (function *fun)
2825 : : {
2826 : 1431630 : if (targetm.cannot_modify_jumps_p ())
2827 : : return false;
2828 : 1431630 : return (optimize > 0
2829 : 1003195 : && flag_expensive_optimizations
2830 : 2359324 : && ! optimize_function_for_size_p (fun));
2831 : : }
2832 : :
2833 : : unsigned int
2834 : 866644 : pass_duplicate_computed_gotos::execute (function *fun)
2835 : : {
2836 : 866644 : duplicate_computed_gotos (fun);
2837 : :
2838 : 866644 : return 0;
2839 : : }
2840 : :
2841 : : } // anon namespace
2842 : :
2843 : : rtl_opt_pass *
2844 : 282866 : make_pass_duplicate_computed_gotos (gcc::context *ctxt)
2845 : : {
2846 : 282866 : return new pass_duplicate_computed_gotos (ctxt);
2847 : : }
2848 : :
2849 : : /* This function is the main 'entrance' for the optimization that
2850 : : partitions hot and cold basic blocks into separate sections of the
2851 : : .o file (to improve performance and cache locality). Ideally it
2852 : : would be called after all optimizations that rearrange the CFG have
2853 : : been called. However part of this optimization may introduce new
2854 : : register usage, so it must be called before register allocation has
2855 : : occurred. This means that this optimization is actually called
2856 : : well before the optimization that reorders basic blocks (see
2857 : : function above).
2858 : :
2859 : : This optimization checks the feedback information to determine
2860 : : which basic blocks are hot/cold, updates flags on the basic blocks
2861 : : to indicate which section they belong in. This information is
2862 : : later used for writing out sections in the .o file. Because hot
2863 : : and cold sections can be arbitrarily large (within the bounds of
2864 : : memory), far beyond the size of a single function, it is necessary
2865 : : to fix up all edges that cross section boundaries, to make sure the
2866 : : instructions used can actually span the required distance. The
2867 : : fixes are described below.
2868 : :
2869 : : Fall-through edges must be changed into jumps; it is not safe or
2870 : : legal to fall through across a section boundary. Whenever a
2871 : : fall-through edge crossing a section boundary is encountered, a new
2872 : : basic block is inserted (in the same section as the fall-through
2873 : : source), and the fall through edge is redirected to the new basic
2874 : : block. The new basic block contains an unconditional jump to the
2875 : : original fall-through target. (If the unconditional jump is
2876 : : insufficient to cross section boundaries, that is dealt with a
2877 : : little later, see below).
2878 : :
2879 : : In order to deal with architectures that have short conditional
2880 : : branches (which cannot span all of memory) we take any conditional
2881 : : jump that attempts to cross a section boundary and add a level of
2882 : : indirection: it becomes a conditional jump to a new basic block, in
2883 : : the same section. The new basic block contains an unconditional
2884 : : jump to the original target, in the other section.
2885 : :
2886 : : For those architectures whose unconditional branch is also
2887 : : incapable of reaching all of memory, those unconditional jumps are
2888 : : converted into indirect jumps, through a register.
2889 : :
2890 : : IMPORTANT NOTE: This optimization causes some messy interactions
2891 : : with the cfg cleanup optimizations; those optimizations want to
2892 : : merge blocks wherever possible, and to collapse indirect jump
2893 : : sequences (change "A jumps to B jumps to C" directly into "A jumps
2894 : : to C"). Those optimizations can undo the jump fixes that
2895 : : partitioning is required to make (see above), in order to ensure
2896 : : that jumps attempting to cross section boundaries are really able
2897 : : to cover whatever distance the jump requires (on many architectures
2898 : : conditional or unconditional jumps are not able to reach all of
2899 : : memory). Therefore tests have to be inserted into each such
2900 : : optimization to make sure that it does not undo stuff necessary to
2901 : : cross partition boundaries. This would be much less of a problem
2902 : : if we could perform this optimization later in the compilation, but
2903 : : unfortunately the fact that we may need to create indirect jumps
2904 : : (through registers) requires that this optimization be performed
2905 : : before register allocation.
2906 : :
2907 : : Hot and cold basic blocks are partitioned and put in separate
2908 : : sections of the .o file, to reduce paging and improve cache
2909 : : performance (hopefully). This can result in bits of code from the
2910 : : same function being widely separated in the .o file. However this
2911 : : is not obvious to the current bb structure. Therefore we must take
2912 : : care to ensure that: 1). There are no fall_thru edges that cross
2913 : : between sections; 2). For those architectures which have "short"
2914 : : conditional branches, all conditional branches that attempt to
2915 : : cross between sections are converted to unconditional branches;
2916 : : and, 3). For those architectures which have "short" unconditional
2917 : : branches, all unconditional branches that attempt to cross between
2918 : : sections are converted to indirect jumps.
2919 : :
2920 : : The code for fixing up fall_thru edges that cross between hot and
2921 : : cold basic blocks does so by creating new basic blocks containing
2922 : : unconditional branches to the appropriate label in the "other"
2923 : : section. The new basic block is then put in the same (hot or cold)
2924 : : section as the original conditional branch, and the fall_thru edge
2925 : : is modified to fall into the new basic block instead. By adding
2926 : : this level of indirection we end up with only unconditional branches
2927 : : crossing between hot and cold sections.
2928 : :
2929 : : Conditional branches are dealt with by adding a level of indirection.
2930 : : A new basic block is added in the same (hot/cold) section as the
2931 : : conditional branch, and the conditional branch is retargeted to the
2932 : : new basic block. The new basic block contains an unconditional branch
2933 : : to the original target of the conditional branch (in the other section).
2934 : :
2935 : : Unconditional branches are dealt with by converting them into
2936 : : indirect jumps. */
2937 : :
2938 : : namespace {
2939 : :
2940 : : const pass_data pass_data_partition_blocks =
2941 : : {
2942 : : RTL_PASS, /* type */
2943 : : "bbpart", /* name */
2944 : : OPTGROUP_NONE, /* optinfo_flags */
2945 : : TV_REORDER_BLOCKS, /* tv_id */
2946 : : PROP_cfglayout, /* properties_required */
2947 : : 0, /* properties_provided */
2948 : : 0, /* properties_destroyed */
2949 : : 0, /* todo_flags_start */
2950 : : 0, /* todo_flags_finish */
2951 : : };
2952 : :
2953 : : class pass_partition_blocks : public rtl_opt_pass
2954 : : {
2955 : : public:
2956 : 282866 : pass_partition_blocks (gcc::context *ctxt)
2957 : 565732 : : rtl_opt_pass (pass_data_partition_blocks, ctxt)
2958 : : {}
2959 : :
2960 : : /* opt_pass methods: */
2961 : : bool gate (function *) final override;
2962 : : unsigned int execute (function *) final override;
2963 : :
2964 : : }; // class pass_partition_blocks
2965 : :
2966 : : bool
2967 : 1431630 : pass_partition_blocks::gate (function *fun)
2968 : : {
2969 : : /* The optimization to partition hot/cold basic blocks into separate
2970 : : sections of the .o file does not work well with linkonce or with
2971 : : user defined section attributes or with naked attribute. Don't call
2972 : : it if either case arises. */
2973 : 1431630 : return (flag_reorder_blocks_and_partition
2974 : 687357 : && optimize
2975 : : /* See pass_reorder_blocks::gate. We should not partition if
2976 : : we are going to omit the reordering. */
2977 : 687319 : && optimize_function_for_speed_p (fun)
2978 : 641817 : && !DECL_COMDAT_GROUP (current_function_decl)
2979 : 546875 : && !lookup_attribute ("section", DECL_ATTRIBUTES (fun->decl))
2980 : 546829 : && !lookup_attribute ("naked", DECL_ATTRIBUTES (fun->decl))
2981 : : /* Workaround a bug in GDB where read_partial_die doesn't cope
2982 : : with DIEs with DW_AT_ranges, see PR81115. */
2983 : 1978410 : && !(in_lto_p && MAIN_NAME_P (DECL_NAME (fun->decl))));
2984 : : }
2985 : :
2986 : : unsigned
2987 : 537639 : pass_partition_blocks::execute (function *fun)
2988 : : {
2989 : 537639 : vec<edge> crossing_edges;
2990 : :
2991 : 537639 : if (n_basic_blocks_for_fn (fun) <= NUM_FIXED_BLOCKS + 1)
2992 : : return 0;
2993 : :
2994 : 255685 : df_set_flags (DF_DEFER_INSN_RESCAN);
2995 : :
2996 : 255685 : crossing_edges = find_rarely_executed_basic_blocks_and_crossing_edges ();
2997 : 255685 : if (!crossing_edges.exists ())
2998 : : /* Make sure to process deferred rescans and clear changeable df flags. */
2999 : : return TODO_df_finish;
3000 : :
3001 : 60093 : crtl->has_bb_partition = true;
3002 : :
3003 : : /* Make sure the source of any crossing edge ends in a jump and the
3004 : : destination of any crossing edge has a label. */
3005 : 60093 : add_labels_and_missing_jumps (crossing_edges);
3006 : :
3007 : : /* Convert all crossing fall_thru edges to non-crossing fall
3008 : : thrus to unconditional jumps (that jump to the original fall
3009 : : through dest). */
3010 : 60093 : fix_up_fall_thru_edges ();
3011 : :
3012 : : /* If the architecture does not have conditional branches that can
3013 : : span all of memory, convert crossing conditional branches into
3014 : : crossing unconditional branches. */
3015 : 60093 : if (!HAS_LONG_COND_BRANCH)
3016 : : fix_crossing_conditional_branches ();
3017 : :
3018 : : /* If the architecture does not have unconditional branches that
3019 : : can span all of memory, convert crossing unconditional branches
3020 : : into indirect jumps. Since adding an indirect jump also adds
3021 : : a new register usage, update the register usage information as
3022 : : well. */
3023 : 60093 : if (!HAS_LONG_UNCOND_BRANCH)
3024 : : fix_crossing_unconditional_branches ();
3025 : :
3026 : 60093 : update_crossing_jump_flags ();
3027 : :
3028 : : /* Clear bb->aux fields that the above routines were using. */
3029 : 60093 : clear_aux_for_blocks ();
3030 : :
3031 : 60093 : crossing_edges.release ();
3032 : :
3033 : : /* ??? FIXME: DF generates the bb info for a block immediately.
3034 : : And by immediately, I mean *during* creation of the block.
3035 : :
3036 : : #0 df_bb_refs_collect
3037 : : #1 in df_bb_refs_record
3038 : : #2 in create_basic_block_structure
3039 : :
3040 : : Which means that the bb_has_eh_pred test in df_bb_refs_collect
3041 : : will *always* fail, because no edges can have been added to the
3042 : : block yet. Which of course means we don't add the right
3043 : : artificial refs, which means we fail df_verify (much) later.
3044 : :
3045 : : Cleanest solution would seem to make DF_DEFER_INSN_RESCAN imply
3046 : : that we also shouldn't grab data from the new blocks those new
3047 : : insns are in either. In this way one can create the block, link
3048 : : it up properly, and have everything Just Work later, when deferred
3049 : : insns are processed.
3050 : :
3051 : : In the meantime, we have no other option but to throw away all
3052 : : of the DF data and recompute it all. */
3053 : 60093 : if (fun->eh->lp_array)
3054 : : {
3055 : 60093 : df_finish_pass (true);
3056 : 60093 : df_scan_alloc (NULL);
3057 : 60093 : df_scan_blocks ();
3058 : : /* Not all post-landing pads use all of the EH_RETURN_DATA_REGNO
3059 : : data. We blindly generated all of them when creating the new
3060 : : landing pad. Delete those assignments we don't use. */
3061 : 60093 : df_set_flags (DF_LR_RUN_DCE);
3062 : 60093 : df_analyze ();
3063 : : }
3064 : :
3065 : : /* Make sure to process deferred rescans and clear changeable df flags. */
3066 : : return TODO_df_finish;
3067 : : }
3068 : :
3069 : : } // anon namespace
3070 : :
3071 : : rtl_opt_pass *
3072 : 282866 : make_pass_partition_blocks (gcc::context *ctxt)
3073 : : {
3074 : 282866 : return new pass_partition_blocks (ctxt);
3075 : : }
|