This graph shows which files directly or indirectly include this file:

Data Structures
struct	target_gcse

Macros
#define	this_target_gcse (&default_target_gcse)

Functions
void	gcse_cc_finalize (void)
bool	gcse_or_cprop_is_too_expensive (const char *)
rtx_insn *	insert_insn_end_basic_block (rtx_insn *, basic_block)

Variables
struct target_gcse	default_target_gcse

Macro Definition Documentation

◆ this_target_gcse

#define this_target_gcse (&default_target_gcse)

Function Documentation

◆ gcse_cc_finalize()

void gcse_cc_finalize ( void )

Reset all state within gcse.cc so that we can rerun the compiler
within the same process.  For use by toplev::finalize.

References NULL, and test_insn.

Referenced by toplev::finalize().

◆ gcse_or_cprop_is_too_expensive()

bool gcse_or_cprop_is_too_expensive ( const char * pass )

extern

Return true if the graph is too expensive to optimize. PASS is the
optimization about to be performed.

References cfun, max_reg_num(), n_basic_blocks_for_fn, n_edges_for_fn, SBITMAP_ELT_TYPE, SBITMAP_SET_SIZE, and warning().

Referenced by gcse_after_reload_main(), one_code_hoisting_pass(), one_cprop_pass(), and one_pre_gcse_pass().

◆ insert_insn_end_basic_block()

rtx_insn * insert_insn_end_basic_block	(	rtx_insn *	pat,
		basic_block	bb )

extern

Return the INSN which is added at the end of the block BB with
same instruction pattern with PAT.

References add_label_notes(), BB_END, BB_HEAD, CALL_P, emit_insn_after_noloc(), emit_insn_before_noloc(), find_first_parameter_load(), gcc_assert, INSN_P, JUMP_P, LABEL_P, NEXT_INSN(), NONJUMP_INSN_P, NOTE_INSN_BASIC_BLOCK_P, NULL_RTX, PATTERN(), single_succ_edge(), and single_succ_p().

Variable Documentation

◆ default_target_gcse

struct target_gcse default_target_gcse

extern

Partial redundancy elimination / Hoisting for RTL.
   Copyright (C) 1997-2025 Free Software Foundation, Inc.

This file is part of GCC.

GCC is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free
Software Foundation; either version 3, or (at your option) any later
version.

GCC is distributed in the hope that it will be useful, but WITHOUT ANY
WARRANTY; without even the implied warranty of MERCHANTABILITY or
FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
for more details.

You should have received a copy of the GNU General Public License
along with GCC; see the file COPYING3.  If not see
<http://www.gnu.org/licenses/>.

TODO
  - reordering of memory allocation and freeing to be more space efficient
  - calc rough register pressure information and use the info to drive all
    kinds of code motion (including code hoisting) in a unified way.

References searched while implementing this.

  Compilers Principles, Techniques and Tools
  Aho, Sethi, Ullman
  Addison-Wesley, 1988

  Global Optimization by Suppression of Partial Redundancies
  E. Morel, C. Renvoise
  communications of the acm, Vol. 22, Num. 2, Feb. 1979

  A Portable Machine-Independent Global Optimizer - Design and Measurements
  Frederick Chow
  Stanford Ph.D. thesis, Dec. 1983

  A Fast Algorithm for Code Movement Optimization
  D.M. Dhamdhere
  SIGPLAN Notices, Vol. 23, Num. 10, Oct. 1988

  A Solution to a Problem with Morel and Renvoise's
  Global Optimization by Suppression of Partial Redundancies
  K-H Drechsler, M.P. Stadel
  ACM TOPLAS, Vol. 10, Num. 4, Oct. 1988

  Practical Adaptation of the Global Optimization
  Algorithm of Morel and Renvoise
  D.M. Dhamdhere
  ACM TOPLAS, Vol. 13, Num. 2. Apr. 1991

  Efficiently Computing Static Single Assignment Form and the Control
  Dependence Graph
  R. Cytron, J. Ferrante, B.K. Rosen, M.N. Wegman, and F.K. Zadeck
  ACM TOPLAS, Vol. 13, Num. 4, Oct. 1991

  Lazy Code Motion
  J. Knoop, O. Ruthing, B. Steffen
  ACM SIGPLAN Notices Vol. 27, Num. 7, Jul. 1992, '92 Conference on PLDI

  What's In a Region?  Or Computing Control Dependence Regions in Near-Linear
  Time for Reducible Flow Control
  Thomas Ball
  ACM Letters on Programming Languages and Systems,
  Vol. 2, Num. 1-4, Mar-Dec 1993

  An Efficient Representation for Sparse Sets
  Preston Briggs, Linda Torczon
  ACM Letters on Programming Languages and Systems,
  Vol. 2, Num. 1-4, Mar-Dec 1993

  A Variation of Knoop, Ruthing, and Steffen's Lazy Code Motion
  K-H Drechsler, M.P. Stadel
  ACM SIGPLAN Notices, Vol. 28, Num. 5, May 1993

  Partial Dead Code Elimination
  J. Knoop, O. Ruthing, B. Steffen
  ACM SIGPLAN Notices, Vol. 29, Num. 6, Jun. 1994

  Effective Partial Redundancy Elimination
  P. Briggs, K.D. Cooper
  ACM SIGPLAN Notices, Vol. 29, Num. 6, Jun. 1994

  The Program Structure Tree: Computing Control Regions in Linear Time
  R. Johnson, D. Pearson, K. Pingali
  ACM SIGPLAN Notices, Vol. 29, Num. 6, Jun. 1994

  Optimal Code Motion: Theory and Practice
  J. Knoop, O. Ruthing, B. Steffen
  ACM TOPLAS, Vol. 16, Num. 4, Jul. 1994

  The power of assignment motion
  J. Knoop, O. Ruthing, B. Steffen
  ACM SIGPLAN Notices Vol. 30, Num. 6, Jun. 1995, '95 Conference on PLDI

  Global code motion / global value numbering
  C. Click
  ACM SIGPLAN Notices Vol. 30, Num. 6, Jun. 1995, '95 Conference on PLDI

  Value Driven Redundancy Elimination
  L.T. Simpson
  Rice University Ph.D. thesis, Apr. 1996

  Value Numbering
  L.T. Simpson
  Massively Scalar Compiler Project, Rice University, Sep. 1996

  High Performance Compilers for Parallel Computing
  Michael Wolfe
  Addison-Wesley, 1996

  Advanced Compiler Design and Implementation
  Steven Muchnick
  Morgan Kaufmann, 1997

  Building an Optimizing Compiler
  Robert Morgan
  Digital Press, 1998

  People wishing to speed up the code here should read:
    Elimination Algorithms for Data Flow Analysis
    B.G. Ryder, M.C. Paull
    ACM Computing Surveys, Vol. 18, Num. 3, Sep. 1986

    How to Analyze Large Programs Efficiently and Informatively
    D.M. Dhamdhere, B.K. Rosen, F.K. Zadeck
    ACM SIGPLAN Notices Vol. 27, Num. 7, Jul. 1992, '92 Conference on PLDI

  People wishing to do something different can find various possibilities
  in the above papers and elsewhere.

We support GCSE via Partial Redundancy Elimination.  PRE optimizations
are a superset of those done by classic GCSE.

Two passes of copy/constant propagation are done around PRE or hoisting
because the first one enables more GCSE and the second one helps to clean
up the copies that PRE and HOIST create.  This is needed more for PRE than
for HOIST because code hoisting will try to use an existing register
containing the common subexpression rather than create a new one.  This is
harder to do for PRE because of the code motion (which HOIST doesn't do).

Expressions we are interested in GCSE-ing are of the form
(set (pseudo-reg) (expression)).
Function want_to_gcse_p says what these are.

In addition, expressions in REG_EQUAL notes are candidates for GCSE-ing.
This allows PRE to hoist expressions that are expressed in multiple insns,
such as complex address calculations (e.g. for PIC code, or loads with a
high part and a low part).

PRE handles moving invariant expressions out of loops (by treating them as
partially redundant).

**********************

We used to support multiple passes but there are diminishing returns in
doing so.  The first pass usually makes 90% of the changes that are doable.
A second pass can make a few more changes made possible by the first pass.
Experiments show any further passes don't make enough changes to justify
the expense.

A study of spec92 using an unlimited number of passes:
[1 pass] = 1208 substitutions, [2] = 577, [3] = 202, [4] = 192, [5] = 83,
[6] = 34, [7] = 17, [8] = 9, [9] = 4, [10] = 4, [11] = 2,
[12] = 2, [13] = 1, [15] = 1, [16] = 2, [41] = 1

It was found doing copy propagation between each pass enables further
substitutions.

This study was done before expressions in REG_EQUAL notes were added as
candidate expressions for optimization, and before the GIMPLE optimizers
were added.  Probably, multiple passes is even less efficient now than
at the time when the study was conducted.

PRE is quite expensive in complicated functions because the DFA can take
a while to converge.  Hence we only perform one pass.

**********************

The steps for PRE are:

1) Build the hash table of expressions we wish to GCSE (expr_hash_table).

2) Perform the data flow analysis for PRE.

3) Delete the redundant instructions

4) Insert the required copies [if any] that make the partially
   redundant instructions fully redundant.

5) For other reaching expressions, insert an instruction to copy the value
   to a newly created pseudo that will reach the redundant instruction.

The deletion is done first so that when we do insertions we
know which pseudo reg to use.

Various papers have argued that PRE DFA is expensive (O(n^2)) and others
argue it is not.  The number of iterations for the algorithm to converge
is typically 2-4 so I don't view it as that expensive (relatively speaking).

PRE GCSE depends heavily on the second CPROP pass to clean up the copies
we create.  To make an expression reach the place where it's redundant,
the result of the expression is copied to a new register, and the redundant
expression is deleted by replacing it with this new register.  Classic GCSE
doesn't have this problem as much as it computes the reaching defs of
each register in each block and thus can try to use an existing
register.


GCSE global vars.

Data Structures

Macros

Functions

Variables

Macro Definition Documentation

◆ this_target_gcse

Function Documentation

◆ gcse_cc_finalize()

◆ gcse_or_cprop_is_too_expensive()

◆ insert_insn_end_basic_block()

Variable Documentation

◆ default_target_gcse