Global, SSA-based optimizations using mathematical identities.
Copyright (C) 2005-2024 Free Software Foundation, Inc.
This file is part of GCC.
GCC is free software; you can redistribute it and/or modify it
under the terms of the GNU General Public License as published by the
Free Software Foundation; either version 3, or (at your option) any
later version.
GCC is distributed in the hope that it will be useful, but WITHOUT
ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
for more details.
You should have received a copy of the GNU General Public License
along with GCC; see the file COPYING3. If not see
<http://www.gnu.org/licenses/>.
Currently, the only mini-pass in this file tries to CSE reciprocal
operations. These are common in sequences such as this one:
modulus = sqrt(x*x + y*y + z*z);
x = x / modulus;
y = y / modulus;
z = z / modulus;
that can be optimized to
modulus = sqrt(x*x + y*y + z*z);
rmodulus = 1.0 / modulus;
x = x * rmodulus;
y = y * rmodulus;
z = z * rmodulus;
We do this for loop invariant divisors, and with this pass whenever
we notice that a division has the same divisor multiple times.
Of course, like in PRE, we don't insert a division if a dominator
already has one. However, this cannot be done as an extension of
PRE for several reasons.
First of all, with some experiments it was found out that the
transformation is not always useful if there are only two divisions
by the same divisor. This is probably because modern processors
can pipeline the divisions; on older, in-order processors it should
still be effective to optimize two divisions by the same number.
We make this a param, and it shall be called N in the remainder of
this comment.
Second, if trapping math is active, we have less freedom on where
to insert divisions: we can only do so in basic blocks that already
contain one. (If divisions don't trap, instead, we can insert
divisions elsewhere, which will be in blocks that are common dominators
of those that have the division).
We really don't want to compute the reciprocal unless a division will
be found. To do this, we won't insert the division in a basic block
that has less than N divisions *post-dominating* it.
The algorithm constructs a subset of the dominator tree, holding the
blocks containing the divisions and the common dominators to them,
and walk it twice. The first walk is in post-order, and it annotates
each block with the number of divisions that post-dominate it: this
gives information on where divisions can be inserted profitably.
The second walk is in pre-order, and it inserts divisions as explained
above, and replaces divisions by multiplications.
In the best case, the cost of the pass is O(n_statements). In the
worst-case, the cost is due to creating the dominator tree subset,
with a cost of O(n_basic_blocks ^ 2); however this can only happen
for n_statements / n_basic_blocks statements. So, the amortized cost
of creating the dominator tree subset is O(n_basic_blocks) and the
worst-case cost of the pass is O(n_statements * n_basic_blocks).
More practically, the cost will be small because there are few
divisions, and they tend to be in the same basic block, so insert_bb
is called very few times.
If we did this using domwalk.cc, an efficient implementation would have
to work on all the variables in a single pass, because we could not
work on just a subset of the dominator tree, as we do now, and the
cost would also be something like O(n_statements * n_basic_blocks).
The data structures would be more complex in order to work on all the
variables in a single pass.
This structure represents one basic block that either computes a
division, or is a common dominator for basic block that compute a
division.