GCC Middle and Back End API Reference
ext_cand Struct Reference
Collaboration diagram for ext_cand:

Data Fields

const_rtx expr
 
enum rtx_code code
 
machine_mode mode
 
rtx_insninsn
 

Detailed Description

Redundant Extension Elimination pass for the GNU compiler.
   Copyright (C) 2010-2024 Free Software Foundation, Inc.
   Contributed by Ilya Enkovich (ilya.enkovich@intel.com)

   Based on the Redundant Zero-extension elimination pass contributed by
   Sriraman Tallam (tmsriram@google.com) and Silvius Rus (rus@google.com).

This file is part of GCC.

GCC is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free
Software Foundation; either version 3, or (at your option) any later
version.

GCC is distributed in the hope that it will be useful, but WITHOUT ANY
WARRANTY; without even the implied warranty of MERCHANTABILITY or
FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
for more details.

You should have received a copy of the GNU General Public License
along with GCC; see the file COPYING3.  If not see
<http://www.gnu.org/licenses/>.   
Problem Description :
 --------------------
 This pass is intended to remove redundant extension instructions.
 Such instructions appear for different reasons.  We expect some of
 them due to implicit zero-extension in 64-bit registers after writing
 to their lower 32-bit half (e.g. for the x86-64 architecture).
 Another possible reason is a type cast which follows a load (for
 instance a register restore) and which can be combined into a single
 instruction, and for which earlier local passes, e.g. the combiner,
 weren't able to optimize.

 How does this pass work  ?
 --------------------------

 This pass is run after register allocation.  Hence, all registers that
 this pass deals with are hard registers.  This pass first looks for an
 extension instruction that could possibly be redundant.  Such extension
 instructions show up in RTL with the pattern  :
 (set (reg:<SWI248> x) (any_extend:<SWI248> (reg:<SWI124> x))),
 where x can be any hard register.
 Now, this pass tries to eliminate this instruction by merging the
 extension with the definitions of register x.  For instance, if
 one of the definitions of register x was  :
 (set (reg:SI x) (plus:SI (reg:SI z1) (reg:SI z2))),
 followed by extension  :
 (set (reg:DI x) (zero_extend:DI (reg:SI x)))
 then the combination converts this into :
 (set (reg:DI x) (zero_extend:DI (plus:SI (reg:SI z1) (reg:SI z2)))).
 If all the merged definitions are recognizable assembly instructions,
 the extension is effectively eliminated.

 For example, for the x86-64 architecture, implicit zero-extensions
 are captured with appropriate patterns in the i386.md file.  Hence,
 these merged definition can be matched to a single assembly instruction.
 The original extension instruction is then deleted if all the
 definitions can be merged.

 However, there are cases where the definition instruction cannot be
 merged with an extension.  Examples are CALL instructions.  In such
 cases, the original extension is not redundant and this pass does
 not delete it.

 Handling conditional moves :
 ----------------------------

 Architectures like x86-64 support conditional moves whose semantics for
 extension differ from the other instructions.  For instance, the
 instruction *cmov ebx, eax*
 zero-extends eax onto rax only when the move from ebx to eax happens.
 Otherwise, eax may not be zero-extended.  Consider conditional moves as
 RTL instructions of the form
 (set (reg:SI x) (if_then_else (cond) (reg:SI y) (reg:SI z))).
 This pass tries to merge an extension with a conditional move by
 actually merging the definitions of y and z with an extension and then
 converting the conditional move into :
 (set (reg:DI x) (if_then_else (cond) (reg:DI y) (reg:DI z))).
 Since registers y and z are extended, register x will also be extended
 after the conditional move.  Note that this step has to be done
 transitively since the definition of a conditional copy can be
 another conditional copy.

 Motivating Example I :
 ---------------------
 For this program :
 **********************************************
 bad_code.c

 int mask[1000];

 int foo(unsigned x)
 {
   if (x < 10)
     x = x * 45;
   else
     x = x * 78;
   return mask[x];
 }
 **********************************************

 $ gcc -O2 bad_code.c
   ........
   400315:       b8 4e 00 00 00          mov    $0x4e,%eax
   40031a:       0f af f8                imul   %eax,%edi
   40031d:       89 ff                   mov    %edi,%edi - useless extension
   40031f:       8b 04 bd 60 19 40 00    mov    0x401960(,%rdi,4),%eax
   400326:       c3                      retq
   ......
   400330:       ba 2d 00 00 00          mov    $0x2d,%edx
   400335:       0f af fa                imul   %edx,%edi
   400338:       89 ff                   mov    %edi,%edi - useless extension
   40033a:       8b 04 bd 60 19 40 00    mov    0x401960(,%rdi,4),%eax
   400341:       c3                      retq

 $ gcc -O2 -free bad_code.c
   ......
   400315:       6b ff 4e                imul   $0x4e,%edi,%edi
   400318:       8b 04 bd 40 19 40 00    mov    0x401940(,%rdi,4),%eax
   40031f:       c3                      retq
   400320:       6b ff 2d                imul   $0x2d,%edi,%edi
   400323:       8b 04 bd 40 19 40 00    mov    0x401940(,%rdi,4),%eax
   40032a:       c3                      retq

 Motivating Example II :
 ---------------------

 Here is an example with a conditional move.

 For this program :
 **********************************************

 unsigned long long foo(unsigned x , unsigned y)
 {
   unsigned z;
   if (x > 100)
     z = x + y;
   else
     z = x - y;
   return (unsigned long long)(z);
 }

 $ gcc -O2 bad_code.c
   ............
   400360:       8d 14 3e                lea    (%rsi,%rdi,1),%edx
   400363:       89 f8                   mov    %edi,%eax
   400365:       29 f0                   sub    %esi,%eax
   400367:       83 ff 65                cmp    $0x65,%edi
   40036a:       0f 43 c2                cmovae %edx,%eax
   40036d:       89 c0                   mov    %eax,%eax - useless extension
   40036f:       c3                      retq

 $ gcc -O2 -free bad_code.c
   .............
   400360:       89 fa                   mov    %edi,%edx
   400362:       8d 04 3e                lea    (%rsi,%rdi,1),%eax
   400365:       29 f2                   sub    %esi,%edx
   400367:       83 ff 65                cmp    $0x65,%edi
   40036a:       89 d6                   mov    %edx,%esi
   40036c:       48 0f 42 c6             cmovb  %rsi,%rax
   400370:       c3                      retq

Motivating Example III :
---------------------

Here is an example with a type cast.

For this program :
**********************************************

void test(int size, unsigned char *in, unsigned char *out)
{
  int i;
  unsigned char xr, xg, xy=0;

  for (i = 0; i < size; i++) {
    xr = *in++;
    xg = *in++;
    xy = (unsigned char) ((19595*xr + 38470*xg) >> 16);
    *out++ = xy;
  }
}

$ gcc -O2 bad_code.c
  ............
  10:   0f b6 0e                movzbl (%rsi),%ecx
  13:   0f b6 46 01             movzbl 0x1(%rsi),%eax
  17:   48 83 c6 02             add    $0x2,%rsi
  1b:   0f b6 c9                movzbl %cl,%ecx - useless extension
  1e:   0f b6 c0                movzbl %al,%eax - useless extension
  21:   69 c9 8b 4c 00 00       imul   $0x4c8b,%ecx,%ecx
  27:   69 c0 46 96 00 00       imul   $0x9646,%eax,%eax

 $ gcc -O2 -free bad_code.c
   .............
  10:   0f b6 0e                movzbl (%rsi),%ecx
  13:   0f b6 46 01             movzbl 0x1(%rsi),%eax
  17:   48 83 c6 02             add    $0x2,%rsi
  1b:   69 c9 8b 4c 00 00       imul   $0x4c8b,%ecx,%ecx
  21:   69 c0 46 96 00 00       imul   $0x9646,%eax,%eax

 Usefulness :
 ----------

 The original redundant zero-extension elimination pass reported reduction
 of the dynamic instruction count of a compression benchmark by 2.8% and
 improvement of its run time by about 1%.

 The additional performance gain with the enhanced pass is mostly expected
 on in-order architectures where redundancy cannot be compensated by out of
 order execution.  Measurements showed up to 10% performance gain (reduced
 run time) on EEMBC 2.0 benchmarks on Atom processor with geomean performance
 gain 1%.   
This structure represents a candidate for elimination.   

Field Documentation

◆ code

enum rtx_code ext_cand::code

◆ expr

const_rtx ext_cand::expr

◆ insn

rtx_insn* ext_cand::insn

◆ mode

machine_mode ext_cand::mode

The documentation for this struct was generated from the following file: