GCC Middle and Back End API Reference
rgroup_controls Struct Reference

#include <tree-vectorizer.h>

Collaboration diagram for rgroup_controls:

Data Fields

unsigned int max_nscalars_per_iter
 
unsigned int factor
 
tree type
 
tree compare_type
 
vec< treecontrols
 
tree bias_adjusted_ctrl
 

Detailed Description

In general, we can divide the vector statements in a vectorized loop
into related groups ("rgroups") and say that for each rgroup there is
some nS such that the rgroup operates on nS values from one scalar
iteration followed by nS values from the next.  That is, if VF is the
vectorization factor of the loop, the rgroup operates on a sequence:

  (1,1) (1,2) ... (1,nS) (2,1) ... (2,nS) ... (VF,1) ... (VF,nS)

where (i,j) represents a scalar value with index j in a scalar
iteration with index i.

[ We use the term "rgroup" to emphasise that this grouping isn't
  necessarily the same as the grouping of statements used elsewhere.
  For example, if we implement a group of scalar loads using gather
  loads, we'll use a separate gather load for each scalar load, and
  thus each gather load will belong to its own rgroup. ]

In general this sequence will occupy nV vectors concatenated
together.  If these vectors have nL lanes each, the total number
of scalar values N is given by:

    N = nS * VF = nV * nL

None of nS, VF, nV and nL are required to be a power of 2.  nS and nV
are compile-time constants but VF and nL can be variable (if the target
supports variable-length vectors).

In classical vectorization, each iteration of the vector loop would
handle exactly VF iterations of the original scalar loop.  However,
in vector loops that are able to operate on partial vectors, a
particular iteration of the vector loop might handle fewer than VF
iterations of the scalar loop.  The vector lanes that correspond to
iterations of the scalar loop are said to be "active" and the other
lanes are said to be "inactive".

In such vector loops, many rgroups need to be controlled to ensure
that they have no effect for the inactive lanes.  Conceptually, each
such rgroup needs a sequence of booleans in the same order as above,
but with each (i,j) replaced by a boolean that indicates whether
iteration i is active.  This sequence occupies nV vector controls
that again have nL lanes each.  Thus the control sequence as a whole
consists of VF independent booleans that are each repeated nS times.

Taking mask-based approach as a partially-populated vectors example.
We make the simplifying assumption that if a sequence of nV masks is
suitable for one (nS,nL) pair, we can reuse it for (nS/2,nL/2) by
VIEW_CONVERTing it.  This holds for all current targets that support
fully-masked loops.  For example, suppose the scalar loop is:

  float *f;
  double *d;
  for (int i = 0; i < n; ++i)
    {
      f[i * 2 + 0] += 1.0f;
      f[i * 2 + 1] += 2.0f;
      d[i] += 3.0;
    }

and suppose that vectors have 256 bits.  The vectorized f accesses
will belong to one rgroup and the vectorized d access to another:

  f rgroup: nS = 2, nV = 1, nL = 8
  d rgroup: nS = 1, nV = 1, nL = 4
            VF = 4

  [ In this simple example the rgroups do correspond to the normal
    SLP grouping scheme. ]

If only the first three lanes are active, the masks we need are:

  f rgroup: 1 1 | 1 1 | 1 1 | 0 0
  d rgroup:  1  |  1  |  1  |  0

Here we can use a mask calculated for f's rgroup for d's, but not
vice versa.

Thus for each value of nV, it is enough to provide nV masks, with the
mask being calculated based on the highest nL (or, equivalently, based
on the highest nS) required by any rgroup with that nV.  We therefore
represent the entire collection of masks as a two-level table, with the
first level being indexed by nV - 1 (since nV == 0 doesn't exist) and
the second being indexed by the mask index 0 <= i < nV.   
The controls (like masks or lengths) needed by rgroups with nV vectors,
according to the description above.   

Field Documentation

◆ bias_adjusted_ctrl

tree rgroup_controls::bias_adjusted_ctrl

◆ compare_type

tree rgroup_controls::compare_type

◆ controls

◆ factor

◆ max_nscalars_per_iter

◆ type


The documentation for this struct was generated from the following file: