GCC Middle and Back End API Reference
|
#include <tree-vectorizer.h>
Data Fields | |
unsigned int | max_nscalars_per_iter |
unsigned int | factor |
tree | type |
tree | compare_type |
vec< tree > | controls |
tree | bias_adjusted_ctrl |
In general, we can divide the vector statements in a vectorized loop into related groups ("rgroups") and say that for each rgroup there is some nS such that the rgroup operates on nS values from one scalar iteration followed by nS values from the next. That is, if VF is the vectorization factor of the loop, the rgroup operates on a sequence: (1,1) (1,2) ... (1,nS) (2,1) ... (2,nS) ... (VF,1) ... (VF,nS) where (i,j) represents a scalar value with index j in a scalar iteration with index i. [ We use the term "rgroup" to emphasise that this grouping isn't necessarily the same as the grouping of statements used elsewhere. For example, if we implement a group of scalar loads using gather loads, we'll use a separate gather load for each scalar load, and thus each gather load will belong to its own rgroup. ] In general this sequence will occupy nV vectors concatenated together. If these vectors have nL lanes each, the total number of scalar values N is given by: N = nS * VF = nV * nL None of nS, VF, nV and nL are required to be a power of 2. nS and nV are compile-time constants but VF and nL can be variable (if the target supports variable-length vectors). In classical vectorization, each iteration of the vector loop would handle exactly VF iterations of the original scalar loop. However, in vector loops that are able to operate on partial vectors, a particular iteration of the vector loop might handle fewer than VF iterations of the scalar loop. The vector lanes that correspond to iterations of the scalar loop are said to be "active" and the other lanes are said to be "inactive". In such vector loops, many rgroups need to be controlled to ensure that they have no effect for the inactive lanes. Conceptually, each such rgroup needs a sequence of booleans in the same order as above, but with each (i,j) replaced by a boolean that indicates whether iteration i is active. This sequence occupies nV vector controls that again have nL lanes each. Thus the control sequence as a whole consists of VF independent booleans that are each repeated nS times. Taking mask-based approach as a partially-populated vectors example. We make the simplifying assumption that if a sequence of nV masks is suitable for one (nS,nL) pair, we can reuse it for (nS/2,nL/2) by VIEW_CONVERTing it. This holds for all current targets that support fully-masked loops. For example, suppose the scalar loop is: float *f; double *d; for (int i = 0; i < n; ++i) { f[i * 2 + 0] += 1.0f; f[i * 2 + 1] += 2.0f; d[i] += 3.0; } and suppose that vectors have 256 bits. The vectorized f accesses will belong to one rgroup and the vectorized d access to another: f rgroup: nS = 2, nV = 1, nL = 8 d rgroup: nS = 1, nV = 1, nL = 4 VF = 4 [ In this simple example the rgroups do correspond to the normal SLP grouping scheme. ] If only the first three lanes are active, the masks we need are: f rgroup: 1 1 | 1 1 | 1 1 | 0 0 d rgroup: 1 | 1 | 1 | 0 Here we can use a mask calculated for f's rgroup for d's, but not vice versa. Thus for each value of nV, it is enough to provide nV masks, with the mask being calculated based on the highest nL (or, equivalently, based on the highest nS) required by any rgroup with that nV. We therefore represent the entire collection of masks as a two-level table, with the first level being indexed by nV - 1 (since nV == 0 doesn't exist) and the second being indexed by the mask index 0 <= i < nV.
The controls (like masks or lengths) needed by rgroups with nV vectors, according to the description above.
tree rgroup_controls::bias_adjusted_ctrl |
Referenced by vect_get_loop_len(), vect_set_loop_controls_directly(), and vect_verify_full_masking_avx512().
tree rgroup_controls::compare_type |
Referenced by vect_verify_full_masking_avx512().
unsigned int rgroup_controls::factor |
Referenced by vect_adjust_loop_lens_control(), vect_estimate_min_profitable_iters(), vect_get_loop_len(), vect_get_loop_mask(), vect_record_loop_len(), vect_rgroup_iv_might_wrap_p(), vect_set_loop_condition_partial_vectors(), vect_set_loop_controls_directly(), vect_verify_full_masking(), vect_verify_full_masking_avx512(), and vect_verify_loop_lens().
unsigned int rgroup_controls::max_nscalars_per_iter |
Referenced by vect_estimate_min_profitable_iters(), vect_get_max_nscalars_per_iter(), vect_maybe_permute_loop_masks(), vect_record_loop_len(), vect_rgroup_iv_might_wrap_p(), vect_set_loop_condition_partial_vectors(), vect_set_loop_controls_directly(), vect_verify_full_masking(), vect_verify_full_masking_avx512(), and vect_verify_loop_lens().
tree rgroup_controls::type |
Referenced by can_produce_all_loop_masks_p(), vect_adjust_loop_lens_control(), vect_estimate_min_profitable_iters(), vect_get_loop_len(), vect_get_loop_mask(), vect_maybe_permute_loop_masks(), vect_record_loop_len(), vect_set_loop_controls_directly(), vect_verify_full_masking(), and vect_verify_full_masking_avx512().