GCC Middle and Back End API Reference
vect_optimize_slp_pass Class Reference
Collaboration diagram for vect_optimize_slp_pass:

Public Member Functions

 vect_optimize_slp_pass (vec_info *vinfo)
 
void run ()
 

Private Member Functions

struct loopcontaining_loop (slp_tree)
 
bool is_cfg_latch_edge (graph_edge *)
 
void build_vertices (hash_set< slp_tree > &, slp_tree)
 
void build_vertices ()
 
void build_graph ()
 
void create_partitions ()
 
template<typename T>
void for_each_partition_edge (unsigned int, T)
 
bool is_compatible_layout (slp_tree, unsigned int)
 
int change_layout_cost (slp_tree, unsigned int, unsigned int)
 
slpg_partition_layout_costspartition_layout_costs (unsigned int, unsigned int)
 
void change_vec_perm_layout (slp_tree, lane_permutation_t &, int, unsigned int)
 
int internal_node_cost (slp_tree, int, unsigned int)
 
void start_choosing_layouts ()
 
slpg_layout_cost edge_layout_cost (graph_edge *, unsigned int, unsigned int, unsigned int)
 
slpg_layout_cost total_in_cost (unsigned int)
 
slpg_layout_cost forward_cost (graph_edge *, unsigned int, unsigned int)
 
slpg_layout_cost backward_cost (graph_edge *, unsigned int, unsigned int)
 
void forward_pass ()
 
void backward_pass ()
 
slp_tree get_result_with_layout (slp_tree, unsigned int)
 
void materialize ()
 
void remove_redundant_permutations ()
 
void decide_masked_load_lanes ()
 
void dump ()
 

Private Attributes

vec_infom_vinfo
 
bool m_optimize_size
 
graphm_slpg = nullptr
 
auto_vec< slpg_vertexm_vertices
 
auto_vec< int > m_leafs
 
auto_vec< vec< unsigned > > m_perms
 
auto_vec< slpg_partition_infom_partitions
 
auto_vec< unsigned int > m_partitioned_nodes
 
auto_vec< slpg_partition_layout_costsm_partition_layout_costs
 
auto_vec< slp_treem_node_layouts
 

Detailed Description

This class tries to optimize the layout of vectors in order to avoid unnecessary shuffling. At the moment, the set of possible layouts are restricted to bijective permutations. The goal of the pass depends on whether we're optimizing for size or for speed. When optimizing for size, the goal is to reduce the overall number of layout changes (including layout changes implied by things like load permutations). When optimizing for speed, the goal is to reduce the maximum latency attributable to layout changes on any non-cyclical path through the data flow graph. For example, when optimizing a loop nest for speed, we will prefer to make layout changes outside of a loop rather than inside of a loop, and will prefer to make layout changes in parallel rather than serially, even if that increases the overall number of layout changes. The high-level procedure is: (1) Build a graph in which edges go from uses (parents) to definitions (children). (2) Divide the graph into a dag of strongly-connected components (SCCs). (3) When optimizing for speed, partition the nodes in each SCC based on their containing cfg loop. When optimizing for size, treat each SCC as a single partition. This gives us a dag of partitions. The goal is now to assign a layout to each partition. (4) Construct a set of vector layouts that are worth considering. Record which nodes must keep their current layout. (5) Perform a forward walk over the partition dag (from loads to stores) accumulating the "forward" cost of using each layout. When visiting each partition, assign a tentative choice of layout to the partition and use that choice when calculating the cost of using a different layout in successor partitions. (6) Perform a backward walk over the partition dag (from stores to loads), accumulating the "backward" cost of using each layout. When visiting each partition, make a final choice of layout for that partition based on the accumulated forward costs (from (5)) and backward costs (from (6)). (7) Apply the chosen layouts to the SLP graph. For example, consider the SLP statements: S1: a_1 = load loop: S2: a_2 = PHI<a_1, a_3> S3: b_1 = load S4: a_3 = a_2 + b_1 exit: S5: a_4 = PHI<a_3> S6: store a_4 S2 and S4 form an SCC and are part of the same loop. Every other statement is in a singleton SCC. In this example there is a one-to-one mapping between SCCs and partitions and the partition dag looks like this; S1 S3 \ / S2+S4 | S5 | S6 S2, S3 and S4 will have a higher execution frequency than the other statements, so when optimizing for speed, the goal is to avoid any layout changes: - within S3 - within S2+S4 - on the S3->S2+S4 edge For example, if S3 was originally a reversing load, the goal of the pass is to make it an unreversed load and change the layout on the S1->S2+S4 and S2+S4->S5 edges to compensate. (Changing the layout on S1->S2+S4 and S5->S6 would also be acceptable.) The difference between SCCs and partitions becomes important if we add an outer loop: S1: a_1 = ... loop1: S2: a_2 = PHI<a_1, a_6> S3: b_1 = load S4: a_3 = a_2 + b_1 loop2: S5: a_4 = PHI<a_3, a_5> S6: c_1 = load S7: a_5 = a_4 + c_1 exit2: S8: a_6 = PHI<a_5> S9: store a_6 exit1: Here, S2, S4, S5, S7 and S8 form a single SCC. However, when optimizing for speed, we usually do not want restrictions in the outer loop to "infect" the decision for the inner loop. For example, if an outer-loop node in the SCC contains a statement with a fixed layout, that should not prevent the inner loop from using a different layout. Conversely, the inner loop should not dictate a layout to the outer loop: if the outer loop does a lot of computation, then it may not be efficient to do all of that computation in the inner loop's preferred layout. So when optimizing for speed, we partition the SCC into S2+S4+S8 (outer) and S5+S7 (inner). We also try to arrange partitions so that: - the partition for an outer loop comes before the partition for an inner loop - if a sibling loop A dominates a sibling loop B, A's partition comes before B's This gives the following partition dag for the example above: S1 S3 \ / S2+S4+S8 S6 | \\ / | S5+S7 | S9 There are two edges from S2+S4+S8 to S5+S7: one for the edge S4->S5 and one for a reversal of the edge S7->S8. The backward walk picks a layout for S5+S7 before S2+S4+S8. The choice for S2+S4+S8 therefore has to balance the cost of using the outer loop's preferred layout against the cost of changing the layout on entry to the inner loop (S4->S5) and on exit from the inner loop (S7->S8 reversed). Although this works well when optimizing for speed, it has the downside when optimizing for size that the choice of layout for S5+S7 is completely independent of S9, which lessens the chance of reducing the overall number of permutations. We therefore do not partition SCCs when optimizing for size. To give a concrete example of the difference between optimizing for size and speed, consider: a[0] = (b[1] << c[3]) - d[1]; a[1] = (b[0] << c[2]) - d[0]; a[2] = (b[3] << c[1]) - d[3]; a[3] = (b[2] << c[0]) - d[2]; There are three different layouts here: one for a, one for b and d, and one for c. When optimizing for speed it is better to permute each of b, c and d into the order required by a, since those permutations happen in parallel. But when optimizing for size, it is better to: - permute c into the same order as b - do the arithmetic - permute the result into the order required by a This gives 2 permutations rather than 3.

Constructor & Destructor Documentation

◆ vect_optimize_slp_pass()

vect_optimize_slp_pass::vect_optimize_slp_pass ( vec_info * vinfo)
inline

References m_vinfo.

Member Function Documentation

◆ backward_cost()

slpg_layout_cost vect_optimize_slp_pass::backward_cost ( graph_edge * ud,
unsigned int to_node_i,
unsigned int from_layout_i )
private
UD represents a use-def link between TO_NODE_I and a node in an earlier partition; TO_NODE_I could be the definition node or the use node. The node at the other end of the link wants to use layout FROM_LAYOUT_I; return the cost of any necessary fix-ups on edge UD, or slpg_layout_cost::impossible () if the choice cannot be made. At this point, TO_NODE_I's partition has a fixed choice of layout.

References slpg_layout_cost::add_serial_cost(), graph_edge::dest, edge_layout_cost(), gcc_assert, slpg_layout_cost::impossible(), slpg_partition_info::in_degree, internal_node_cost(), slpg_partition_info::layout, m_optimize_size, m_partitions, m_vertices, partition_layout_costs(), SLP_TREE_CODE, slpg_layout_cost::split(), and graph_edge::src.

Referenced by backward_pass().

◆ backward_pass()

void vect_optimize_slp_pass::backward_pass ( )
private

◆ build_graph()

void vect_optimize_slp_pass::build_graph ( )
private
Build the graph. Mark edges that correspond to cfg loop latch edges with a nonnull data field.

References add_edge(), build_vertices(), graph_edge::data, is_cfg_latch_edge(), m_optimize_size, m_slpg, m_vertices, new_graph(), and SLP_TREE_CHILDREN.

Referenced by run().

◆ build_vertices() [1/2]

void vect_optimize_slp_pass::build_vertices ( )
private
Fill the vertices and leafs vector with all nodes in the SLP graph.

References build_vertices(), FOR_EACH_VEC_ELT, i, m_leafs, m_vertices, m_vinfo, SLP_INSTANCE_TREE, and visited.

Referenced by build_graph(), build_vertices(), and build_vertices().

◆ build_vertices() [2/2]

void vect_optimize_slp_pass::build_vertices ( hash_set< slp_tree > & visited,
slp_tree node )
private
Fill the vertices and leafs vector with all nodes in the SLP graph. Also record whether we should optimize anything for speed rather than size.

References build_vertices(), FOR_EACH_VEC_ELT, gimple_bb(), i, m_leafs, m_optimize_size, m_vertices, optimize_bb_for_speed_p(), SLP_TREE_CHILDREN, SLP_TREE_REPRESENTATIVE, vect_orig_stmt(), _slp_tree::vertex, and visited.

◆ change_layout_cost()

int vect_optimize_slp_pass::change_layout_cost ( slp_tree node,
unsigned int from_layout_i,
unsigned int to_layout_i )
private
Return the cost (in arbtirary units) of going from layout FROM_LAYOUT_I to layout TO_LAYOUT_I for a node like NODE. Return -1 if either of the layouts is incompatible with NODE or if the change is not possible for some other reason. The properties taken from NODE include the number of lanes and the vector type. The actual operation doesn't matter.

References count, i, is_compatible_layout(), m_perms, m_vinfo, MAX, SLP_TREE_LANES, vect_slp_permute(), and vectorizable_slp_permutation_1().

Referenced by edge_layout_cost().

◆ change_vec_perm_layout()

void vect_optimize_slp_pass::change_vec_perm_layout ( slp_tree node,
lane_permutation_t & perm,
int in_layout_i,
unsigned int out_layout_i )
private
Change PERM in one of two ways: - if IN_LAYOUT_I < 0, accept input operand I in the layout that has been chosen for child I of NODE. - if IN_LAYOUT >= 0, accept all inputs operands with that layout. In both cases, arrange for the output to have layout OUT_LAYOUT_I

References m_partitions, m_perms, m_vertices, SLP_TREE_CHILDREN, vect_slp_permute(), and _slp_tree::vertex.

Referenced by internal_node_cost(), and materialize().

◆ containing_loop()

struct loop * vect_optimize_slp_pass::containing_loop ( slp_tree node)
private

◆ create_partitions()

◆ decide_masked_load_lanes()

◆ dump()

◆ edge_layout_cost()

slpg_layout_cost vect_optimize_slp_pass::edge_layout_cost ( graph_edge * ud,
unsigned int node1_i,
unsigned int layout1_i,
unsigned int layout2_i )
private
Return the cost of switching between layout LAYOUT1_I (at node NODE1_I) and layout LAYOUT2_I on cross-partition use-to-def edge UD. Return slpg_layout_cost::impossible () if the change isn't possible.

References change_layout_cost(), graph_edge::dest, slpg_layout_cost::impossible(), m_optimize_size, m_vertices, slpg_layout_cost::split(), and graph_edge::src.

Referenced by backward_cost(), backward_pass(), forward_cost(), and forward_pass().

◆ for_each_partition_edge()

template<typename T>
void vect_optimize_slp_pass::for_each_partition_edge ( unsigned int node_i,
T fn )
private
Look for edges from earlier partitions into node NODE_I and edges from node NODE_I into later partitions. Call: FN (ud, other_node_i) for each such use-to-def edge ud, where other_node_i is the node at the other end of the edge.

References m_slpg, m_vertices, and T.

Referenced by backward_pass(), dump(), forward_pass(), start_choosing_layouts(), and total_in_cost().

◆ forward_cost()

slpg_layout_cost vect_optimize_slp_pass::forward_cost ( graph_edge * ud,
unsigned int from_node_i,
unsigned int to_layout_i )
private
UD represents a use-def link between FROM_NODE_I and a node in a later partition; FROM_NODE_I could be the definition node or the use node. The node at the other end of the link wants to use layout TO_LAYOUT_I. Return the cost of any necessary fix-ups on edge UD, or return slpg_layout_cost::impossible () if the change isn't possible. At this point, FROM_NODE_I's partition has chosen the cheapest layout based on the information available so far, but this choice is only provisional.

References slpg_layout_cost::add_serial_cost(), edge_layout_cost(), gcc_assert, slpg_layout_cost::impossible(), slpg_layout_cost::is_better_than(), slpg_layout_cost::is_possible(), slpg_partition_info::layout, m_optimize_size, m_partitions, m_vertices, slpg_partition_info::out_degree, partition_layout_costs(), and slpg_layout_cost::split().

Referenced by forward_pass().

◆ forward_pass()

void vect_optimize_slp_pass::forward_pass ( )
private
Make a forward pass through the partitions, accumulating input costs. Make a tentative (provisional) choice of layout for each partition, ensuring that this choice still allows later partitions to keep their original layout.

References add_cost(), slpg_layout_cost::add_serial_cost(), edge_layout_cost(), for_each_partition_edge(), forward_cost(), gcc_assert, slpg_layout_cost::impossible(), internal_node_cost(), slpg_layout_cost::is_better_than(), is_compatible_layout(), slpg_layout_cost::is_possible(), m_optimize_size, m_partitioned_nodes, m_partitions, m_perms, m_vertices, partition_layout_costs(), SLP_TREE_CODE, total_in_cost(), and _slp_tree::vertex.

Referenced by run().

◆ get_result_with_layout()

slp_tree vect_optimize_slp_pass::get_result_with_layout ( slp_tree node,
unsigned int to_layout_i )
private

◆ internal_node_cost()

int vect_optimize_slp_pass::internal_node_cost ( slp_tree node,
int in_layout_i,
unsigned int out_layout_i )
private
Check whether the target allows NODE to be rearranged so that the node's output has layout OUT_LAYOUT_I. Return the cost of the change if so, in the same arbitrary units as for change_layout_cost. Return -1 otherwise. If NODE is a VEC_PERM_EXPR and IN_LAYOUT_I < 0, also check whether NODE can adapt to the layout changes that have (perhaps provisionally) been chosen for NODE's children, so that no extra permutations are needed on either the input or the output of NODE. If NODE is a VEC_PERM_EXPR and IN_LAYOUT_I >= 0, instead assume that all inputs will be forced into layout IN_LAYOUT_I beforehand. IN_LAYOUT_I has no meaning for other types of node. Keeping the node as-is is always valid. If the target doesn't appear to support the node as-is, but might realistically support other layouts, then layout 0 instead has the cost of a worst-case permutation. On the one hand, this ensures that every node has at least one valid layout, avoiding what would otherwise be an awkward special case. On the other, it still encourages the pass to change an invalid pre-existing layout choice into a valid one.

References change_vec_perm_layout(), count, DR_GROUP_FIRST_ELEMENT, DR_GROUP_SIZE, DR_IS_READ, dyn_cast(), is_compatible_layout(), LOOP_VINFO_VECT_FACTOR, m_perms, m_vinfo, SLP_TREE_CHILDREN, SLP_TREE_CODE, SLP_TREE_LANE_PERMUTATION, SLP_TREE_LANES, SLP_TREE_LOAD_PERMUTATION, SLP_TREE_REPRESENTATIVE, STMT_VINFO_DATA_REF, STMT_VINFO_GROUPED_ACCESS, vect_slp_permute(), vect_transform_slp_perm_load_1(), vectorizable_slp_permutation_1(), and vNULL.

Referenced by backward_cost(), and forward_pass().

◆ is_cfg_latch_edge()

bool vect_optimize_slp_pass::is_cfg_latch_edge ( graph_edge * ud)
private
Return true if UD (an edge from a use to a definition) is associated with a loop latch edge in the cfg.

References bb_loop_header_p(), containing_loop(), graph_edge::dest, gimple_bb(), is_a(), m_vertices, SLP_TREE_CODE, SLP_TREE_DEF_TYPE, SLP_TREE_REPRESENTATIVE, graph_edge::src, vect_internal_def, and vect_orig_stmt().

Referenced by build_graph().

◆ is_compatible_layout()

bool vect_optimize_slp_pass::is_compatible_layout ( slp_tree node,
unsigned int layout_i )
private
Return true if layout LAYOUT_I is compatible with the number of SLP lanes that NODE would operate on. This test is independent of NODE's actual operation.

References m_perms, and SLP_TREE_LANES.

Referenced by change_layout_cost(), forward_pass(), and internal_node_cost().

◆ materialize()

◆ partition_layout_costs()

slpg_partition_layout_costs & vect_optimize_slp_pass::partition_layout_costs ( unsigned int partition_i,
unsigned int layout_i )
inlineprivate
Return the costs of assigning layout LAYOUT_I to partition PARTITION_I.

References m_partition_layout_costs, and m_perms.

Referenced by backward_cost(), backward_pass(), dump(), forward_cost(), forward_pass(), and total_in_cost().

◆ remove_redundant_permutations()

void vect_optimize_slp_pass::remove_redundant_permutations ( )
private
Elide load permutations that are not necessary. Such permutations might be pre-existing, rather than created by the layout optimizations.

References as_a(), DR_GROUP_FIRST_ELEMENT, DR_GROUP_GAP, DR_GROUP_NEXT_ELEMENT, DR_GROUP_SIZE, FOR_EACH_VEC_ELT, is_a(), known_eq, LOOP_VINFO_VECT_FACTOR, m_leafs, m_vertices, m_vinfo, NULL, SLP_TREE_LANES, SLP_TREE_LOAD_PERMUTATION, SLP_TREE_SCALAR_STMTS, and STMT_VINFO_GROUPED_ACCESS.

Referenced by materialize(), and run().

◆ run()

void vect_optimize_slp_pass::run ( )

◆ start_choosing_layouts()

◆ total_in_cost()

slpg_layout_cost vect_optimize_slp_pass::total_in_cost ( unsigned int node_i)
private
Return the incoming costs for node NODE_I, assuming that each input keeps its current (provisional) choice of layout. The inputs do not necessarily have the same layout as each other.

References add_cost(), slpg_layout_cost::add_parallel_cost(), slpg_layout_cost::add_serial_cost(), for_each_partition_edge(), m_partitions, m_vertices, partition_layout_costs(), and slpg_layout_cost::split().

Referenced by forward_pass().

Field Documentation

◆ m_leafs

auto_vec<int> vect_optimize_slp_pass::m_leafs
private

◆ m_node_layouts

auto_vec<slp_tree> vect_optimize_slp_pass::m_node_layouts
private

◆ m_optimize_size

◆ m_partition_layout_costs

auto_vec<slpg_partition_layout_costs> vect_optimize_slp_pass::m_partition_layout_costs
private

◆ m_partitioned_nodes

auto_vec<unsigned int> vect_optimize_slp_pass::m_partitioned_nodes
private

◆ m_partitions

◆ m_perms

◆ m_slpg

graph* vect_optimize_slp_pass::m_slpg = nullptr
private

◆ m_vertices

◆ m_vinfo


The documentation for this class was generated from the following file: