LCOV - code coverage report
Current view: top level - gcc - tree-loop-distribution.cc (source / functions) Coverage Total Hit
Test: gcc.info Lines: 92.3 % 1689 1559
Test Date: 2026-03-28 14:25:54 Functions: 90.8 % 87 79
Legend: Lines:     hit not hit

            Line data    Source code
       1              : /* Loop distribution.
       2              :    Copyright (C) 2006-2026 Free Software Foundation, Inc.
       3              :    Contributed by Georges-Andre Silber <Georges-Andre.Silber@ensmp.fr>
       4              :    and Sebastian Pop <sebastian.pop@amd.com>.
       5              : 
       6              : This file is part of GCC.
       7              : 
       8              : GCC is free software; you can redistribute it and/or modify it
       9              : under the terms of the GNU General Public License as published by the
      10              : Free Software Foundation; either version 3, or (at your option) any
      11              : later version.
      12              : 
      13              : GCC is distributed in the hope that it will be useful, but WITHOUT
      14              : ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
      15              : FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
      16              : for more details.
      17              : 
      18              : You should have received a copy of the GNU General Public License
      19              : along with GCC; see the file COPYING3.  If not see
      20              : <http://www.gnu.org/licenses/>.  */
      21              : 
      22              : /* This pass performs loop distribution: for example, the loop
      23              : 
      24              :    |DO I = 2, N
      25              :    |    A(I) = B(I) + C
      26              :    |    D(I) = A(I-1)*E
      27              :    |ENDDO
      28              : 
      29              :    is transformed to
      30              : 
      31              :    |DOALL I = 2, N
      32              :    |   A(I) = B(I) + C
      33              :    |ENDDO
      34              :    |
      35              :    |DOALL I = 2, N
      36              :    |   D(I) = A(I-1)*E
      37              :    |ENDDO
      38              : 
      39              :    Loop distribution is the dual of loop fusion.  It separates statements
      40              :    of a loop (or loop nest) into multiple loops (or loop nests) with the
      41              :    same loop header.  The major goal is to separate statements which may
      42              :    be vectorized from those that can't.  This pass implements distribution
      43              :    in the following steps:
      44              : 
      45              :      1) Seed partitions with specific type statements.  For now we support
      46              :         two types seed statements: statement defining variable used outside
      47              :         of loop; statement storing to memory.
      48              :      2) Build reduced dependence graph (RDG) for loop to be distributed.
      49              :         The vertices (RDG:V) model all statements in the loop and the edges
      50              :         (RDG:E) model flow and control dependencies between statements.
      51              :      3) Apart from RDG, compute data dependencies between memory references.
      52              :      4) Starting from seed statement, build up partition by adding depended
      53              :         statements according to RDG's dependence information.  Partition is
      54              :         classified as parallel type if it can be executed paralleled; or as
      55              :         sequential type if it can't.  Parallel type partition is further
      56              :         classified as different builtin kinds if it can be implemented as
      57              :         builtin function calls.
      58              :      5) Build partition dependence graph (PG) based on data dependencies.
      59              :         The vertices (PG:V) model all partitions and the edges (PG:E) model
      60              :         all data dependencies between every partitions pair.  In general,
      61              :         data dependence is either compilation time known or unknown.  In C
      62              :         family languages, there exists quite amount compilation time unknown
      63              :         dependencies because of possible alias relation of data references.
      64              :         We categorize PG's edge to two types: "true" edge that represents
      65              :         compilation time known data dependencies; "alias" edge for all other
      66              :         data dependencies.
      67              :      6) Traverse subgraph of PG as if all "alias" edges don't exist.  Merge
      68              :         partitions in each strong connected component (SCC) correspondingly.
      69              :         Build new PG for merged partitions.
      70              :      7) Traverse PG again and this time with both "true" and "alias" edges
      71              :         included.  We try to break SCCs by removing some edges.  Because
      72              :         SCCs by "true" edges are all fused in step 6), we can break SCCs
      73              :         by removing some "alias" edges.  It's NP-hard to choose optimal
      74              :         edge set, fortunately simple approximation is good enough for us
      75              :         given the small problem scale.
      76              :      8) Collect all data dependencies of the removed "alias" edges.  Create
      77              :         runtime alias checks for collected data dependencies.
      78              :      9) Version loop under the condition of runtime alias checks.  Given
      79              :         loop distribution generally introduces additional overhead, it is
      80              :         only useful if vectorization is achieved in distributed loop.  We
      81              :         version loop with internal function call IFN_LOOP_DIST_ALIAS.  If
      82              :         no distributed loop can be vectorized, we simply remove distributed
      83              :         loops and recover to the original one.
      84              : 
      85              :    TODO:
      86              :      1) We only distribute innermost two-level loop nest now.  We should
      87              :         extend it for arbitrary loop nests in the future.
      88              :      2) We only fuse partitions in SCC now.  A better fusion algorithm is
      89              :         desired to minimize loop overhead, maximize parallelism and maximize
      90              :         data reuse.  */
      91              : 
      92              : #include "config.h"
      93              : #include "system.h"
      94              : #include "coretypes.h"
      95              : #include "backend.h"
      96              : #include "tree.h"
      97              : #include "gimple.h"
      98              : #include "cfghooks.h"
      99              : #include "tree-pass.h"
     100              : #include "ssa.h"
     101              : #include "gimple-pretty-print.h"
     102              : #include "fold-const.h"
     103              : #include "cfganal.h"
     104              : #include "gimple-iterator.h"
     105              : #include "gimplify-me.h"
     106              : #include "stor-layout.h"
     107              : #include "tree-cfg.h"
     108              : #include "tree-ssa-loop-manip.h"
     109              : #include "tree-ssa-loop-ivopts.h"
     110              : #include "tree-ssa-loop.h"
     111              : #include "tree-into-ssa.h"
     112              : #include "tree-ssa.h"
     113              : #include "cfgloop.h"
     114              : #include "tree-scalar-evolution.h"
     115              : #include "tree-vectorizer.h"
     116              : #include "tree-eh.h"
     117              : #include "gimple-fold.h"
     118              : #include "tree-affine.h"
     119              : #include "intl.h"
     120              : #include "rtl.h"
     121              : #include "memmodel.h"
     122              : #include "optabs.h"
     123              : #include "tree-ssa-loop-niter.h"
     124              : 
     125              : 
     126              : #define MAX_DATAREFS_NUM \
     127              :         ((unsigned) param_loop_max_datarefs_for_datadeps)
     128              : 
     129              : /* Threshold controlling number of distributed partitions.  Given it may
     130              :    be unnecessary if a memory stream cost model is invented in the future,
     131              :    we define it as a temporary macro, rather than a parameter.  */
     132              : #define NUM_PARTITION_THRESHOLD (4)
     133              : 
     134              : /* Hashtable helpers.  */
     135              : 
     136              : struct ddr_hasher : nofree_ptr_hash <struct data_dependence_relation>
     137              : {
     138              :   static inline hashval_t hash (const data_dependence_relation *);
     139              :   static inline bool equal (const data_dependence_relation *,
     140              :                             const data_dependence_relation *);
     141              : };
     142              : 
     143              : /* Hash function for data dependence.  */
     144              : 
     145              : inline hashval_t
     146      9250433 : ddr_hasher::hash (const data_dependence_relation *ddr)
     147              : {
     148      9250433 :   inchash::hash h;
     149      9250433 :   h.add_ptr (DDR_A (ddr));
     150      9250433 :   h.add_ptr (DDR_B (ddr));
     151      9250433 :   return h.end ();
     152              : }
     153              : 
     154              : /* Hash table equality function for data dependence.  */
     155              : 
     156              : inline bool
     157      8266834 : ddr_hasher::equal (const data_dependence_relation *ddr1,
     158              :                    const data_dependence_relation *ddr2)
     159              : {
     160      8266834 :   return (DDR_A (ddr1) == DDR_A (ddr2) && DDR_B (ddr1) == DDR_B (ddr2));
     161              : }
     162              : 
     163              : 
     164              : 
     165              : #define DR_INDEX(dr)      ((uintptr_t) (dr)->aux)
     166              : 
     167              : /* A Reduced Dependence Graph (RDG) vertex representing a statement.  */
     168              : struct rdg_vertex
     169              : {
     170              :   /* The statement represented by this vertex.  */
     171              :   gimple *stmt;
     172              : 
     173              :   /* Vector of data-references in this statement.  */
     174              :   vec<data_reference_p> datarefs;
     175              : 
     176              :   /* True when the statement contains a write to memory.  */
     177              :   bool has_mem_write;
     178              : 
     179              :   /* True when the statement contains a read from memory.  */
     180              :   bool has_mem_reads;
     181              : };
     182              : 
     183              : #define RDGV_STMT(V)     ((struct rdg_vertex *) ((V)->data))->stmt
     184              : #define RDGV_DATAREFS(V) ((struct rdg_vertex *) ((V)->data))->datarefs
     185              : #define RDGV_HAS_MEM_WRITE(V) ((struct rdg_vertex *) ((V)->data))->has_mem_write
     186              : #define RDGV_HAS_MEM_READS(V) ((struct rdg_vertex *) ((V)->data))->has_mem_reads
     187              : #define RDG_STMT(RDG, I) RDGV_STMT (&(RDG->vertices[I]))
     188              : #define RDG_DATAREFS(RDG, I) RDGV_DATAREFS (&(RDG->vertices[I]))
     189              : #define RDG_MEM_WRITE_STMT(RDG, I) RDGV_HAS_MEM_WRITE (&(RDG->vertices[I]))
     190              : #define RDG_MEM_READS_STMT(RDG, I) RDGV_HAS_MEM_READS (&(RDG->vertices[I]))
     191              : 
     192              : /* Data dependence type.  */
     193              : 
     194              : enum rdg_dep_type
     195              : {
     196              :   /* Read After Write (RAW).  */
     197              :   flow_dd = 'f',
     198              : 
     199              :   /* Control dependence (execute conditional on).  */
     200              :   control_dd = 'c'
     201              : };
     202              : 
     203              : /* Dependence information attached to an edge of the RDG.  */
     204              : 
     205              : struct rdg_edge
     206              : {
     207              :   /* Type of the dependence.  */
     208              :   enum rdg_dep_type type;
     209              : };
     210              : 
     211              : #define RDGE_TYPE(E)        ((struct rdg_edge *) ((E)->data))->type
     212              : 
     213              : /* Kind of distributed loop.  */
     214              : enum partition_kind {
     215              :     PKIND_NORMAL,
     216              :     /* Partial memset stands for a paritition can be distributed into a loop
     217              :        of memset calls, rather than a single memset call.  It's handled just
     218              :        like a normal parition, i.e, distributed as separate loop, no memset
     219              :        call is generated.
     220              : 
     221              :        Note: This is a hacking fix trying to distribute ZERO-ing stmt in a
     222              :        loop nest as deep as possible.  As a result, parloop achieves better
     223              :        parallelization by parallelizing deeper loop nest.  This hack should
     224              :        be unnecessary and removed once distributed memset can be understood
     225              :        and analyzed in data reference analysis.  See PR82604 for more.  */
     226              :     PKIND_PARTIAL_MEMSET,
     227              :     PKIND_MEMSET, PKIND_MEMCPY, PKIND_MEMMOVE
     228              : };
     229              : 
     230              : /* Type of distributed loop.  */
     231              : enum partition_type {
     232              :     /* The distributed loop can be executed parallelly.  */
     233              :     PTYPE_PARALLEL = 0,
     234              :     /* The distributed loop has to be executed sequentially.  */
     235              :     PTYPE_SEQUENTIAL
     236              : };
     237              : 
     238              : /* Builtin info for loop distribution.  */
     239              : struct builtin_info
     240              : {
     241              :   /* data-references a kind != PKIND_NORMAL partition is about.  */
     242              :   data_reference_p dst_dr;
     243              :   data_reference_p src_dr;
     244              :   /* Base address and size of memory objects operated by the builtin.  Note
     245              :      both dest and source memory objects must have the same size.  */
     246              :   tree dst_base;
     247              :   tree src_base;
     248              :   tree size;
     249              :   /* Base and offset part of dst_base after stripping constant offset.  This
     250              :      is only used in memset builtin distribution for now.  */
     251              :   tree dst_base_base;
     252              :   unsigned HOST_WIDE_INT dst_base_offset;
     253              : };
     254              : 
     255              : /* Partition for loop distribution.  */
     256              : struct partition
     257              : {
     258              :   /* Statements of the partition.  */
     259              :   bitmap stmts;
     260              :   /* True if the partition defines variable which is used outside of loop.  */
     261              :   bool reduction_p;
     262              :   location_t loc;
     263              :   enum partition_kind kind;
     264              :   enum partition_type type;
     265              :   /* Data references in the partition.  */
     266              :   bitmap datarefs;
     267              :   /* Information of builtin parition.  */
     268              :   struct builtin_info *builtin;
     269              : };
     270              : 
     271              : /* Partitions are fused because of different reasons.  */
     272              : enum fuse_type
     273              : {
     274              :   FUSE_NON_BUILTIN = 0,
     275              :   FUSE_REDUCTION = 1,
     276              :   FUSE_SHARE_REF = 2,
     277              :   FUSE_SAME_SCC = 3,
     278              :   FUSE_FINALIZE = 4
     279              : };
     280              : 
     281              : /* Description on different fusing reason.  */
     282              : static const char *fuse_message[] = {
     283              :   "they are non-builtins",
     284              :   "they have reductions",
     285              :   "they have shared memory refs",
     286              :   "they are in the same dependence scc",
     287              :   "there is no point to distribute loop"};
     288              : 
     289              : 
     290              : /* Dump vertex I in RDG to FILE.  */
     291              : 
     292              : static void
     293          871 : dump_rdg_vertex (FILE *file, struct graph *rdg, int i)
     294              : {
     295          871 :   struct vertex *v = &(rdg->vertices[i]);
     296          871 :   struct graph_edge *e;
     297              : 
     298          871 :   fprintf (file, "(vertex %d: (%s%s) (in:", i,
     299          871 :            RDG_MEM_WRITE_STMT (rdg, i) ? "w" : "",
     300          871 :            RDG_MEM_READS_STMT (rdg, i) ? "r" : "");
     301              : 
     302          871 :   if (v->pred)
     303         2903 :     for (e = v->pred; e; e = e->pred_next)
     304         2032 :       fprintf (file, " %d", e->src);
     305              : 
     306          871 :   fprintf (file, ") (out:");
     307              : 
     308          871 :   if (v->succ)
     309         2758 :     for (e = v->succ; e; e = e->succ_next)
     310         2032 :       fprintf (file, " %d", e->dest);
     311              : 
     312          871 :   fprintf (file, ")\n");
     313          871 :   print_gimple_stmt (file, RDGV_STMT (v), 0, TDF_VOPS|TDF_MEMSYMS);
     314          871 :   fprintf (file, ")\n");
     315          871 : }
     316              : 
     317              : /* Call dump_rdg_vertex on stderr.  */
     318              : 
     319              : DEBUG_FUNCTION void
     320            0 : debug_rdg_vertex (struct graph *rdg, int i)
     321              : {
     322            0 :   dump_rdg_vertex (stderr, rdg, i);
     323            0 : }
     324              : 
     325              : /* Dump the reduced dependence graph RDG to FILE.  */
     326              : 
     327              : static void
     328           70 : dump_rdg (FILE *file, struct graph *rdg)
     329              : {
     330           70 :   fprintf (file, "(rdg\n");
     331          941 :   for (int i = 0; i < rdg->n_vertices; i++)
     332          871 :     dump_rdg_vertex (file, rdg, i);
     333           70 :   fprintf (file, ")\n");
     334           70 : }
     335              : 
     336              : /* Call dump_rdg on stderr.  */
     337              : 
     338              : DEBUG_FUNCTION void
     339            0 : debug_rdg (struct graph *rdg)
     340              : {
     341            0 :   dump_rdg (stderr, rdg);
     342            0 : }
     343              : 
     344              : static void
     345            0 : dot_rdg_1 (FILE *file, struct graph *rdg)
     346              : {
     347            0 :   int i;
     348            0 :   pretty_printer pp;
     349            0 :   pp_needs_newline (&pp) = false;
     350            0 :   pp.set_output_stream (file);
     351              : 
     352            0 :   fprintf (file, "digraph RDG {\n");
     353              : 
     354            0 :   for (i = 0; i < rdg->n_vertices; i++)
     355              :     {
     356            0 :       struct vertex *v = &(rdg->vertices[i]);
     357            0 :       struct graph_edge *e;
     358              : 
     359            0 :       fprintf (file, "%d [label=\"[%d] ", i, i);
     360            0 :       pp_gimple_stmt_1 (&pp, RDGV_STMT (v), 0, TDF_SLIM);
     361            0 :       pp_flush (&pp);
     362            0 :       fprintf (file, "\"]\n");
     363              : 
     364              :       /* Highlight reads from memory.  */
     365            0 :       if (RDG_MEM_READS_STMT (rdg, i))
     366            0 :        fprintf (file, "%d [style=filled, fillcolor=green]\n", i);
     367              : 
     368              :       /* Highlight stores to memory.  */
     369            0 :       if (RDG_MEM_WRITE_STMT (rdg, i))
     370            0 :        fprintf (file, "%d [style=filled, fillcolor=red]\n", i);
     371              : 
     372            0 :       if (v->succ)
     373            0 :        for (e = v->succ; e; e = e->succ_next)
     374            0 :          switch (RDGE_TYPE (e))
     375              :            {
     376            0 :            case flow_dd:
     377              :              /* These are the most common dependences: don't print these. */
     378            0 :              fprintf (file, "%d -> %d \n", i, e->dest);
     379            0 :              break;
     380              : 
     381            0 :            case control_dd:
     382            0 :              fprintf (file, "%d -> %d [label=control] \n", i, e->dest);
     383            0 :              break;
     384              : 
     385            0 :            default:
     386            0 :              gcc_unreachable ();
     387              :            }
     388              :     }
     389              : 
     390            0 :   fprintf (file, "}\n\n");
     391            0 : }
     392              : 
     393              : /* Display the Reduced Dependence Graph using dotty.  */
     394              : 
     395              : DEBUG_FUNCTION void
     396            0 : dot_rdg (struct graph *rdg)
     397              : {
     398              :   /* When debugging, you may want to enable the following code.  */
     399              : #ifdef HAVE_POPEN
     400            0 :   FILE *file = popen ("dot -Tx11", "w");
     401            0 :   if (!file)
     402              :     return;
     403            0 :   dot_rdg_1 (file, rdg);
     404            0 :   fflush (file);
     405            0 :   close (fileno (file));
     406            0 :   pclose (file);
     407              : #else
     408              :   dot_rdg_1 (stderr, rdg);
     409              : #endif
     410              : }
     411              : 
     412              : /* Returns the index of STMT in RDG.  */
     413              : 
     414              : static int
     415     13271180 : rdg_vertex_for_stmt (struct graph *rdg ATTRIBUTE_UNUSED, gimple *stmt)
     416              : {
     417     13271180 :   int index = gimple_uid (stmt);
     418     13271180 :   gcc_checking_assert (index == -1 || RDG_STMT (rdg, index) == stmt);
     419     13271180 :   return index;
     420              : }
     421              : 
     422              : /* Creates dependence edges in RDG for all the uses of DEF.  IDEF is
     423              :    the index of DEF in RDG.  */
     424              : 
     425              : static void
     426      1786758 : create_rdg_edges_for_scalar (struct graph *rdg, tree def, int idef)
     427              : {
     428      1786758 :   use_operand_p imm_use_p;
     429      1786758 :   imm_use_iterator iterator;
     430              : 
     431      6643883 :   FOR_EACH_IMM_USE_FAST (imm_use_p, iterator, def)
     432              :     {
     433      3070367 :       struct graph_edge *e;
     434      3070367 :       int use = rdg_vertex_for_stmt (rdg, USE_STMT (imm_use_p));
     435              : 
     436      3070367 :       if (use < 0)
     437       444496 :         continue;
     438              : 
     439      2625871 :       e = add_edge (rdg, idef, use);
     440      2625871 :       e->data = XNEW (struct rdg_edge);
     441      2625871 :       RDGE_TYPE (e) = flow_dd;
     442      1786758 :     }
     443      1786758 : }
     444              : 
     445              : /* Creates an edge for the control dependences of BB to the vertex V.  */
     446              : 
     447              : static void
     448      2278223 : create_edge_for_control_dependence (struct graph *rdg, basic_block bb,
     449              :                                     int v, control_dependences *cd)
     450              : {
     451      2278223 :   bitmap_iterator bi;
     452      2278223 :   unsigned edge_n;
     453      6482850 :   EXECUTE_IF_SET_IN_BITMAP (cd->get_edges_dependent_on (bb->index),
     454              :                             0, edge_n, bi)
     455              :     {
     456      4204627 :       basic_block cond_bb = cd->get_edge_src (edge_n);
     457      4204627 :       gimple *stmt = *gsi_last_bb (cond_bb);
     458      4204627 :       if (stmt && is_ctrl_stmt (stmt))
     459              :         {
     460      3771610 :           struct graph_edge *e;
     461      3771610 :           int c = rdg_vertex_for_stmt (rdg, stmt);
     462      3771610 :           if (c < 0)
     463      1238288 :             continue;
     464              : 
     465      2533322 :           e = add_edge (rdg, c, v);
     466      2533322 :           e->data = XNEW (struct rdg_edge);
     467      2533322 :           RDGE_TYPE (e) = control_dd;
     468              :         }
     469              :     }
     470      2278223 : }
     471              : 
     472              : /* Creates the edges of the reduced dependence graph RDG.  */
     473              : 
     474              : static void
     475       144954 : create_rdg_flow_edges (struct graph *rdg)
     476              : {
     477       144954 :   int i;
     478       144954 :   def_operand_p def_p;
     479       144954 :   ssa_op_iter iter;
     480              : 
     481      2354741 :   for (i = 0; i < rdg->n_vertices; i++)
     482      6206332 :     FOR_EACH_PHI_OR_STMT_DEF (def_p, RDG_STMT (rdg, i),
     483              :                               iter, SSA_OP_DEF)
     484      1786758 :       create_rdg_edges_for_scalar (rdg, DEF_FROM_PTR (def_p), i);
     485       144954 : }
     486              : 
     487              : /* Creates the edges of the reduced dependence graph RDG.  */
     488              : 
     489              : static void
     490       137515 : create_rdg_cd_edges (struct graph *rdg, control_dependences *cd, loop_p loop)
     491              : {
     492       137515 :   int i;
     493              : 
     494      2297718 :   for (i = 0; i < rdg->n_vertices; i++)
     495              :     {
     496      2160203 :       gimple *stmt = RDG_STMT (rdg, i);
     497      2160203 :       if (gimple_code (stmt) == GIMPLE_PHI)
     498              :         {
     499       421826 :           edge_iterator ei;
     500       421826 :           edge e;
     501      1263793 :           FOR_EACH_EDGE (e, ei, gimple_bb (stmt)->preds)
     502       841967 :             if (flow_bb_inside_loop_p (loop, e->src))
     503       539846 :               create_edge_for_control_dependence (rdg, e->src, i, cd);
     504              :         }
     505              :       else
     506      1738377 :         create_edge_for_control_dependence (rdg, gimple_bb (stmt), i, cd);
     507              :     }
     508       137515 : }
     509              : 
     510              : 
     511              : class loop_distribution
     512              : {
     513              :   private:
     514              :   /* The loop (nest) to be distributed.  */
     515              :   vec<loop_p> loop_nest;
     516              : 
     517              :   /* Vector of data references in the loop to be distributed.  */
     518              :   vec<data_reference_p> datarefs_vec;
     519              : 
     520              :   /* If there is nonaddressable data reference in above vector.  */
     521              :   bool has_nonaddressable_dataref_p;
     522              : 
     523              :   /* Store index of data reference in aux field.  */
     524              : 
     525              :   /* Hash table for data dependence relation in the loop to be distributed.  */
     526              :   hash_table<ddr_hasher> *ddrs_table;
     527              : 
     528              :   /* Array mapping basic block's index to its topological order.  */
     529              :   int *bb_top_order_index;
     530              :   /* And size of the array.  */
     531              :   int bb_top_order_index_size;
     532              : 
     533              :   /* Build the vertices of the reduced dependence graph RDG.  Return false
     534              :      if that failed.  */
     535              :   bool create_rdg_vertices (struct graph *rdg, const vec<gimple *> &stmts,
     536              :                             loop_p loop);
     537              : 
     538              :   /* Initialize STMTS with all the statements of LOOP.  We use topological
     539              :      order to discover all statements.  The order is important because
     540              :      generate_loops_for_partition is using the same traversal for identifying
     541              :      statements in loop copies.  */
     542              :   void stmts_from_loop (class loop *loop, vec<gimple *> *stmts);
     543              : 
     544              : 
     545              :   /* Build the Reduced Dependence Graph (RDG) with one vertex per statement of
     546              :      LOOP, and one edge per flow dependence or control dependence from control
     547              :      dependence CD.  During visiting each statement, data references are also
     548              :      collected and recorded in global data DATAREFS_VEC.  */
     549              :   struct graph * build_rdg (class loop *loop, control_dependences *cd);
     550              : 
     551              : /* Merge PARTITION into the partition DEST.  RDG is the reduced dependence
     552              :    graph and we update type for result partition if it is non-NULL.  */
     553              :   void partition_merge_into (struct graph *rdg,
     554              :                              partition *dest, partition *partition,
     555              :                              enum fuse_type ft);
     556              : 
     557              : 
     558              :   /* Return data dependence relation for data references A and B.  The two
     559              :      data references must be in lexicographic order wrto reduced dependence
     560              :      graph RDG.  We firstly try to find ddr from global ddr hash table.  If
     561              :      it doesn't exist, compute the ddr and cache it.  */
     562              :   data_dependence_relation * get_data_dependence (struct graph *rdg,
     563              :                                                   data_reference_p a,
     564              :                                                   data_reference_p b);
     565              : 
     566              : 
     567              :   /* In reduced dependence graph RDG for loop distribution, return true if
     568              :      dependence between references DR1 and DR2 leads to a dependence cycle
     569              :      and such dependence cycle can't be resolved by runtime alias check.  */
     570              :   bool data_dep_in_cycle_p (struct graph *rdg, data_reference_p dr1,
     571              :                             data_reference_p dr2);
     572              : 
     573              : 
     574              :   /* Given reduced dependence graph RDG, PARTITION1 and PARTITION2, update
     575              :      PARTITION1's type after merging PARTITION2 into PARTITION1.  */
     576              :   void update_type_for_merge (struct graph *rdg,
     577              :                               partition *partition1, partition *partition2);
     578              : 
     579              : 
     580              :   /* Returns a partition with all the statements needed for computing
     581              :      the vertex V of the RDG, also including the loop exit conditions.  */
     582              :   partition *build_rdg_partition_for_vertex (struct graph *rdg, int v);
     583              : 
     584              :   /* Given data references DST_DR and SRC_DR in loop nest LOOP and RDG, classify
     585              :      if it forms builtin memcpy or memmove call.  */
     586              :   void classify_builtin_ldst (loop_p loop, struct graph *rdg, partition *partition,
     587              :                               data_reference_p dst_dr, data_reference_p src_dr);
     588              : 
     589              :   /* Classifies the builtin kind we can generate for PARTITION of RDG and LOOP.
     590              :      For the moment we detect memset, memcpy and memmove patterns.  Bitmap
     591              :      STMT_IN_ALL_PARTITIONS contains statements belonging to all partitions.
     592              :      Returns true if there is a reduction in all partitions and we
     593              :      possibly did not mark PARTITION as having one for this reason.  */
     594              : 
     595              :   bool
     596              :   classify_partition (loop_p loop,
     597              :                       struct graph *rdg, partition *partition,
     598              :                       bitmap stmt_in_all_partitions);
     599              : 
     600              : 
     601              :   /* Returns true when PARTITION1 and PARTITION2 access the same memory
     602              :      object in RDG.  */
     603              :   bool share_memory_accesses (struct graph *rdg,
     604              :                               partition *partition1, partition *partition2);
     605              : 
     606              :   /* For each seed statement in STARTING_STMTS, this function builds
     607              :      partition for it by adding depended statements according to RDG.
     608              :      All partitions are recorded in PARTITIONS.  */
     609              :   void rdg_build_partitions (struct graph *rdg,
     610              :                              vec<gimple *> starting_stmts,
     611              :                              vec<partition *> *partitions);
     612              : 
     613              :   /* Compute partition dependence created by the data references in DRS1
     614              :      and DRS2, modify and return DIR according to that.  IF ALIAS_DDR is
     615              :      not NULL, we record dependence introduced by possible alias between
     616              :      two data references in ALIAS_DDRS; otherwise, we simply ignore such
     617              :      dependence as if it doesn't exist at all.  */
     618              :   int pg_add_dependence_edges (struct graph *rdg, int dir, bitmap drs1,
     619              :                                bitmap drs2, vec<ddr_p> *alias_ddrs);
     620              : 
     621              : 
     622              :   /* Build and return partition dependence graph for PARTITIONS.  RDG is
     623              :      reduced dependence graph for the loop to be distributed.  If IGNORE_ALIAS_P
     624              :      is true, data dependence caused by possible alias between references
     625              :      is ignored, as if it doesn't exist at all; otherwise all depdendences
     626              :      are considered.  */
     627              :   struct graph *build_partition_graph (struct graph *rdg,
     628              :                                        vec<struct partition *> *partitions,
     629              :                                        bool ignore_alias_p);
     630              : 
     631              :   /* Given reduced dependence graph RDG merge strong connected components
     632              :      of PARTITIONS.  If IGNORE_ALIAS_P is true, data dependence caused by
     633              :      possible alias between references is ignored, as if it doesn't exist
     634              :      at all; otherwise all depdendences are considered.  */
     635              :   void merge_dep_scc_partitions (struct graph *rdg, vec<struct partition *>
     636              :                                  *partitions, bool ignore_alias_p);
     637              : 
     638              : /* This is the main function breaking strong conected components in
     639              :    PARTITIONS giving reduced depdendence graph RDG.  Store data dependence
     640              :    relations for runtime alias check in ALIAS_DDRS.  */
     641              :   void break_alias_scc_partitions (struct graph *rdg, vec<struct partition *>
     642              :                                    *partitions, vec<ddr_p> *alias_ddrs);
     643              : 
     644              : 
     645              :   /* Fuse PARTITIONS of LOOP if necessary before finalizing distribution.
     646              :      ALIAS_DDRS contains ddrs which need runtime alias check.  */
     647              :   void finalize_partitions (class loop *loop, vec<struct partition *>
     648              :                             *partitions, vec<ddr_p> *alias_ddrs);
     649              : 
     650              :   /* Distributes the code from LOOP in such a way that producer statements
     651              :      are placed before consumer statements.  Tries to separate only the
     652              :      statements from STMTS into separate loops.  Returns the number of
     653              :      distributed loops.  Set NB_CALLS to number of generated builtin calls.
     654              :      Set *DESTROY_P to whether LOOP needs to be destroyed.  */
     655              :   int distribute_loop (class loop *loop, const vec<gimple *> &stmts,
     656              :                        control_dependences *cd, int *nb_calls, bool *destroy_p,
     657              :                        bool only_patterns_p);
     658              : 
     659              :   /* Transform loops which mimic the effects of builtins rawmemchr or strlen and
     660              :      replace them accordingly.  */
     661              :   bool transform_reduction_loop (loop_p loop);
     662              : 
     663              :   /* Compute topological order for basic blocks.  Topological order is
     664              :      needed because data dependence is computed for data references in
     665              :      lexicographical order.  */
     666              :   void bb_top_order_init (void);
     667              : 
     668              :   void bb_top_order_destroy (void);
     669              : 
     670              :   public:
     671              : 
     672              :   /* Getter for bb_top_order.  */
     673              : 
     674      2978154 :   inline int get_bb_top_order_index_size (void)
     675              :     {
     676      2978154 :       return bb_top_order_index_size;
     677              :     }
     678              : 
     679      5956308 :   inline int get_bb_top_order_index (int i)
     680              :     {
     681      5956308 :       return bb_top_order_index[i];
     682              :     }
     683              : 
     684              :   unsigned int execute (function *fun);
     685              : };
     686              : 
     687              : 
     688              : /* If X has a smaller topological sort number than Y, returns -1;
     689              :    if greater, returns 1.  */
     690              : static int
     691      2978154 : bb_top_order_cmp_r (const void *x, const void *y, void *loop)
     692              : {
     693      2978154 :   loop_distribution *_loop =
     694              :     (loop_distribution *) loop;
     695              : 
     696      2978154 :   basic_block bb1 = *(const basic_block *) x;
     697      2978154 :   basic_block bb2 = *(const basic_block *) y;
     698              : 
     699      2978154 :   int bb_top_order_index_size = _loop->get_bb_top_order_index_size ();
     700              : 
     701      2978154 :   gcc_assert (bb1->index < bb_top_order_index_size
     702              :               && bb2->index < bb_top_order_index_size);
     703      2978154 :   gcc_assert (bb1 == bb2
     704              :               || _loop->get_bb_top_order_index(bb1->index)
     705              :                  != _loop->get_bb_top_order_index(bb2->index));
     706              : 
     707      2978154 :   return (_loop->get_bb_top_order_index(bb1->index) -
     708      2978154 :           _loop->get_bb_top_order_index(bb2->index));
     709              : }
     710              : 
     711              : bool
     712       148171 : loop_distribution::create_rdg_vertices (struct graph *rdg,
     713              :                                         const vec<gimple *> &stmts,
     714              :                                         loop_p loop)
     715              : {
     716       148171 :   int i;
     717       148171 :   gimple *stmt;
     718              : 
     719      2380027 :   FOR_EACH_VEC_ELT (stmts, i, stmt)
     720              :     {
     721      2235073 :       struct vertex *v = &(rdg->vertices[i]);
     722              : 
     723              :       /* Record statement to vertex mapping.  */
     724      2235073 :       gimple_set_uid (stmt, i);
     725              : 
     726      2235073 :       v->data = XNEW (struct rdg_vertex);
     727      2235073 :       RDGV_STMT (v) = stmt;
     728      2235073 :       RDGV_DATAREFS (v).create (0);
     729      2235073 :       RDGV_HAS_MEM_WRITE (v) = false;
     730      2235073 :       RDGV_HAS_MEM_READS (v) = false;
     731      2235073 :       if (gimple_code (stmt) == GIMPLE_PHI)
     732       445234 :         continue;
     733              : 
     734      1789839 :       unsigned drp = datarefs_vec.length ();
     735      1789839 :       if (!find_data_references_in_stmt (loop, stmt, &datarefs_vec))
     736              :         return false;
     737      2678139 :       for (unsigned j = drp; j < datarefs_vec.length (); ++j)
     738              :         {
     739       446283 :           data_reference_p dr = datarefs_vec[j];
     740       446283 :           if (DR_IS_READ (dr))
     741       239743 :             RDGV_HAS_MEM_READS (v) = true;
     742              :           else
     743       206540 :             RDGV_HAS_MEM_WRITE (v) = true;
     744       446283 :           RDGV_DATAREFS (v).safe_push (dr);
     745       446283 :           has_nonaddressable_dataref_p |= may_be_nonaddressable_p (dr->ref);
     746              :         }
     747              :     }
     748              :   return true;
     749              : }
     750              : 
     751              : void
     752       148171 : loop_distribution::stmts_from_loop (class loop *loop, vec<gimple *> *stmts)
     753              : {
     754       148171 :   unsigned int i;
     755       148171 :   basic_block *bbs = get_loop_body_in_custom_order (loop, this, bb_top_order_cmp_r);
     756              : 
     757       631089 :   for (i = 0; i < loop->num_nodes; i++)
     758              :     {
     759       482918 :       basic_block bb = bbs[i];
     760              : 
     761      1074595 :       for (gphi_iterator bsi = gsi_start_phis (bb); !gsi_end_p (bsi);
     762       591677 :            gsi_next (&bsi))
     763      1183354 :         if (!virtual_operand_p (gimple_phi_result (bsi.phi ())))
     764       446472 :           stmts->safe_push (bsi.phi ());
     765              : 
     766      3809715 :       for (gimple_stmt_iterator bsi = gsi_start_bb (bb); !gsi_end_p (bsi);
     767      2843879 :            gsi_next (&bsi))
     768              :         {
     769      2843879 :           gimple *stmt = gsi_stmt (bsi);
     770      2843879 :           if (gimple_code (stmt) != GIMPLE_LABEL && !is_gimple_debug (stmt))
     771      1816417 :             stmts->safe_push (stmt);
     772              :         }
     773              :     }
     774              : 
     775       148171 :   free (bbs);
     776       148171 : }
     777              : 
     778              : /* Free the reduced dependence graph RDG.  */
     779              : 
     780              : static void
     781       148171 : free_rdg (struct graph *rdg, loop_p loop)
     782              : {
     783       148171 :   int i;
     784              : 
     785      2411060 :   for (i = 0; i < rdg->n_vertices; i++)
     786              :     {
     787      2262889 :       struct vertex *v = &(rdg->vertices[i]);
     788      2262889 :       struct graph_edge *e;
     789              : 
     790      7422082 :       for (e = v->succ; e; e = e->succ_next)
     791      5159193 :         free (e->data);
     792              : 
     793      2262889 :       if (v->data)
     794              :         {
     795      2235073 :           (RDGV_DATAREFS (v)).release ();
     796      2235073 :           free (v->data);
     797              :         }
     798              :     }
     799              : 
     800       148171 :   free_graph (rdg);
     801              : 
     802              :   /* Reset UIDs of stmts still in the loop.  */
     803       148171 :   basic_block *bbs = get_loop_body (loop);
     804       631089 :   for (unsigned i = 0; i < loop->num_nodes; ++i)
     805              :     {
     806       482918 :       basic_block bb = bbs[i];
     807       482918 :       gimple_stmt_iterator gsi;
     808      1074535 :       for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (&gsi))
     809       591617 :         gimple_set_uid (gsi_stmt (gsi), -1);
     810      3806623 :       for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
     811      2840787 :         gimple_set_uid (gsi_stmt (gsi), -1);
     812              :     }
     813       148171 :   free (bbs);
     814       148171 : }
     815              : 
     816              : struct graph *
     817       148171 : loop_distribution::build_rdg (class loop *loop, control_dependences *cd)
     818              : {
     819       148171 :   struct graph *rdg;
     820              : 
     821              :   /* Create the RDG vertices from the stmts of the loop nest.  */
     822       148171 :   auto_vec<gimple *, 10> stmts;
     823       148171 :   stmts_from_loop (loop, &stmts);
     824       296342 :   rdg = new_graph (stmts.length ());
     825       148171 :   if (!create_rdg_vertices (rdg, stmts, loop))
     826              :     {
     827         3217 :       free_rdg (rdg, loop);
     828         3217 :       return NULL;
     829              :     }
     830       144954 :   stmts.release ();
     831              : 
     832       144954 :   create_rdg_flow_edges (rdg);
     833       144954 :   if (cd)
     834       137515 :     create_rdg_cd_edges (rdg, cd, loop);
     835              : 
     836              :   return rdg;
     837       148171 : }
     838              : 
     839              : 
     840              : /* Allocate and initialize a partition from BITMAP.  */
     841              : 
     842              : static partition *
     843       246865 : partition_alloc (void)
     844              : {
     845       246865 :   partition *partition = XCNEW (struct partition);
     846       246865 :   partition->stmts = BITMAP_ALLOC (NULL);
     847       246865 :   partition->reduction_p = false;
     848       246865 :   partition->loc = UNKNOWN_LOCATION;
     849       246865 :   partition->kind = PKIND_NORMAL;
     850       246865 :   partition->type = PTYPE_PARALLEL;
     851       246865 :   partition->datarefs = BITMAP_ALLOC (NULL);
     852       246865 :   return partition;
     853              : }
     854              : 
     855              : /* Free PARTITION.  */
     856              : 
     857              : static void
     858       246865 : partition_free (partition *partition)
     859              : {
     860       246865 :   BITMAP_FREE (partition->stmts);
     861       246865 :   BITMAP_FREE (partition->datarefs);
     862       246865 :   if (partition->builtin)
     863        12855 :     free (partition->builtin);
     864              : 
     865       246865 :   free (partition);
     866       246865 : }
     867              : 
     868              : /* Returns true if the partition can be generated as a builtin.  */
     869              : 
     870              : static bool
     871       351988 : partition_builtin_p (partition *partition)
     872              : {
     873       351988 :   return partition->kind > PKIND_PARTIAL_MEMSET;
     874              : }
     875              : 
     876              : /* Returns true if the partition contains a reduction.  */
     877              : 
     878              : static bool
     879      2060554 : partition_reduction_p (partition *partition)
     880              : {
     881      2060554 :   return partition->reduction_p;
     882              : }
     883              : 
     884              : void
     885        34423 : loop_distribution::partition_merge_into (struct graph *rdg,
     886              :                       partition *dest, partition *partition, enum fuse_type ft)
     887              : {
     888        34423 :   if (dump_file && (dump_flags & TDF_DETAILS))
     889              :     {
     890           42 :       fprintf (dump_file, "Fuse partitions because %s:\n", fuse_message[ft]);
     891           42 :       fprintf (dump_file, "  Part 1: ");
     892           42 :       dump_bitmap (dump_file, dest->stmts);
     893           42 :       fprintf (dump_file, "  Part 2: ");
     894           42 :       dump_bitmap (dump_file, partition->stmts);
     895              :     }
     896              : 
     897        34423 :   dest->kind = PKIND_NORMAL;
     898        34423 :   if (dest->type == PTYPE_PARALLEL)
     899        24321 :     dest->type = partition->type;
     900              : 
     901        34423 :   bitmap_ior_into (dest->stmts, partition->stmts);
     902        34423 :   if (partition_reduction_p (partition))
     903         3569 :     dest->reduction_p = true;
     904              : 
     905              :   /* Further check if any data dependence prevents us from executing the
     906              :      new partition parallelly.  */
     907        34423 :   if (dest->type == PTYPE_PARALLEL && rdg != NULL)
     908         6870 :     update_type_for_merge (rdg, dest, partition);
     909              : 
     910        34423 :   bitmap_ior_into (dest->datarefs, partition->datarefs);
     911        34423 : }
     912              : 
     913              : 
     914              : /* Returns true when DEF is an SSA_NAME defined in LOOP and used after
     915              :    the LOOP.  */
     916              : 
     917              : static bool
     918      4859472 : ssa_name_has_uses_outside_loop_p (tree def, loop_p loop)
     919              : {
     920      4859472 :   imm_use_iterator imm_iter;
     921      4859472 :   use_operand_p use_p;
     922              : 
     923     19340695 :   FOR_EACH_IMM_USE_FAST (use_p, imm_iter, def)
     924              :     {
     925      9755899 :       if (is_gimple_debug (USE_STMT (use_p)))
     926       975897 :         continue;
     927              : 
     928      8780002 :       basic_block use_bb = gimple_bb (USE_STMT (use_p));
     929      8780002 :       if (!flow_bb_inside_loop_p (loop, use_bb))
     930       134148 :         return true;
     931       134148 :     }
     932              : 
     933      4725324 :   return false;
     934              : }
     935              : 
     936              : /* Returns true when STMT defines a scalar variable used after the
     937              :    loop LOOP.  */
     938              : 
     939              : static bool
     940      6980424 : stmt_has_scalar_dependences_outside_loop (loop_p loop, gimple *stmt)
     941              : {
     942      6980424 :   def_operand_p def_p;
     943      6980424 :   ssa_op_iter op_iter;
     944              : 
     945      6980424 :   if (gimple_code (stmt) == GIMPLE_PHI)
     946      1202493 :     return ssa_name_has_uses_outside_loop_p (gimple_phi_result (stmt), loop);
     947              : 
     948      9361436 :   FOR_EACH_SSA_DEF_OPERAND (def_p, stmt, op_iter, SSA_OP_DEF)
     949      3656979 :     if (ssa_name_has_uses_outside_loop_p (DEF_FROM_PTR (def_p), loop))
     950              :       return true;
     951              : 
     952              :   return false;
     953              : }
     954              : 
     955              : /* Return a copy of LOOP placed before LOOP.  */
     956              : 
     957              : static class loop *
     958         1143 : copy_loop_before (class loop *loop, bool redirect_lc_phi_defs)
     959              : {
     960         1143 :   class loop *res;
     961         1143 :   edge preheader = loop_preheader_edge (loop);
     962              : 
     963         1143 :   initialize_original_copy_tables ();
     964         1143 :   res = slpeel_tree_duplicate_loop_to_edge_cfg (loop, single_exit (loop), NULL,
     965              :                                                 NULL, preheader, NULL, false);
     966         1143 :   gcc_assert (res != NULL);
     967              : 
     968              :   /* When a not last partition is supposed to keep the LC PHIs computed
     969              :      adjust their definitions.  */
     970         1143 :   if (redirect_lc_phi_defs)
     971              :     {
     972            3 :       edge exit = single_exit (loop);
     973           14 :       for (gphi_iterator si = gsi_start_phis (exit->dest); !gsi_end_p (si);
     974           11 :            gsi_next (&si))
     975              :         {
     976           11 :           gphi *phi = si.phi ();
     977           22 :           if (virtual_operand_p (gimple_phi_result (phi)))
     978            3 :             continue;
     979            8 :           use_operand_p use_p = PHI_ARG_DEF_PTR_FROM_EDGE (phi, exit);
     980            8 :           if (TREE_CODE (USE_FROM_PTR (use_p)) == SSA_NAME)
     981              :             {
     982            5 :               tree new_def = get_current_def (USE_FROM_PTR (use_p));
     983            5 :               if (!new_def)
     984              :                 /* Something defined outside of the loop.  */
     985            2 :                 continue;
     986            3 :               SET_USE (use_p, new_def);
     987              :             }
     988              :         }
     989              :     }
     990              : 
     991         1143 :   free_original_copy_tables ();
     992         1143 :   delete_update_ssa ();
     993              : 
     994         1143 :   return res;
     995              : }
     996              : 
     997              : /* Creates an empty basic block after LOOP.  */
     998              : 
     999              : static void
    1000         1143 : create_bb_after_loop (class loop *loop)
    1001              : {
    1002         1143 :   edge exit = single_exit (loop);
    1003              : 
    1004         1143 :   if (!exit)
    1005              :     return;
    1006              : 
    1007         1143 :   split_edge (exit);
    1008              : }
    1009              : 
    1010              : /* Generate code for PARTITION from the code in LOOP.  The loop is
    1011              :    copied when COPY_P is true.  All the statements not flagged in the
    1012              :    PARTITION bitmap are removed from the loop or from its copy.  The
    1013              :    statements are indexed in sequence inside a basic block, and the
    1014              :    basic blocks of a loop are taken in dom order.  */
    1015              : 
    1016              : static void
    1017         3041 : generate_loops_for_partition (class loop *loop, partition *partition,
    1018              :                               bool copy_p, bool keep_lc_phis_p)
    1019              : {
    1020         3041 :   unsigned i;
    1021         3041 :   basic_block *bbs;
    1022              : 
    1023         3041 :   if (copy_p)
    1024              :     {
    1025         1143 :       int orig_loop_num = loop->orig_loop_num;
    1026         1143 :       loop = copy_loop_before (loop, keep_lc_phis_p);
    1027         1143 :       gcc_assert (loop != NULL);
    1028         1143 :       loop->orig_loop_num = orig_loop_num;
    1029         1143 :       create_preheader (loop, CP_SIMPLE_PREHEADERS);
    1030         1143 :       create_bb_after_loop (loop);
    1031              :     }
    1032              :   else
    1033              :     {
    1034              :       /* Origin number is set to the new versioned loop's num.  */
    1035         1898 :       gcc_assert (loop->orig_loop_num != loop->num);
    1036              :     }
    1037              : 
    1038              :   /* Remove stmts not in the PARTITION bitmap.  */
    1039         3041 :   bbs = get_loop_body_in_dom_order (loop);
    1040              : 
    1041         3041 :   if (MAY_HAVE_DEBUG_BIND_STMTS)
    1042         7667 :     for (i = 0; i < loop->num_nodes; i++)
    1043              :       {
    1044         5902 :         basic_block bb = bbs[i];
    1045              : 
    1046        11114 :         for (gphi_iterator bsi = gsi_start_phis (bb); !gsi_end_p (bsi);
    1047         5212 :              gsi_next (&bsi))
    1048              :           {
    1049         5212 :             gphi *phi = bsi.phi ();
    1050         8650 :             if (!virtual_operand_p (gimple_phi_result (phi))
    1051         5212 :                 && !bitmap_bit_p (partition->stmts, gimple_uid (phi)))
    1052          379 :               reset_debug_uses (phi);
    1053              :           }
    1054              : 
    1055        48542 :         for (gimple_stmt_iterator bsi = gsi_start_bb (bb); !gsi_end_p (bsi); gsi_next (&bsi))
    1056              :           {
    1057        36738 :             gimple *stmt = gsi_stmt (bsi);
    1058        36738 :             if (gimple_code (stmt) != GIMPLE_LABEL
    1059        36738 :                 && !is_gimple_debug (stmt)
    1060        60321 :                 && !bitmap_bit_p (partition->stmts, gimple_uid (stmt)))
    1061         6507 :               reset_debug_uses (stmt);
    1062              :           }
    1063              :       }
    1064              : 
    1065        13564 :   for (i = 0; i < loop->num_nodes; i++)
    1066              :     {
    1067        10523 :       basic_block bb = bbs[i];
    1068        10523 :       edge inner_exit = NULL;
    1069              : 
    1070        10523 :       if (loop != bb->loop_father)
    1071          116 :         inner_exit = single_exit (bb->loop_father);
    1072              : 
    1073        19875 :       for (gphi_iterator bsi = gsi_start_phis (bb); !gsi_end_p (bsi);)
    1074              :         {
    1075         9352 :           gphi *phi = bsi.phi ();
    1076        15542 :           if (!virtual_operand_p (gimple_phi_result (phi))
    1077         9352 :               && !bitmap_bit_p (partition->stmts, gimple_uid (phi)))
    1078          869 :             remove_phi_node (&bsi, true);
    1079              :           else
    1080         8483 :             gsi_next (&bsi);
    1081              :         }
    1082              : 
    1083        76971 :       for (gimple_stmt_iterator bsi = gsi_start_bb (bb); !gsi_end_p (bsi);)
    1084              :         {
    1085        55925 :           gimple *stmt = gsi_stmt (bsi);
    1086        55925 :           if (gimple_code (stmt) != GIMPLE_LABEL
    1087        55917 :               && !is_gimple_debug (stmt)
    1088        98687 :               && !bitmap_bit_p (partition->stmts, gimple_uid (stmt)))
    1089              :             {
    1090              :               /* In distribution of loop nest, if bb is inner loop's exit_bb,
    1091              :                  we choose its exit edge/path in order to avoid generating
    1092              :                  infinite loop.  For all other cases, we choose an arbitrary
    1093              :                  path through the empty CFG part that this unnecessary
    1094              :                  control stmt controls.  */
    1095        13676 :               if (gcond *cond_stmt = dyn_cast <gcond *> (stmt))
    1096              :                 {
    1097          672 :                   if (inner_exit && inner_exit->flags & EDGE_TRUE_VALUE)
    1098            5 :                     gimple_cond_make_true (cond_stmt);
    1099              :                   else
    1100          667 :                     gimple_cond_make_false (cond_stmt);
    1101          672 :                   update_stmt (stmt);
    1102              :                 }
    1103        13004 :               else if (gimple_code (stmt) == GIMPLE_SWITCH)
    1104              :                 {
    1105            0 :                   gswitch *switch_stmt = as_a <gswitch *> (stmt);
    1106            0 :                   gimple_switch_set_index
    1107            0 :                       (switch_stmt, CASE_LOW (gimple_switch_label (switch_stmt, 1)));
    1108            0 :                   update_stmt (stmt);
    1109              :                 }
    1110              :               else
    1111              :                 {
    1112        13004 :                   unlink_stmt_vdef (stmt);
    1113        13004 :                   gsi_remove (&bsi, true);
    1114        13004 :                   release_defs (stmt);
    1115        13004 :                   continue;
    1116              :                 }
    1117              :             }
    1118        42921 :           gsi_next (&bsi);
    1119              :         }
    1120              :     }
    1121              : 
    1122         3041 :   free (bbs);
    1123         3041 : }
    1124              : 
    1125              : /* If VAL memory representation contains the same value in all bytes,
    1126              :    return that value, otherwise return -1.
    1127              :    E.g. for 0x24242424 return 0x24, for IEEE double
    1128              :    747708026454360457216.0 return 0x44, etc.  */
    1129              : 
    1130              : static int
    1131        79241 : const_with_all_bytes_same (tree val)
    1132              : {
    1133        79241 :   unsigned char buf[64];
    1134        79241 :   int i, len;
    1135              : 
    1136        79241 :   if (integer_zerop (val)
    1137        79241 :       || (TREE_CODE (val) == CONSTRUCTOR
    1138          521 :           && !TREE_CLOBBER_P (val)
    1139        33162 :           && CONSTRUCTOR_NELTS (val) == 0))
    1140              :     return 0;
    1141              : 
    1142        48353 :   if (real_zerop (val))
    1143              :     {
    1144              :       /* Only return 0 for +0.0, not for -0.0, which doesn't have
    1145              :          an all bytes same memory representation.  Don't transform
    1146              :          -0.0 stores into +0.0 even for !HONOR_SIGNED_ZEROS.  */
    1147         2302 :       switch (TREE_CODE (val))
    1148              :         {
    1149         2289 :         case REAL_CST:
    1150         2289 :           if (!real_isneg (TREE_REAL_CST_PTR (val)))
    1151              :             return 0;
    1152              :           break;
    1153            6 :         case COMPLEX_CST:
    1154            6 :           if (!const_with_all_bytes_same (TREE_REALPART (val))
    1155           12 :               && !const_with_all_bytes_same (TREE_IMAGPART (val)))
    1156              :             return 0;
    1157              :           break;
    1158            7 :         case VECTOR_CST:
    1159            7 :           {
    1160            7 :             unsigned int count = vector_cst_encoded_nelts (val);
    1161            7 :             unsigned int j;
    1162           21 :             for (j = 0; j < count; ++j)
    1163           14 :               if (const_with_all_bytes_same (VECTOR_CST_ENCODED_ELT (val, j)))
    1164              :                 break;
    1165            7 :             if (j == count)
    1166              :               return 0;
    1167              :             break;
    1168              :           }
    1169              :         default:
    1170              :           break;
    1171              :         }
    1172              :     }
    1173              : 
    1174        46079 :   if (CHAR_BIT != 8 || BITS_PER_UNIT != 8)
    1175              :     return -1;
    1176              : 
    1177        46079 :   len = native_encode_expr (val, buf, sizeof (buf));
    1178        46079 :   if (len == 0)
    1179              :     return -1;
    1180        29503 :   for (i = 1; i < len; i++)
    1181        25952 :     if (buf[i] != buf[0])
    1182              :       return -1;
    1183         3551 :   return buf[0];
    1184              : }
    1185              : 
    1186              : /* Generate a call to memset for PARTITION in LOOP.  */
    1187              : 
    1188              : static void
    1189         7733 : generate_memset_builtin (class loop *loop, partition *partition)
    1190              : {
    1191         7733 :   gimple_stmt_iterator gsi;
    1192         7733 :   tree mem, fn, nb_bytes;
    1193         7733 :   tree val;
    1194         7733 :   struct builtin_info *builtin = partition->builtin;
    1195         7733 :   gimple *fn_call;
    1196              : 
    1197              :   /* The new statements will be placed before LOOP.  */
    1198         7733 :   gsi = gsi_last_bb (loop_preheader_edge (loop)->src);
    1199              : 
    1200         7733 :   nb_bytes = rewrite_to_non_trapping_overflow (builtin->size);
    1201         7733 :   nb_bytes = force_gimple_operand_gsi (&gsi, nb_bytes, true, NULL_TREE,
    1202              :                                        false, GSI_CONTINUE_LINKING);
    1203         7733 :   mem = rewrite_to_non_trapping_overflow (builtin->dst_base);
    1204         7733 :   mem = force_gimple_operand_gsi (&gsi, mem, true, NULL_TREE,
    1205              :                                   false, GSI_CONTINUE_LINKING);
    1206              : 
    1207              :   /* This exactly matches the pattern recognition in classify_partition.  */
    1208         7733 :   val = gimple_assign_rhs1 (DR_STMT (builtin->dst_dr));
    1209              :   /* Handle constants like 0x15151515 and similarly
    1210              :      floating point constants etc. where all bytes are the same.  */
    1211         7733 :   int bytev = const_with_all_bytes_same (val);
    1212         7733 :   if (bytev != -1)
    1213         7634 :     val = build_int_cst (integer_type_node, bytev);
    1214           99 :   else if (TREE_CODE (val) == INTEGER_CST)
    1215            0 :     val = fold_convert (integer_type_node, val);
    1216           99 :   else if (!useless_type_conversion_p (integer_type_node, TREE_TYPE (val)))
    1217              :     {
    1218           99 :       tree tem = make_ssa_name (integer_type_node);
    1219           99 :       gimple *cstmt = gimple_build_assign (tem, NOP_EXPR, val);
    1220           99 :       gsi_insert_after (&gsi, cstmt, GSI_CONTINUE_LINKING);
    1221           99 :       val = tem;
    1222              :     }
    1223              : 
    1224        15466 :   fn = build_fold_addr_expr (builtin_decl_implicit (BUILT_IN_MEMSET));
    1225         7733 :   fn_call = gimple_build_call (fn, 3, mem, val, nb_bytes);
    1226         7733 :   gimple_set_location (fn_call, partition->loc);
    1227         7733 :   gsi_insert_after (&gsi, fn_call, GSI_CONTINUE_LINKING);
    1228         7733 :   fold_stmt (&gsi);
    1229              : 
    1230         7733 :   if (dump_file && (dump_flags & TDF_DETAILS))
    1231              :     {
    1232           39 :       fprintf (dump_file, "generated memset");
    1233           39 :       if (bytev == 0)
    1234           33 :         fprintf (dump_file, " zero\n");
    1235              :       else
    1236            6 :         fprintf (dump_file, "\n");
    1237              :     }
    1238         7733 : }
    1239              : 
    1240              : /* Generate a call to memcpy for PARTITION in LOOP.  */
    1241              : 
    1242              : static void
    1243         4391 : generate_memcpy_builtin (class loop *loop, partition *partition)
    1244              : {
    1245         4391 :   gimple_stmt_iterator gsi;
    1246         4391 :   gimple *fn_call;
    1247         4391 :   tree dest, src, fn, nb_bytes;
    1248         4391 :   enum built_in_function kind;
    1249         4391 :   struct builtin_info *builtin = partition->builtin;
    1250              : 
    1251              :   /* The new statements will be placed before LOOP.  */
    1252         4391 :   gsi = gsi_last_bb (loop_preheader_edge (loop)->src);
    1253              : 
    1254         4391 :   nb_bytes = rewrite_to_non_trapping_overflow (builtin->size);
    1255         4391 :   nb_bytes = force_gimple_operand_gsi (&gsi, nb_bytes, true, NULL_TREE,
    1256              :                                        false, GSI_CONTINUE_LINKING);
    1257         4391 :   dest = rewrite_to_non_trapping_overflow (builtin->dst_base);
    1258         4391 :   src = rewrite_to_non_trapping_overflow (builtin->src_base);
    1259         4391 :   if (partition->kind == PKIND_MEMCPY
    1260         4391 :       || ! ptr_derefs_may_alias_p (dest, src))
    1261              :     kind = BUILT_IN_MEMCPY;
    1262              :   else
    1263          177 :     kind = BUILT_IN_MEMMOVE;
    1264              :   /* Try harder if we're copying a constant size.  */
    1265          177 :   if (kind == BUILT_IN_MEMMOVE && poly_int_tree_p (nb_bytes))
    1266              :     {
    1267          184 :       aff_tree asrc, adest;
    1268           92 :       tree_to_aff_combination (src, ptr_type_node, &asrc);
    1269           92 :       tree_to_aff_combination (dest, ptr_type_node, &adest);
    1270           92 :       aff_combination_scale (&adest, -1);
    1271           92 :       aff_combination_add (&asrc, &adest);
    1272          184 :       if (aff_comb_cannot_overlap_p (&asrc, wi::to_poly_widest (nb_bytes),
    1273          184 :                                      wi::to_poly_widest (nb_bytes)))
    1274            0 :         kind = BUILT_IN_MEMCPY;
    1275           92 :     }
    1276              : 
    1277         4391 :   dest = force_gimple_operand_gsi (&gsi, dest, true, NULL_TREE,
    1278              :                                    false, GSI_CONTINUE_LINKING);
    1279         4391 :   src = force_gimple_operand_gsi (&gsi, src, true, NULL_TREE,
    1280              :                                   false, GSI_CONTINUE_LINKING);
    1281         4391 :   fn = build_fold_addr_expr (builtin_decl_implicit (kind));
    1282         4391 :   fn_call = gimple_build_call (fn, 3, dest, src, nb_bytes);
    1283         4391 :   gimple_set_location (fn_call, partition->loc);
    1284         4391 :   gsi_insert_after (&gsi, fn_call, GSI_CONTINUE_LINKING);
    1285         4391 :   fold_stmt (&gsi);
    1286              : 
    1287         4391 :   if (dump_file && (dump_flags & TDF_DETAILS))
    1288              :     {
    1289           13 :       if (kind == BUILT_IN_MEMCPY)
    1290           10 :         fprintf (dump_file, "generated memcpy\n");
    1291              :       else
    1292            3 :         fprintf (dump_file, "generated memmove\n");
    1293              :     }
    1294         4391 : }
    1295              : 
    1296              : /* Remove and destroy the loop LOOP.  */
    1297              : 
    1298              : static void
    1299        10132 : destroy_loop (class loop *loop)
    1300              : {
    1301        10132 :   unsigned nbbs = loop->num_nodes;
    1302        10132 :   edge exit = single_exit (loop);
    1303        10132 :   basic_block src = loop_preheader_edge (loop)->src, dest = exit->dest;
    1304        10132 :   basic_block *bbs;
    1305        10132 :   unsigned i;
    1306              : 
    1307        10132 :   bbs = get_loop_body_in_dom_order (loop);
    1308              : 
    1309        10132 :   gimple_stmt_iterator dst_gsi = gsi_after_labels (exit->dest);
    1310        10132 :   bool safe_p = single_pred_p (exit->dest);
    1311        31907 :   for (unsigned i = 0; i < nbbs; ++i)
    1312              :     {
    1313              :       /* We have made sure to not leave any dangling uses of SSA
    1314              :          names defined in the loop.  With the exception of virtuals.
    1315              :          Make sure we replace all uses of virtual defs that will remain
    1316              :          outside of the loop with the bare symbol as delete_basic_block
    1317              :          will release them.  */
    1318        51576 :       for (gphi_iterator gsi = gsi_start_phis (bbs[i]); !gsi_end_p (gsi);
    1319        29801 :            gsi_next (&gsi))
    1320              :         {
    1321        29801 :           gphi *phi = gsi.phi ();
    1322        70684 :           if (virtual_operand_p (gimple_phi_result (phi)))
    1323        11082 :             mark_virtual_phi_result_for_renaming (phi);
    1324              :         }
    1325       132484 :       for (gimple_stmt_iterator gsi = gsi_start_bb (bbs[i]); !gsi_end_p (gsi);)
    1326              :         {
    1327        88934 :           gimple *stmt = gsi_stmt (gsi);
    1328        88934 :           tree vdef = gimple_vdef (stmt);
    1329        49237 :           if (vdef && TREE_CODE (vdef) == SSA_NAME)
    1330        11840 :             mark_virtual_operand_for_renaming (vdef);
    1331              :           /* Also move and eventually reset debug stmts.  We can leave
    1332              :              constant values in place in case the stmt dominates the exit.
    1333              :              ???  Non-constant values from the last iteration can be
    1334              :              replaced with final values if we can compute them.  */
    1335        88934 :           if (gimple_debug_bind_p (stmt))
    1336              :             {
    1337        20084 :               tree val = gimple_debug_bind_get_value (stmt);
    1338        20084 :               gsi_move_before (&gsi, &dst_gsi);
    1339        20084 :               if (val
    1340        20084 :                   && (!safe_p
    1341        10499 :                       || !is_gimple_min_invariant (val)
    1342         1118 :                       || !dominated_by_p (CDI_DOMINATORS, exit->src, bbs[i])))
    1343              :                 {
    1344        11171 :                   gimple_debug_bind_reset_value (stmt);
    1345        11171 :                   update_stmt (stmt);
    1346              :                 }
    1347              :             }
    1348              :           else
    1349        68850 :             gsi_next (&gsi);
    1350              :         }
    1351              :     }
    1352              : 
    1353        10132 :   redirect_edge_pred (exit, src);
    1354        10132 :   exit->flags &= ~(EDGE_TRUE_VALUE|EDGE_FALSE_VALUE);
    1355        10132 :   exit->flags |= EDGE_FALLTHRU;
    1356        10132 :   exit->probability = profile_probability::always ();
    1357        10132 :   cancel_loop_tree (loop);
    1358        10132 :   rescan_loop_exit (exit, false, true);
    1359              : 
    1360        10132 :   i = nbbs;
    1361        21775 :   do
    1362              :     {
    1363        21775 :       --i;
    1364        21775 :       delete_basic_block (bbs[i]);
    1365              :     }
    1366        21775 :   while (i != 0);
    1367              : 
    1368        10132 :   free (bbs);
    1369              : 
    1370        10132 :   set_immediate_dominator (CDI_DOMINATORS, dest,
    1371              :                            recompute_dominator (CDI_DOMINATORS, dest));
    1372        10132 : }
    1373              : 
    1374              : /* Generates code for PARTITION.  Return whether LOOP needs to be destroyed.  */
    1375              : 
    1376              : static bool
    1377        15165 : generate_code_for_partition (class loop *loop,
    1378              :                              partition *partition, bool copy_p,
    1379              :                              bool keep_lc_phis_p)
    1380              : {
    1381        15165 :   switch (partition->kind)
    1382              :     {
    1383         3041 :     case PKIND_NORMAL:
    1384         3041 :     case PKIND_PARTIAL_MEMSET:
    1385              :       /* Reductions all have to be in the last partition.  */
    1386         3041 :       gcc_assert (!partition_reduction_p (partition)
    1387              :                   || !copy_p);
    1388         3041 :       generate_loops_for_partition (loop, partition, copy_p,
    1389              :                                     keep_lc_phis_p);
    1390         3041 :       return false;
    1391              : 
    1392         7733 :     case PKIND_MEMSET:
    1393         7733 :       generate_memset_builtin (loop, partition);
    1394         7733 :       break;
    1395              : 
    1396         4391 :     case PKIND_MEMCPY:
    1397         4391 :     case PKIND_MEMMOVE:
    1398         4391 :       generate_memcpy_builtin (loop, partition);
    1399         4391 :       break;
    1400              : 
    1401            0 :     default:
    1402            0 :       gcc_unreachable ();
    1403              :     }
    1404              : 
    1405              :   /* Common tail for partitions we turn into a call.  If this was the last
    1406              :      partition for which we generate code, we have to destroy the loop.  */
    1407        12124 :   if (!copy_p)
    1408              :     return true;
    1409              :   return false;
    1410              : }
    1411              : 
    1412              : data_dependence_relation *
    1413      1550240 : loop_distribution::get_data_dependence (struct graph *rdg, data_reference_p a,
    1414              :                                         data_reference_p b)
    1415              : {
    1416      1550240 :   struct data_dependence_relation ent, **slot;
    1417      1550240 :   struct data_dependence_relation *ddr;
    1418              : 
    1419      1550240 :   gcc_assert (DR_IS_WRITE (a) || DR_IS_WRITE (b));
    1420      1550240 :   gcc_assert (rdg_vertex_for_stmt (rdg, DR_STMT (a))
    1421              :               <= rdg_vertex_for_stmt (rdg, DR_STMT (b)));
    1422      1550240 :   ent.a = a;
    1423      1550240 :   ent.b = b;
    1424      1550240 :   slot = ddrs_table->find_slot (&ent, INSERT);
    1425      1550240 :   if (*slot == NULL)
    1426              :     {
    1427      1174896 :       ddr = initialize_data_dependence_relation (a, b, loop_nest);
    1428      1174896 :       compute_affine_dependence (ddr, loop_nest[0]);
    1429      1174896 :       *slot = ddr;
    1430              :     }
    1431              : 
    1432      1550240 :   return *slot;
    1433              : }
    1434              : 
    1435              : bool
    1436       257008 : loop_distribution::data_dep_in_cycle_p (struct graph *rdg,
    1437              :                                         data_reference_p dr1,
    1438              :                                         data_reference_p dr2)
    1439              : {
    1440       257008 :   struct data_dependence_relation *ddr;
    1441              : 
    1442              :   /* Re-shuffle data-refs to be in topological order.  */
    1443       514016 :   if (rdg_vertex_for_stmt (rdg, DR_STMT (dr1))
    1444       257008 :       > rdg_vertex_for_stmt (rdg, DR_STMT (dr2)))
    1445        49661 :     std::swap (dr1, dr2);
    1446              : 
    1447       257008 :   ddr = get_data_dependence (rdg, dr1, dr2);
    1448              : 
    1449              :   /* In case of no data dependence.  */
    1450       257008 :   if (DDR_ARE_DEPENDENT (ddr) == chrec_known)
    1451              :     return false;
    1452              :   /* For unknown data dependence or known data dependence which can't be
    1453              :      expressed in classic distance vector, we check if it can be resolved
    1454              :      by runtime alias check.  If yes, we still consider data dependence
    1455              :      as won't introduce data dependence cycle.  */
    1456        92606 :   else if (DDR_ARE_DEPENDENT (ddr) == chrec_dont_know
    1457        92606 :            || DDR_NUM_DIST_VECTS (ddr) == 0)
    1458        50433 :     return !runtime_alias_check_p (ddr, NULL, true);
    1459        42173 :   else if (DDR_NUM_DIST_VECTS (ddr) > 1)
    1460              :     return true;
    1461        37732 :   else if (DDR_REVERSED_P (ddr)
    1462       111040 :            || lambda_vector_zerop (DDR_DIST_VECT (ddr, 0), DDR_NB_LOOPS (ddr)))
    1463        33064 :     return false;
    1464              : 
    1465              :   return true;
    1466              : }
    1467              : 
    1468              : void
    1469       237819 : loop_distribution::update_type_for_merge (struct graph *rdg,
    1470              :                                            partition *partition1,
    1471              :                                            partition *partition2)
    1472              : {
    1473       237819 :   unsigned i, j;
    1474       237819 :   bitmap_iterator bi, bj;
    1475       237819 :   data_reference_p dr1, dr2;
    1476              : 
    1477       718093 :   EXECUTE_IF_SET_IN_BITMAP (partition1->datarefs, 0, i, bi)
    1478              :     {
    1479       489384 :       unsigned start = (partition1 == partition2) ? i + 1 : 0;
    1480              : 
    1481       489384 :       dr1 = datarefs_vec[i];
    1482      1162112 :       EXECUTE_IF_SET_IN_BITMAP (partition2->datarefs, start, j, bj)
    1483              :         {
    1484       681838 :           dr2 = datarefs_vec[j];
    1485       681838 :           if (DR_IS_READ (dr1) && DR_IS_READ (dr2))
    1486       424830 :             continue;
    1487              : 
    1488              :           /* Partition can only be executed sequentially if there is any
    1489              :              data dependence cycle.  */
    1490       257008 :           if (data_dep_in_cycle_p (rdg, dr1, dr2))
    1491              :             {
    1492         9110 :               partition1->type = PTYPE_SEQUENTIAL;
    1493         9110 :               return;
    1494              :             }
    1495              :         }
    1496              :     }
    1497              : }
    1498              : 
    1499              : partition *
    1500       246865 : loop_distribution::build_rdg_partition_for_vertex (struct graph *rdg, int v)
    1501              : {
    1502       246865 :   partition *partition = partition_alloc ();
    1503       246865 :   auto_vec<int, 3> nodes;
    1504       246865 :   unsigned i, j;
    1505       246865 :   int x;
    1506       246865 :   data_reference_p dr;
    1507              : 
    1508       246865 :   graphds_dfs (rdg, &v, 1, &nodes, false, NULL);
    1509              : 
    1510      3468474 :   FOR_EACH_VEC_ELT (nodes, i, x)
    1511              :     {
    1512      3221609 :       bitmap_set_bit (partition->stmts, x);
    1513              : 
    1514      4231225 :       for (j = 0; RDG_DATAREFS (rdg, x).iterate (j, &dr); ++j)
    1515              :         {
    1516       507923 :           unsigned idx = (unsigned) DR_INDEX (dr);
    1517       507923 :           gcc_assert (idx < datarefs_vec.length ());
    1518              : 
    1519              :           /* Partition can only be executed sequentially if there is any
    1520              :              unknown data reference.  */
    1521       507923 :           if (!DR_BASE_ADDRESS (dr) || !DR_OFFSET (dr)
    1522       479257 :               || !DR_INIT (dr) || !DR_STEP (dr))
    1523        28666 :             partition->type = PTYPE_SEQUENTIAL;
    1524              : 
    1525       507923 :           bitmap_set_bit (partition->datarefs, idx);
    1526              :         }
    1527              :     }
    1528              : 
    1529       246865 :   if (partition->type == PTYPE_SEQUENTIAL)
    1530              :     return partition;
    1531              : 
    1532              :   /* Further check if any data dependence prevents us from executing the
    1533              :      partition parallelly.  */
    1534       230949 :   update_type_for_merge (rdg, partition, partition);
    1535              : 
    1536       230949 :   return partition;
    1537       246865 : }
    1538              : 
    1539              : /* Given PARTITION of LOOP and RDG, record single load/store data references
    1540              :    for builtin partition in SRC_DR/DST_DR, return false if there is no such
    1541              :    data references.  */
    1542              : 
    1543              : static bool
    1544       227771 : find_single_drs (class loop *loop, struct graph *rdg, const bitmap &partition_stmts,
    1545              :                  data_reference_p *dst_dr, data_reference_p *src_dr)
    1546              : {
    1547       227771 :   unsigned i;
    1548       227771 :   data_reference_p single_ld = NULL, single_st = NULL;
    1549       227771 :   bitmap_iterator bi;
    1550              : 
    1551      2339183 :   EXECUTE_IF_SET_IN_BITMAP (partition_stmts, 0, i, bi)
    1552              :     {
    1553      2174607 :       gimple *stmt = RDG_STMT (rdg, i);
    1554      2174607 :       data_reference_p dr;
    1555              : 
    1556      2174607 :       if (gimple_code (stmt) == GIMPLE_PHI)
    1557       493812 :         continue;
    1558              : 
    1559              :       /* Any scalar stmts are ok.  */
    1560      3131217 :       if (!gimple_vuse (stmt))
    1561      1329098 :         continue;
    1562              : 
    1563              :       /* Otherwise just regular loads/stores.  */
    1564       351697 :       if (!gimple_assign_single_p (stmt))
    1565       227771 :         return false;
    1566              : 
    1567              :       /* But exactly one store and/or load.  */
    1568      2756980 :       for (unsigned j = 0; RDG_DATAREFS (rdg, i).iterate (j, &dr); ++j)
    1569              :         {
    1570       357087 :           tree type = TREE_TYPE (DR_REF (dr));
    1571              : 
    1572              :           /* The memset, memcpy and memmove library calls are only
    1573              :              able to deal with generic address space.  */
    1574       357087 :           if (!ADDR_SPACE_GENERIC_P (TYPE_ADDR_SPACE (type)))
    1575              :             return false;
    1576              : 
    1577       357069 :           if (DR_IS_READ (dr))
    1578              :             {
    1579       210044 :               if (single_ld != NULL)
    1580              :                 return false;
    1581              :               single_ld = dr;
    1582              :             }
    1583              :           else
    1584              :             {
    1585       147025 :               if (single_st != NULL)
    1586              :                 return false;
    1587              :               single_st = dr;
    1588              :             }
    1589              :         }
    1590              :     }
    1591              : 
    1592       164576 :   if (!single_ld && !single_st)
    1593              :     return false;
    1594              : 
    1595       158690 :   basic_block bb_ld = NULL;
    1596       158690 :   basic_block bb_st = NULL;
    1597       158690 :   edge exit = single_exit (loop);
    1598              : 
    1599       158690 :   if (single_ld)
    1600              :     {
    1601              :       /* Bail out if this is a bitfield memory reference.  */
    1602        83754 :       if (TREE_CODE (DR_REF (single_ld)) == COMPONENT_REF
    1603        83754 :           && DECL_BIT_FIELD (TREE_OPERAND (DR_REF (single_ld), 1)))
    1604              :         return false;
    1605              : 
    1606              :       /* Data reference must be executed exactly once per iteration of each
    1607              :          loop in the loop nest.  We only need to check dominance information
    1608              :          against the outermost one in a perfect loop nest because a bb can't
    1609              :          dominate outermost loop's latch without dominating inner loop's.  */
    1610        83682 :       bb_ld = gimple_bb (DR_STMT (single_ld));
    1611        83682 :       if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb_ld))
    1612              :         return false;
    1613              : 
    1614              :       /* The data reference must also be executed before possibly exiting
    1615              :          the loop as otherwise we'd for example unconditionally execute
    1616              :          memset (ptr, 0, n) which even with n == 0 implies ptr is non-NULL.  */
    1617        77172 :       if (bb_ld != loop->header
    1618        77172 :           && (!exit
    1619         9953 :               || !dominated_by_p (CDI_DOMINATORS, exit->src, bb_ld)))
    1620         2165 :         return false;
    1621              :     }
    1622              : 
    1623       149943 :   if (single_st)
    1624              :     {
    1625              :       /* Bail out if this is a bitfield memory reference.  */
    1626       136847 :       if (TREE_CODE (DR_REF (single_st)) == COMPONENT_REF
    1627       136847 :           && DECL_BIT_FIELD (TREE_OPERAND (DR_REF (single_st), 1)))
    1628              :         return false;
    1629              : 
    1630              :       /* Data reference must be executed exactly once per iteration.
    1631              :          Same as single_ld, we only need to check against the outermost
    1632              :          loop.  */
    1633       136760 :       bb_st = gimple_bb (DR_STMT (single_st));
    1634       136760 :       if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb_st))
    1635              :         return false;
    1636              : 
    1637              :       /* And before exiting the loop.  */
    1638       132208 :       if (bb_st != loop->header
    1639       132208 :           && (!exit
    1640        17248 :               || !dominated_by_p (CDI_DOMINATORS, exit->src, bb_st)))
    1641         1178 :         return false;
    1642              :     }
    1643              : 
    1644       144126 :   if (single_ld && single_st)
    1645              :     {
    1646              :       /* Load and store must be in the same loop nest.  */
    1647        59766 :       if (bb_st->loop_father != bb_ld->loop_father)
    1648              :         return false;
    1649              : 
    1650        59165 :       edge e = single_exit (bb_st->loop_father);
    1651        59165 :       bool dom_ld = dominated_by_p (CDI_DOMINATORS, e->src, bb_ld);
    1652        59165 :       bool dom_st = dominated_by_p (CDI_DOMINATORS, e->src, bb_st);
    1653        59165 :       if (dom_ld != dom_st)
    1654              :         return false;
    1655              :     }
    1656              : 
    1657       143525 :   *src_dr = single_ld;
    1658       143525 :   *dst_dr = single_st;
    1659       143525 :   return true;
    1660              : }
    1661              : 
    1662              : /* Given data reference DR in LOOP_NEST, this function checks the enclosing
    1663              :    loops from inner to outer to see if loop's step equals to access size at
    1664              :    each level of loop.  Return 2 if we can prove this at all level loops;
    1665              :    record access base and size in BASE and SIZE; save loop's step at each
    1666              :    level of loop in STEPS if it is not null.  For example:
    1667              : 
    1668              :      int arr[100][100][100];
    1669              :      for (i = 0; i < 100; i++)       ;steps[2] = 40000
    1670              :        for (j = 100; j > 0; j--)     ;steps[1] = -400
    1671              :          for (k = 0; k < 100; k++)   ;steps[0] = 4
    1672              :            arr[i][j - 1][k] = 0;     ;base = &arr, size = 4000000
    1673              : 
    1674              :    Return 1 if we can prove the equality at the innermost loop, but not all
    1675              :    level loops.  In this case, no information is recorded.
    1676              : 
    1677              :    Return 0 if no equality can be proven at any level loops.  */
    1678              : 
    1679              : static int
    1680        83392 : compute_access_range (loop_p loop_nest, data_reference_p dr, tree *base,
    1681              :                       tree *size, vec<tree> *steps = NULL)
    1682              : {
    1683        83392 :   location_t loc = gimple_location (DR_STMT (dr));
    1684        83392 :   basic_block bb = gimple_bb (DR_STMT (dr));
    1685        83392 :   class loop *loop = bb->loop_father;
    1686        83392 :   tree ref = DR_REF (dr);
    1687        83392 :   tree access_base = build_fold_addr_expr (ref);
    1688        83392 :   tree access_size = TYPE_SIZE_UNIT (TREE_TYPE (ref));
    1689        83392 :   int res = 0;
    1690              : 
    1691        84853 :   do {
    1692        84853 :       tree scev_fn = analyze_scalar_evolution (loop, access_base);
    1693        84853 :       if (TREE_CODE (scev_fn) != POLYNOMIAL_CHREC)
    1694        43791 :         return res;
    1695              : 
    1696        83096 :       access_base = CHREC_LEFT (scev_fn);
    1697        83096 :       if (tree_contains_chrecs (access_base, NULL))
    1698              :         return res;
    1699              : 
    1700        83096 :       tree scev_step = CHREC_RIGHT (scev_fn);
    1701              :       /* Only support constant steps.  */
    1702        83096 :       if (TREE_CODE (scev_step) != INTEGER_CST)
    1703              :         return res;
    1704              : 
    1705        78414 :       enum ev_direction access_dir = scev_direction (scev_fn);
    1706        78414 :       if (access_dir == EV_DIR_UNKNOWN)
    1707              :         return res;
    1708              : 
    1709        78414 :       if (steps != NULL)
    1710        50970 :         steps->safe_push (scev_step);
    1711              : 
    1712        78414 :       scev_step = fold_convert_loc (loc, sizetype, scev_step);
    1713              :       /* Compute absolute value of scev step.  */
    1714        78414 :       if (access_dir == EV_DIR_DECREASES)
    1715         2200 :         scev_step = fold_build1_loc (loc, NEGATE_EXPR, sizetype, scev_step);
    1716              : 
    1717              :       /* At each level of loop, scev step must equal to access size.  In other
    1718              :          words, DR must access consecutive memory between loop iterations.  */
    1719        78414 :       if (!operand_equal_p (scev_step, access_size, 0))
    1720              :         return res;
    1721              : 
    1722              :       /* Access stride can be computed for data reference at least for the
    1723              :          innermost loop.  */
    1724        41062 :       res = 1;
    1725              : 
    1726              :       /* Compute DR's execution times in loop.  */
    1727        41062 :       tree niters = number_of_latch_executions (loop);
    1728        41062 :       niters = fold_convert_loc (loc, sizetype, niters);
    1729        41062 :       if (dominated_by_p (CDI_DOMINATORS, single_exit (loop)->src, bb))
    1730        41062 :         niters = size_binop_loc (loc, PLUS_EXPR, niters, size_one_node);
    1731              : 
    1732              :       /* Compute DR's overall access size in loop.  */
    1733        41062 :       access_size = fold_build2_loc (loc, MULT_EXPR, sizetype,
    1734              :                                      niters, scev_step);
    1735              :       /* Adjust base address in case of negative step.  */
    1736        41062 :       if (access_dir == EV_DIR_DECREASES)
    1737              :         {
    1738         1436 :           tree adj = fold_build2_loc (loc, MINUS_EXPR, sizetype,
    1739              :                                       scev_step, access_size);
    1740         1436 :           access_base = fold_build_pointer_plus_loc (loc, access_base, adj);
    1741              :         }
    1742        41062 :   } while (loop != loop_nest && (loop = loop_outer (loop)) != NULL);
    1743              : 
    1744        39601 :   *base = access_base;
    1745        39601 :   *size = access_size;
    1746              :   /* Access stride can be computed for data reference at each level loop.  */
    1747        39601 :   return 2;
    1748              : }
    1749              : 
    1750              : /* Allocate and return builtin struct.  Record information like DST_DR,
    1751              :    SRC_DR, DST_BASE, SRC_BASE and SIZE in the allocated struct.  */
    1752              : 
    1753              : static struct builtin_info *
    1754        12855 : alloc_builtin (data_reference_p dst_dr, data_reference_p src_dr,
    1755              :                tree dst_base, tree src_base, tree size)
    1756              : {
    1757            0 :   struct builtin_info *builtin = XNEW (struct builtin_info);
    1758        12855 :   builtin->dst_dr = dst_dr;
    1759        12855 :   builtin->src_dr = src_dr;
    1760        12855 :   builtin->dst_base = dst_base;
    1761        12855 :   builtin->src_base = src_base;
    1762        12855 :   builtin->size = size;
    1763        12855 :   return builtin;
    1764              : }
    1765              : 
    1766              : /* Given data reference DR in loop nest LOOP, classify if it forms builtin
    1767              :    memset call.  */
    1768              : 
    1769              : static void
    1770        71254 : classify_builtin_st (loop_p loop, partition *partition, data_reference_p dr)
    1771              : {
    1772        71254 :   gimple *stmt = DR_STMT (dr);
    1773        71254 :   tree base, size, rhs = gimple_assign_rhs1 (stmt);
    1774              : 
    1775        71254 :   if (const_with_all_bytes_same (rhs) == -1
    1776        71254 :       && (!INTEGRAL_TYPE_P (TREE_TYPE (rhs))
    1777        55616 :           || (TYPE_MODE (TREE_TYPE (rhs))
    1778        27808 :               != TYPE_MODE (unsigned_char_type_node))))
    1779        63293 :     return;
    1780              : 
    1781        33993 :   if (TREE_CODE (rhs) == SSA_NAME
    1782         5161 :       && !SSA_NAME_IS_DEFAULT_DEF (rhs)
    1783        38510 :       && flow_bb_inside_loop_p (loop, gimple_bb (SSA_NAME_DEF_STMT (rhs))))
    1784              :     return;
    1785              : 
    1786        29619 :   int res = compute_access_range (loop, dr, &base, &size);
    1787        29619 :   if (res == 0)
    1788              :     return;
    1789         8022 :   if (res == 1)
    1790              :     {
    1791           61 :       partition->kind = PKIND_PARTIAL_MEMSET;
    1792           61 :       return;
    1793              :     }
    1794              : 
    1795         7961 :   tree base_offset;
    1796         7961 :   tree base_base;
    1797         7961 :   split_constant_offset (base, &base_base, &base_offset);
    1798         7961 :   if (!cst_and_fits_in_hwi (base_offset))
    1799              :     return;
    1800         7961 :   unsigned HOST_WIDE_INT const_base_offset = int_cst_value (base_offset);
    1801              : 
    1802         7961 :   struct builtin_info *builtin;
    1803         7961 :   builtin = alloc_builtin (dr, NULL, base, NULL_TREE, size);
    1804         7961 :   builtin->dst_base_base = base_base;
    1805         7961 :   builtin->dst_base_offset = const_base_offset;
    1806         7961 :   partition->builtin = builtin;
    1807         7961 :   partition->kind = PKIND_MEMSET;
    1808              : }
    1809              : 
    1810              : /* Given data references DST_DR and SRC_DR in loop nest LOOP and RDG, classify
    1811              :    if it forms builtin memcpy or memmove call.  */
    1812              : 
    1813              : void
    1814        35006 : loop_distribution::classify_builtin_ldst (loop_p loop, struct graph *rdg,
    1815              :                                           partition *partition,
    1816              :                                           data_reference_p dst_dr,
    1817              :                                           data_reference_p src_dr)
    1818              : {
    1819        35006 :   tree base, size, src_base, src_size;
    1820        35006 :   auto_vec<tree> dst_steps, src_steps;
    1821              : 
    1822              :   /* Compute access range of both load and store.  */
    1823        35006 :   int res = compute_access_range (loop, dst_dr, &base, &size, &dst_steps);
    1824        35006 :   if (res != 2)
    1825              :     return;
    1826        18767 :   res = compute_access_range (loop, src_dr, &src_base, &src_size, &src_steps);
    1827        18767 :   if (res != 2)
    1828              :     return;
    1829              : 
    1830              :   /* They must have the same access size.  */
    1831        12873 :   if (!operand_equal_p (size, src_size, 0))
    1832              :     return;
    1833              : 
    1834              :   /* They must have the same storage order.  */
    1835        25746 :   if (reverse_storage_order_for_component_p (DR_REF (dst_dr))
    1836        12873 :       != reverse_storage_order_for_component_p (DR_REF (src_dr)))
    1837              :     return;
    1838              : 
    1839              :   /* Load and store in loop nest must access memory in the same way, i.e,
    1840              :      their must have the same steps in each loop of the nest.  */
    1841        38619 :   if (dst_steps.length () != src_steps.length ())
    1842              :     return;
    1843        25846 :   for (unsigned i = 0; i < dst_steps.length (); ++i)
    1844        13272 :     if (!operand_equal_p (dst_steps[i], src_steps[i], 0))
    1845              :       return;
    1846              : 
    1847              :   /* Now check that if there is a dependence.  */
    1848        12574 :   ddr_p ddr = get_data_dependence (rdg, src_dr, dst_dr);
    1849              : 
    1850              :   /* Classify as memcpy if no dependence between load and store.  */
    1851        12574 :   if (DDR_ARE_DEPENDENT (ddr) == chrec_known)
    1852              :     {
    1853         4596 :       partition->builtin = alloc_builtin (dst_dr, src_dr, base, src_base, size);
    1854         4596 :       partition->kind = PKIND_MEMCPY;
    1855         4596 :       return;
    1856              :     }
    1857              : 
    1858              :   /* Can't do memmove in case of unknown dependence or dependence without
    1859              :      classical distance vector.  */
    1860         7978 :   if (DDR_ARE_DEPENDENT (ddr) == chrec_dont_know
    1861        35582 :       || DDR_NUM_DIST_VECTS (ddr) == 0)
    1862              :     return;
    1863              : 
    1864          576 :   unsigned i;
    1865          576 :   lambda_vector dist_v;
    1866          576 :   int num_lev = (DDR_LOOP_NEST (ddr)).length ();
    1867         1094 :   FOR_EACH_VEC_ELT (DDR_DIST_VECTS (ddr), i, dist_v)
    1868              :     {
    1869          518 :       unsigned dep_lev = dependence_level (dist_v, num_lev);
    1870              :       /* Can't do memmove if load depends on store.  */
    1871          559 :       if (dep_lev > 0 && dist_v[dep_lev - 1] > 0 && !DDR_REVERSED_P (ddr))
    1872              :         return;
    1873              :     }
    1874              : 
    1875          298 :   partition->builtin = alloc_builtin (dst_dr, src_dr, base, src_base, size);
    1876          298 :   partition->kind = PKIND_MEMMOVE;
    1877          298 :   return;
    1878        35006 : }
    1879              : 
    1880              : bool
    1881       246865 : loop_distribution::classify_partition (loop_p loop,
    1882              :                                        struct graph *rdg, partition *partition,
    1883              :                                        bitmap stmt_in_all_partitions)
    1884              : {
    1885       246865 :   bitmap_iterator bi;
    1886       246865 :   unsigned i;
    1887       246865 :   data_reference_p single_ld = NULL, single_st = NULL;
    1888       246865 :   bool volatiles_p = false, has_reduction = false;
    1889              : 
    1890      3468474 :   EXECUTE_IF_SET_IN_BITMAP (partition->stmts, 0, i, bi)
    1891              :     {
    1892      3221609 :       gimple *stmt = RDG_STMT (rdg, i);
    1893              : 
    1894      5441306 :       if (gimple_has_volatile_ops (stmt))
    1895      3221609 :         volatiles_p = true;
    1896              : 
    1897              :       /* If the stmt is not included by all partitions and there is uses
    1898              :          outside of the loop, then mark the partition as reduction.  */
    1899      3221609 :       if (stmt_has_scalar_dependences_outside_loop (loop, stmt))
    1900              :         {
    1901              :           /* Due to limitation in the transform phase we have to fuse all
    1902              :              reduction partitions.  As a result, this could cancel valid
    1903              :              loop distribution especially for loop that induction variable
    1904              :              is used outside of loop.  To workaround this issue, we skip
    1905              :              marking partition as reudction if the reduction stmt belongs
    1906              :              to all partitions.  In such case, reduction will be computed
    1907              :              correctly no matter how partitions are fused/distributed.  */
    1908        63447 :           if (!bitmap_bit_p (stmt_in_all_partitions, i))
    1909        25726 :             partition->reduction_p = true;
    1910              :           else
    1911              :             has_reduction = true;
    1912              :         }
    1913              :     }
    1914              : 
    1915              :   /* Simple workaround to prevent classifying the partition as builtin
    1916              :      if it contains any use outside of loop.  For the case where all
    1917              :      partitions have the reduction this simple workaround is delayed
    1918              :      to only affect the last partition.  */
    1919       246865 :   if (partition->reduction_p)
    1920              :      return has_reduction;
    1921              : 
    1922              :   /* Perform general partition disqualification for builtins.  */
    1923       222329 :   if (volatiles_p
    1924       222329 :       || !flag_tree_loop_distribute_patterns)
    1925              :     return has_reduction;
    1926              : 
    1927              :   /* Find single load/store data references for builtin partition.  */
    1928       220332 :   if (!find_single_drs (loop, rdg, partition->stmts, &single_st, &single_ld)
    1929       220332 :       || !single_st)
    1930              :     return has_reduction;
    1931              : 
    1932       130309 :   if (single_ld && single_st)
    1933              :     {
    1934        59055 :       gimple *store = DR_STMT (single_st), *load = DR_STMT (single_ld);
    1935              :       /* Direct aggregate copy or via an SSA name temporary.  */
    1936        59055 :       if (load != store
    1937        59055 :           && gimple_assign_lhs (load) != gimple_assign_rhs1 (store))
    1938              :         return has_reduction;
    1939              :     }
    1940              : 
    1941       106260 :   partition->loc = gimple_location (DR_STMT (single_st));
    1942              : 
    1943              :   /* Classify the builtin kind.  */
    1944              :   if (single_ld == NULL)
    1945        71254 :     classify_builtin_st (loop, partition, single_st);
    1946              :   else
    1947        35006 :     classify_builtin_ldst (loop, rdg, partition, single_st, single_ld);
    1948              :   return has_reduction;
    1949              : }
    1950              : 
    1951              : bool
    1952       866868 : loop_distribution::share_memory_accesses (struct graph *rdg,
    1953              :                        partition *partition1, partition *partition2)
    1954              : {
    1955       866868 :   unsigned i, j;
    1956       866868 :   bitmap_iterator bi, bj;
    1957       866868 :   data_reference_p dr1, dr2;
    1958              : 
    1959              :   /* First check whether in the intersection of the two partitions are
    1960              :      any loads or stores.  Common loads are the situation that happens
    1961              :      most often.  */
    1962      6375916 :   EXECUTE_IF_AND_IN_BITMAP (partition1->stmts, partition2->stmts, 0, i, bi)
    1963      5513694 :     if (RDG_MEM_WRITE_STMT (rdg, i)
    1964      5513694 :         || RDG_MEM_READS_STMT (rdg, i))
    1965              :       return true;
    1966              : 
    1967              :   /* Then check whether the two partitions access the same memory object.  */
    1968      1920927 :   EXECUTE_IF_SET_IN_BITMAP (partition1->datarefs, 0, i, bi)
    1969              :     {
    1970      1061222 :       dr1 = datarefs_vec[i];
    1971              : 
    1972      1061222 :       if (!DR_BASE_ADDRESS (dr1)
    1973      1059652 :           || !DR_OFFSET (dr1) || !DR_INIT (dr1) || !DR_STEP (dr1))
    1974         1570 :         continue;
    1975              : 
    1976      2270105 :       EXECUTE_IF_SET_IN_BITMAP (partition2->datarefs, 0, j, bj)
    1977              :         {
    1978      1212970 :           dr2 = datarefs_vec[j];
    1979              : 
    1980      1212970 :           if (!DR_BASE_ADDRESS (dr2)
    1981      1212453 :               || !DR_OFFSET (dr2) || !DR_INIT (dr2) || !DR_STEP (dr2))
    1982          517 :             continue;
    1983              : 
    1984      1212453 :           if (operand_equal_p (DR_BASE_ADDRESS (dr1), DR_BASE_ADDRESS (dr2), 0)
    1985      1063265 :               && operand_equal_p (DR_OFFSET (dr1), DR_OFFSET (dr2), 0)
    1986      1061872 :               && operand_equal_p (DR_INIT (dr1), DR_INIT (dr2), 0)
    1987      1215001 :               && operand_equal_p (DR_STEP (dr1), DR_STEP (dr2), 0))
    1988              :             return true;
    1989              :         }
    1990              :     }
    1991              : 
    1992              :   return false;
    1993              : }
    1994              : 
    1995              : /* For each seed statement in STARTING_STMTS, this function builds
    1996              :    partition for it by adding depended statements according to RDG.
    1997              :    All partitions are recorded in PARTITIONS.  */
    1998              : 
    1999              : void
    2000       137515 : loop_distribution::rdg_build_partitions (struct graph *rdg,
    2001              :                                          vec<gimple *> starting_stmts,
    2002              :                                          vec<partition *> *partitions)
    2003              : {
    2004       137515 :   auto_bitmap processed;
    2005       137515 :   int i;
    2006       137515 :   gimple *stmt;
    2007              : 
    2008       528421 :   FOR_EACH_VEC_ELT (starting_stmts, i, stmt)
    2009              :     {
    2010       253391 :       int v = rdg_vertex_for_stmt (rdg, stmt);
    2011              : 
    2012       253391 :       if (dump_file && (dump_flags & TDF_DETAILS))
    2013          141 :         fprintf (dump_file,
    2014              :                  "ldist asked to generate code for vertex %d\n", v);
    2015              : 
    2016              :       /* If the vertex is already contained in another partition so
    2017              :          is the partition rooted at it.  */
    2018       253391 :       if (bitmap_bit_p (processed, v))
    2019         6526 :         continue;
    2020              : 
    2021       246865 :       partition *partition = build_rdg_partition_for_vertex (rdg, v);
    2022       246865 :       bitmap_ior_into (processed, partition->stmts);
    2023              : 
    2024       246865 :       if (dump_file && (dump_flags & TDF_DETAILS))
    2025              :         {
    2026          141 :           fprintf (dump_file, "ldist creates useful %s partition:\n",
    2027          141 :                    partition->type == PTYPE_PARALLEL ? "parallel" : "sequent");
    2028          141 :           bitmap_print (dump_file, partition->stmts, "  ", "\n");
    2029              :         }
    2030              : 
    2031       246865 :       partitions->safe_push (partition);
    2032              :     }
    2033              : 
    2034              :   /* All vertices should have been assigned to at least one partition now,
    2035              :      other than vertices belonging to dead code.  */
    2036       137515 : }
    2037              : 
    2038              : /* Dump to FILE the PARTITIONS.  */
    2039              : 
    2040              : static void
    2041           42 : dump_rdg_partitions (FILE *file, const vec<partition *> &partitions)
    2042              : {
    2043           42 :   int i;
    2044           42 :   partition *partition;
    2045              : 
    2046          108 :   FOR_EACH_VEC_ELT (partitions, i, partition)
    2047           66 :     debug_bitmap_file (file, partition->stmts);
    2048           42 : }
    2049              : 
    2050              : /* Debug PARTITIONS.  */
    2051              : extern void debug_rdg_partitions (const vec<partition *> &);
    2052              : 
    2053              : DEBUG_FUNCTION void
    2054            0 : debug_rdg_partitions (const vec<partition *> &partitions)
    2055              : {
    2056            0 :   dump_rdg_partitions (stderr, partitions);
    2057            0 : }
    2058              : 
    2059              : /* Returns the number of read and write operations in the RDG.  */
    2060              : 
    2061              : static int
    2062         2709 : number_of_rw_in_rdg (struct graph *rdg)
    2063              : {
    2064         2709 :   int i, res = 0;
    2065              : 
    2066        40767 :   for (i = 0; i < rdg->n_vertices; i++)
    2067              :     {
    2068        38058 :       if (RDG_MEM_WRITE_STMT (rdg, i))
    2069         7707 :         ++res;
    2070              : 
    2071        38058 :       if (RDG_MEM_READS_STMT (rdg, i))
    2072         5524 :         ++res;
    2073              :     }
    2074              : 
    2075         2709 :   return res;
    2076              : }
    2077              : 
    2078              : /* Returns the number of read and write operations in a PARTITION of
    2079              :    the RDG.  */
    2080              : 
    2081              : static int
    2082         5905 : number_of_rw_in_partition (struct graph *rdg, partition *partition)
    2083              : {
    2084         5905 :   int res = 0;
    2085         5905 :   unsigned i;
    2086         5905 :   bitmap_iterator ii;
    2087              : 
    2088        56680 :   EXECUTE_IF_SET_IN_BITMAP (partition->stmts, 0, i, ii)
    2089              :     {
    2090        50775 :       if (RDG_MEM_WRITE_STMT (rdg, i))
    2091         7699 :         ++res;
    2092              : 
    2093        50775 :       if (RDG_MEM_READS_STMT (rdg, i))
    2094         5524 :         ++res;
    2095              :     }
    2096              : 
    2097         5905 :   return res;
    2098              : }
    2099              : 
    2100              : /* Returns true when one of the PARTITIONS contains all the read or
    2101              :    write operations of RDG.  */
    2102              : 
    2103              : static bool
    2104         2709 : partition_contains_all_rw (struct graph *rdg,
    2105              :                            const vec<partition *> &partitions)
    2106              : {
    2107         2709 :   int i;
    2108         2709 :   partition *partition;
    2109         2709 :   int nrw = number_of_rw_in_rdg (rdg);
    2110              : 
    2111         8491 :   FOR_EACH_VEC_ELT (partitions, i, partition)
    2112         5905 :     if (nrw == number_of_rw_in_partition (rdg, partition))
    2113              :       return true;
    2114              : 
    2115              :   return false;
    2116              : }
    2117              : 
    2118              : int
    2119       973986 : loop_distribution::pg_add_dependence_edges (struct graph *rdg, int dir,
    2120              :                          bitmap drs1, bitmap drs2, vec<ddr_p> *alias_ddrs)
    2121              : {
    2122       973986 :   unsigned i, j;
    2123       973986 :   bitmap_iterator bi, bj;
    2124       973986 :   data_reference_p dr1, dr2, saved_dr1;
    2125              : 
    2126              :   /* dependence direction - 0 is no dependence, -1 is back,
    2127              :      1 is forth, 2 is both (we can stop then, merging will occur).  */
    2128      1871096 :   EXECUTE_IF_SET_IN_BITMAP (drs1, 0, i, bi)
    2129              :     {
    2130      1097174 :       dr1 = datarefs_vec[i];
    2131              : 
    2132      2284666 :       EXECUTE_IF_SET_IN_BITMAP (drs2, 0, j, bj)
    2133              :         {
    2134      1387556 :           int res, this_dir = 1;
    2135      1387556 :           ddr_p ddr;
    2136              : 
    2137      1387556 :           dr2 = datarefs_vec[j];
    2138              : 
    2139              :           /* Skip all <read, read> data dependence.  */
    2140      1387556 :           if (DR_IS_READ (dr1) && DR_IS_READ (dr2))
    2141       106898 :             continue;
    2142              : 
    2143      1280658 :           saved_dr1 = dr1;
    2144              :           /* Re-shuffle data-refs to be in topological order.  */
    2145      2561316 :           if (rdg_vertex_for_stmt (rdg, DR_STMT (dr1))
    2146      1280658 :               > rdg_vertex_for_stmt (rdg, DR_STMT (dr2)))
    2147              :             {
    2148        92764 :               std::swap (dr1, dr2);
    2149        92764 :               this_dir = -this_dir;
    2150              :             }
    2151      1280658 :           ddr = get_data_dependence (rdg, dr1, dr2);
    2152      1280658 :           if (DDR_ARE_DEPENDENT (ddr) == chrec_dont_know)
    2153              :             {
    2154       211873 :               this_dir = 0;
    2155       211873 :               res = data_ref_compare_tree (DR_BASE_ADDRESS (dr1),
    2156              :                                            DR_BASE_ADDRESS (dr2));
    2157              :               /* Be conservative.  If data references are not well analyzed,
    2158              :                  or the two data references have the same base address and
    2159              :                  offset, add dependence and consider it alias to each other.
    2160              :                  In other words, the dependence cannot be resolved by
    2161              :                  runtime alias check.  */
    2162       211873 :               if (!DR_BASE_ADDRESS (dr1) || !DR_BASE_ADDRESS (dr2)
    2163       211471 :                   || !DR_OFFSET (dr1) || !DR_OFFSET (dr2)
    2164       211471 :                   || !DR_INIT (dr1) || !DR_INIT (dr2)
    2165       211471 :                   || !DR_STEP (dr1) || !tree_fits_uhwi_p (DR_STEP (dr1))
    2166       203441 :                   || !DR_STEP (dr2) || !tree_fits_uhwi_p (DR_STEP (dr2))
    2167       203295 :                   || res == 0)
    2168              :                 this_dir = 2;
    2169              :               /* Data dependence could be resolved by runtime alias check,
    2170              :                  record it in ALIAS_DDRS.  */
    2171        13034 :               else if (alias_ddrs != NULL)
    2172         6497 :                 alias_ddrs->safe_push (ddr);
    2173              :               /* Or simply ignore it.  */
    2174              :             }
    2175      1068785 :           else if (DDR_ARE_DEPENDENT (ddr) == NULL_TREE)
    2176              :             {
    2177              :               /* Known dependences can still be unordered througout the
    2178              :                  iteration space, see gcc.dg/tree-ssa/ldist-16.c and
    2179              :                  gcc.dg/tree-ssa/pr94969.c.  */
    2180         2103 :               if (DDR_NUM_DIST_VECTS (ddr) != 1)
    2181              :                 this_dir = 2;
    2182              :               else
    2183              :                 {
    2184              :                   /* If the overlap is exact preserve stmt order.  */
    2185         1838 :                   if (lambda_vector_zerop (DDR_DIST_VECT (ddr, 0),
    2186         1838 :                                            DDR_NB_LOOPS (ddr)))
    2187              :                     ;
    2188              :                   /* Else as the distance vector is lexicographic positive swap
    2189              :                      the dependence direction.  */
    2190              :                   else
    2191              :                     {
    2192          842 :                       if (DDR_REVERSED_P (ddr))
    2193           42 :                         this_dir = -this_dir;
    2194          842 :                       this_dir = -this_dir;
    2195              :                     }
    2196              :                   /* When then dependence distance of the innermost common
    2197              :                      loop of the DRs is zero we have a conflict.  This is
    2198              :                      due to wonky dependence analysis which sometimes
    2199              :                      ends up using a zero distance in place of unknown.  */
    2200          919 :                   auto l1 = gimple_bb (DR_STMT (dr1))->loop_father;
    2201          919 :                   auto l2 = gimple_bb (DR_STMT (dr2))->loop_father;
    2202          919 :                   int idx = index_in_loop_nest (find_common_loop (l1, l2)->num,
    2203          919 :                                                 DDR_LOOP_NEST (ddr));
    2204          919 :                   if (DDR_DIST_VECT (ddr, 0)[idx] == 0
    2205              :                       /* Unless it is the outermost loop which is the one
    2206              :                          we eventually distribute.  */
    2207          919 :                       && idx != 0)
    2208              :                     this_dir = 2;
    2209              :                 }
    2210              :             }
    2211              :           else
    2212              :             this_dir = 0;
    2213         7433 :           if (this_dir == 2)
    2214       200064 :             return 2;
    2215      1080607 :           else if (dir == 0)
    2216              :             dir = this_dir;
    2217         7368 :           else if (this_dir != 0 && dir != this_dir)
    2218              :             return 2;
    2219              :           /* Shuffle "back" dr1.  */
    2220      1080594 :           dr1 = saved_dr1;
    2221              :         }
    2222              :     }
    2223              :   return dir;
    2224              : }
    2225              : 
    2226              : /* Compare postorder number of the partition graph vertices V1 and V2.  */
    2227              : 
    2228              : static int
    2229       789526 : pgcmp (const void *v1_, const void *v2_)
    2230              : {
    2231       789526 :   const vertex *v1 = (const vertex *)v1_;
    2232       789526 :   const vertex *v2 = (const vertex *)v2_;
    2233       789526 :   return v2->post - v1->post;
    2234              : }
    2235              : 
    2236              : /* Data attached to vertices of partition dependence graph.  */
    2237              : struct pg_vdata
    2238              : {
    2239              :   /* ID of the corresponding partition.  */
    2240              :   int id;
    2241              :   /* The partition.  */
    2242              :   struct partition *partition;
    2243              : };
    2244              : 
    2245              : /* Data attached to edges of partition dependence graph.  */
    2246              : struct pg_edata
    2247              : {
    2248              :   /* If the dependence edge can be resolved by runtime alias check,
    2249              :      this vector contains data dependence relations for runtime alias
    2250              :      check.  On the other hand, if the dependence edge is introduced
    2251              :      because of compilation time known data dependence, this vector
    2252              :      contains nothing.  */
    2253              :   vec<ddr_p> alias_ddrs;
    2254              : };
    2255              : 
    2256              : /* Callback data for traversing edges in graph.  */
    2257              : struct pg_edge_callback_data
    2258              : {
    2259              :   /* Bitmap contains strong connected components should be merged.  */
    2260              :   bitmap sccs_to_merge;
    2261              :   /* Array constains component information for all vertices.  */
    2262              :   int *vertices_component;
    2263              :   /* Vector to record all data dependence relations which are needed
    2264              :      to break strong connected components by runtime alias checks.  */
    2265              :   vec<ddr_p> *alias_ddrs;
    2266              : };
    2267              : 
    2268              : /* Initialize vertice's data for partition dependence graph PG with
    2269              :    PARTITIONS.  */
    2270              : 
    2271              : static void
    2272        12398 : init_partition_graph_vertices (struct graph *pg,
    2273              :                                vec<struct partition *> *partitions)
    2274              : {
    2275        12398 :   int i;
    2276        12398 :   partition *partition;
    2277        12398 :   struct pg_vdata *data;
    2278              : 
    2279        69430 :   for (i = 0; partitions->iterate (i, &partition); ++i)
    2280              :     {
    2281        57032 :       data = new pg_vdata;
    2282        57032 :       pg->vertices[i].data = data;
    2283        57032 :       data->id = i;
    2284        57032 :       data->partition = partition;
    2285              :     }
    2286        12398 : }
    2287              : 
    2288              : /* Add edge <I, J> to partition dependence graph PG.  Attach vector of data
    2289              :    dependence relations to the EDGE if DDRS isn't NULL.  */
    2290              : 
    2291              : static void
    2292       412241 : add_partition_graph_edge (struct graph *pg, int i, int j, vec<ddr_p> *ddrs)
    2293              : {
    2294       412241 :   struct graph_edge *e = add_edge (pg, i, j);
    2295              : 
    2296              :   /* If the edge is attached with data dependence relations, it means this
    2297              :      dependence edge can be resolved by runtime alias checks.  */
    2298       412241 :   if (ddrs != NULL)
    2299              :     {
    2300        10070 :       struct pg_edata *data = new pg_edata;
    2301              : 
    2302        10070 :       gcc_assert (ddrs->length () > 0);
    2303        10070 :       e->data = data;
    2304        10070 :       data->alias_ddrs = vNULL;
    2305        10070 :       data->alias_ddrs.safe_splice (*ddrs);
    2306              :     }
    2307       412241 : }
    2308              : 
    2309              : /* Callback function for graph travesal algorithm.  It returns true
    2310              :    if edge E should skipped when traversing the graph.  */
    2311              : 
    2312              : static bool
    2313         1951 : pg_skip_alias_edge (struct graph_edge *e)
    2314              : {
    2315         1951 :   struct pg_edata *data = (struct pg_edata *)e->data;
    2316         1951 :   return (data != NULL && data->alias_ddrs.length () > 0);
    2317              : }
    2318              : 
    2319              : /* Callback function freeing data attached to edge E of graph.  */
    2320              : 
    2321              : static void
    2322       412241 : free_partition_graph_edata_cb (struct graph *, struct graph_edge *e, void *)
    2323              : {
    2324       412241 :   if (e->data != NULL)
    2325              :     {
    2326        10069 :       struct pg_edata *data = (struct pg_edata *)e->data;
    2327        10069 :       data->alias_ddrs.release ();
    2328        10069 :       delete data;
    2329              :     }
    2330       412241 : }
    2331              : 
    2332              : /* Free data attached to vertice of partition dependence graph PG.  */
    2333              : 
    2334              : static void
    2335        12398 : free_partition_graph_vdata (struct graph *pg)
    2336              : {
    2337        12398 :   int i;
    2338        12398 :   struct pg_vdata *data;
    2339              : 
    2340        69430 :   for (i = 0; i < pg->n_vertices; ++i)
    2341              :     {
    2342        57032 :       data = (struct pg_vdata *)pg->vertices[i].data;
    2343        57032 :       delete data;
    2344              :     }
    2345        12398 : }
    2346              : 
    2347              : /* Build and return partition dependence graph for PARTITIONS.  RDG is
    2348              :    reduced dependence graph for the loop to be distributed.  If IGNORE_ALIAS_P
    2349              :    is true, data dependence caused by possible alias between references
    2350              :    is ignored, as if it doesn't exist at all; otherwise all depdendences
    2351              :    are considered.  */
    2352              : 
    2353              : struct graph *
    2354        12398 : loop_distribution::build_partition_graph (struct graph *rdg,
    2355              :                                           vec<struct partition *> *partitions,
    2356              :                                           bool ignore_alias_p)
    2357              : {
    2358        12398 :   int i, j;
    2359        12398 :   struct partition *partition1, *partition2;
    2360        24796 :   graph *pg = new_graph (partitions->length ());
    2361        12398 :   auto_vec<ddr_p> alias_ddrs, *alias_ddrs_p;
    2362              : 
    2363        12398 :   alias_ddrs_p = ignore_alias_p ? NULL : &alias_ddrs;
    2364              : 
    2365        12398 :   init_partition_graph_vertices (pg, partitions);
    2366              : 
    2367        12398 :   for (i = 0; partitions->iterate (i, &partition1); ++i)
    2368              :     {
    2369      1100448 :       for (j = i + 1; partitions->iterate (j, &partition2); ++j)
    2370              :         {
    2371              :           /* dependence direction - 0 is no dependence, -1 is back,
    2372              :              1 is forth, 2 is both (we can stop then, merging will occur).  */
    2373       973986 :           int dir = 0;
    2374              : 
    2375              :           /* If the first partition has reduction, add back edge; if the
    2376              :              second partition has reduction, add forth edge.  This makes
    2377              :              sure that reduction partition will be sorted as the last one.  */
    2378       973986 :           if (partition_reduction_p (partition1))
    2379              :             dir = -1;
    2380       973780 :           else if (partition_reduction_p (partition2))
    2381         1531 :             dir = 1;
    2382              : 
    2383              :           /* Cleanup the temporary vector.  */
    2384       973986 :           alias_ddrs.truncate (0);
    2385              : 
    2386       973986 :           dir = pg_add_dependence_edges (rdg, dir, partition1->datarefs,
    2387              :                                          partition2->datarefs, alias_ddrs_p);
    2388              : 
    2389              :           /* Add edge to partition graph if there exists dependence.  There
    2390              :              are two types of edges.  One type edge is caused by compilation
    2391              :              time known dependence, this type cannot be resolved by runtime
    2392              :              alias check.  The other type can be resolved by runtime alias
    2393              :              check.  */
    2394       973986 :           if (dir == 1 || dir == 2
    2395       980711 :               || alias_ddrs.length () > 0)
    2396              :             {
    2397              :               /* Attach data dependence relations to edge that can be resolved
    2398              :                  by runtime alias check.  */
    2399       206857 :               bool alias_edge_p = (dir != 1 && dir != 2);
    2400       408710 :               add_partition_graph_edge (pg, i, j,
    2401              :                                         (alias_edge_p) ? &alias_ddrs : NULL);
    2402              :             }
    2403       973986 :           if (dir == -1 || dir == 2
    2404      1953038 :               || alias_ddrs.length () > 0)
    2405              :             {
    2406              :               /* Attach data dependence relations to edge that can be resolved
    2407              :                  by runtime alias check.  */
    2408       205384 :               bool alias_edge_p = (dir != -1 && dir != 2);
    2409       405702 :               add_partition_graph_edge (pg, j, i,
    2410              :                                         (alias_edge_p) ? &alias_ddrs : NULL);
    2411              :             }
    2412              :         }
    2413              :     }
    2414        12398 :   return pg;
    2415        12398 : }
    2416              : 
    2417              : /* Sort partitions in PG in descending post order and store them in
    2418              :    PARTITIONS.  */
    2419              : 
    2420              : static void
    2421        12398 : sort_partitions_by_post_order (struct graph *pg,
    2422              :                                vec<struct partition *> *partitions)
    2423              : {
    2424        12398 :   int i;
    2425        12398 :   struct pg_vdata *data;
    2426              : 
    2427              :   /* Now order the remaining nodes in descending postorder.  */
    2428        12398 :   qsort (pg->vertices, pg->n_vertices, sizeof (vertex), pgcmp);
    2429        12398 :   partitions->truncate (0);
    2430        81828 :   for (i = 0; i < pg->n_vertices; ++i)
    2431              :     {
    2432        57032 :       data = (struct pg_vdata *)pg->vertices[i].data;
    2433        57032 :       if (data->partition)
    2434        49411 :         partitions->safe_push (data->partition);
    2435              :     }
    2436        12398 : }
    2437              : 
    2438              : void
    2439         6756 : loop_distribution::merge_dep_scc_partitions (struct graph *rdg,
    2440              :                                              vec<struct partition *> *partitions,
    2441              :                                              bool ignore_alias_p)
    2442              : {
    2443         6756 :   struct partition *partition1, *partition2;
    2444         6756 :   struct pg_vdata *data;
    2445         6756 :   graph *pg = build_partition_graph (rdg, partitions, ignore_alias_p);
    2446         6756 :   int i, j, num_sccs = graphds_scc (pg, NULL);
    2447              : 
    2448              :   /* Strong connected compoenent means dependence cycle, we cannot distribute
    2449              :      them.  So fuse them together.  */
    2450         6756 :   if ((unsigned) num_sccs < partitions->length ())
    2451              :     {
    2452         1938 :       for (i = 0; i < num_sccs; ++i)
    2453              :         {
    2454         2128 :           for (j = 0; partitions->iterate (j, &partition1); ++j)
    2455         2128 :             if (pg->vertices[j].component == i)
    2456              :               break;
    2457         9760 :           for (j = j + 1; partitions->iterate (j, &partition2); ++j)
    2458         7744 :             if (pg->vertices[j].component == i)
    2459              :               {
    2460         6673 :                 partition_merge_into (NULL, partition1,
    2461              :                                       partition2, FUSE_SAME_SCC);
    2462         6673 :                 partition1->type = PTYPE_SEQUENTIAL;
    2463         6673 :                 (*partitions)[j] = NULL;
    2464         6673 :                 partition_free (partition2);
    2465         6673 :                 data = (struct pg_vdata *)pg->vertices[j].data;
    2466         6673 :                 data->partition = NULL;
    2467              :               }
    2468              :         }
    2469              :     }
    2470              : 
    2471         6756 :   sort_partitions_by_post_order (pg, partitions);
    2472        13512 :   gcc_assert (partitions->length () == (unsigned)num_sccs);
    2473         6756 :   free_partition_graph_vdata (pg);
    2474         6756 :   for_each_edge (pg, free_partition_graph_edata_cb, NULL);
    2475         6756 :   free_graph (pg);
    2476         6756 : }
    2477              : 
    2478              : /* Callback function for traversing edge E in graph G.  DATA is private
    2479              :    callback data.  */
    2480              : 
    2481              : static void
    2482          741 : pg_collect_alias_ddrs (struct graph *g, struct graph_edge *e, void *data)
    2483              : {
    2484          741 :   int i, j, component;
    2485          741 :   struct pg_edge_callback_data *cbdata;
    2486          741 :   struct pg_edata *edata = (struct pg_edata *) e->data;
    2487              : 
    2488              :   /* If the edge doesn't have attached data dependence, it represents
    2489              :      compilation time known dependences.  This type dependence cannot
    2490              :      be resolved by runtime alias check.  */
    2491          741 :   if (edata == NULL || edata->alias_ddrs.length () == 0)
    2492              :     return;
    2493              : 
    2494          723 :   cbdata = (struct pg_edge_callback_data *) data;
    2495          723 :   i = e->src;
    2496          723 :   j = e->dest;
    2497          723 :   component = cbdata->vertices_component[i];
    2498              :   /* Vertices are topologically sorted according to compilation time
    2499              :      known dependences, so we can break strong connected components
    2500              :      by removing edges of the opposite direction, i.e, edges pointing
    2501              :      from vertice with smaller post number to vertice with bigger post
    2502              :      number.  */
    2503          723 :   if (g->vertices[i].post < g->vertices[j].post
    2504              :       /* We only need to remove edges connecting vertices in the same
    2505              :          strong connected component to break it.  */
    2506          368 :       && component == cbdata->vertices_component[j]
    2507              :       /* Check if we want to break the strong connected component or not.  */
    2508         1091 :       && !bitmap_bit_p (cbdata->sccs_to_merge, component))
    2509          368 :     cbdata->alias_ddrs->safe_splice (edata->alias_ddrs);
    2510              : }
    2511              : 
    2512              : /* Callback function for traversing edge E.  DATA is private
    2513              :    callback data.  */
    2514              : 
    2515              : static void
    2516          741 : pg_unmark_merged_alias_ddrs (struct graph *, struct graph_edge *e, void *data)
    2517              : {
    2518          741 :   int i, j, component;
    2519          741 :   struct pg_edge_callback_data *cbdata;
    2520          741 :   struct pg_edata *edata = (struct pg_edata *) e->data;
    2521              : 
    2522          741 :   if (edata == NULL || edata->alias_ddrs.length () == 0)
    2523              :     return;
    2524              : 
    2525          724 :   cbdata = (struct pg_edge_callback_data *) data;
    2526          724 :   i = e->src;
    2527          724 :   j = e->dest;
    2528          724 :   component = cbdata->vertices_component[i];
    2529              :   /* Make sure to not skip vertices inside SCCs we are going to merge.  */
    2530          724 :   if (component == cbdata->vertices_component[j]
    2531          724 :       && bitmap_bit_p (cbdata->sccs_to_merge, component))
    2532              :     {
    2533            1 :       edata->alias_ddrs.release ();
    2534            1 :       delete edata;
    2535            1 :       e->data = NULL;
    2536              :     }
    2537              : }
    2538              : 
    2539              : /* This is the main function breaking strong conected components in
    2540              :    PARTITIONS giving reduced depdendence graph RDG.  Store data dependence
    2541              :    relations for runtime alias check in ALIAS_DDRS.  */
    2542              : void
    2543         5642 : loop_distribution::break_alias_scc_partitions (struct graph *rdg,
    2544              :                                                vec<struct partition *> *partitions,
    2545              :                                                vec<ddr_p> *alias_ddrs)
    2546              : {
    2547         5642 :   int i, j, k, num_sccs, num_sccs_no_alias = 0;
    2548              :   /* Build partition dependence graph.  */
    2549         5642 :   graph *pg = build_partition_graph (rdg, partitions, false);
    2550              : 
    2551         5642 :   alias_ddrs->truncate (0);
    2552              :   /* Find strong connected components in the graph, with all dependence edges
    2553              :      considered.  */
    2554         5642 :   num_sccs = graphds_scc (pg, NULL);
    2555              :   /* All SCCs now can be broken by runtime alias checks because SCCs caused by
    2556              :      compilation time known dependences are merged before this function.  */
    2557         5642 :   if ((unsigned) num_sccs < partitions->length ())
    2558              :     {
    2559          293 :       struct pg_edge_callback_data cbdata;
    2560          293 :       auto_bitmap sccs_to_merge;
    2561          293 :       auto_vec<enum partition_type> scc_types;
    2562          293 :       struct partition *partition, *first;
    2563              : 
    2564              :       /* If all partitions in a SCC have the same type, we can simply merge the
    2565              :          SCC.  This loop finds out such SCCS and record them in bitmap.  */
    2566          293 :       bitmap_set_range (sccs_to_merge, 0, (unsigned) num_sccs);
    2567          603 :       for (i = 0; i < num_sccs; ++i)
    2568              :         {
    2569          333 :           for (j = 0; partitions->iterate (j, &first); ++j)
    2570          333 :             if (pg->vertices[j].component == i)
    2571              :               break;
    2572              : 
    2573          310 :           bool same_type = true, all_builtins = partition_builtin_p (first);
    2574         1541 :           for (++j; partitions->iterate (j, &partition); ++j)
    2575              :             {
    2576         1268 :               if (pg->vertices[j].component != i)
    2577          154 :                 continue;
    2578              : 
    2579         1114 :               if (first->type != partition->type)
    2580              :                 {
    2581              :                   same_type = false;
    2582              :                   break;
    2583              :                 }
    2584         1077 :               all_builtins &= partition_builtin_p (partition);
    2585              :             }
    2586              :           /* Merge SCC if all partitions in SCC have the same type, though the
    2587              :              result partition is sequential, because vectorizer can do better
    2588              :              runtime alias check.  One expecption is all partitions in SCC are
    2589              :              builtins.  */
    2590          310 :           if (!same_type || all_builtins)
    2591           79 :             bitmap_clear_bit (sccs_to_merge, i);
    2592              :         }
    2593              : 
    2594              :       /* Initialize callback data for traversing.  */
    2595          293 :       cbdata.sccs_to_merge = sccs_to_merge;
    2596          293 :       cbdata.alias_ddrs = alias_ddrs;
    2597          293 :       cbdata.vertices_component = XNEWVEC (int, pg->n_vertices);
    2598              :       /* Record the component information which will be corrupted by next
    2599              :          graph scc finding call.  */
    2600         1721 :       for (i = 0; i < pg->n_vertices; ++i)
    2601         1428 :         cbdata.vertices_component[i] = pg->vertices[i].component;
    2602              : 
    2603              :       /* Collect data dependences for runtime alias checks to break SCCs.  */
    2604          293 :       if (bitmap_count_bits (sccs_to_merge) != (unsigned) num_sccs)
    2605              :         {
    2606              :           /* For SCCs we want to merge clear all alias_ddrs for edges
    2607              :              inside the component.  */
    2608           75 :           for_each_edge (pg, pg_unmark_merged_alias_ddrs, &cbdata);
    2609              : 
    2610              :           /* Run SCC finding algorithm again, with alias dependence edges
    2611              :              skipped.  This is to topologically sort partitions according to
    2612              :              compilation time known dependence.  Note the topological order
    2613              :              is stored in the form of pg's post order number.  */
    2614           75 :           num_sccs_no_alias = graphds_scc (pg, NULL, pg_skip_alias_edge);
    2615              :           /* We cannot assert partitions->length () == num_sccs_no_alias
    2616              :              since we are not ignoring alias edges in cycles we are
    2617              :              going to merge.  That's required to compute correct postorder.  */
    2618              :           /* With topological order, we can construct two subgraphs L and R.
    2619              :              L contains edge <x, y> where x < y in terms of post order, while
    2620              :              R contains edge <x, y> where x > y.  Edges for compilation time
    2621              :              known dependence all fall in R, so we break SCCs by removing all
    2622              :              (alias) edges of in subgraph L.  */
    2623           75 :           for_each_edge (pg, pg_collect_alias_ddrs, &cbdata);
    2624              :         }
    2625              : 
    2626              :       /* For SCC that doesn't need to be broken, merge it.  */
    2627          603 :       for (i = 0; i < num_sccs; ++i)
    2628              :         {
    2629          310 :           if (!bitmap_bit_p (sccs_to_merge, i))
    2630           79 :             continue;
    2631              : 
    2632          248 :           for (j = 0; partitions->iterate (j, &first); ++j)
    2633          248 :             if (cbdata.vertices_component[j] == i)
    2634              :               break;
    2635         1635 :           for (k = j + 1; partitions->iterate (k, &partition); ++k)
    2636              :             {
    2637         1094 :               struct pg_vdata *data;
    2638              : 
    2639         1094 :               if (cbdata.vertices_component[k] != i)
    2640          146 :                 continue;
    2641              : 
    2642          948 :               partition_merge_into (NULL, first, partition, FUSE_SAME_SCC);
    2643          948 :               (*partitions)[k] = NULL;
    2644          948 :               partition_free (partition);
    2645          948 :               data = (struct pg_vdata *)pg->vertices[k].data;
    2646          948 :               gcc_assert (data->id == k);
    2647          948 :               data->partition = NULL;
    2648              :               /* The result partition of merged SCC must be sequential.  */
    2649          948 :               first->type = PTYPE_SEQUENTIAL;
    2650              :             }
    2651              :         }
    2652              :       /* If reduction partition's SCC is broken by runtime alias checks,
    2653              :          we force a negative post order to it making sure it will be scheduled
    2654              :          in the last.  */
    2655          293 :       if (num_sccs_no_alias > 0)
    2656              :         {
    2657              :           j = -1;
    2658          327 :           for (i = 0; i < pg->n_vertices; ++i)
    2659              :             {
    2660          252 :               struct pg_vdata *data = (struct pg_vdata *)pg->vertices[i].data;
    2661          252 :               if (data->partition && partition_reduction_p (data->partition))
    2662              :                 {
    2663            5 :                   gcc_assert (j == -1);
    2664              :                   j = i;
    2665              :                 }
    2666              :             }
    2667           75 :           if (j >= 0)
    2668            5 :             pg->vertices[j].post = -1;
    2669              :         }
    2670              : 
    2671          293 :       free (cbdata.vertices_component);
    2672          293 :     }
    2673              : 
    2674         5642 :   sort_partitions_by_post_order (pg, partitions);
    2675         5642 :   free_partition_graph_vdata (pg);
    2676         5642 :   for_each_edge (pg, free_partition_graph_edata_cb, NULL);
    2677         5642 :   free_graph (pg);
    2678              : 
    2679         5642 :   if (dump_file && (dump_flags & TDF_DETAILS))
    2680              :     {
    2681           15 :       fprintf (dump_file, "Possible alias data dependence to break:\n");
    2682           15 :       dump_data_dependence_relations (dump_file, *alias_ddrs);
    2683              :     }
    2684         5642 : }
    2685              : 
    2686              : /* Compute and return an expression whose value is the segment length which
    2687              :    will be accessed by DR in NITERS iterations.  */
    2688              : 
    2689              : static tree
    2690         1286 : data_ref_segment_size (struct data_reference *dr, tree niters)
    2691              : {
    2692         1286 :   niters = size_binop (MINUS_EXPR,
    2693              :                        fold_convert (sizetype, niters),
    2694              :                        size_one_node);
    2695         1286 :   return size_binop (MULT_EXPR,
    2696              :                      fold_convert (sizetype, DR_STEP (dr)),
    2697              :                      fold_convert (sizetype, niters));
    2698              : }
    2699              : 
    2700              : /* Return true if LOOP's latch is dominated by statement for data reference
    2701              :    DR.  */
    2702              : 
    2703              : static inline bool
    2704         1286 : latch_dominated_by_data_ref (class loop *loop, data_reference *dr)
    2705              : {
    2706         1286 :   return dominated_by_p (CDI_DOMINATORS, single_exit (loop)->src,
    2707         1286 :                          gimple_bb (DR_STMT (dr)));
    2708              : }
    2709              : 
    2710              : /* Compute alias check pairs and store them in COMP_ALIAS_PAIRS for LOOP's
    2711              :    data dependence relations ALIAS_DDRS.  */
    2712              : 
    2713              : static void
    2714           75 : compute_alias_check_pairs (class loop *loop, vec<ddr_p> *alias_ddrs,
    2715              :                            vec<dr_with_seg_len_pair_t> *comp_alias_pairs)
    2716              : {
    2717           75 :   unsigned int i;
    2718           75 :   unsigned HOST_WIDE_INT factor = 1;
    2719           75 :   tree niters_plus_one, niters = number_of_latch_executions (loop);
    2720              : 
    2721           75 :   gcc_assert (niters != NULL_TREE && niters != chrec_dont_know);
    2722           75 :   niters = fold_convert (sizetype, niters);
    2723           75 :   niters_plus_one = size_binop (PLUS_EXPR, niters, size_one_node);
    2724              : 
    2725           75 :   if (dump_file && (dump_flags & TDF_DETAILS))
    2726            0 :     fprintf (dump_file, "Creating alias check pairs:\n");
    2727              : 
    2728              :   /* Iterate all data dependence relations and compute alias check pairs.  */
    2729          718 :   for (i = 0; i < alias_ddrs->length (); i++)
    2730              :     {
    2731          643 :       ddr_p ddr = (*alias_ddrs)[i];
    2732          643 :       struct data_reference *dr_a = DDR_A (ddr);
    2733          643 :       struct data_reference *dr_b = DDR_B (ddr);
    2734          643 :       tree seg_length_a, seg_length_b;
    2735              : 
    2736          643 :       if (latch_dominated_by_data_ref (loop, dr_a))
    2737          639 :         seg_length_a = data_ref_segment_size (dr_a, niters_plus_one);
    2738              :       else
    2739            4 :         seg_length_a = data_ref_segment_size (dr_a, niters);
    2740              : 
    2741          643 :       if (latch_dominated_by_data_ref (loop, dr_b))
    2742          639 :         seg_length_b = data_ref_segment_size (dr_b, niters_plus_one);
    2743              :       else
    2744            4 :         seg_length_b = data_ref_segment_size (dr_b, niters);
    2745              : 
    2746          643 :       unsigned HOST_WIDE_INT access_size_a
    2747          643 :         = tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr_a))));
    2748          643 :       unsigned HOST_WIDE_INT access_size_b
    2749          643 :         = tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr_b))));
    2750          643 :       unsigned int align_a = TYPE_ALIGN_UNIT (TREE_TYPE (DR_REF (dr_a)));
    2751          643 :       unsigned int align_b = TYPE_ALIGN_UNIT (TREE_TYPE (DR_REF (dr_b)));
    2752              : 
    2753          643 :       dr_with_seg_len_pair_t dr_with_seg_len_pair
    2754         1286 :         (dr_with_seg_len (dr_a, seg_length_a, access_size_a, align_a),
    2755         1286 :          dr_with_seg_len (dr_b, seg_length_b, access_size_b, align_b),
    2756              :          /* ??? Would WELL_ORDERED be safe?  */
    2757          643 :          dr_with_seg_len_pair_t::REORDERED);
    2758              : 
    2759          643 :       comp_alias_pairs->safe_push (dr_with_seg_len_pair);
    2760              :     }
    2761              : 
    2762           75 :   if (tree_fits_uhwi_p (niters))
    2763           46 :     factor = tree_to_uhwi (niters);
    2764              : 
    2765              :   /* Prune alias check pairs.  */
    2766           75 :   prune_runtime_alias_test_list (comp_alias_pairs, factor);
    2767           75 :   if (dump_file && (dump_flags & TDF_DETAILS))
    2768            0 :     fprintf (dump_file,
    2769              :              "Improved number of alias checks from %d to %d\n",
    2770              :              alias_ddrs->length (), comp_alias_pairs->length ());
    2771           75 : }
    2772              : 
    2773              : /* Given data dependence relations in ALIAS_DDRS, generate runtime alias
    2774              :    checks and version LOOP under condition of these runtime alias checks.  */
    2775              : 
    2776              : static void
    2777           75 : version_loop_by_alias_check (vec<struct partition *> *partitions,
    2778              :                              class loop *loop, vec<ddr_p> *alias_ddrs)
    2779              : {
    2780           75 :   profile_probability prob;
    2781           75 :   basic_block cond_bb;
    2782           75 :   class loop *nloop;
    2783           75 :   tree lhs, arg0, cond_expr = NULL_TREE;
    2784           75 :   gimple_seq cond_stmts = NULL;
    2785           75 :   gimple *call_stmt = NULL;
    2786           75 :   auto_vec<dr_with_seg_len_pair_t> comp_alias_pairs;
    2787              : 
    2788              :   /* Generate code for runtime alias checks if necessary.  */
    2789           75 :   gcc_assert (alias_ddrs->length () > 0);
    2790              : 
    2791           75 :   if (dump_file && (dump_flags & TDF_DETAILS))
    2792            0 :     fprintf (dump_file,
    2793              :              "Version loop <%d> with runtime alias check\n", loop->num);
    2794              : 
    2795           75 :   compute_alias_check_pairs (loop, alias_ddrs, &comp_alias_pairs);
    2796           75 :   create_runtime_alias_checks (loop, &comp_alias_pairs, &cond_expr);
    2797           75 :   cond_expr = force_gimple_operand_1 (cond_expr, &cond_stmts,
    2798              :                                       is_gimple_val, NULL_TREE);
    2799              : 
    2800              :   /* Depend on vectorizer to fold IFN_LOOP_DIST_ALIAS.  */
    2801           75 :   bool cancelable_p = flag_tree_loop_vectorize;
    2802           75 :   if (cancelable_p)
    2803              :     {
    2804              :       unsigned i = 0;
    2805              :       struct partition *partition;
    2806          216 :       for (; partitions->iterate (i, &partition); ++i)
    2807          182 :         if (!partition_builtin_p (partition))
    2808              :           break;
    2809              : 
    2810              :      /* If all partitions are builtins, distributing it would be profitable and
    2811              :         we don't want to cancel the runtime alias checks.  */
    2812          142 :       if (i == partitions->length ())
    2813              :         cancelable_p = false;
    2814              :     }
    2815              : 
    2816              :   /* Generate internal function call for loop distribution alias check if the
    2817              :      runtime alias check should be cancelable.  */
    2818           41 :   if (cancelable_p)
    2819              :     {
    2820           37 :       call_stmt = gimple_build_call_internal (IFN_LOOP_DIST_ALIAS,
    2821              :                                               2, NULL_TREE, cond_expr);
    2822           37 :       lhs = make_ssa_name (boolean_type_node);
    2823           37 :       gimple_call_set_lhs (call_stmt, lhs);
    2824              :     }
    2825              :   else
    2826              :     lhs = cond_expr;
    2827              : 
    2828           75 :   prob = profile_probability::guessed_always ().apply_scale (9, 10);
    2829           75 :   initialize_original_copy_tables ();
    2830           75 :   nloop = loop_version (loop, lhs, &cond_bb, prob, prob.invert (),
    2831              :                         prob, prob.invert (), true);
    2832           75 :   free_original_copy_tables ();
    2833              :   /* Record the original loop number in newly generated loops.  In case of
    2834              :      distribution, the original loop will be distributed and the new loop
    2835              :      is kept.  */
    2836           75 :   loop->orig_loop_num = nloop->num;
    2837           75 :   nloop->orig_loop_num = nloop->num;
    2838           75 :   nloop->dont_vectorize = true;
    2839           75 :   nloop->force_vectorize = false;
    2840              : 
    2841           75 :   if (call_stmt)
    2842              :     {
    2843              :       /* Record new loop's num in IFN_LOOP_DIST_ALIAS because the original
    2844              :          loop could be destroyed.  */
    2845           37 :       arg0 = build_int_cst (integer_type_node, loop->orig_loop_num);
    2846           37 :       gimple_call_set_arg (call_stmt, 0, arg0);
    2847           37 :       gimple_seq_add_stmt_without_update (&cond_stmts, call_stmt);
    2848              :     }
    2849              : 
    2850           75 :   if (cond_stmts)
    2851              :     {
    2852           75 :       gimple_stmt_iterator cond_gsi = gsi_last_bb (cond_bb);
    2853           75 :       gsi_insert_seq_before (&cond_gsi, cond_stmts, GSI_SAME_STMT);
    2854              :     }
    2855           75 :   update_ssa (TODO_update_ssa_no_phi);
    2856           75 : }
    2857              : 
    2858              : /* Return true if loop versioning is needed to distrubute PARTITIONS.
    2859              :    ALIAS_DDRS are data dependence relations for runtime alias check.  */
    2860              : 
    2861              : static inline bool
    2862        12001 : version_for_distribution_p (vec<struct partition *> *partitions,
    2863              :                             vec<ddr_p> *alias_ddrs)
    2864              : {
    2865              :   /* No need to version loop if we have only one partition.  */
    2866        14587 :   if (partitions->length () == 1)
    2867              :     return false;
    2868              : 
    2869              :   /* Need to version loop if runtime alias check is necessary.  */
    2870         2586 :   return (alias_ddrs->length () > 0);
    2871              : }
    2872              : 
    2873              : /* Compare base offset of builtin mem* partitions P1 and P2.  */
    2874              : 
    2875              : static int
    2876          375 : offset_cmp (const void *vp1, const void *vp2)
    2877              : {
    2878          375 :   struct partition *p1 = *(struct partition *const *) vp1;
    2879          375 :   struct partition *p2 = *(struct partition *const *) vp2;
    2880          375 :   unsigned HOST_WIDE_INT o1 = p1->builtin->dst_base_offset;
    2881          375 :   unsigned HOST_WIDE_INT o2 = p2->builtin->dst_base_offset;
    2882          375 :   return (o2 < o1) - (o1 < o2);
    2883              : }
    2884              : 
    2885              : /* Fuse adjacent memset builtin PARTITIONS if possible.  This is a special
    2886              :    case optimization transforming below code:
    2887              : 
    2888              :      __builtin_memset (&obj, 0, 100);
    2889              :      _1 = &obj + 100;
    2890              :      __builtin_memset (_1, 0, 200);
    2891              :      _2 = &obj + 300;
    2892              :      __builtin_memset (_2, 0, 100);
    2893              : 
    2894              :    into:
    2895              : 
    2896              :      __builtin_memset (&obj, 0, 400);
    2897              : 
    2898              :    Note we don't have dependence information between different partitions
    2899              :    at this point, as a result, we can't handle nonadjacent memset builtin
    2900              :    partitions since dependence might be broken.  */
    2901              : 
    2902              : static void
    2903         2642 : fuse_memset_builtins (vec<struct partition *> *partitions)
    2904              : {
    2905         2642 :   unsigned i, j;
    2906         2642 :   struct partition *part1, *part2;
    2907         2642 :   tree rhs1, rhs2;
    2908              : 
    2909         8349 :   for (i = 0; partitions->iterate (i, &part1);)
    2910              :     {
    2911         5707 :       if (part1->kind != PKIND_MEMSET)
    2912              :         {
    2913         3493 :           i++;
    2914         3493 :           continue;
    2915              :         }
    2916              : 
    2917              :       /* Find sub-array of memset builtins of the same base.  Index range
    2918              :          of the sub-array is [i, j) with "j > i".  */
    2919         2288 :       for (j = i + 1; partitions->iterate (j, &part2); ++j)
    2920              :         {
    2921         1761 :           if (part2->kind != PKIND_MEMSET
    2922         2321 :               || !operand_equal_p (part1->builtin->dst_base_base,
    2923          560 :                                    part2->builtin->dst_base_base, 0))
    2924              :             break;
    2925              : 
    2926              :           /* Memset calls setting different values can't be merged.  */
    2927          122 :           rhs1 = gimple_assign_rhs1 (DR_STMT (part1->builtin->dst_dr));
    2928          122 :           rhs2 = gimple_assign_rhs1 (DR_STMT (part2->builtin->dst_dr));
    2929          122 :           if (!operand_equal_p (rhs1, rhs2, 0))
    2930              :             break;
    2931              :         }
    2932              : 
    2933              :       /* Stable sort is required in order to avoid breaking dependence.  */
    2934         2214 :       gcc_stablesort (&(*partitions)[i], j - i, sizeof (*partitions)[i],
    2935              :                       offset_cmp);
    2936              :       /* Continue with next partition.  */
    2937         2214 :       i = j;
    2938              :     }
    2939              : 
    2940              :   /* Merge all consecutive memset builtin partitions.  */
    2941        11562 :   for (i = 0; i < partitions->length () - 1;)
    2942              :     {
    2943         3139 :       part1 = (*partitions)[i];
    2944         3139 :       if (part1->kind != PKIND_MEMSET)
    2945              :         {
    2946         1378 :           i++;
    2947         3111 :           continue;
    2948              :         }
    2949              : 
    2950         1761 :       part2 = (*partitions)[i + 1];
    2951              :       /* Only merge memset partitions of the same base and with constant
    2952              :          access sizes.  */
    2953         3408 :       if (part2->kind != PKIND_MEMSET
    2954          560 :           || TREE_CODE (part1->builtin->size) != INTEGER_CST
    2955          295 :           || TREE_CODE (part2->builtin->size) != INTEGER_CST
    2956         2056 :           || !operand_equal_p (part1->builtin->dst_base_base,
    2957          295 :                                part2->builtin->dst_base_base, 0))
    2958              :         {
    2959         1647 :           i++;
    2960         1647 :           continue;
    2961              :         }
    2962          114 :       rhs1 = gimple_assign_rhs1 (DR_STMT (part1->builtin->dst_dr));
    2963          114 :       rhs2 = gimple_assign_rhs1 (DR_STMT (part2->builtin->dst_dr));
    2964          114 :       int bytev1 = const_with_all_bytes_same (rhs1);
    2965          114 :       int bytev2 = const_with_all_bytes_same (rhs2);
    2966              :       /* Only merge memset partitions of the same value.  */
    2967          114 :       if (bytev1 != bytev2 || bytev1 == -1)
    2968              :         {
    2969           44 :           i++;
    2970           44 :           continue;
    2971              :         }
    2972          140 :       wide_int end1 = wi::add (part1->builtin->dst_base_offset,
    2973           70 :                                wi::to_wide (part1->builtin->size));
    2974              :       /* Only merge adjacent memset partitions.  */
    2975           70 :       if (wi::ne_p (end1, part2->builtin->dst_base_offset))
    2976              :         {
    2977           42 :           i++;
    2978           42 :           continue;
    2979              :         }
    2980              :       /* Merge partitions[i] and partitions[i+1].  */
    2981           28 :       part1->builtin->size = fold_build2 (PLUS_EXPR, sizetype,
    2982              :                                           part1->builtin->size,
    2983              :                                           part2->builtin->size);
    2984           28 :       partition_free (part2);
    2985           28 :       partitions->ordered_remove (i + 1);
    2986           70 :     }
    2987         2642 : }
    2988              : 
    2989              : void
    2990        39330 : loop_distribution::finalize_partitions (class loop *loop,
    2991              :                                         vec<struct partition *> *partitions,
    2992              :                                         vec<ddr_p> *alias_ddrs)
    2993              : {
    2994        39330 :   unsigned i;
    2995        39330 :   struct partition *partition, *a;
    2996              : 
    2997        45001 :   if (partitions->length () == 1
    2998        39405 :       || alias_ddrs->length () > 0)
    2999        39330 :     return;
    3000              : 
    3001         5596 :   unsigned num_builtin = 0, num_normal = 0, num_partial_memset = 0;
    3002         5596 :   bool same_type_p = true;
    3003         5596 :   enum partition_type type = ((*partitions)[0])->type;
    3004        29679 :   for (i = 0; partitions->iterate (i, &partition); ++i)
    3005              :     {
    3006        24083 :       same_type_p &= (type == partition->type);
    3007        24083 :       if (partition_builtin_p (partition))
    3008              :         {
    3009         2656 :           num_builtin++;
    3010         2656 :           continue;
    3011              :         }
    3012        21427 :       num_normal++;
    3013        21427 :       if (partition->kind == PKIND_PARTIAL_MEMSET)
    3014            4 :         num_partial_memset++;
    3015              :     }
    3016              : 
    3017              :   /* Don't distribute current loop into too many loops given we don't have
    3018              :      memory stream cost model.  Be even more conservative in case of loop
    3019              :      nest distribution.  */
    3020         5596 :   if ((same_type_p && num_builtin == 0
    3021         2931 :        && (loop->inner == NULL || num_normal != 2 || num_partial_memset != 1))
    3022         2669 :       || (loop->inner != NULL
    3023           69 :           && i >= NUM_PARTITION_THRESHOLD && num_normal > 1)
    3024         2660 :       || (loop->inner == NULL
    3025         2600 :           && i >= NUM_PARTITION_THRESHOLD && num_normal > num_builtin))
    3026              :     {
    3027              :       a = (*partitions)[0];
    3028        18302 :       for (i = 1; partitions->iterate (i, &partition); ++i)
    3029              :         {
    3030        15348 :           partition_merge_into (NULL, a, partition, FUSE_FINALIZE);
    3031        15348 :           partition_free (partition);
    3032              :         }
    3033         2954 :       partitions->truncate (1);
    3034              :     }
    3035              : 
    3036              :   /* Fuse memset builtins if possible.  */
    3037         5596 :   if (partitions->length () > 1)
    3038         2642 :     fuse_memset_builtins (partitions);
    3039              : }
    3040              : 
    3041              : /* Distributes the code from LOOP in such a way that producer statements
    3042              :    are placed before consumer statements.  Tries to separate only the
    3043              :    statements from STMTS into separate loops.  Returns the number of
    3044              :    distributed loops.  Set NB_CALLS to number of generated builtin calls.
    3045              :    Set *DESTROY_P to whether LOOP needs to be destroyed.  */
    3046              : 
    3047              : int
    3048       140011 : loop_distribution::distribute_loop (class loop *loop,
    3049              :                  const vec<gimple *> &stmts,
    3050              :                  control_dependences *cd, int *nb_calls, bool *destroy_p,
    3051              :                  bool only_patterns_p)
    3052              : {
    3053       140011 :   ddrs_table = new hash_table<ddr_hasher> (389);
    3054       140011 :   struct graph *rdg;
    3055       140011 :   partition *partition;
    3056       140011 :   int i, nbp;
    3057              : 
    3058       140011 :   *destroy_p = false;
    3059       140011 :   *nb_calls = 0;
    3060       140011 :   loop_nest.create (0);
    3061       140011 :   if (!find_loop_nest (loop, &loop_nest))
    3062              :     {
    3063            0 :       loop_nest.release ();
    3064            0 :       delete ddrs_table;
    3065            0 :       return 0;
    3066              :     }
    3067              : 
    3068       140011 :   datarefs_vec.create (20);
    3069       140011 :   has_nonaddressable_dataref_p = false;
    3070       140011 :   rdg = build_rdg (loop, cd);
    3071       140011 :   if (!rdg)
    3072              :     {
    3073         2496 :       if (dump_file && (dump_flags & TDF_DETAILS))
    3074            0 :         fprintf (dump_file,
    3075              :                  "Loop %d not distributed: failed to build the RDG.\n",
    3076              :                  loop->num);
    3077              : 
    3078         2496 :       loop_nest.release ();
    3079         2496 :       free_data_refs (datarefs_vec);
    3080         2496 :       delete ddrs_table;
    3081         2496 :       return 0;
    3082              :     }
    3083              : 
    3084       275030 :   if (datarefs_vec.length () > MAX_DATAREFS_NUM)
    3085              :     {
    3086            0 :       if (dump_file && (dump_flags & TDF_DETAILS))
    3087            0 :         fprintf (dump_file,
    3088              :                  "Loop %d not distributed: too many memory references.\n",
    3089              :                  loop->num);
    3090              : 
    3091            0 :       free_rdg (rdg, loop);
    3092            0 :       loop_nest.release ();
    3093            0 :       free_data_refs (datarefs_vec);
    3094            0 :       delete ddrs_table;
    3095            0 :       return 0;
    3096              :     }
    3097              : 
    3098              :   data_reference_p dref;
    3099       567014 :   for (i = 0; datarefs_vec.iterate (i, &dref); ++i)
    3100       429499 :     dref->aux = (void *) (uintptr_t) i;
    3101              : 
    3102       137515 :   if (dump_file && (dump_flags & TDF_DETAILS))
    3103           70 :     dump_rdg (dump_file, rdg);
    3104              : 
    3105       137515 :   auto_vec<struct partition *, 3> partitions;
    3106       137515 :   rdg_build_partitions (rdg, stmts, &partitions);
    3107              : 
    3108       137515 :   auto_vec<ddr_p> alias_ddrs;
    3109              : 
    3110       137515 :   auto_bitmap stmt_in_all_partitions;
    3111       137515 :   bitmap_copy (stmt_in_all_partitions, partitions[0]->stmts);
    3112       384380 :   for (i = 1; partitions.iterate (i, &partition); ++i)
    3113       109350 :     bitmap_and_into (stmt_in_all_partitions, partitions[i]->stmts);
    3114              : 
    3115              :   bool any_builtin = false;
    3116              :   bool reduction_in_all = false;
    3117       384380 :   int reduction_partition_num = -1;
    3118       384380 :   FOR_EACH_VEC_ELT (partitions, i, partition)
    3119              :     {
    3120       246865 :       reduction_in_all
    3121       246865 :         |= classify_partition (loop, rdg, partition, stmt_in_all_partitions);
    3122       246865 :       any_builtin |= partition_builtin_p (partition);
    3123              :     }
    3124              : 
    3125              :   /* If we are only distributing patterns but did not detect any,
    3126              :      simply bail out.  */
    3127       137515 :   if (only_patterns_p
    3128       137515 :       && !any_builtin)
    3129              :     {
    3130        98185 :       nbp = 0;
    3131        98185 :       goto ldist_done;
    3132              :     }
    3133              : 
    3134              :   /* If we are only distributing patterns fuse all partitions that
    3135              :      were not classified as builtins.  This also avoids chopping
    3136              :      a loop into pieces, separated by builtin calls.  That is, we
    3137              :      only want no or a single loop body remaining.  */
    3138        39330 :   struct partition *into;
    3139        39330 :   if (only_patterns_p)
    3140              :     {
    3141        17881 :       for (i = 0; partitions.iterate (i, &into); ++i)
    3142        10659 :         if (!partition_builtin_p (into))
    3143              :           break;
    3144        11458 :       for (++i; partitions.iterate (i, &partition); ++i)
    3145         2666 :         if (!partition_builtin_p (partition))
    3146              :           {
    3147         1995 :             partition_merge_into (NULL, into, partition, FUSE_NON_BUILTIN);
    3148         1995 :             partitions.unordered_remove (i);
    3149         1995 :             partition_free (partition);
    3150         1995 :             i--;
    3151              :           }
    3152              :     }
    3153              : 
    3154              :   /* Due to limitations in the transform phase we have to fuse all
    3155              :      reduction partitions into the last partition so the existing
    3156              :      loop will contain all loop-closed PHI nodes.  */
    3157       109723 :   for (i = 0; partitions.iterate (i, &into); ++i)
    3158        72489 :     if (partition_reduction_p (into))
    3159              :       break;
    3160        41914 :   for (i = i + 1; partitions.iterate (i, &partition); ++i)
    3161         2584 :     if (partition_reduction_p (partition))
    3162              :       {
    3163         2296 :         partition_merge_into (rdg, into, partition, FUSE_REDUCTION);
    3164         2296 :         partitions.unordered_remove (i);
    3165         2296 :         partition_free (partition);
    3166         2296 :         i--;
    3167              :       }
    3168              : 
    3169              :   /* Apply our simple cost model - fuse partitions with similar
    3170              :      memory accesses.  */
    3171       108569 :   for (i = 0; partitions.iterate (i, &into); ++i)
    3172              :     {
    3173        69239 :       bool changed = false;
    3174       936107 :       for (int j = i + 1; partitions.iterate (j, &partition); ++j)
    3175              :         {
    3176       866868 :           if (share_memory_accesses (rdg, into, partition))
    3177              :             {
    3178         7163 :               partition_merge_into (rdg, into, partition, FUSE_SHARE_REF);
    3179         7163 :               partitions.unordered_remove (j);
    3180         7163 :               partition_free (partition);
    3181         7163 :               j--;
    3182         7163 :               changed = true;
    3183              :             }
    3184              :         }
    3185              :       /* If we fused 0 1 2 in step 1 to 0,2 1 as 0 and 2 have similar
    3186              :          accesses when 1 and 2 have similar accesses but not 0 and 1
    3187              :          then in the next iteration we will fail to consider merging
    3188              :          1 into 0,2.  So try again if we did any merging into 0.  */
    3189        69239 :       if (changed)
    3190         3625 :         i--;
    3191              :     }
    3192              : 
    3193              :   /* Put a non-builtin partition last if we need to preserve a reduction.
    3194              :      In most cases this helps to keep a normal partition last avoiding to
    3195              :      spill a reduction result across builtin calls.
    3196              :      ???  The proper way would be to use dependences to see whether we
    3197              :      can move builtin partitions earlier during merge_dep_scc_partitions
    3198              :      and its sort_partitions_by_post_order.  Especially when the
    3199              :      dependence graph is composed of multiple independent subgraphs the
    3200              :      heuristic does not work reliably.  */
    3201        39330 :   if (reduction_in_all
    3202        46446 :       && partition_builtin_p (partitions.last()))
    3203          136 :     FOR_EACH_VEC_ELT (partitions, i, partition)
    3204           76 :       if (!partition_builtin_p (partition))
    3205              :         {
    3206           15 :           partitions.unordered_remove (i);
    3207           15 :           partitions.quick_push (partition);
    3208           15 :           break;
    3209              :         }
    3210              : 
    3211              :   /* Build the partition dependency graph and fuse partitions in strong
    3212              :      connected component.  */
    3213        39330 :   if (partitions.length () > 1)
    3214              :     {
    3215              :       /* Don't support loop nest distribution under runtime alias check
    3216              :          since it's not likely to enable many vectorization opportunities.
    3217              :          Also if loop has any data reference which may be not addressable
    3218              :          since alias check needs to take, compare address of the object.  */
    3219         6756 :       if (loop->inner || has_nonaddressable_dataref_p)
    3220          362 :         merge_dep_scc_partitions (rdg, &partitions, false);
    3221              :       else
    3222              :         {
    3223         6394 :           merge_dep_scc_partitions (rdg, &partitions, true);
    3224         6394 :           if (partitions.length () > 1)
    3225         5642 :             break_alias_scc_partitions (rdg, &partitions, &alias_ddrs);
    3226              :         }
    3227              :     }
    3228              : 
    3229        39330 :   finalize_partitions (loop, &partitions, &alias_ddrs);
    3230              : 
    3231              :   /* If there is a reduction in all partitions make sure the last
    3232              :      non-builtin partition provides the LC PHI defs.  */
    3233        39330 :   if (reduction_in_all)
    3234              :     {
    3235        14284 :       FOR_EACH_VEC_ELT (partitions, i, partition)
    3236         7168 :         if (!partition_builtin_p (partition))
    3237         7092 :           reduction_partition_num = i;
    3238         7116 :       if (reduction_partition_num == -1)
    3239              :         {
    3240              :           /* If all partitions are builtin, force the last one to
    3241              :              be code generated as normal partition.  */
    3242           60 :           partition = partitions.last ();
    3243           60 :           partition->kind = PKIND_NORMAL;
    3244              :         }
    3245              :     }
    3246              : 
    3247        39330 :   nbp = partitions.length ();
    3248        39330 :   if (nbp == 0
    3249        75951 :       || (nbp == 1 && !partition_builtin_p (partitions[0]))
    3250        51454 :       || (nbp > 1 && partition_contains_all_rw (rdg, partitions)))
    3251              :     {
    3252        27329 :       nbp = 0;
    3253        27329 :       goto ldist_done;
    3254              :     }
    3255              : 
    3256        12076 :   if (version_for_distribution_p (&partitions, &alias_ddrs))
    3257           75 :     version_loop_by_alias_check (&partitions, loop, &alias_ddrs);
    3258              : 
    3259        12001 :   if (dump_file && (dump_flags & TDF_DETAILS))
    3260              :     {
    3261           42 :       fprintf (dump_file,
    3262              :                "distribute loop <%d> into partitions:\n", loop->num);
    3263           42 :       dump_rdg_partitions (dump_file, partitions);
    3264              :     }
    3265              : 
    3266        27166 :   FOR_EACH_VEC_ELT (partitions, i, partition)
    3267              :     {
    3268        15165 :       if (partition_builtin_p (partition))
    3269        12124 :         (*nb_calls)++;
    3270        15165 :       *destroy_p |= generate_code_for_partition (loop, partition, i < nbp - 1,
    3271              :                                                  i == reduction_partition_num);
    3272              :     }
    3273              : 
    3274       137515 :  ldist_done:
    3275       137515 :   loop_nest.release ();
    3276       137515 :   free_data_refs (datarefs_vec);
    3277       137515 :   for (hash_table<ddr_hasher>::iterator iter = ddrs_table->begin ();
    3278      2487307 :        iter != ddrs_table->end (); ++iter)
    3279              :     {
    3280      1174896 :       free_dependence_relation (*iter);
    3281      1174896 :       *iter = NULL;
    3282              :     }
    3283       137515 :   delete ddrs_table;
    3284              : 
    3285       349929 :   FOR_EACH_VEC_ELT (partitions, i, partition)
    3286       212414 :     partition_free (partition);
    3287              : 
    3288       137515 :   free_rdg (rdg, loop);
    3289       137515 :   return nbp - *nb_calls;
    3290       137515 : }
    3291              : 
    3292              : 
    3293       200790 : void loop_distribution::bb_top_order_init (void)
    3294              : {
    3295       200790 :   int rpo_num;
    3296       200790 :   int *rpo = XNEWVEC (int, n_basic_blocks_for_fn (cfun) - NUM_FIXED_BLOCKS);
    3297       200790 :   edge entry = single_succ_edge (ENTRY_BLOCK_PTR_FOR_FN (cfun));
    3298       200790 :   bitmap exit_bbs = BITMAP_ALLOC (NULL);
    3299              : 
    3300       200790 :   bb_top_order_index = XNEWVEC (int, last_basic_block_for_fn (cfun));
    3301       200790 :   bb_top_order_index_size = last_basic_block_for_fn (cfun);
    3302              : 
    3303       200790 :   entry->flags &= ~EDGE_DFS_BACK;
    3304       200790 :   bitmap_set_bit (exit_bbs, EXIT_BLOCK);
    3305       200790 :   rpo_num = rev_post_order_and_mark_dfs_back_seme (cfun, entry, exit_bbs, true,
    3306              :                                                    rpo, NULL);
    3307       200790 :   BITMAP_FREE (exit_bbs);
    3308              : 
    3309      6868116 :   for (int i = 0; i < rpo_num; i++)
    3310      6667326 :     bb_top_order_index[rpo[i]] = i;
    3311              : 
    3312       200790 :   free (rpo);
    3313       200790 : }
    3314              : 
    3315       200790 : void loop_distribution::bb_top_order_destroy ()
    3316              : {
    3317       200790 :   free (bb_top_order_index);
    3318       200790 :   bb_top_order_index = NULL;
    3319       200790 :   bb_top_order_index_size = 0;
    3320       200790 : }
    3321              : 
    3322              : 
    3323              : /* Given LOOP, this function records seed statements for distribution in
    3324              :    WORK_LIST.  Return false if there is nothing for distribution.  */
    3325              : 
    3326              : static bool
    3327       184512 : find_seed_stmts_for_distribution (class loop *loop, vec<gimple *> *work_list)
    3328              : {
    3329       184512 :   basic_block *bbs = get_loop_body_in_dom_order (loop);
    3330              : 
    3331              :   /* Initialize the worklist with stmts we seed the partitions with.  */
    3332       667341 :   for (unsigned i = 0; i < loop->num_nodes; ++i)
    3333              :     {
    3334              :       /* In irreducible sub-regions we don't know how to redirect
    3335              :          conditions, so fail.  See PR100492.  */
    3336       527179 :       if (bbs[i]->flags & BB_IRREDUCIBLE_LOOP)
    3337              :         {
    3338            4 :           if (dump_file && (dump_flags & TDF_DETAILS))
    3339            0 :             fprintf (dump_file, "loop %d contains an irreducible region.\n",
    3340              :                      loop->num);
    3341            4 :           work_list->truncate (0);
    3342            4 :           break;
    3343              :         }
    3344       527175 :       for (gphi_iterator gsi = gsi_start_phis (bbs[i]);
    3345      1244580 :            !gsi_end_p (gsi); gsi_next (&gsi))
    3346              :         {
    3347       717405 :           gphi *phi = gsi.phi ();
    3348      1434810 :           if (virtual_operand_p (gimple_phi_result (phi)))
    3349       191575 :             continue;
    3350              :           /* Distribute stmts which have defs that are used outside of
    3351              :              the loop.  */
    3352       525830 :           if (!stmt_has_scalar_dependences_outside_loop (loop, phi))
    3353       500768 :             continue;
    3354        25062 :           work_list->safe_push (phi);
    3355              :         }
    3356      1054350 :       for (gimple_stmt_iterator gsi = gsi_start_bb (bbs[i]);
    3357      3583143 :            !gsi_end_p (gsi); gsi_next (&gsi))
    3358              :         {
    3359      3100314 :           gimple *stmt = gsi_stmt (gsi);
    3360              : 
    3361              :           /* Ignore clobbers, they do not have true side effects.  */
    3362      3100314 :           if (gimple_clobber_p (stmt))
    3363      2785241 :             continue;
    3364              : 
    3365              :           /* If there is a stmt with side-effects bail out - we
    3366              :              cannot and should not distribute this loop.  */
    3367      3091718 :           if (gimple_has_side_effects (stmt))
    3368              :             {
    3369        44346 :               free (bbs);
    3370        44346 :               return false;
    3371              :             }
    3372              : 
    3373              :           /* Distribute stmts which have defs that are used outside of
    3374              :              the loop.  */
    3375      3047372 :           if (stmt_has_scalar_dependences_outside_loop (loop, stmt))
    3376              :             ;
    3377              :           /* Otherwise only distribute stores for now.  */
    3378      4776426 :           else if (!gimple_vdef (stmt))
    3379      2776645 :             continue;
    3380              : 
    3381       270727 :           work_list->safe_push (stmt);
    3382              :         }
    3383              :     }
    3384       140166 :   bool res = work_list->length () > 0;
    3385       140020 :   if (res && !can_copy_bbs_p (bbs, loop->num_nodes))
    3386              :     {
    3387            8 :       if (dump_file && (dump_flags & TDF_DETAILS))
    3388            0 :         fprintf (dump_file, "cannot copy loop %d.\n", loop->num);
    3389              :       res = false;
    3390              :     }
    3391       140166 :   free (bbs);
    3392       140166 :   return res;
    3393              : }
    3394              : 
    3395              : /* A helper function for generate_{rawmemchr,strlen}_builtin functions in order
    3396              :    to place new statements SEQ before LOOP and replace the old reduction
    3397              :    variable with the new one.  */
    3398              : 
    3399              : static void
    3400           29 : generate_reduction_builtin_1 (loop_p loop, gimple_seq &seq,
    3401              :                               tree reduction_var_old, tree reduction_var_new,
    3402              :                               const char *info, machine_mode load_mode)
    3403              : {
    3404           29 :   gcc_assert (flag_tree_loop_distribute_patterns);
    3405              : 
    3406              :   /* Place new statements before LOOP.  */
    3407           29 :   gimple_stmt_iterator gsi = gsi_last_bb (loop_preheader_edge (loop)->src);
    3408           29 :   gsi_insert_seq_after (&gsi, seq, GSI_CONTINUE_LINKING);
    3409              : 
    3410              :   /* Replace old reduction variable with new one.  */
    3411           29 :   imm_use_iterator iter;
    3412           29 :   gimple *stmt;
    3413           29 :   use_operand_p use_p;
    3414          144 :   FOR_EACH_IMM_USE_STMT (stmt, iter, reduction_var_old)
    3415              :     {
    3416          344 :       FOR_EACH_IMM_USE_ON_STMT (use_p, iter)
    3417           86 :         SET_USE (use_p, reduction_var_new);
    3418              : 
    3419           86 :       update_stmt (stmt);
    3420           29 :     }
    3421              : 
    3422           29 :   if (dump_file && (dump_flags & TDF_DETAILS))
    3423            7 :     fprintf (dump_file, info, GET_MODE_NAME (load_mode));
    3424           29 : }
    3425              : 
    3426              : /* Generate a call to rawmemchr and place it before LOOP.  REDUCTION_VAR is
    3427              :    replaced with a fresh SSA name representing the result of the call.  */
    3428              : 
    3429              : static void
    3430            0 : generate_rawmemchr_builtin (loop_p loop, tree reduction_var,
    3431              :                             data_reference_p store_dr, tree base, tree pattern,
    3432              :                             location_t loc)
    3433              : {
    3434            0 :   gimple_seq seq = NULL;
    3435              : 
    3436            0 :   tree mem = force_gimple_operand (base, &seq, true, NULL_TREE);
    3437            0 :   gimple *fn_call = gimple_build_call_internal (IFN_RAWMEMCHR, 2, mem, pattern);
    3438            0 :   tree reduction_var_new = copy_ssa_name (reduction_var);
    3439            0 :   gimple_call_set_lhs (fn_call, reduction_var_new);
    3440            0 :   gimple_set_location (fn_call, loc);
    3441            0 :   gimple_seq_add_stmt (&seq, fn_call);
    3442              : 
    3443            0 :   if (store_dr)
    3444              :     {
    3445            0 :       gassign *g = gimple_build_assign (DR_REF (store_dr), reduction_var_new);
    3446            0 :       gimple_seq_add_stmt (&seq, g);
    3447              :     }
    3448              : 
    3449            0 :   generate_reduction_builtin_1 (loop, seq, reduction_var, reduction_var_new,
    3450              :                                 "generated rawmemchr%s\n",
    3451            0 :                                 TYPE_MODE (TREE_TYPE (TREE_TYPE (base))));
    3452            0 : }
    3453              : 
    3454              : /* Helper function for generate_strlen_builtin(,_using_rawmemchr)  */
    3455              : 
    3456              : static void
    3457           29 : generate_strlen_builtin_1 (loop_p loop, gimple_seq &seq,
    3458              :                            tree reduction_var_old, tree reduction_var_new,
    3459              :                            machine_mode mode, tree start_len)
    3460              : {
    3461              :   /* REDUCTION_VAR_NEW has either size type or ptrdiff type and must be
    3462              :      converted if types of old and new reduction variable are not compatible. */
    3463           29 :   reduction_var_new = gimple_convert (&seq, TREE_TYPE (reduction_var_old),
    3464              :                                       reduction_var_new);
    3465              : 
    3466              :   /* Loops of the form `for (i=42; s[i]; ++i);` have an additional start
    3467              :      length.  */
    3468           29 :   if (!integer_zerop (start_len))
    3469              :     {
    3470           25 :       tree lhs = make_ssa_name (TREE_TYPE (reduction_var_new));
    3471           25 :       gimple *g = gimple_build_assign (lhs, PLUS_EXPR, reduction_var_new,
    3472              :                                        start_len);
    3473           25 :       gimple_seq_add_stmt (&seq, g);
    3474           25 :       reduction_var_new = lhs;
    3475              :     }
    3476              : 
    3477           29 :   generate_reduction_builtin_1 (loop, seq, reduction_var_old, reduction_var_new,
    3478              :                                 "generated strlen%s\n", mode);
    3479           29 : }
    3480              : 
    3481              : /* Generate a call to strlen and place it before LOOP.  REDUCTION_VAR is
    3482              :    replaced with a fresh SSA name representing the result of the call.  */
    3483              : 
    3484              : static void
    3485           29 : generate_strlen_builtin (loop_p loop, tree reduction_var, tree base,
    3486              :                          tree start_len, location_t loc)
    3487              : {
    3488           29 :   gimple_seq seq = NULL;
    3489              : 
    3490           29 :   tree reduction_var_new = make_ssa_name (size_type_node);
    3491              : 
    3492           29 :   tree mem = force_gimple_operand (base, &seq, true, NULL_TREE);
    3493           58 :   tree fn = build_fold_addr_expr (builtin_decl_implicit (BUILT_IN_STRLEN));
    3494           29 :   gimple *fn_call = gimple_build_call (fn, 1, mem);
    3495           29 :   gimple_call_set_lhs (fn_call, reduction_var_new);
    3496           29 :   gimple_set_location (fn_call, loc);
    3497           29 :   gimple_seq_add_stmt (&seq, fn_call);
    3498              : 
    3499           29 :   generate_strlen_builtin_1 (loop, seq, reduction_var, reduction_var_new,
    3500              :                              QImode, start_len);
    3501           29 : }
    3502              : 
    3503              : /* Generate code in order to mimic the behaviour of strlen but this time over
    3504              :    an array of elements with mode different than QI.  REDUCTION_VAR is replaced
    3505              :    with a fresh SSA name representing the result, i.e., the length.  */
    3506              : 
    3507              : static void
    3508            0 : generate_strlen_builtin_using_rawmemchr (loop_p loop, tree reduction_var,
    3509              :                                          tree base, tree load_type,
    3510              :                                          tree start_len, location_t loc)
    3511              : {
    3512            0 :   gimple_seq seq = NULL;
    3513              : 
    3514            0 :   tree start = force_gimple_operand (base, &seq, true, NULL_TREE);
    3515            0 :   tree zero = build_zero_cst (load_type);
    3516            0 :   gimple *fn_call = gimple_build_call_internal (IFN_RAWMEMCHR, 2, start, zero);
    3517            0 :   tree end = make_ssa_name (TREE_TYPE (base));
    3518            0 :   gimple_call_set_lhs (fn_call, end);
    3519            0 :   gimple_set_location (fn_call, loc);
    3520            0 :   gimple_seq_add_stmt (&seq, fn_call);
    3521              : 
    3522              :   /* Determine the number of elements between START and END by
    3523              :      evaluating (END - START) / sizeof (*START).  */
    3524            0 :   tree diff = make_ssa_name (ptrdiff_type_node);
    3525            0 :   gimple *diff_stmt = gimple_build_assign (diff, POINTER_DIFF_EXPR, end, start);
    3526            0 :   gimple_seq_add_stmt (&seq, diff_stmt);
    3527              :   /* Let SIZE be the size of each character.  */
    3528            0 :   tree size = gimple_convert (&seq, ptrdiff_type_node,
    3529            0 :                               TYPE_SIZE_UNIT (load_type));
    3530            0 :   tree count = make_ssa_name (ptrdiff_type_node);
    3531            0 :   gimple *count_stmt = gimple_build_assign (count, TRUNC_DIV_EXPR, diff, size);
    3532            0 :   gimple_seq_add_stmt (&seq, count_stmt);
    3533              : 
    3534            0 :   generate_strlen_builtin_1 (loop, seq, reduction_var, count,
    3535            0 :                              TYPE_MODE (load_type),
    3536              :                              start_len);
    3537            0 : }
    3538              : 
    3539              : /* Return true if we can count at least as many characters by taking pointer
    3540              :    difference as we can count via reduction_var without an overflow.  Thus
    3541              :    compute 2^n < (2^(m-1) / s) where n = TYPE_PRECISION (reduction_var_type),
    3542              :    m = TYPE_PRECISION (ptrdiff_type_node), and s = size of each character.  */
    3543              : static bool
    3544            0 : reduction_var_overflows_first (tree reduction_var_type, tree load_type)
    3545              : {
    3546            0 :   widest_int n2 = wi::lshift (1, TYPE_PRECISION (reduction_var_type));;
    3547            0 :   widest_int m2 = wi::lshift (1, TYPE_PRECISION (ptrdiff_type_node) - 1);
    3548            0 :   widest_int s = wi::to_widest (TYPE_SIZE_UNIT (load_type));
    3549            0 :   return wi::ltu_p (n2, wi::udiv_trunc (m2, s));
    3550            0 : }
    3551              : 
    3552              : static gimple *
    3553        24026 : determine_reduction_stmt_1 (const loop_p loop, const basic_block *bbs)
    3554              : {
    3555        24026 :   gimple *reduction_stmt = NULL;
    3556              : 
    3557        88087 :   for (unsigned i = 0, ninsns = 0; i < loop->num_nodes; ++i)
    3558              :     {
    3559        69485 :       basic_block bb = bbs[i];
    3560              : 
    3561       133586 :       for (gphi_iterator bsi = gsi_start_phis (bb); !gsi_end_p (bsi);
    3562        64101 :            gsi_next (&bsi))
    3563              :         {
    3564        64826 :           gphi *phi = bsi.phi ();
    3565       129652 :           if (virtual_operand_p (gimple_phi_result (phi)))
    3566        20524 :             continue;
    3567        44302 :           if (stmt_has_scalar_dependences_outside_loop (loop, phi))
    3568              :             {
    3569         7640 :               if (reduction_stmt)
    3570          725 :                 return NULL;
    3571              :               reduction_stmt = phi;
    3572              :             }
    3573              :         }
    3574              : 
    3575        68760 :       for (gimple_stmt_iterator bsi = gsi_start_nondebug_bb (bb);
    3576       214799 :            !gsi_end_p (bsi); gsi_next_nondebug (&bsi), ++ninsns)
    3577              :         {
    3578              :           /* Bail out early for loops which are unlikely to match.  */
    3579       150738 :           if (ninsns > 16)
    3580         4699 :             return NULL;
    3581       147898 :           gimple *stmt = gsi_stmt (bsi);
    3582       147898 :           if (gimple_clobber_p (stmt))
    3583         5562 :             continue;
    3584       142336 :           if (gimple_code (stmt) == GIMPLE_LABEL)
    3585          975 :             continue;
    3586       252172 :           if (gimple_has_volatile_ops (stmt))
    3587              :             return NULL;
    3588       141311 :           if (stmt_has_scalar_dependences_outside_loop (loop, stmt))
    3589              :             {
    3590         5824 :               if (reduction_stmt)
    3591              :                 return NULL;
    3592              :               reduction_stmt = stmt;
    3593              :             }
    3594              :         }
    3595              :     }
    3596              : 
    3597              :   return reduction_stmt;
    3598              : }
    3599              : 
    3600              : /* If LOOP has a single non-volatile reduction statement, then return a pointer
    3601              :    to it.  Otherwise return NULL.  */
    3602              : static gimple *
    3603        24026 : determine_reduction_stmt (const loop_p loop)
    3604              : {
    3605        24026 :   basic_block *bbs = get_loop_body (loop);
    3606        24026 :   gimple *reduction_stmt = determine_reduction_stmt_1 (loop, bbs);
    3607        24026 :   XDELETEVEC (bbs);
    3608        24026 :   return reduction_stmt;
    3609              : }
    3610              : 
    3611              : /* Transform loops which mimic the effects of builtins rawmemchr or strlen and
    3612              :    replace them accordingly.  For example, a loop of the form
    3613              : 
    3614              :      for (; *p != 42; ++p);
    3615              : 
    3616              :    is replaced by
    3617              : 
    3618              :      p = rawmemchr<MODE> (p, 42);
    3619              : 
    3620              :    under the assumption that rawmemchr is available for a particular MODE.
    3621              :    Another example is
    3622              : 
    3623              :      int i;
    3624              :      for (i = 42; s[i]; ++i);
    3625              : 
    3626              :    which is replaced by
    3627              : 
    3628              :      i = (int)strlen (&s[42]) + 42;
    3629              : 
    3630              :    for some character array S.  In case array S is not of type character array
    3631              :    we end up with
    3632              : 
    3633              :      i = (int)(rawmemchr<MODE> (&s[42], 0) - &s[42]) + 42;
    3634              : 
    3635              :    assuming that rawmemchr is available for a particular MODE.  */
    3636              : 
    3637              : bool
    3638        83890 : loop_distribution::transform_reduction_loop (loop_p loop)
    3639              : {
    3640        83890 :   gimple *reduction_stmt;
    3641        83890 :   data_reference_p load_dr = NULL, store_dr = NULL;
    3642              : 
    3643        83890 :   edge e = single_exit (loop);
    3644       244231 :   gcond *cond = safe_dyn_cast <gcond *> (*gsi_last_bb (e->src));
    3645        66390 :   if (!cond)
    3646              :     return false;
    3647              :   /* Ensure loop condition is an (in)equality test and loop is exited either if
    3648              :      the inequality test fails or the equality test succeeds.  */
    3649        51189 :   if (!(e->flags & EDGE_FALSE_VALUE && gimple_cond_code (cond) == NE_EXPR)
    3650        88649 :       && !(e->flags & EDGE_TRUE_VALUE && gimple_cond_code (cond) == EQ_EXPR))
    3651              :     return false;
    3652              :   /* A limitation of the current implementation is that we only support
    3653              :      constant patterns in (in)equality tests.  */
    3654        31616 :   tree pattern = gimple_cond_rhs (cond);
    3655        31616 :   if (TREE_CODE (pattern) != INTEGER_CST)
    3656              :     return false;
    3657              : 
    3658        24026 :   reduction_stmt = determine_reduction_stmt (loop);
    3659              : 
    3660              :   /* A limitation of the current implementation is that we require a reduction
    3661              :      statement.  Therefore, loops without a reduction statement as in the
    3662              :      following are not recognized:
    3663              :      int *p;
    3664              :      void foo (void) { for (; *p; ++p); } */
    3665        24026 :   if (reduction_stmt == NULL)
    3666              :     return false;
    3667              : 
    3668              :   /* Reduction variables are guaranteed to be SSA names.  */
    3669         8257 :   tree reduction_var;
    3670         8257 :   switch (gimple_code (reduction_stmt))
    3671              :     {
    3672         8160 :     case GIMPLE_ASSIGN:
    3673         8160 :     case GIMPLE_PHI:
    3674         8160 :       reduction_var = gimple_get_lhs (reduction_stmt);
    3675         8160 :       break;
    3676              :     default:
    3677              :       /* Bail out e.g. for GIMPLE_CALL.  */
    3678              :       return false;
    3679              :     }
    3680              : 
    3681         8160 :   struct graph *rdg = build_rdg (loop, NULL);
    3682         8160 :   if (rdg == NULL)
    3683              :     {
    3684          721 :       if (dump_file && (dump_flags & TDF_DETAILS))
    3685            0 :         fprintf (dump_file,
    3686              :                  "Loop %d not transformed: failed to build the RDG.\n",
    3687              :                  loop->num);
    3688              : 
    3689          721 :       return false;
    3690              :     }
    3691        14878 :   auto_bitmap partition_stmts;
    3692         7439 :   bitmap_set_range (partition_stmts, 0, rdg->n_vertices);
    3693         7439 :   find_single_drs (loop, rdg, partition_stmts, &store_dr, &load_dr);
    3694         7439 :   free_rdg (rdg, loop);
    3695              : 
    3696              :   /* Bail out if there is no single load.  */
    3697         7439 :   if (load_dr == NULL)
    3698              :     return false;
    3699              : 
    3700              :   /* Reaching this point we have a loop with a single reduction variable,
    3701              :      a single load, and an optional single store.  */
    3702              : 
    3703         3729 :   tree load_ref = DR_REF (load_dr);
    3704         3729 :   tree load_type = TREE_TYPE (load_ref);
    3705         3729 :   tree load_access_base = build_fold_addr_expr (load_ref);
    3706         3729 :   tree load_access_size = TYPE_SIZE_UNIT (load_type);
    3707         3729 :   affine_iv load_iv, reduction_iv;
    3708              : 
    3709         3729 :   if (!INTEGRAL_TYPE_P (load_type)
    3710         3729 :       || !type_has_mode_precision_p (load_type))
    3711         2365 :     return false;
    3712              : 
    3713              :   /* We already ensured that the loop condition tests for (in)equality where the
    3714              :      rhs is a constant pattern. Now ensure that the lhs is the result of the
    3715              :      load.  */
    3716         1364 :   if (gimple_cond_lhs (cond) != gimple_assign_lhs (DR_STMT (load_dr)))
    3717              :     return false;
    3718              : 
    3719              :   /* Bail out if no affine induction variable with constant step can be
    3720              :      determined.  */
    3721         1219 :   if (!simple_iv (loop, loop, load_access_base, &load_iv, false))
    3722              :     return false;
    3723              : 
    3724              :   /* Bail out if memory accesses are not consecutive or not growing.  */
    3725         1210 :   if (!operand_equal_p (load_iv.step, load_access_size, 0))
    3726              :     return false;
    3727              : 
    3728         1189 :   if (!simple_iv (loop, loop, reduction_var, &reduction_iv, false))
    3729              :     return false;
    3730              : 
    3731              :   /* Handle rawmemchr like loops.  */
    3732         1075 :   if (operand_equal_p (load_iv.base, reduction_iv.base)
    3733         1075 :       && operand_equal_p (load_iv.step, reduction_iv.step))
    3734              :     {
    3735          259 :       if (store_dr)
    3736              :         {
    3737              :           /* Ensure that we store to X and load from X+I where I>0.  */
    3738            0 :           if (TREE_CODE (load_iv.base) != POINTER_PLUS_EXPR
    3739            0 :               || !integer_onep (TREE_OPERAND (load_iv.base, 1)))
    3740            0 :             return false;
    3741            0 :           tree ptr_base = TREE_OPERAND (load_iv.base, 0);
    3742            0 :           if (TREE_CODE (ptr_base) != SSA_NAME)
    3743              :             return false;
    3744            0 :           gimple *def = SSA_NAME_DEF_STMT (ptr_base);
    3745            0 :           if (!gimple_assign_single_p (def)
    3746            0 :               || gimple_assign_rhs1 (def) != DR_REF (store_dr))
    3747              :             return false;
    3748              :           /* Ensure that the reduction value is stored.  */
    3749            0 :           if (gimple_assign_rhs1 (DR_STMT (store_dr)) != reduction_var)
    3750              :             return false;
    3751              :         }
    3752              :       /* Bail out if target does not provide rawmemchr for a certain mode.  */
    3753          259 :       machine_mode mode = TYPE_MODE (load_type);
    3754          259 :       if (direct_optab_handler (rawmemchr_optab, mode) == CODE_FOR_nothing)
    3755              :         return false;
    3756            0 :       location_t loc = gimple_location (DR_STMT (load_dr));
    3757            0 :       generate_rawmemchr_builtin (loop, reduction_var, store_dr, load_iv.base,
    3758              :                                   pattern, loc);
    3759            0 :       return true;
    3760              :     }
    3761              : 
    3762              :   /* Handle strlen like loops.  */
    3763          816 :   if (store_dr == NULL
    3764          806 :       && integer_zerop (pattern)
    3765          806 :       && INTEGRAL_TYPE_P (TREE_TYPE (reduction_var))
    3766          796 :       && TREE_CODE (reduction_iv.base) == INTEGER_CST
    3767          792 :       && TREE_CODE (reduction_iv.step) == INTEGER_CST
    3768         1608 :       && integer_onep (reduction_iv.step))
    3769              :     {
    3770          736 :       location_t loc = gimple_location (DR_STMT (load_dr));
    3771          736 :       tree reduction_var_type = TREE_TYPE (reduction_var);
    3772              :       /* While determining the length of a string an overflow might occur.
    3773              :          If an overflow only occurs in the loop implementation and not in the
    3774              :          strlen implementation, then either the overflow is undefined or the
    3775              :          truncated result of strlen equals the one of the loop.  Otherwise if
    3776              :          an overflow may also occur in the strlen implementation, then
    3777              :          replacing a loop by a call to strlen is sound whenever we ensure that
    3778              :          if an overflow occurs in the strlen implementation, then also an
    3779              :          overflow occurs in the loop implementation which is undefined.  It
    3780              :          seems reasonable to relax this and assume that the strlen
    3781              :          implementation cannot overflow in case sizetype is big enough in the
    3782              :          sense that an overflow can only happen for string objects which are
    3783              :          bigger than half of the address space; at least for 32-bit targets and
    3784              :          up.
    3785              : 
    3786              :          For strlen which makes use of rawmemchr the maximal length of a string
    3787              :          which can be determined without an overflow is PTRDIFF_MAX / S where
    3788              :          each character has size S.  Since an overflow for ptrdiff type is
    3789              :          undefined we have to make sure that if an overflow occurs, then an
    3790              :          overflow occurs in the loop implementation, too, and this is
    3791              :          undefined, too.  Similar as before we relax this and assume that no
    3792              :          string object is larger than half of the address space; at least for
    3793              :          32-bit targets and up.  */
    3794          736 :       if (TYPE_MODE (load_type) == TYPE_MODE (char_type_node)
    3795          456 :           && TYPE_PRECISION (load_type) == TYPE_PRECISION (char_type_node)
    3796          456 :           && ((TYPE_PRECISION (sizetype) >= TYPE_PRECISION (ptr_type_node) - 1
    3797          456 :                && TYPE_PRECISION (ptr_type_node) >= 32)
    3798            0 :               || (TYPE_OVERFLOW_UNDEFINED (reduction_var_type)
    3799            0 :                   && TYPE_PRECISION (reduction_var_type) <= TYPE_PRECISION (sizetype)))
    3800          765 :           && builtin_decl_implicit (BUILT_IN_STRLEN))
    3801           29 :         generate_strlen_builtin (loop, reduction_var, load_iv.base,
    3802              :                                  reduction_iv.base, loc);
    3803          707 :       else if (direct_optab_handler (rawmemchr_optab, TYPE_MODE (load_type))
    3804              :                != CODE_FOR_nothing
    3805          707 :                && ((TYPE_PRECISION (ptrdiff_type_node) == TYPE_PRECISION (ptr_type_node)
    3806            0 :                     && TYPE_PRECISION (ptrdiff_type_node) >= 32)
    3807            0 :                    || (TYPE_OVERFLOW_UNDEFINED (reduction_var_type)
    3808            0 :                        && reduction_var_overflows_first (reduction_var_type, load_type))))
    3809            0 :         generate_strlen_builtin_using_rawmemchr (loop, reduction_var,
    3810              :                                                  load_iv.base,
    3811              :                                                  load_type,
    3812              :                                                  reduction_iv.base, loc);
    3813              :       else
    3814          707 :         return false;
    3815           29 :       return true;
    3816              :     }
    3817              : 
    3818              :   return false;
    3819              : }
    3820              : 
    3821              : /* Given innermost LOOP, return the outermost enclosing loop that forms a
    3822              :    perfect loop nest.  */
    3823              : 
    3824              : static class loop *
    3825       167557 : prepare_perfect_loop_nest (class loop *loop)
    3826              : {
    3827       167557 :   class loop *outer = loop_outer (loop);
    3828       167557 :   tree niters = number_of_latch_executions (loop);
    3829              : 
    3830              :   /* TODO: We only support the innermost 3-level loop nest distribution
    3831              :      because of compilation time issue for now.  This should be relaxed
    3832              :      in the future.  Note we only allow 3-level loop nest distribution
    3833              :      when parallelizing loops.  */
    3834       167557 :   while ((loop->inner == NULL
    3835        17464 :           || (loop->inner->inner == NULL && flag_tree_parallelize_loops > 1))
    3836       167617 :          && loop_outer (outer)
    3837        41203 :          && outer->inner == loop && loop->next == NULL
    3838        26800 :          && single_exit (outer)
    3839        25118 :          && !chrec_contains_symbols_defined_in_loop (niters, outer->num)
    3840        18392 :          && (niters = number_of_latch_executions (outer)) != NULL_TREE
    3841       203413 :          && niters != chrec_dont_know)
    3842              :     {
    3843        17464 :       loop = outer;
    3844        17464 :       outer = loop_outer (loop);
    3845              :     }
    3846              : 
    3847       167557 :   return loop;
    3848              : }
    3849              : 
    3850              : 
    3851              : unsigned int
    3852       200790 : loop_distribution::execute (function *fun)
    3853              : {
    3854       200790 :   bool changed = false;
    3855       200790 :   basic_block bb;
    3856       200790 :   control_dependences *cd = NULL;
    3857       200790 :   auto_vec<loop_p> loops_to_be_destroyed;
    3858              : 
    3859       594616 :   if (number_of_loops (fun) <= 1)
    3860              :     return 0;
    3861              : 
    3862       200790 :   bb_top_order_init ();
    3863              : 
    3864      7269696 :   FOR_ALL_BB_FN (bb, fun)
    3865              :     {
    3866      7068906 :       gimple_stmt_iterator gsi;
    3867     10994891 :       for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (&gsi))
    3868      3925985 :         gimple_set_uid (gsi_stmt (gsi), -1);
    3869     63260960 :       for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
    3870     49123148 :         gimple_set_uid (gsi_stmt (gsi), -1);
    3871              :     }
    3872              : 
    3873              :   /* We can at the moment only distribute non-nested loops, thus restrict
    3874              :      walking to innermost loops.  */
    3875      1019482 :   for (auto loop : loops_list (cfun, LI_ONLY_INNERMOST))
    3876              :     {
    3877              :       /* Don't distribute multiple exit edges loop, or cold loop when
    3878              :          not doing pattern detection.  */
    3879       417112 :       if (!single_exit (loop)
    3880       417112 :           || (!flag_tree_loop_distribute_patterns
    3881         1206 :               && !optimize_loop_for_speed_p (loop)))
    3882       165441 :         continue;
    3883              : 
    3884              :       /* If niters is unknown don't distribute loop but rather try to transform
    3885              :          it to a call to a builtin.  */
    3886       251671 :       tree niters = number_of_latch_executions (loop);
    3887       251671 :       if (niters == NULL_TREE || niters == chrec_dont_know)
    3888              :         {
    3889        84114 :           datarefs_vec.create (20);
    3890        84114 :           if (flag_tree_loop_distribute_patterns
    3891        84114 :               && transform_reduction_loop (loop))
    3892              :             {
    3893           29 :               changed = true;
    3894           29 :               loops_to_be_destroyed.safe_push (loop);
    3895           29 :               if (dump_enabled_p ())
    3896              :                 {
    3897            7 :                   dump_user_location_t loc = find_loop_location (loop);
    3898            7 :                   dump_printf_loc (MSG_OPTIMIZED_LOCATIONS,
    3899              :                                    loc, "Loop %d transformed into a builtin.\n",
    3900              :                                    loop->num);
    3901              :                 }
    3902              :             }
    3903        84114 :           free_data_refs (datarefs_vec);
    3904        84114 :           continue;
    3905        84114 :         }
    3906              : 
    3907              :       /* Get the perfect loop nest for distribution.  */
    3908       167557 :       loop = prepare_perfect_loop_nest (loop);
    3909       340068 :       for (; loop; loop = loop->inner)
    3910              :         {
    3911       184512 :           auto_vec<gimple *> work_list;
    3912       184512 :           if (!find_seed_stmts_for_distribution (loop, &work_list))
    3913        44501 :             continue;
    3914              : 
    3915       140011 :           const char *str = loop->inner ? " nest" : "";
    3916       140011 :           dump_user_location_t loc = find_loop_location (loop);
    3917       140011 :           if (!cd)
    3918              :             {
    3919        61773 :               calculate_dominance_info (CDI_DOMINATORS);
    3920        61773 :               calculate_dominance_info (CDI_POST_DOMINATORS);
    3921        61773 :               cd = new control_dependences ();
    3922        61773 :               free_dominance_info (CDI_POST_DOMINATORS);
    3923              :             }
    3924              : 
    3925       140011 :           bool destroy_p;
    3926       140011 :           int nb_generated_loops, nb_generated_calls;
    3927       140011 :           bool only_patterns = !optimize_loop_for_speed_p (loop)
    3928       140011 :                                || !flag_tree_loop_distribution;
    3929              :           /* do not try to distribute loops that are not expected to iterate.  */
    3930        30837 :           if (!only_patterns)
    3931              :             {
    3932        30837 :               HOST_WIDE_INT iterations = estimated_loop_iterations_int (loop);
    3933        30837 :               if (iterations < 0)
    3934        14673 :                 iterations = likely_max_loop_iterations_int (loop);
    3935        30837 :               if (!iterations)
    3936       109184 :                 only_patterns = true;
    3937              :             }
    3938       140011 :           nb_generated_loops
    3939       140011 :             = distribute_loop (loop, work_list, cd, &nb_generated_calls,
    3940              :                                &destroy_p, only_patterns);
    3941       140011 :           if (destroy_p)
    3942        10103 :             loops_to_be_destroyed.safe_push (loop);
    3943              : 
    3944       140011 :           if (nb_generated_loops + nb_generated_calls > 0)
    3945              :             {
    3946        12001 :               changed = true;
    3947        12001 :               if (dump_enabled_p ())
    3948           71 :                 dump_printf_loc (MSG_OPTIMIZED_LOCATIONS,
    3949              :                                  loc, "Loop%s %d distributed: split to %d loops "
    3950              :                                  "and %d library calls.\n", str, loop->num,
    3951              :                                  nb_generated_loops, nb_generated_calls);
    3952              : 
    3953        12001 :               break;
    3954              :             }
    3955              : 
    3956       128010 :           if (dump_file && (dump_flags & TDF_DETAILS))
    3957           28 :             fprintf (dump_file, "Loop%s %d not distributed.\n", str, loop->num);
    3958       184512 :         }
    3959       200790 :     }
    3960              : 
    3961       200790 :   if (cd)
    3962        61773 :     delete cd;
    3963              : 
    3964       200790 :   if (bb_top_order_index != NULL)
    3965       200790 :     bb_top_order_destroy ();
    3966              : 
    3967       200790 :   if (changed)
    3968              :     {
    3969              :       /* Destroy loop bodies that could not be reused.  Do this late as we
    3970              :          otherwise can end up referring to stale data in control
    3971              :          dependences.  */
    3972              :       unsigned i;
    3973              :       class loop *loop;
    3974        17886 :       FOR_EACH_VEC_ELT (loops_to_be_destroyed, i, loop)
    3975        10132 :         destroy_loop (loop);
    3976              : 
    3977              :       /* Cached scalar evolutions now may refer to wrong or non-existing
    3978              :          loops.  */
    3979         7754 :       scev_reset ();
    3980         7754 :       mark_virtual_operands_for_renaming (fun);
    3981         7754 :       rewrite_into_loop_closed_ssa (NULL, TODO_update_ssa);
    3982              :     }
    3983              : 
    3984       200790 :   checking_verify_loop_structure ();
    3985              : 
    3986       200790 :   return changed ? TODO_cleanup_cfg : 0;
    3987       200790 : }
    3988              : 
    3989              : 
    3990              : /* Distribute all loops in the current function.  */
    3991              : 
    3992              : namespace {
    3993              : 
    3994              : const pass_data pass_data_loop_distribution =
    3995              : {
    3996              :   GIMPLE_PASS, /* type */
    3997              :   "ldist", /* name */
    3998              :   OPTGROUP_LOOP, /* optinfo_flags */
    3999              :   TV_TREE_LOOP_DISTRIBUTION, /* tv_id */
    4000              :   ( PROP_cfg | PROP_ssa ), /* properties_required */
    4001              :   0, /* properties_provided */
    4002              :   0, /* properties_destroyed */
    4003              :   0, /* todo_flags_start */
    4004              :   0, /* todo_flags_finish */
    4005              : };
    4006              : 
    4007              : class pass_loop_distribution : public gimple_opt_pass
    4008              : {
    4009              : public:
    4010       287872 :   pass_loop_distribution (gcc::context *ctxt)
    4011       575744 :     : gimple_opt_pass (pass_data_loop_distribution, ctxt)
    4012              :   {}
    4013              : 
    4014              :   /* opt_pass methods: */
    4015       240838 :   bool gate (function *) final override
    4016              :     {
    4017       240838 :       return flag_tree_loop_distribution
    4018       240838 :         || flag_tree_loop_distribute_patterns;
    4019              :     }
    4020              : 
    4021              :   unsigned int execute (function *) final override;
    4022              : 
    4023              : }; // class pass_loop_distribution
    4024              : 
    4025              : unsigned int
    4026       200790 : pass_loop_distribution::execute (function *fun)
    4027              : {
    4028       200790 :   return loop_distribution ().execute (fun);
    4029              : }
    4030              : 
    4031              : } // anon namespace
    4032              : 
    4033              : gimple_opt_pass *
    4034       287872 : make_pass_loop_distribution (gcc::context *ctxt)
    4035              : {
    4036       287872 :   return new pass_loop_distribution (ctxt);
    4037              : }
        

Generated by: LCOV version 2.4-beta

LCOV profile is generated on x86_64 machine using following configure options: configure --disable-bootstrap --enable-coverage=opt --enable-languages=c,c++,fortran,go,jit,lto,rust,m2 --enable-host-shared. GCC test suite is run with the built compiler.