LCOV - code coverage report
Current view: top level - gcc - tree-loop-distribution.cc (source / functions) Coverage Total Hit
Test: gcc.info Lines: 91.5 % 1677 1534
Test Date: 2024-04-13 14:00:49 Functions: 87.4 % 87 76
Legend: Lines: hit not hit | Branches: + taken - not taken # not executed Branches: - 0 0

             Branch data     Line data    Source code
       1                 :             : /* Loop distribution.
       2                 :             :    Copyright (C) 2006-2024 Free Software Foundation, Inc.
       3                 :             :    Contributed by Georges-Andre Silber <Georges-Andre.Silber@ensmp.fr>
       4                 :             :    and Sebastian Pop <sebastian.pop@amd.com>.
       5                 :             : 
       6                 :             : This file is part of GCC.
       7                 :             : 
       8                 :             : GCC is free software; you can redistribute it and/or modify it
       9                 :             : under the terms of the GNU General Public License as published by the
      10                 :             : Free Software Foundation; either version 3, or (at your option) any
      11                 :             : later version.
      12                 :             : 
      13                 :             : GCC is distributed in the hope that it will be useful, but WITHOUT
      14                 :             : ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
      15                 :             : FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
      16                 :             : for more details.
      17                 :             : 
      18                 :             : You should have received a copy of the GNU General Public License
      19                 :             : along with GCC; see the file COPYING3.  If not see
      20                 :             : <http://www.gnu.org/licenses/>.  */
      21                 :             : 
      22                 :             : /* This pass performs loop distribution: for example, the loop
      23                 :             : 
      24                 :             :    |DO I = 2, N
      25                 :             :    |    A(I) = B(I) + C
      26                 :             :    |    D(I) = A(I-1)*E
      27                 :             :    |ENDDO
      28                 :             : 
      29                 :             :    is transformed to
      30                 :             : 
      31                 :             :    |DOALL I = 2, N
      32                 :             :    |   A(I) = B(I) + C
      33                 :             :    |ENDDO
      34                 :             :    |
      35                 :             :    |DOALL I = 2, N
      36                 :             :    |   D(I) = A(I-1)*E
      37                 :             :    |ENDDO
      38                 :             : 
      39                 :             :    Loop distribution is the dual of loop fusion.  It separates statements
      40                 :             :    of a loop (or loop nest) into multiple loops (or loop nests) with the
      41                 :             :    same loop header.  The major goal is to separate statements which may
      42                 :             :    be vectorized from those that can't.  This pass implements distribution
      43                 :             :    in the following steps:
      44                 :             : 
      45                 :             :      1) Seed partitions with specific type statements.  For now we support
      46                 :             :         two types seed statements: statement defining variable used outside
      47                 :             :         of loop; statement storing to memory.
      48                 :             :      2) Build reduced dependence graph (RDG) for loop to be distributed.
      49                 :             :         The vertices (RDG:V) model all statements in the loop and the edges
      50                 :             :         (RDG:E) model flow and control dependencies between statements.
      51                 :             :      3) Apart from RDG, compute data dependencies between memory references.
      52                 :             :      4) Starting from seed statement, build up partition by adding depended
      53                 :             :         statements according to RDG's dependence information.  Partition is
      54                 :             :         classified as parallel type if it can be executed paralleled; or as
      55                 :             :         sequential type if it can't.  Parallel type partition is further
      56                 :             :         classified as different builtin kinds if it can be implemented as
      57                 :             :         builtin function calls.
      58                 :             :      5) Build partition dependence graph (PG) based on data dependencies.
      59                 :             :         The vertices (PG:V) model all partitions and the edges (PG:E) model
      60                 :             :         all data dependencies between every partitions pair.  In general,
      61                 :             :         data dependence is either compilation time known or unknown.  In C
      62                 :             :         family languages, there exists quite amount compilation time unknown
      63                 :             :         dependencies because of possible alias relation of data references.
      64                 :             :         We categorize PG's edge to two types: "true" edge that represents
      65                 :             :         compilation time known data dependencies; "alias" edge for all other
      66                 :             :         data dependencies.
      67                 :             :      6) Traverse subgraph of PG as if all "alias" edges don't exist.  Merge
      68                 :             :         partitions in each strong connected component (SCC) correspondingly.
      69                 :             :         Build new PG for merged partitions.
      70                 :             :      7) Traverse PG again and this time with both "true" and "alias" edges
      71                 :             :         included.  We try to break SCCs by removing some edges.  Because
      72                 :             :         SCCs by "true" edges are all fused in step 6), we can break SCCs
      73                 :             :         by removing some "alias" edges.  It's NP-hard to choose optimal
      74                 :             :         edge set, fortunately simple approximation is good enough for us
      75                 :             :         given the small problem scale.
      76                 :             :      8) Collect all data dependencies of the removed "alias" edges.  Create
      77                 :             :         runtime alias checks for collected data dependencies.
      78                 :             :      9) Version loop under the condition of runtime alias checks.  Given
      79                 :             :         loop distribution generally introduces additional overhead, it is
      80                 :             :         only useful if vectorization is achieved in distributed loop.  We
      81                 :             :         version loop with internal function call IFN_LOOP_DIST_ALIAS.  If
      82                 :             :         no distributed loop can be vectorized, we simply remove distributed
      83                 :             :         loops and recover to the original one.
      84                 :             : 
      85                 :             :    TODO:
      86                 :             :      1) We only distribute innermost two-level loop nest now.  We should
      87                 :             :         extend it for arbitrary loop nests in the future.
      88                 :             :      2) We only fuse partitions in SCC now.  A better fusion algorithm is
      89                 :             :         desired to minimize loop overhead, maximize parallelism and maximize
      90                 :             :         data reuse.  */
      91                 :             : 
      92                 :             : #include "config.h"
      93                 :             : #include "system.h"
      94                 :             : #include "coretypes.h"
      95                 :             : #include "backend.h"
      96                 :             : #include "tree.h"
      97                 :             : #include "gimple.h"
      98                 :             : #include "cfghooks.h"
      99                 :             : #include "tree-pass.h"
     100                 :             : #include "ssa.h"
     101                 :             : #include "gimple-pretty-print.h"
     102                 :             : #include "fold-const.h"
     103                 :             : #include "cfganal.h"
     104                 :             : #include "gimple-iterator.h"
     105                 :             : #include "gimplify-me.h"
     106                 :             : #include "stor-layout.h"
     107                 :             : #include "tree-cfg.h"
     108                 :             : #include "tree-ssa-loop-manip.h"
     109                 :             : #include "tree-ssa-loop-ivopts.h"
     110                 :             : #include "tree-ssa-loop.h"
     111                 :             : #include "tree-into-ssa.h"
     112                 :             : #include "tree-ssa.h"
     113                 :             : #include "cfgloop.h"
     114                 :             : #include "tree-scalar-evolution.h"
     115                 :             : #include "tree-vectorizer.h"
     116                 :             : #include "tree-eh.h"
     117                 :             : #include "gimple-fold.h"
     118                 :             : #include "tree-affine.h"
     119                 :             : #include "intl.h"
     120                 :             : #include "rtl.h"
     121                 :             : #include "memmodel.h"
     122                 :             : #include "optabs.h"
     123                 :             : #include "tree-ssa-loop-niter.h"
     124                 :             : 
     125                 :             : 
     126                 :             : #define MAX_DATAREFS_NUM \
     127                 :             :         ((unsigned) param_loop_max_datarefs_for_datadeps)
     128                 :             : 
     129                 :             : /* Threshold controlling number of distributed partitions.  Given it may
     130                 :             :    be unnecessary if a memory stream cost model is invented in the future,
     131                 :             :    we define it as a temporary macro, rather than a parameter.  */
     132                 :             : #define NUM_PARTITION_THRESHOLD (4)
     133                 :             : 
     134                 :             : /* Hashtable helpers.  */
     135                 :             : 
     136                 :             : struct ddr_hasher : nofree_ptr_hash <struct data_dependence_relation>
     137                 :             : {
     138                 :             :   static inline hashval_t hash (const data_dependence_relation *);
     139                 :             :   static inline bool equal (const data_dependence_relation *,
     140                 :             :                             const data_dependence_relation *);
     141                 :             : };
     142                 :             : 
     143                 :             : /* Hash function for data dependence.  */
     144                 :             : 
     145                 :             : inline hashval_t
     146                 :     9470454 : ddr_hasher::hash (const data_dependence_relation *ddr)
     147                 :             : {
     148                 :     9470454 :   inchash::hash h;
     149                 :     9470454 :   h.add_ptr (DDR_A (ddr));
     150                 :     9470454 :   h.add_ptr (DDR_B (ddr));
     151                 :     9470454 :   return h.end ();
     152                 :             : }
     153                 :             : 
     154                 :             : /* Hash table equality function for data dependence.  */
     155                 :             : 
     156                 :             : inline bool
     157                 :     8703602 : ddr_hasher::equal (const data_dependence_relation *ddr1,
     158                 :             :                    const data_dependence_relation *ddr2)
     159                 :             : {
     160                 :     8703602 :   return (DDR_A (ddr1) == DDR_A (ddr2) && DDR_B (ddr1) == DDR_B (ddr2));
     161                 :             : }
     162                 :             : 
     163                 :             : 
     164                 :             : 
     165                 :             : #define DR_INDEX(dr)      ((uintptr_t) (dr)->aux)
     166                 :             : 
     167                 :             : /* A Reduced Dependence Graph (RDG) vertex representing a statement.  */
     168                 :             : struct rdg_vertex
     169                 :             : {
     170                 :             :   /* The statement represented by this vertex.  */
     171                 :             :   gimple *stmt;
     172                 :             : 
     173                 :             :   /* Vector of data-references in this statement.  */
     174                 :             :   vec<data_reference_p> datarefs;
     175                 :             : 
     176                 :             :   /* True when the statement contains a write to memory.  */
     177                 :             :   bool has_mem_write;
     178                 :             : 
     179                 :             :   /* True when the statement contains a read from memory.  */
     180                 :             :   bool has_mem_reads;
     181                 :             : };
     182                 :             : 
     183                 :             : #define RDGV_STMT(V)     ((struct rdg_vertex *) ((V)->data))->stmt
     184                 :             : #define RDGV_DATAREFS(V) ((struct rdg_vertex *) ((V)->data))->datarefs
     185                 :             : #define RDGV_HAS_MEM_WRITE(V) ((struct rdg_vertex *) ((V)->data))->has_mem_write
     186                 :             : #define RDGV_HAS_MEM_READS(V) ((struct rdg_vertex *) ((V)->data))->has_mem_reads
     187                 :             : #define RDG_STMT(RDG, I) RDGV_STMT (&(RDG->vertices[I]))
     188                 :             : #define RDG_DATAREFS(RDG, I) RDGV_DATAREFS (&(RDG->vertices[I]))
     189                 :             : #define RDG_MEM_WRITE_STMT(RDG, I) RDGV_HAS_MEM_WRITE (&(RDG->vertices[I]))
     190                 :             : #define RDG_MEM_READS_STMT(RDG, I) RDGV_HAS_MEM_READS (&(RDG->vertices[I]))
     191                 :             : 
     192                 :             : /* Data dependence type.  */
     193                 :             : 
     194                 :             : enum rdg_dep_type
     195                 :             : {
     196                 :             :   /* Read After Write (RAW).  */
     197                 :             :   flow_dd = 'f',
     198                 :             : 
     199                 :             :   /* Control dependence (execute conditional on).  */
     200                 :             :   control_dd = 'c'
     201                 :             : };
     202                 :             : 
     203                 :             : /* Dependence information attached to an edge of the RDG.  */
     204                 :             : 
     205                 :             : struct rdg_edge
     206                 :             : {
     207                 :             :   /* Type of the dependence.  */
     208                 :             :   enum rdg_dep_type type;
     209                 :             : };
     210                 :             : 
     211                 :             : #define RDGE_TYPE(E)        ((struct rdg_edge *) ((E)->data))->type
     212                 :             : 
     213                 :             : /* Kind of distributed loop.  */
     214                 :             : enum partition_kind {
     215                 :             :     PKIND_NORMAL,
     216                 :             :     /* Partial memset stands for a paritition can be distributed into a loop
     217                 :             :        of memset calls, rather than a single memset call.  It's handled just
     218                 :             :        like a normal parition, i.e, distributed as separate loop, no memset
     219                 :             :        call is generated.
     220                 :             : 
     221                 :             :        Note: This is a hacking fix trying to distribute ZERO-ing stmt in a
     222                 :             :        loop nest as deep as possible.  As a result, parloop achieves better
     223                 :             :        parallelization by parallelizing deeper loop nest.  This hack should
     224                 :             :        be unnecessary and removed once distributed memset can be understood
     225                 :             :        and analyzed in data reference analysis.  See PR82604 for more.  */
     226                 :             :     PKIND_PARTIAL_MEMSET,
     227                 :             :     PKIND_MEMSET, PKIND_MEMCPY, PKIND_MEMMOVE
     228                 :             : };
     229                 :             : 
     230                 :             : /* Type of distributed loop.  */
     231                 :             : enum partition_type {
     232                 :             :     /* The distributed loop can be executed parallelly.  */
     233                 :             :     PTYPE_PARALLEL = 0,
     234                 :             :     /* The distributed loop has to be executed sequentially.  */
     235                 :             :     PTYPE_SEQUENTIAL
     236                 :             : };
     237                 :             : 
     238                 :             : /* Builtin info for loop distribution.  */
     239                 :             : struct builtin_info
     240                 :             : {
     241                 :             :   /* data-references a kind != PKIND_NORMAL partition is about.  */
     242                 :             :   data_reference_p dst_dr;
     243                 :             :   data_reference_p src_dr;
     244                 :             :   /* Base address and size of memory objects operated by the builtin.  Note
     245                 :             :      both dest and source memory objects must have the same size.  */
     246                 :             :   tree dst_base;
     247                 :             :   tree src_base;
     248                 :             :   tree size;
     249                 :             :   /* Base and offset part of dst_base after stripping constant offset.  This
     250                 :             :      is only used in memset builtin distribution for now.  */
     251                 :             :   tree dst_base_base;
     252                 :             :   unsigned HOST_WIDE_INT dst_base_offset;
     253                 :             : };
     254                 :             : 
     255                 :             : /* Partition for loop distribution.  */
     256                 :             : struct partition
     257                 :             : {
     258                 :             :   /* Statements of the partition.  */
     259                 :             :   bitmap stmts;
     260                 :             :   /* True if the partition defines variable which is used outside of loop.  */
     261                 :             :   bool reduction_p;
     262                 :             :   location_t loc;
     263                 :             :   enum partition_kind kind;
     264                 :             :   enum partition_type type;
     265                 :             :   /* Data references in the partition.  */
     266                 :             :   bitmap datarefs;
     267                 :             :   /* Information of builtin parition.  */
     268                 :             :   struct builtin_info *builtin;
     269                 :             : };
     270                 :             : 
     271                 :             : /* Partitions are fused because of different reasons.  */
     272                 :             : enum fuse_type
     273                 :             : {
     274                 :             :   FUSE_NON_BUILTIN = 0,
     275                 :             :   FUSE_REDUCTION = 1,
     276                 :             :   FUSE_SHARE_REF = 2,
     277                 :             :   FUSE_SAME_SCC = 3,
     278                 :             :   FUSE_FINALIZE = 4
     279                 :             : };
     280                 :             : 
     281                 :             : /* Description on different fusing reason.  */
     282                 :             : static const char *fuse_message[] = {
     283                 :             :   "they are non-builtins",
     284                 :             :   "they have reductions",
     285                 :             :   "they have shared memory refs",
     286                 :             :   "they are in the same dependence scc",
     287                 :             :   "there is no point to distribute loop"};
     288                 :             : 
     289                 :             : 
     290                 :             : /* Dump vertex I in RDG to FILE.  */
     291                 :             : 
     292                 :             : static void
     293                 :         856 : dump_rdg_vertex (FILE *file, struct graph *rdg, int i)
     294                 :             : {
     295                 :         856 :   struct vertex *v = &(rdg->vertices[i]);
     296                 :         856 :   struct graph_edge *e;
     297                 :             : 
     298                 :         856 :   fprintf (file, "(vertex %d: (%s%s) (in:", i,
     299                 :         856 :            RDG_MEM_WRITE_STMT (rdg, i) ? "w" : "",
     300                 :         856 :            RDG_MEM_READS_STMT (rdg, i) ? "r" : "");
     301                 :             : 
     302                 :         856 :   if (v->pred)
     303                 :        2859 :     for (e = v->pred; e; e = e->pred_next)
     304                 :        2003 :       fprintf (file, " %d", e->src);
     305                 :             : 
     306                 :         856 :   fprintf (file, ") (out:");
     307                 :             : 
     308                 :         856 :   if (v->succ)
     309                 :        2715 :     for (e = v->succ; e; e = e->succ_next)
     310                 :        2003 :       fprintf (file, " %d", e->dest);
     311                 :             : 
     312                 :         856 :   fprintf (file, ")\n");
     313                 :         856 :   print_gimple_stmt (file, RDGV_STMT (v), 0, TDF_VOPS|TDF_MEMSYMS);
     314                 :         856 :   fprintf (file, ")\n");
     315                 :         856 : }
     316                 :             : 
     317                 :             : /* Call dump_rdg_vertex on stderr.  */
     318                 :             : 
     319                 :             : DEBUG_FUNCTION void
     320                 :           0 : debug_rdg_vertex (struct graph *rdg, int i)
     321                 :             : {
     322                 :           0 :   dump_rdg_vertex (stderr, rdg, i);
     323                 :           0 : }
     324                 :             : 
     325                 :             : /* Dump the reduced dependence graph RDG to FILE.  */
     326                 :             : 
     327                 :             : static void
     328                 :          67 : dump_rdg (FILE *file, struct graph *rdg)
     329                 :             : {
     330                 :          67 :   fprintf (file, "(rdg\n");
     331                 :         923 :   for (int i = 0; i < rdg->n_vertices; i++)
     332                 :         856 :     dump_rdg_vertex (file, rdg, i);
     333                 :          67 :   fprintf (file, ")\n");
     334                 :          67 : }
     335                 :             : 
     336                 :             : /* Call dump_rdg on stderr.  */
     337                 :             : 
     338                 :             : DEBUG_FUNCTION void
     339                 :           0 : debug_rdg (struct graph *rdg)
     340                 :             : {
     341                 :           0 :   dump_rdg (stderr, rdg);
     342                 :           0 : }
     343                 :             : 
     344                 :             : static void
     345                 :           0 : dot_rdg_1 (FILE *file, struct graph *rdg)
     346                 :             : {
     347                 :           0 :   int i;
     348                 :           0 :   pretty_printer buffer;
     349                 :           0 :   pp_needs_newline (&buffer) = false;
     350                 :           0 :   buffer.buffer->stream = file;
     351                 :             : 
     352                 :           0 :   fprintf (file, "digraph RDG {\n");
     353                 :             : 
     354                 :           0 :   for (i = 0; i < rdg->n_vertices; i++)
     355                 :             :     {
     356                 :           0 :       struct vertex *v = &(rdg->vertices[i]);
     357                 :           0 :       struct graph_edge *e;
     358                 :             : 
     359                 :           0 :       fprintf (file, "%d [label=\"[%d] ", i, i);
     360                 :           0 :       pp_gimple_stmt_1 (&buffer, RDGV_STMT (v), 0, TDF_SLIM);
     361                 :           0 :       pp_flush (&buffer);
     362                 :           0 :       fprintf (file, "\"]\n");
     363                 :             : 
     364                 :             :       /* Highlight reads from memory.  */
     365                 :           0 :       if (RDG_MEM_READS_STMT (rdg, i))
     366                 :           0 :        fprintf (file, "%d [style=filled, fillcolor=green]\n", i);
     367                 :             : 
     368                 :             :       /* Highlight stores to memory.  */
     369                 :           0 :       if (RDG_MEM_WRITE_STMT (rdg, i))
     370                 :           0 :        fprintf (file, "%d [style=filled, fillcolor=red]\n", i);
     371                 :             : 
     372                 :           0 :       if (v->succ)
     373                 :           0 :        for (e = v->succ; e; e = e->succ_next)
     374                 :           0 :          switch (RDGE_TYPE (e))
     375                 :             :            {
     376                 :           0 :            case flow_dd:
     377                 :             :              /* These are the most common dependences: don't print these. */
     378                 :           0 :              fprintf (file, "%d -> %d \n", i, e->dest);
     379                 :           0 :              break;
     380                 :             : 
     381                 :           0 :            case control_dd:
     382                 :           0 :              fprintf (file, "%d -> %d [label=control] \n", i, e->dest);
     383                 :           0 :              break;
     384                 :             : 
     385                 :           0 :            default:
     386                 :           0 :              gcc_unreachable ();
     387                 :             :            }
     388                 :             :     }
     389                 :             : 
     390                 :           0 :   fprintf (file, "}\n\n");
     391                 :           0 : }
     392                 :             : 
     393                 :             : /* Display the Reduced Dependence Graph using dotty.  */
     394                 :             : 
     395                 :             : DEBUG_FUNCTION void
     396                 :           0 : dot_rdg (struct graph *rdg)
     397                 :             : {
     398                 :             :   /* When debugging, you may want to enable the following code.  */
     399                 :             : #ifdef HAVE_POPEN
     400                 :           0 :   FILE *file = popen ("dot -Tx11", "w");
     401                 :           0 :   if (!file)
     402                 :             :     return;
     403                 :           0 :   dot_rdg_1 (file, rdg);
     404                 :           0 :   fflush (file);
     405                 :           0 :   close (fileno (file));
     406                 :           0 :   pclose (file);
     407                 :             : #else
     408                 :             :   dot_rdg_1 (stderr, rdg);
     409                 :             : #endif
     410                 :             : }
     411                 :             : 
     412                 :             : /* Returns the index of STMT in RDG.  */
     413                 :             : 
     414                 :             : static int
     415                 :    12772596 : rdg_vertex_for_stmt (struct graph *rdg ATTRIBUTE_UNUSED, gimple *stmt)
     416                 :             : {
     417                 :    12772596 :   int index = gimple_uid (stmt);
     418                 :    12772596 :   gcc_checking_assert (index == -1 || RDG_STMT (rdg, index) == stmt);
     419                 :    12772596 :   return index;
     420                 :             : }
     421                 :             : 
     422                 :             : /* Creates dependence edges in RDG for all the uses of DEF.  IDEF is
     423                 :             :    the index of DEF in RDG.  */
     424                 :             : 
     425                 :             : static void
     426                 :     1543760 : create_rdg_edges_for_scalar (struct graph *rdg, tree def, int idef)
     427                 :             : {
     428                 :     1543760 :   use_operand_p imm_use_p;
     429                 :     1543760 :   imm_use_iterator iterator;
     430                 :             : 
     431                 :     4146220 :   FOR_EACH_IMM_USE_FAST (imm_use_p, iterator, def)
     432                 :             :     {
     433                 :     2602460 :       struct graph_edge *e;
     434                 :     2602460 :       int use = rdg_vertex_for_stmt (rdg, USE_STMT (imm_use_p));
     435                 :             : 
     436                 :     2602460 :       if (use < 0)
     437                 :      348891 :         continue;
     438                 :             : 
     439                 :     2253569 :       e = add_edge (rdg, idef, use);
     440                 :     2253569 :       e->data = XNEW (struct rdg_edge);
     441                 :     2253569 :       RDGE_TYPE (e) = flow_dd;
     442                 :             :     }
     443                 :     1543760 : }
     444                 :             : 
     445                 :             : /* Creates an edge for the control dependences of BB to the vertex V.  */
     446                 :             : 
     447                 :             : static void
     448                 :     1943568 : create_edge_for_control_dependence (struct graph *rdg, basic_block bb,
     449                 :             :                                     int v, control_dependences *cd)
     450                 :             : {
     451                 :     1943568 :   bitmap_iterator bi;
     452                 :     1943568 :   unsigned edge_n;
     453                 :     5514169 :   EXECUTE_IF_SET_IN_BITMAP (cd->get_edges_dependent_on (bb->index),
     454                 :             :                             0, edge_n, bi)
     455                 :             :     {
     456                 :     3570601 :       basic_block cond_bb = cd->get_edge_src (edge_n);
     457                 :     3570601 :       gimple *stmt = *gsi_last_bb (cond_bb);
     458                 :     3570601 :       if (stmt && is_ctrl_stmt (stmt))
     459                 :             :         {
     460                 :     3205852 :           struct graph_edge *e;
     461                 :     3205852 :           int c = rdg_vertex_for_stmt (rdg, stmt);
     462                 :     3205852 :           if (c < 0)
     463                 :     1030396 :             continue;
     464                 :             : 
     465                 :     2175456 :           e = add_edge (rdg, c, v);
     466                 :     2175456 :           e->data = XNEW (struct rdg_edge);
     467                 :     2175456 :           RDGE_TYPE (e) = control_dd;
     468                 :             :         }
     469                 :             :     }
     470                 :     1943568 : }
     471                 :             : 
     472                 :             : /* Creates the edges of the reduced dependence graph RDG.  */
     473                 :             : 
     474                 :             : static void
     475                 :      120133 : create_rdg_flow_edges (struct graph *rdg)
     476                 :             : {
     477                 :      120133 :   int i;
     478                 :      120133 :   def_operand_p def_p;
     479                 :      120133 :   ssa_op_iter iter;
     480                 :             : 
     481                 :     2006944 :   for (i = 0; i < rdg->n_vertices; i++)
     482                 :     5317382 :     FOR_EACH_PHI_OR_STMT_DEF (def_p, RDG_STMT (rdg, i),
     483                 :             :                               iter, SSA_OP_DEF)
     484                 :     1543760 :       create_rdg_edges_for_scalar (rdg, DEF_FROM_PTR (def_p), i);
     485                 :      120133 : }
     486                 :             : 
     487                 :             : /* Creates the edges of the reduced dependence graph RDG.  */
     488                 :             : 
     489                 :             : static void
     490                 :      114653 : create_rdg_cd_edges (struct graph *rdg, control_dependences *cd, loop_p loop)
     491                 :             : {
     492                 :      114653 :   int i;
     493                 :             : 
     494                 :     1963586 :   for (i = 0; i < rdg->n_vertices; i++)
     495                 :             :     {
     496                 :     1848933 :       gimple *stmt = RDG_STMT (rdg, i);
     497                 :     1848933 :       if (gimple_code (stmt) == GIMPLE_PHI)
     498                 :             :         {
     499                 :      348627 :           edge_iterator ei;
     500                 :      348627 :           edge e;
     501                 :     1043326 :           FOR_EACH_EDGE (e, ei, gimple_bb (stmt)->preds)
     502                 :      694699 :             if (flow_bb_inside_loop_p (loop, e->src))
     503                 :      443262 :               create_edge_for_control_dependence (rdg, e->src, i, cd);
     504                 :             :         }
     505                 :             :       else
     506                 :     1500306 :         create_edge_for_control_dependence (rdg, gimple_bb (stmt), i, cd);
     507                 :             :     }
     508                 :      114653 : }
     509                 :             : 
     510                 :             : 
     511                 :             : class loop_distribution
     512                 :             : {
     513                 :             :   private:
     514                 :             :   /* The loop (nest) to be distributed.  */
     515                 :             :   vec<loop_p> loop_nest;
     516                 :             : 
     517                 :             :   /* Vector of data references in the loop to be distributed.  */
     518                 :             :   vec<data_reference_p> datarefs_vec;
     519                 :             : 
     520                 :             :   /* If there is nonaddressable data reference in above vector.  */
     521                 :             :   bool has_nonaddressable_dataref_p;
     522                 :             : 
     523                 :             :   /* Store index of data reference in aux field.  */
     524                 :             : 
     525                 :             :   /* Hash table for data dependence relation in the loop to be distributed.  */
     526                 :             :   hash_table<ddr_hasher> *ddrs_table;
     527                 :             : 
     528                 :             :   /* Array mapping basic block's index to its topological order.  */
     529                 :             :   int *bb_top_order_index;
     530                 :             :   /* And size of the array.  */
     531                 :             :   int bb_top_order_index_size;
     532                 :             : 
     533                 :             :   /* Build the vertices of the reduced dependence graph RDG.  Return false
     534                 :             :      if that failed.  */
     535                 :             :   bool create_rdg_vertices (struct graph *rdg, const vec<gimple *> &stmts,
     536                 :             :                             loop_p loop);
     537                 :             : 
     538                 :             :   /* Initialize STMTS with all the statements of LOOP.  We use topological
     539                 :             :      order to discover all statements.  The order is important because
     540                 :             :      generate_loops_for_partition is using the same traversal for identifying
     541                 :             :      statements in loop copies.  */
     542                 :             :   void stmts_from_loop (class loop *loop, vec<gimple *> *stmts);
     543                 :             : 
     544                 :             : 
     545                 :             :   /* Build the Reduced Dependence Graph (RDG) with one vertex per statement of
     546                 :             :      LOOP, and one edge per flow dependence or control dependence from control
     547                 :             :      dependence CD.  During visiting each statement, data references are also
     548                 :             :      collected and recorded in global data DATAREFS_VEC.  */
     549                 :             :   struct graph * build_rdg (class loop *loop, control_dependences *cd);
     550                 :             : 
     551                 :             : /* Merge PARTITION into the partition DEST.  RDG is the reduced dependence
     552                 :             :    graph and we update type for result partition if it is non-NULL.  */
     553                 :             :   void partition_merge_into (struct graph *rdg,
     554                 :             :                              partition *dest, partition *partition,
     555                 :             :                              enum fuse_type ft);
     556                 :             : 
     557                 :             : 
     558                 :             :   /* Return data dependence relation for data references A and B.  The two
     559                 :             :      data references must be in lexicographic order wrto reduced dependence
     560                 :             :      graph RDG.  We firstly try to find ddr from global ddr hash table.  If
     561                 :             :      it doesn't exist, compute the ddr and cache it.  */
     562                 :             :   data_dependence_relation * get_data_dependence (struct graph *rdg,
     563                 :             :                                                   data_reference_p a,
     564                 :             :                                                   data_reference_p b);
     565                 :             : 
     566                 :             : 
     567                 :             :   /* In reduced dependence graph RDG for loop distribution, return true if
     568                 :             :      dependence between references DR1 and DR2 leads to a dependence cycle
     569                 :             :      and such dependence cycle can't be resolved by runtime alias check.  */
     570                 :             :   bool data_dep_in_cycle_p (struct graph *rdg, data_reference_p dr1,
     571                 :             :                             data_reference_p dr2);
     572                 :             : 
     573                 :             : 
     574                 :             :   /* Given reduced dependence graph RDG, PARTITION1 and PARTITION2, update
     575                 :             :      PARTITION1's type after merging PARTITION2 into PARTITION1.  */
     576                 :             :   void update_type_for_merge (struct graph *rdg,
     577                 :             :                               partition *partition1, partition *partition2);
     578                 :             : 
     579                 :             : 
     580                 :             :   /* Returns a partition with all the statements needed for computing
     581                 :             :      the vertex V of the RDG, also including the loop exit conditions.  */
     582                 :             :   partition *build_rdg_partition_for_vertex (struct graph *rdg, int v);
     583                 :             : 
     584                 :             :   /* Given data references DST_DR and SRC_DR in loop nest LOOP and RDG, classify
     585                 :             :      if it forms builtin memcpy or memmove call.  */
     586                 :             :   void classify_builtin_ldst (loop_p loop, struct graph *rdg, partition *partition,
     587                 :             :                               data_reference_p dst_dr, data_reference_p src_dr);
     588                 :             : 
     589                 :             :   /* Classifies the builtin kind we can generate for PARTITION of RDG and LOOP.
     590                 :             :      For the moment we detect memset, memcpy and memmove patterns.  Bitmap
     591                 :             :      STMT_IN_ALL_PARTITIONS contains statements belonging to all partitions.
     592                 :             :      Returns true if there is a reduction in all partitions and we
     593                 :             :      possibly did not mark PARTITION as having one for this reason.  */
     594                 :             : 
     595                 :             :   bool
     596                 :             :   classify_partition (loop_p loop,
     597                 :             :                       struct graph *rdg, partition *partition,
     598                 :             :                       bitmap stmt_in_all_partitions);
     599                 :             : 
     600                 :             : 
     601                 :             :   /* Returns true when PARTITION1 and PARTITION2 access the same memory
     602                 :             :      object in RDG.  */
     603                 :             :   bool share_memory_accesses (struct graph *rdg,
     604                 :             :                               partition *partition1, partition *partition2);
     605                 :             : 
     606                 :             :   /* For each seed statement in STARTING_STMTS, this function builds
     607                 :             :      partition for it by adding depended statements according to RDG.
     608                 :             :      All partitions are recorded in PARTITIONS.  */
     609                 :             :   void rdg_build_partitions (struct graph *rdg,
     610                 :             :                              vec<gimple *> starting_stmts,
     611                 :             :                              vec<partition *> *partitions);
     612                 :             : 
     613                 :             :   /* Compute partition dependence created by the data references in DRS1
     614                 :             :      and DRS2, modify and return DIR according to that.  IF ALIAS_DDR is
     615                 :             :      not NULL, we record dependence introduced by possible alias between
     616                 :             :      two data references in ALIAS_DDRS; otherwise, we simply ignore such
     617                 :             :      dependence as if it doesn't exist at all.  */
     618                 :             :   int pg_add_dependence_edges (struct graph *rdg, int dir, bitmap drs1,
     619                 :             :                                bitmap drs2, vec<ddr_p> *alias_ddrs);
     620                 :             : 
     621                 :             : 
     622                 :             :   /* Build and return partition dependence graph for PARTITIONS.  RDG is
     623                 :             :      reduced dependence graph for the loop to be distributed.  If IGNORE_ALIAS_P
     624                 :             :      is true, data dependence caused by possible alias between references
     625                 :             :      is ignored, as if it doesn't exist at all; otherwise all depdendences
     626                 :             :      are considered.  */
     627                 :             :   struct graph *build_partition_graph (struct graph *rdg,
     628                 :             :                                        vec<struct partition *> *partitions,
     629                 :             :                                        bool ignore_alias_p);
     630                 :             : 
     631                 :             :   /* Given reduced dependence graph RDG merge strong connected components
     632                 :             :      of PARTITIONS.  If IGNORE_ALIAS_P is true, data dependence caused by
     633                 :             :      possible alias between references is ignored, as if it doesn't exist
     634                 :             :      at all; otherwise all depdendences are considered.  */
     635                 :             :   void merge_dep_scc_partitions (struct graph *rdg, vec<struct partition *>
     636                 :             :                                  *partitions, bool ignore_alias_p);
     637                 :             : 
     638                 :             : /* This is the main function breaking strong conected components in
     639                 :             :    PARTITIONS giving reduced depdendence graph RDG.  Store data dependence
     640                 :             :    relations for runtime alias check in ALIAS_DDRS.  */
     641                 :             :   void break_alias_scc_partitions (struct graph *rdg, vec<struct partition *>
     642                 :             :                                    *partitions, vec<ddr_p> *alias_ddrs);
     643                 :             : 
     644                 :             : 
     645                 :             :   /* Fuse PARTITIONS of LOOP if necessary before finalizing distribution.
     646                 :             :      ALIAS_DDRS contains ddrs which need runtime alias check.  */
     647                 :             :   void finalize_partitions (class loop *loop, vec<struct partition *>
     648                 :             :                             *partitions, vec<ddr_p> *alias_ddrs);
     649                 :             : 
     650                 :             :   /* Distributes the code from LOOP in such a way that producer statements
     651                 :             :      are placed before consumer statements.  Tries to separate only the
     652                 :             :      statements from STMTS into separate loops.  Returns the number of
     653                 :             :      distributed loops.  Set NB_CALLS to number of generated builtin calls.
     654                 :             :      Set *DESTROY_P to whether LOOP needs to be destroyed.  */
     655                 :             :   int distribute_loop (class loop *loop, const vec<gimple *> &stmts,
     656                 :             :                        control_dependences *cd, int *nb_calls, bool *destroy_p,
     657                 :             :                        bool only_patterns_p);
     658                 :             : 
     659                 :             :   /* Transform loops which mimic the effects of builtins rawmemchr or strlen and
     660                 :             :      replace them accordingly.  */
     661                 :             :   bool transform_reduction_loop (loop_p loop);
     662                 :             : 
     663                 :             :   /* Compute topological order for basic blocks.  Topological order is
     664                 :             :      needed because data dependence is computed for data references in
     665                 :             :      lexicographical order.  */
     666                 :             :   void bb_top_order_init (void);
     667                 :             : 
     668                 :             :   void bb_top_order_destroy (void);
     669                 :             : 
     670                 :             :   public:
     671                 :             : 
     672                 :             :   /* Getter for bb_top_order.  */
     673                 :             : 
     674                 :     2412466 :   inline int get_bb_top_order_index_size (void)
     675                 :             :     {
     676                 :     2412466 :       return bb_top_order_index_size;
     677                 :             :     }
     678                 :             : 
     679                 :     4824932 :   inline int get_bb_top_order_index (int i)
     680                 :             :     {
     681                 :     4824932 :       return bb_top_order_index[i];
     682                 :             :     }
     683                 :             : 
     684                 :             :   unsigned int execute (function *fun);
     685                 :             : };
     686                 :             : 
     687                 :             : 
     688                 :             : /* If X has a smaller topological sort number than Y, returns -1;
     689                 :             :    if greater, returns 1.  */
     690                 :             : static int
     691                 :     2412466 : bb_top_order_cmp_r (const void *x, const void *y, void *loop)
     692                 :             : {
     693                 :     2412466 :   loop_distribution *_loop =
     694                 :             :     (loop_distribution *) loop;
     695                 :             : 
     696                 :     2412466 :   basic_block bb1 = *(const basic_block *) x;
     697                 :     2412466 :   basic_block bb2 = *(const basic_block *) y;
     698                 :             : 
     699                 :     2412466 :   int bb_top_order_index_size = _loop->get_bb_top_order_index_size ();
     700                 :             : 
     701                 :     2412466 :   gcc_assert (bb1->index < bb_top_order_index_size
     702                 :             :               && bb2->index < bb_top_order_index_size);
     703                 :     2412466 :   gcc_assert (bb1 == bb2
     704                 :             :               || _loop->get_bb_top_order_index(bb1->index)
     705                 :             :                  != _loop->get_bb_top_order_index(bb2->index));
     706                 :             : 
     707                 :     2412466 :   return (_loop->get_bb_top_order_index(bb1->index) - 
     708                 :     2412466 :           _loop->get_bb_top_order_index(bb2->index));
     709                 :             : }
     710                 :             : 
     711                 :             : bool
     712                 :      123147 : loop_distribution::create_rdg_vertices (struct graph *rdg,
     713                 :             :                                         const vec<gimple *> &stmts,
     714                 :             :                                         loop_p loop)
     715                 :             : {
     716                 :      123147 :   int i;
     717                 :      123147 :   gimple *stmt;
     718                 :             : 
     719                 :     2030895 :   FOR_EACH_VEC_ELT (stmts, i, stmt)
     720                 :             :     {
     721                 :     1910762 :       struct vertex *v = &(rdg->vertices[i]);
     722                 :             : 
     723                 :             :       /* Record statement to vertex mapping.  */
     724                 :     1910762 :       gimple_set_uid (stmt, i);
     725                 :             : 
     726                 :     1910762 :       v->data = XNEW (struct rdg_vertex);
     727                 :     1910762 :       RDGV_STMT (v) = stmt;
     728                 :     1910762 :       RDGV_DATAREFS (v).create (0);
     729                 :     1910762 :       RDGV_HAS_MEM_WRITE (v) = false;
     730                 :     1910762 :       RDGV_HAS_MEM_READS (v) = false;
     731                 :     1910762 :       if (gimple_code (stmt) == GIMPLE_PHI)
     732                 :      367062 :         continue;
     733                 :             : 
     734                 :     1543700 :       unsigned drp = datarefs_vec.length ();
     735                 :     1543700 :       if (!find_data_references_in_stmt (loop, stmt, &datarefs_vec))
     736                 :             :         return false;
     737                 :     3818644 :       for (unsigned j = drp; j < datarefs_vec.length (); ++j)
     738                 :             :         {
     739                 :      368636 :           data_reference_p dr = datarefs_vec[j];
     740                 :      368636 :           if (DR_IS_READ (dr))
     741                 :      203435 :             RDGV_HAS_MEM_READS (v) = true;
     742                 :             :           else
     743                 :      165201 :             RDGV_HAS_MEM_WRITE (v) = true;
     744                 :      368636 :           RDGV_DATAREFS (v).safe_push (dr);
     745                 :      368636 :           has_nonaddressable_dataref_p |= may_be_nonaddressable_p (dr->ref);
     746                 :             :         }
     747                 :             :     }
     748                 :             :   return true;
     749                 :             : }
     750                 :             : 
     751                 :             : void
     752                 :      123147 : loop_distribution::stmts_from_loop (class loop *loop, vec<gimple *> *stmts)
     753                 :             : {
     754                 :      123147 :   unsigned int i;
     755                 :      123147 :   basic_block *bbs = get_loop_body_in_custom_order (loop, this, bb_top_order_cmp_r);
     756                 :             : 
     757                 :      519550 :   for (i = 0; i < loop->num_nodes; i++)
     758                 :             :     {
     759                 :      396403 :       basic_block bb = bbs[i];
     760                 :             : 
     761                 :      887745 :       for (gphi_iterator bsi = gsi_start_phis (bb); !gsi_end_p (bsi);
     762                 :      491342 :            gsi_next (&bsi))
     763                 :      982684 :         if (!virtual_operand_p (gimple_phi_result (bsi.phi ())))
     764                 :      368196 :           stmts->safe_push (bsi.phi ());
     765                 :             : 
     766                 :     3092096 :       for (gimple_stmt_iterator bsi = gsi_start_bb (bb); !gsi_end_p (bsi);
     767                 :     2299290 :            gsi_next (&bsi))
     768                 :             :         {
     769                 :     2299290 :           gimple *stmt = gsi_stmt (bsi);
     770                 :     2299290 :           if (gimple_code (stmt) != GIMPLE_LABEL && !is_gimple_debug (stmt))
     771                 :     1567711 :             stmts->safe_push (stmt);
     772                 :             :         }
     773                 :             :     }
     774                 :             : 
     775                 :      123147 :   free (bbs);
     776                 :      123147 : }
     777                 :             : 
     778                 :             : /* Free the reduced dependence graph RDG.  */
     779                 :             : 
     780                 :             : static void
     781                 :      123147 : free_rdg (struct graph *rdg)
     782                 :             : {
     783                 :      123147 :   int i;
     784                 :             : 
     785                 :     2059054 :   for (i = 0; i < rdg->n_vertices; i++)
     786                 :             :     {
     787                 :     1935907 :       struct vertex *v = &(rdg->vertices[i]);
     788                 :     1935907 :       struct graph_edge *e;
     789                 :             : 
     790                 :     6364932 :       for (e = v->succ; e; e = e->succ_next)
     791                 :     4429025 :         free (e->data);
     792                 :             : 
     793                 :     1935907 :       if (v->data)
     794                 :             :         {
     795                 :     1910762 :           gimple_set_uid (RDGV_STMT (v), -1);
     796                 :     1910762 :           (RDGV_DATAREFS (v)).release ();
     797                 :     1910762 :           free (v->data);
     798                 :             :         }
     799                 :             :     }
     800                 :             : 
     801                 :      123147 :   free_graph (rdg);
     802                 :      123147 : }
     803                 :             : 
     804                 :             : struct graph *
     805                 :      123147 : loop_distribution::build_rdg (class loop *loop, control_dependences *cd)
     806                 :             : {
     807                 :      123147 :   struct graph *rdg;
     808                 :             : 
     809                 :             :   /* Create the RDG vertices from the stmts of the loop nest.  */
     810                 :      123147 :   auto_vec<gimple *, 10> stmts;
     811                 :      123147 :   stmts_from_loop (loop, &stmts);
     812                 :      246294 :   rdg = new_graph (stmts.length ());
     813                 :      123147 :   if (!create_rdg_vertices (rdg, stmts, loop))
     814                 :             :     {
     815                 :        3014 :       free_rdg (rdg);
     816                 :        3014 :       return NULL;
     817                 :             :     }
     818                 :      120133 :   stmts.release ();
     819                 :             : 
     820                 :      120133 :   create_rdg_flow_edges (rdg);
     821                 :      120133 :   if (cd)
     822                 :      114653 :     create_rdg_cd_edges (rdg, cd, loop);
     823                 :             : 
     824                 :             :   return rdg;
     825                 :      123147 : }
     826                 :             : 
     827                 :             : 
     828                 :             : /* Allocate and initialize a partition from BITMAP.  */
     829                 :             : 
     830                 :             : static partition *
     831                 :      204085 : partition_alloc (void)
     832                 :             : {
     833                 :      204085 :   partition *partition = XCNEW (struct partition);
     834                 :      204085 :   partition->stmts = BITMAP_ALLOC (NULL);
     835                 :      204085 :   partition->reduction_p = false;
     836                 :      204085 :   partition->loc = UNKNOWN_LOCATION;
     837                 :      204085 :   partition->kind = PKIND_NORMAL;
     838                 :      204085 :   partition->type = PTYPE_PARALLEL;
     839                 :      204085 :   partition->datarefs = BITMAP_ALLOC (NULL);
     840                 :      204085 :   return partition;
     841                 :             : }
     842                 :             : 
     843                 :             : /* Free PARTITION.  */
     844                 :             : 
     845                 :             : static void
     846                 :      204085 : partition_free (partition *partition)
     847                 :             : {
     848                 :      204085 :   BITMAP_FREE (partition->stmts);
     849                 :      204085 :   BITMAP_FREE (partition->datarefs);
     850                 :      204085 :   if (partition->builtin)
     851                 :       11260 :     free (partition->builtin);
     852                 :             : 
     853                 :      204085 :   free (partition);
     854                 :      204085 : }
     855                 :             : 
     856                 :             : /* Returns true if the partition can be generated as a builtin.  */
     857                 :             : 
     858                 :             : static bool
     859                 :      293952 : partition_builtin_p (partition *partition)
     860                 :             : {
     861                 :      293952 :   return partition->kind > PKIND_PARTIAL_MEMSET;
     862                 :             : }
     863                 :             : 
     864                 :             : /* Returns true if the partition contains a reduction.  */
     865                 :             : 
     866                 :             : static bool
     867                 :     2622046 : partition_reduction_p (partition *partition)
     868                 :             : {
     869                 :     2622046 :   return partition->reduction_p;
     870                 :             : }
     871                 :             : 
     872                 :             : void
     873                 :       28079 : loop_distribution::partition_merge_into (struct graph *rdg,
     874                 :             :                       partition *dest, partition *partition, enum fuse_type ft)
     875                 :             : {
     876                 :       28079 :   if (dump_file && (dump_flags & TDF_DETAILS))
     877                 :             :     {
     878                 :          42 :       fprintf (dump_file, "Fuse partitions because %s:\n", fuse_message[ft]);
     879                 :          42 :       fprintf (dump_file, "  Part 1: ");
     880                 :          42 :       dump_bitmap (dump_file, dest->stmts);
     881                 :          42 :       fprintf (dump_file, "  Part 2: ");
     882                 :          42 :       dump_bitmap (dump_file, partition->stmts);
     883                 :             :     }
     884                 :             : 
     885                 :       28079 :   dest->kind = PKIND_NORMAL;
     886                 :       28079 :   if (dest->type == PTYPE_PARALLEL)
     887                 :       22593 :     dest->type = partition->type;
     888                 :             : 
     889                 :       28079 :   bitmap_ior_into (dest->stmts, partition->stmts);
     890                 :       28079 :   if (partition_reduction_p (partition))
     891                 :        2835 :     dest->reduction_p = true;
     892                 :             : 
     893                 :             :   /* Further check if any data dependence prevents us from executing the
     894                 :             :      new partition parallelly.  */
     895                 :       28079 :   if (dest->type == PTYPE_PARALLEL && rdg != NULL)
     896                 :        5361 :     update_type_for_merge (rdg, dest, partition);
     897                 :             : 
     898                 :       28079 :   bitmap_ior_into (dest->datarefs, partition->datarefs);
     899                 :       28079 : }
     900                 :             : 
     901                 :             : 
     902                 :             : /* Returns true when DEF is an SSA_NAME defined in LOOP and used after
     903                 :             :    the LOOP.  */
     904                 :             : 
     905                 :             : static bool
     906                 :     4176754 : ssa_name_has_uses_outside_loop_p (tree def, loop_p loop)
     907                 :             : {
     908                 :     4176754 :   imm_use_iterator imm_iter;
     909                 :     4176754 :   use_operand_p use_p;
     910                 :             : 
     911                 :    12297515 :   FOR_EACH_IMM_USE_FAST (use_p, imm_iter, def)
     912                 :             :     {
     913                 :     8235211 :       if (is_gimple_debug (USE_STMT (use_p)))
     914                 :      780477 :         continue;
     915                 :             : 
     916                 :     7454734 :       basic_block use_bb = gimple_bb (USE_STMT (use_p));
     917                 :     7454734 :       if (!flow_bb_inside_loop_p (loop, use_bb))
     918                 :             :         return true;
     919                 :             :     }
     920                 :             : 
     921                 :             :   return false;
     922                 :             : }
     923                 :             : 
     924                 :             : /* Returns true when STMT defines a scalar variable used after the
     925                 :             :    loop LOOP.  */
     926                 :             : 
     927                 :             : static bool
     928                 :     5915799 : stmt_has_scalar_dependences_outside_loop (loop_p loop, gimple *stmt)
     929                 :             : {
     930                 :     5915799 :   def_operand_p def_p;
     931                 :     5915799 :   ssa_op_iter op_iter;
     932                 :             : 
     933                 :     5915799 :   if (gimple_code (stmt) == GIMPLE_PHI)
     934                 :      984265 :     return ssa_name_has_uses_outside_loop_p (gimple_phi_result (stmt), loop);
     935                 :             : 
     936                 :     8054115 :   FOR_EACH_SSA_DEF_OPERAND (def_p, stmt, op_iter, SSA_OP_DEF)
     937                 :     3192489 :     if (ssa_name_has_uses_outside_loop_p (DEF_FROM_PTR (def_p), loop))
     938                 :             :       return true;
     939                 :             : 
     940                 :             :   return false;
     941                 :             : }
     942                 :             : 
     943                 :             : /* Return a copy of LOOP placed before LOOP.  */
     944                 :             : 
     945                 :             : static class loop *
     946                 :         942 : copy_loop_before (class loop *loop, bool redirect_lc_phi_defs)
     947                 :             : {
     948                 :         942 :   class loop *res;
     949                 :         942 :   edge preheader = loop_preheader_edge (loop);
     950                 :             : 
     951                 :         942 :   initialize_original_copy_tables ();
     952                 :         942 :   res = slpeel_tree_duplicate_loop_to_edge_cfg (loop, single_exit (loop), NULL,
     953                 :             :                                                 NULL, preheader, NULL, false);
     954                 :         942 :   gcc_assert (res != NULL);
     955                 :             : 
     956                 :             :   /* When a not last partition is supposed to keep the LC PHIs computed
     957                 :             :      adjust their definitions.  */
     958                 :         942 :   if (redirect_lc_phi_defs)
     959                 :             :     {
     960                 :           0 :       edge exit = single_exit (loop);
     961                 :           0 :       for (gphi_iterator si = gsi_start_phis (exit->dest); !gsi_end_p (si);
     962                 :           0 :            gsi_next (&si))
     963                 :             :         {
     964                 :           0 :           gphi *phi = si.phi ();
     965                 :           0 :           if (virtual_operand_p (gimple_phi_result (phi)))
     966                 :           0 :             continue;
     967                 :           0 :           use_operand_p use_p = PHI_ARG_DEF_PTR_FROM_EDGE (phi, exit);
     968                 :           0 :           tree new_def = get_current_def (USE_FROM_PTR (use_p));
     969                 :           0 :           SET_USE (use_p, new_def);
     970                 :             :         }
     971                 :             :     }
     972                 :             : 
     973                 :         942 :   free_original_copy_tables ();
     974                 :         942 :   delete_update_ssa ();
     975                 :             : 
     976                 :         942 :   return res;
     977                 :             : }
     978                 :             : 
     979                 :             : /* Creates an empty basic block after LOOP.  */
     980                 :             : 
     981                 :             : static void
     982                 :         942 : create_bb_after_loop (class loop *loop)
     983                 :             : {
     984                 :         942 :   edge exit = single_exit (loop);
     985                 :             : 
     986                 :         942 :   if (!exit)
     987                 :             :     return;
     988                 :             : 
     989                 :         942 :   split_edge (exit);
     990                 :             : }
     991                 :             : 
     992                 :             : /* Generate code for PARTITION from the code in LOOP.  The loop is
     993                 :             :    copied when COPY_P is true.  All the statements not flagged in the
     994                 :             :    PARTITION bitmap are removed from the loop or from its copy.  The
     995                 :             :    statements are indexed in sequence inside a basic block, and the
     996                 :             :    basic blocks of a loop are taken in dom order.  */
     997                 :             : 
     998                 :             : static void
     999                 :        2541 : generate_loops_for_partition (class loop *loop, partition *partition,
    1000                 :             :                               bool copy_p, bool keep_lc_phis_p)
    1001                 :             : {
    1002                 :        2541 :   unsigned i;
    1003                 :        2541 :   basic_block *bbs;
    1004                 :             : 
    1005                 :        2541 :   if (copy_p)
    1006                 :             :     {
    1007                 :         942 :       int orig_loop_num = loop->orig_loop_num;
    1008                 :         942 :       loop = copy_loop_before (loop, keep_lc_phis_p);
    1009                 :         942 :       gcc_assert (loop != NULL);
    1010                 :         942 :       loop->orig_loop_num = orig_loop_num;
    1011                 :         942 :       create_preheader (loop, CP_SIMPLE_PREHEADERS);
    1012                 :         942 :       create_bb_after_loop (loop);
    1013                 :             :     }
    1014                 :             :   else
    1015                 :             :     {
    1016                 :             :       /* Origin number is set to the new versioned loop's num.  */
    1017                 :        1599 :       gcc_assert (loop->orig_loop_num != loop->num);
    1018                 :             :     }
    1019                 :             : 
    1020                 :             :   /* Remove stmts not in the PARTITION bitmap.  */
    1021                 :        2541 :   bbs = get_loop_body_in_dom_order (loop);
    1022                 :             : 
    1023                 :        2541 :   if (MAY_HAVE_DEBUG_BIND_STMTS)
    1024                 :        6362 :     for (i = 0; i < loop->num_nodes; i++)
    1025                 :             :       {
    1026                 :        4897 :         basic_block bb = bbs[i];
    1027                 :             : 
    1028                 :        9348 :         for (gphi_iterator bsi = gsi_start_phis (bb); !gsi_end_p (bsi);
    1029                 :        4451 :              gsi_next (&bsi))
    1030                 :             :           {
    1031                 :        4451 :             gphi *phi = bsi.phi ();
    1032                 :        7427 :             if (!virtual_operand_p (gimple_phi_result (phi))
    1033                 :        4451 :                 && !bitmap_bit_p (partition->stmts, gimple_uid (phi)))
    1034                 :         302 :               reset_debug_uses (phi);
    1035                 :             :           }
    1036                 :             : 
    1037                 :       41883 :         for (gimple_stmt_iterator bsi = gsi_start_bb (bb); !gsi_end_p (bsi); gsi_next (&bsi))
    1038                 :             :           {
    1039                 :       32089 :             gimple *stmt = gsi_stmt (bsi);
    1040                 :       32089 :             if (gimple_code (stmt) != GIMPLE_LABEL
    1041                 :       32089 :                 && !is_gimple_debug (stmt)
    1042                 :       52196 :                 && !bitmap_bit_p (partition->stmts, gimple_uid (stmt)))
    1043                 :        5198 :               reset_debug_uses (stmt);
    1044                 :             :           }
    1045                 :             :       }
    1046                 :             : 
    1047                 :       11250 :   for (i = 0; i < loop->num_nodes; i++)
    1048                 :             :     {
    1049                 :        8709 :       basic_block bb = bbs[i];
    1050                 :        8709 :       edge inner_exit = NULL;
    1051                 :             : 
    1052                 :        8709 :       if (loop != bb->loop_father)
    1053                 :         110 :         inner_exit = single_exit (bb->loop_father);
    1054                 :             : 
    1055                 :       16697 :       for (gphi_iterator bsi = gsi_start_phis (bb); !gsi_end_p (bsi);)
    1056                 :             :         {
    1057                 :        7988 :           gphi *phi = bsi.phi ();
    1058                 :       13310 :           if (!virtual_operand_p (gimple_phi_result (phi))
    1059                 :        7988 :               && !bitmap_bit_p (partition->stmts, gimple_uid (phi)))
    1060                 :         710 :             remove_phi_node (&bsi, true);
    1061                 :             :           else
    1062                 :        7278 :             gsi_next (&bsi);
    1063                 :             :         }
    1064                 :             : 
    1065                 :       65295 :       for (gimple_stmt_iterator bsi = gsi_start_bb (bb); !gsi_end_p (bsi);)
    1066                 :             :         {
    1067                 :       47877 :           gimple *stmt = gsi_stmt (bsi);
    1068                 :       47877 :           if (gimple_code (stmt) != GIMPLE_LABEL
    1069                 :       47869 :               && !is_gimple_debug (stmt)
    1070                 :       83764 :               && !bitmap_bit_p (partition->stmts, gimple_uid (stmt)))
    1071                 :             :             {
    1072                 :             :               /* In distribution of loop nest, if bb is inner loop's exit_bb,
    1073                 :             :                  we choose its exit edge/path in order to avoid generating
    1074                 :             :                  infinite loop.  For all other cases, we choose an arbitrary
    1075                 :             :                  path through the empty CFG part that this unnecessary
    1076                 :             :                  control stmt controls.  */
    1077                 :       10918 :               if (gcond *cond_stmt = dyn_cast <gcond *> (stmt))
    1078                 :             :                 {
    1079                 :         518 :                   if (inner_exit && inner_exit->flags & EDGE_TRUE_VALUE)
    1080                 :           5 :                     gimple_cond_make_true (cond_stmt);
    1081                 :             :                   else
    1082                 :         513 :                     gimple_cond_make_false (cond_stmt);
    1083                 :         518 :                   update_stmt (stmt);
    1084                 :             :                 }
    1085                 :       10400 :               else if (gimple_code (stmt) == GIMPLE_SWITCH)
    1086                 :             :                 {
    1087                 :           0 :                   gswitch *switch_stmt = as_a <gswitch *> (stmt);
    1088                 :           0 :                   gimple_switch_set_index
    1089                 :           0 :                       (switch_stmt, CASE_LOW (gimple_switch_label (switch_stmt, 1)));
    1090                 :           0 :                   update_stmt (stmt);
    1091                 :             :                 }
    1092                 :             :               else
    1093                 :             :                 {
    1094                 :       10400 :                   unlink_stmt_vdef (stmt);
    1095                 :       10400 :                   gsi_remove (&bsi, true);
    1096                 :       10400 :                   release_defs (stmt);
    1097                 :       10400 :                   continue;
    1098                 :             :                 }
    1099                 :             :             }
    1100                 :       37477 :           gsi_next (&bsi);
    1101                 :             :         }
    1102                 :             :     }
    1103                 :             : 
    1104                 :        2541 :   free (bbs);
    1105                 :        2541 : }
    1106                 :             : 
    1107                 :             : /* If VAL memory representation contains the same value in all bytes,
    1108                 :             :    return that value, otherwise return -1.
    1109                 :             :    E.g. for 0x24242424 return 0x24, for IEEE double
    1110                 :             :    747708026454360457216.0 return 0x44, etc.  */
    1111                 :             : 
    1112                 :             : static int
    1113                 :       62135 : const_with_all_bytes_same (tree val)
    1114                 :             : {
    1115                 :       62135 :   unsigned char buf[64];
    1116                 :       62135 :   int i, len;
    1117                 :             : 
    1118                 :       62135 :   if (integer_zerop (val)
    1119                 :       62135 :       || (TREE_CODE (val) == CONSTRUCTOR
    1120                 :         976 :           && !TREE_CLOBBER_P (val)
    1121                 :       21688 :           && CONSTRUCTOR_NELTS (val) == 0))
    1122                 :             :     return 0;
    1123                 :             : 
    1124                 :       43331 :   if (real_zerop (val))
    1125                 :             :     {
    1126                 :             :       /* Only return 0 for +0.0, not for -0.0, which doesn't have
    1127                 :             :          an all bytes same memory representation.  Don't transform
    1128                 :             :          -0.0 stores into +0.0 even for !HONOR_SIGNED_ZEROS.  */
    1129                 :        2912 :       switch (TREE_CODE (val))
    1130                 :             :         {
    1131                 :        2905 :         case REAL_CST:
    1132                 :        2905 :           if (!real_isneg (TREE_REAL_CST_PTR (val)))
    1133                 :             :             return 0;
    1134                 :             :           break;
    1135                 :           0 :         case COMPLEX_CST:
    1136                 :           0 :           if (!const_with_all_bytes_same (TREE_REALPART (val))
    1137                 :           0 :               && !const_with_all_bytes_same (TREE_IMAGPART (val)))
    1138                 :             :             return 0;
    1139                 :             :           break;
    1140                 :           7 :         case VECTOR_CST:
    1141                 :           7 :           {
    1142                 :           7 :             unsigned int count = vector_cst_encoded_nelts (val);
    1143                 :           7 :             unsigned int j;
    1144                 :          21 :             for (j = 0; j < count; ++j)
    1145                 :          14 :               if (const_with_all_bytes_same (VECTOR_CST_ENCODED_ELT (val, j)))
    1146                 :             :                 break;
    1147                 :           7 :             if (j == count)
    1148                 :             :               return 0;
    1149                 :             :             break;
    1150                 :             :           }
    1151                 :             :         default:
    1152                 :             :           break;
    1153                 :             :         }
    1154                 :             :     }
    1155                 :             : 
    1156                 :       40447 :   if (CHAR_BIT != 8 || BITS_PER_UNIT != 8)
    1157                 :             :     return -1;
    1158                 :             : 
    1159                 :       40447 :   len = native_encode_expr (val, buf, sizeof (buf));
    1160                 :       40447 :   if (len == 0)
    1161                 :             :     return -1;
    1162                 :       27097 :   for (i = 1; i < len; i++)
    1163                 :       23909 :     if (buf[i] != buf[0])
    1164                 :             :       return -1;
    1165                 :        3188 :   return buf[0];
    1166                 :             : }
    1167                 :             : 
    1168                 :             : /* Generate a call to memset for PARTITION in LOOP.  */
    1169                 :             : 
    1170                 :             : static void
    1171                 :        7297 : generate_memset_builtin (class loop *loop, partition *partition)
    1172                 :             : {
    1173                 :        7297 :   gimple_stmt_iterator gsi;
    1174                 :        7297 :   tree mem, fn, nb_bytes;
    1175                 :        7297 :   tree val;
    1176                 :        7297 :   struct builtin_info *builtin = partition->builtin;
    1177                 :        7297 :   gimple *fn_call;
    1178                 :             : 
    1179                 :             :   /* The new statements will be placed before LOOP.  */
    1180                 :        7297 :   gsi = gsi_last_bb (loop_preheader_edge (loop)->src);
    1181                 :             : 
    1182                 :        7297 :   nb_bytes = rewrite_to_non_trapping_overflow (builtin->size);
    1183                 :        7297 :   nb_bytes = force_gimple_operand_gsi (&gsi, nb_bytes, true, NULL_TREE,
    1184                 :             :                                        false, GSI_CONTINUE_LINKING);
    1185                 :        7297 :   mem = rewrite_to_non_trapping_overflow (builtin->dst_base);
    1186                 :        7297 :   mem = force_gimple_operand_gsi (&gsi, mem, true, NULL_TREE,
    1187                 :             :                                   false, GSI_CONTINUE_LINKING);
    1188                 :             : 
    1189                 :             :   /* This exactly matches the pattern recognition in classify_partition.  */
    1190                 :        7297 :   val = gimple_assign_rhs1 (DR_STMT (builtin->dst_dr));
    1191                 :             :   /* Handle constants like 0x15151515 and similarly
    1192                 :             :      floating point constants etc. where all bytes are the same.  */
    1193                 :        7297 :   int bytev = const_with_all_bytes_same (val);
    1194                 :        7297 :   if (bytev != -1)
    1195                 :        7185 :     val = build_int_cst (integer_type_node, bytev);
    1196                 :         112 :   else if (TREE_CODE (val) == INTEGER_CST)
    1197                 :           0 :     val = fold_convert (integer_type_node, val);
    1198                 :         112 :   else if (!useless_type_conversion_p (integer_type_node, TREE_TYPE (val)))
    1199                 :             :     {
    1200                 :         112 :       tree tem = make_ssa_name (integer_type_node);
    1201                 :         112 :       gimple *cstmt = gimple_build_assign (tem, NOP_EXPR, val);
    1202                 :         112 :       gsi_insert_after (&gsi, cstmt, GSI_CONTINUE_LINKING);
    1203                 :         112 :       val = tem;
    1204                 :             :     }
    1205                 :             : 
    1206                 :       14594 :   fn = build_fold_addr_expr (builtin_decl_implicit (BUILT_IN_MEMSET));
    1207                 :        7297 :   fn_call = gimple_build_call (fn, 3, mem, val, nb_bytes);
    1208                 :        7297 :   gimple_set_location (fn_call, partition->loc);
    1209                 :        7297 :   gsi_insert_after (&gsi, fn_call, GSI_CONTINUE_LINKING);
    1210                 :        7297 :   fold_stmt (&gsi);
    1211                 :             : 
    1212                 :        7297 :   if (dump_file && (dump_flags & TDF_DETAILS))
    1213                 :             :     {
    1214                 :          36 :       fprintf (dump_file, "generated memset");
    1215                 :          36 :       if (bytev == 0)
    1216                 :          30 :         fprintf (dump_file, " zero\n");
    1217                 :             :       else
    1218                 :           6 :         fprintf (dump_file, "\n");
    1219                 :             :     }
    1220                 :        7297 : }
    1221                 :             : 
    1222                 :             : /* Generate a call to memcpy for PARTITION in LOOP.  */
    1223                 :             : 
    1224                 :             : static void
    1225                 :        3516 : generate_memcpy_builtin (class loop *loop, partition *partition)
    1226                 :             : {
    1227                 :        3516 :   gimple_stmt_iterator gsi;
    1228                 :        3516 :   gimple *fn_call;
    1229                 :        3516 :   tree dest, src, fn, nb_bytes;
    1230                 :        3516 :   enum built_in_function kind;
    1231                 :        3516 :   struct builtin_info *builtin = partition->builtin;
    1232                 :             : 
    1233                 :             :   /* The new statements will be placed before LOOP.  */
    1234                 :        3516 :   gsi = gsi_last_bb (loop_preheader_edge (loop)->src);
    1235                 :             : 
    1236                 :        3516 :   nb_bytes = rewrite_to_non_trapping_overflow (builtin->size);
    1237                 :        3516 :   nb_bytes = force_gimple_operand_gsi (&gsi, nb_bytes, true, NULL_TREE,
    1238                 :             :                                        false, GSI_CONTINUE_LINKING);
    1239                 :        3516 :   dest = rewrite_to_non_trapping_overflow (builtin->dst_base);
    1240                 :        3516 :   src = rewrite_to_non_trapping_overflow (builtin->src_base);
    1241                 :        3516 :   if (partition->kind == PKIND_MEMCPY
    1242                 :        3516 :       || ! ptr_derefs_may_alias_p (dest, src))
    1243                 :             :     kind = BUILT_IN_MEMCPY;
    1244                 :             :   else
    1245                 :         248 :     kind = BUILT_IN_MEMMOVE;
    1246                 :             :   /* Try harder if we're copying a constant size.  */
    1247                 :         248 :   if (kind == BUILT_IN_MEMMOVE && poly_int_tree_p (nb_bytes))
    1248                 :             :     {
    1249                 :         294 :       aff_tree asrc, adest;
    1250                 :         147 :       tree_to_aff_combination (src, ptr_type_node, &asrc);
    1251                 :         147 :       tree_to_aff_combination (dest, ptr_type_node, &adest);
    1252                 :         147 :       aff_combination_scale (&adest, -1);
    1253                 :         147 :       aff_combination_add (&asrc, &adest);
    1254                 :         294 :       if (aff_comb_cannot_overlap_p (&asrc, wi::to_poly_widest (nb_bytes),
    1255                 :         294 :                                      wi::to_poly_widest (nb_bytes)))
    1256                 :          47 :         kind = BUILT_IN_MEMCPY;
    1257                 :         147 :     }
    1258                 :             : 
    1259                 :        3516 :   dest = force_gimple_operand_gsi (&gsi, dest, true, NULL_TREE,
    1260                 :             :                                    false, GSI_CONTINUE_LINKING);
    1261                 :        3516 :   src = force_gimple_operand_gsi (&gsi, src, true, NULL_TREE,
    1262                 :             :                                   false, GSI_CONTINUE_LINKING);
    1263                 :        3516 :   fn = build_fold_addr_expr (builtin_decl_implicit (kind));
    1264                 :        3516 :   fn_call = gimple_build_call (fn, 3, dest, src, nb_bytes);
    1265                 :        3516 :   gimple_set_location (fn_call, partition->loc);
    1266                 :        3516 :   gsi_insert_after (&gsi, fn_call, GSI_CONTINUE_LINKING);
    1267                 :        3516 :   fold_stmt (&gsi);
    1268                 :             : 
    1269                 :        3516 :   if (dump_file && (dump_flags & TDF_DETAILS))
    1270                 :             :     {
    1271                 :          13 :       if (kind == BUILT_IN_MEMCPY)
    1272                 :          10 :         fprintf (dump_file, "generated memcpy\n");
    1273                 :             :       else
    1274                 :           3 :         fprintf (dump_file, "generated memmove\n");
    1275                 :             :     }
    1276                 :        3516 : }
    1277                 :             : 
    1278                 :             : /* Remove and destroy the loop LOOP.  */
    1279                 :             : 
    1280                 :             : static void
    1281                 :        9147 : destroy_loop (class loop *loop)
    1282                 :             : {
    1283                 :        9147 :   unsigned nbbs = loop->num_nodes;
    1284                 :        9147 :   edge exit = single_exit (loop);
    1285                 :        9147 :   basic_block src = loop_preheader_edge (loop)->src, dest = exit->dest;
    1286                 :        9147 :   basic_block *bbs;
    1287                 :        9147 :   unsigned i;
    1288                 :             : 
    1289                 :        9147 :   bbs = get_loop_body_in_dom_order (loop);
    1290                 :             : 
    1291                 :        9147 :   gimple_stmt_iterator dst_gsi = gsi_after_labels (exit->dest);
    1292                 :        9147 :   bool safe_p = single_pred_p (exit->dest);
    1293                 :       28961 :   for (unsigned i = 0; i < nbbs; ++i)
    1294                 :             :     {
    1295                 :             :       /* We have made sure to not leave any dangling uses of SSA
    1296                 :             :          names defined in the loop.  With the exception of virtuals.
    1297                 :             :          Make sure we replace all uses of virtual defs that will remain
    1298                 :             :          outside of the loop with the bare symbol as delete_basic_block
    1299                 :             :          will release them.  */
    1300                 :       46806 :       for (gphi_iterator gsi = gsi_start_phis (bbs[i]); !gsi_end_p (gsi);
    1301                 :       26992 :            gsi_next (&gsi))
    1302                 :             :         {
    1303                 :       26992 :           gphi *phi = gsi.phi ();
    1304                 :       64016 :           if (virtual_operand_p (gimple_phi_result (phi)))
    1305                 :       10032 :             mark_virtual_phi_result_for_renaming (phi);
    1306                 :             :         }
    1307                 :      113527 :       for (gimple_stmt_iterator gsi = gsi_start_bb (bbs[i]); !gsi_end_p (gsi);)
    1308                 :             :         {
    1309                 :       73899 :           gimple *stmt = gsi_stmt (gsi);
    1310                 :       73899 :           tree vdef = gimple_vdef (stmt);
    1311                 :       43578 :           if (vdef && TREE_CODE (vdef) == SSA_NAME)
    1312                 :       10412 :             mark_virtual_operand_for_renaming (vdef);
    1313                 :             :           /* Also move and eventually reset debug stmts.  We can leave
    1314                 :             :              constant values in place in case the stmt dominates the exit.
    1315                 :             :              ???  Non-constant values from the last iteration can be
    1316                 :             :              replaced with final values if we can compute them.  */
    1317                 :       73899 :           if (gimple_debug_bind_p (stmt))
    1318                 :             :             {
    1319                 :       13783 :               tree val = gimple_debug_bind_get_value (stmt);
    1320                 :       13783 :               gsi_move_before (&gsi, &dst_gsi);
    1321                 :       13783 :               if (val
    1322                 :       13783 :                   && (!safe_p
    1323                 :        8767 :                       || !is_gimple_min_invariant (val)
    1324                 :         418 :                       || !dominated_by_p (CDI_DOMINATORS, exit->src, bbs[i])))
    1325                 :             :                 {
    1326                 :        8378 :                   gimple_debug_bind_reset_value (stmt);
    1327                 :        8378 :                   update_stmt (stmt);
    1328                 :             :                 }
    1329                 :             :             }
    1330                 :             :           else
    1331                 :       60116 :             gsi_next (&gsi);
    1332                 :             :         }
    1333                 :             :     }
    1334                 :             : 
    1335                 :        9147 :   redirect_edge_pred (exit, src);
    1336                 :        9147 :   exit->flags &= ~(EDGE_TRUE_VALUE|EDGE_FALSE_VALUE);
    1337                 :        9147 :   exit->flags |= EDGE_FALLTHRU;
    1338                 :        9147 :   cancel_loop_tree (loop);
    1339                 :        9147 :   rescan_loop_exit (exit, false, true);
    1340                 :             : 
    1341                 :        9147 :   i = nbbs;
    1342                 :       19814 :   do
    1343                 :             :     {
    1344                 :       19814 :       --i;
    1345                 :       19814 :       delete_basic_block (bbs[i]);
    1346                 :             :     }
    1347                 :       19814 :   while (i != 0);
    1348                 :             : 
    1349                 :        9147 :   free (bbs);
    1350                 :             : 
    1351                 :        9147 :   set_immediate_dominator (CDI_DOMINATORS, dest,
    1352                 :             :                            recompute_dominator (CDI_DOMINATORS, dest));
    1353                 :        9147 : }
    1354                 :             : 
    1355                 :             : /* Generates code for PARTITION.  Return whether LOOP needs to be destroyed.  */
    1356                 :             : 
    1357                 :             : static bool 
    1358                 :       13354 : generate_code_for_partition (class loop *loop,
    1359                 :             :                              partition *partition, bool copy_p,
    1360                 :             :                              bool keep_lc_phis_p)
    1361                 :             : {
    1362                 :       13354 :   switch (partition->kind)
    1363                 :             :     {
    1364                 :        2541 :     case PKIND_NORMAL:
    1365                 :        2541 :     case PKIND_PARTIAL_MEMSET:
    1366                 :             :       /* Reductions all have to be in the last partition.  */
    1367                 :        2541 :       gcc_assert (!partition_reduction_p (partition)
    1368                 :             :                   || !copy_p);
    1369                 :        2541 :       generate_loops_for_partition (loop, partition, copy_p,
    1370                 :             :                                     keep_lc_phis_p);
    1371                 :        2541 :       return false;
    1372                 :             : 
    1373                 :        7297 :     case PKIND_MEMSET:
    1374                 :        7297 :       generate_memset_builtin (loop, partition);
    1375                 :        7297 :       break;
    1376                 :             : 
    1377                 :        3516 :     case PKIND_MEMCPY:
    1378                 :        3516 :     case PKIND_MEMMOVE:
    1379                 :        3516 :       generate_memcpy_builtin (loop, partition);
    1380                 :        3516 :       break;
    1381                 :             : 
    1382                 :           0 :     default:
    1383                 :           0 :       gcc_unreachable ();
    1384                 :             :     }
    1385                 :             : 
    1386                 :             :   /* Common tail for partitions we turn into a call.  If this was the last
    1387                 :             :      partition for which we generate code, we have to destroy the loop.  */
    1388                 :       10813 :   if (!copy_p)
    1389                 :             :     return true;
    1390                 :             :   return false;
    1391                 :             : }
    1392                 :             : 
    1393                 :             : data_dependence_relation *
    1394                 :     1693289 : loop_distribution::get_data_dependence (struct graph *rdg, data_reference_p a,
    1395                 :             :                                         data_reference_p b)
    1396                 :             : {
    1397                 :     1693289 :   struct data_dependence_relation ent, **slot;
    1398                 :     1693289 :   struct data_dependence_relation *ddr;
    1399                 :             : 
    1400                 :     1693289 :   gcc_assert (DR_IS_WRITE (a) || DR_IS_WRITE (b));
    1401                 :     1693289 :   gcc_assert (rdg_vertex_for_stmt (rdg, DR_STMT (a))
    1402                 :             :               <= rdg_vertex_for_stmt (rdg, DR_STMT (b)));
    1403                 :     1693289 :   ent.a = a;
    1404                 :     1693289 :   ent.b = b;
    1405                 :     1693289 :   slot = ddrs_table->find_slot (&ent, INSERT);
    1406                 :     1693289 :   if (*slot == NULL)
    1407                 :             :     {
    1408                 :      957126 :       ddr = initialize_data_dependence_relation (a, b, loop_nest);
    1409                 :      957126 :       compute_affine_dependence (ddr, loop_nest[0]);
    1410                 :      957126 :       *slot = ddr;
    1411                 :             :     }
    1412                 :             : 
    1413                 :     1693289 :   return *slot;
    1414                 :             : }
    1415                 :             : 
    1416                 :             : bool
    1417                 :      239489 : loop_distribution::data_dep_in_cycle_p (struct graph *rdg,
    1418                 :             :                                         data_reference_p dr1,
    1419                 :             :                                         data_reference_p dr2)
    1420                 :             : {
    1421                 :      239489 :   struct data_dependence_relation *ddr;
    1422                 :             : 
    1423                 :             :   /* Re-shuffle data-refs to be in topological order.  */
    1424                 :      478978 :   if (rdg_vertex_for_stmt (rdg, DR_STMT (dr1))
    1425                 :      239489 :       > rdg_vertex_for_stmt (rdg, DR_STMT (dr2)))
    1426                 :       56455 :     std::swap (dr1, dr2);
    1427                 :             : 
    1428                 :      239489 :   ddr = get_data_dependence (rdg, dr1, dr2);
    1429                 :             : 
    1430                 :             :   /* In case of no data dependence.  */
    1431                 :      239489 :   if (DDR_ARE_DEPENDENT (ddr) == chrec_known)
    1432                 :             :     return false;
    1433                 :             :   /* For unknown data dependence or known data dependence which can't be
    1434                 :             :      expressed in classic distance vector, we check if it can be resolved
    1435                 :             :      by runtime alias check.  If yes, we still consider data dependence
    1436                 :             :      as won't introduce data dependence cycle.  */
    1437                 :       81139 :   else if (DDR_ARE_DEPENDENT (ddr) == chrec_dont_know
    1438                 :       81139 :            || DDR_NUM_DIST_VECTS (ddr) == 0)
    1439                 :       42569 :     return !runtime_alias_check_p (ddr, NULL, true);
    1440                 :       38570 :   else if (DDR_NUM_DIST_VECTS (ddr) > 1)
    1441                 :             :     return true;
    1442                 :       34435 :   else if (DDR_REVERSED_P (ddr)
    1443                 :      102117 :            || lambda_vector_zerop (DDR_DIST_VECT (ddr, 0), DDR_NB_LOOPS (ddr)))
    1444                 :       30576 :     return false;
    1445                 :             : 
    1446                 :             :   return true;
    1447                 :             : }
    1448                 :             : 
    1449                 :             : void
    1450                 :      196660 : loop_distribution::update_type_for_merge (struct graph *rdg,
    1451                 :             :                                            partition *partition1,
    1452                 :             :                                            partition *partition2)
    1453                 :             : {
    1454                 :      196660 :   unsigned i, j;
    1455                 :      196660 :   bitmap_iterator bi, bj;
    1456                 :      196660 :   data_reference_p dr1, dr2;
    1457                 :             : 
    1458                 :      613440 :   EXECUTE_IF_SET_IN_BITMAP (partition1->datarefs, 0, i, bi)
    1459                 :             :     {
    1460                 :      424775 :       unsigned start = (partition1 == partition2) ? i + 1 : 0;
    1461                 :             : 
    1462                 :      424775 :       dr1 = datarefs_vec[i];
    1463                 :     1029064 :       EXECUTE_IF_SET_IN_BITMAP (partition2->datarefs, start, j, bj)
    1464                 :             :         {
    1465                 :      612284 :           dr2 = datarefs_vec[j];
    1466                 :      612284 :           if (DR_IS_READ (dr1) && DR_IS_READ (dr2))
    1467                 :      372795 :             continue;
    1468                 :             : 
    1469                 :             :           /* Partition can only be executed sequentially if there is any
    1470                 :             :              data dependence cycle.  */
    1471                 :      239489 :           if (data_dep_in_cycle_p (rdg, dr1, dr2))
    1472                 :             :             {
    1473                 :        7995 :               partition1->type = PTYPE_SEQUENTIAL;
    1474                 :        7995 :               return;
    1475                 :             :             }
    1476                 :             :         }
    1477                 :             :     }
    1478                 :             : }
    1479                 :             : 
    1480                 :             : partition *
    1481                 :      204085 : loop_distribution::build_rdg_partition_for_vertex (struct graph *rdg, int v)
    1482                 :             : {
    1483                 :      204085 :   partition *partition = partition_alloc ();
    1484                 :      204085 :   auto_vec<int, 3> nodes;
    1485                 :      204085 :   unsigned i, j;
    1486                 :      204085 :   int x;
    1487                 :      204085 :   data_reference_p dr;
    1488                 :             : 
    1489                 :      204085 :   graphds_dfs (rdg, &v, 1, &nodes, false, NULL);
    1490                 :             : 
    1491                 :     2965071 :   FOR_EACH_VEC_ELT (nodes, i, x)
    1492                 :             :     {
    1493                 :     2760986 :       bitmap_set_bit (partition->stmts, x);
    1494                 :             : 
    1495                 :     3615923 :       for (j = 0; RDG_DATAREFS (rdg, x).iterate (j, &dr); ++j)
    1496                 :             :         {
    1497                 :      429569 :           unsigned idx = (unsigned) DR_INDEX (dr);
    1498                 :      429569 :           gcc_assert (idx < datarefs_vec.length ());
    1499                 :             : 
    1500                 :             :           /* Partition can only be executed sequentially if there is any
    1501                 :             :              unknown data reference.  */
    1502                 :      429569 :           if (!DR_BASE_ADDRESS (dr) || !DR_OFFSET (dr)
    1503                 :      408268 :               || !DR_INIT (dr) || !DR_STEP (dr))
    1504                 :       21301 :             partition->type = PTYPE_SEQUENTIAL;
    1505                 :             : 
    1506                 :      429569 :           bitmap_set_bit (partition->datarefs, idx);
    1507                 :             :         }
    1508                 :             :     }
    1509                 :             : 
    1510                 :      204085 :   if (partition->type == PTYPE_SEQUENTIAL)
    1511                 :             :     return partition;
    1512                 :             : 
    1513                 :             :   /* Further check if any data dependence prevents us from executing the
    1514                 :             :      partition parallelly.  */
    1515                 :      191299 :   update_type_for_merge (rdg, partition, partition);
    1516                 :             : 
    1517                 :      191299 :   return partition;
    1518                 :      204085 : }
    1519                 :             : 
    1520                 :             : /* Given PARTITION of LOOP and RDG, record single load/store data references
    1521                 :             :    for builtin partition in SRC_DR/DST_DR, return false if there is no such
    1522                 :             :    data references.  */
    1523                 :             : 
    1524                 :             : static bool
    1525                 :      186055 : find_single_drs (class loop *loop, struct graph *rdg, const bitmap &partition_stmts,
    1526                 :             :                  data_reference_p *dst_dr, data_reference_p *src_dr)
    1527                 :             : {
    1528                 :      186055 :   unsigned i;
    1529                 :      186055 :   data_reference_p single_ld = NULL, single_st = NULL;
    1530                 :      186055 :   bitmap_iterator bi;
    1531                 :             : 
    1532                 :     1942281 :   EXECUTE_IF_SET_IN_BITMAP (partition_stmts, 0, i, bi)
    1533                 :             :     {
    1534                 :     1810155 :       gimple *stmt = RDG_STMT (rdg, i);
    1535                 :     1810155 :       data_reference_p dr;
    1536                 :             : 
    1537                 :     1810155 :       if (gimple_code (stmt) == GIMPLE_PHI)
    1538                 :      402990 :         continue;
    1539                 :             : 
    1540                 :             :       /* Any scalar stmts are ok.  */
    1541                 :     2627131 :       if (!gimple_vuse (stmt))
    1542                 :     1116724 :         continue;
    1543                 :             : 
    1544                 :             :       /* Otherwise just regular loads/stores.  */
    1545                 :      290441 :       if (!gimple_assign_single_p (stmt))
    1546                 :      186055 :         return false;
    1547                 :             : 
    1548                 :             :       /* But exactly one store and/or load.  */
    1549                 :     2286753 :       for (unsigned j = 0; RDG_DATAREFS (rdg, i).iterate (j, &dr); ++j)
    1550                 :             :         {
    1551                 :      294028 :           tree type = TREE_TYPE (DR_REF (dr));
    1552                 :             : 
    1553                 :             :           /* The memset, memcpy and memmove library calls are only
    1554                 :             :              able to deal with generic address space.  */
    1555                 :      294028 :           if (!ADDR_SPACE_GENERIC_P (TYPE_ADDR_SPACE (type)))
    1556                 :             :             return false;
    1557                 :             : 
    1558                 :      294010 :           if (DR_IS_READ (dr))
    1559                 :             :             {
    1560                 :      176725 :               if (single_ld != NULL)
    1561                 :             :                 return false;
    1562                 :             :               single_ld = dr;
    1563                 :             :             }
    1564                 :             :           else
    1565                 :             :             {
    1566                 :      117285 :               if (single_st != NULL)
    1567                 :             :                 return false;
    1568                 :             :               single_st = dr;
    1569                 :             :             }
    1570                 :             :         }
    1571                 :             :     }
    1572                 :             : 
    1573                 :      132126 :   if (!single_ld && !single_st)
    1574                 :             :     return false;
    1575                 :             : 
    1576                 :      127182 :   basic_block bb_ld = NULL;
    1577                 :      127182 :   basic_block bb_st = NULL;
    1578                 :      127182 :   edge exit = single_exit (loop);
    1579                 :             : 
    1580                 :      127182 :   if (single_ld)
    1581                 :             :     {
    1582                 :             :       /* Bail out if this is a bitfield memory reference.  */
    1583                 :       68956 :       if (TREE_CODE (DR_REF (single_ld)) == COMPONENT_REF
    1584                 :       68956 :           && DECL_BIT_FIELD (TREE_OPERAND (DR_REF (single_ld), 1)))
    1585                 :             :         return false;
    1586                 :             : 
    1587                 :             :       /* Data reference must be executed exactly once per iteration of each
    1588                 :             :          loop in the loop nest.  We only need to check dominance information
    1589                 :             :          against the outermost one in a perfect loop nest because a bb can't
    1590                 :             :          dominate outermost loop's latch without dominating inner loop's.  */
    1591                 :       68888 :       bb_ld = gimple_bb (DR_STMT (single_ld));
    1592                 :       68888 :       if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb_ld))
    1593                 :             :         return false;
    1594                 :             : 
    1595                 :             :       /* The data reference must also be executed before possibly exiting
    1596                 :             :          the loop as otherwise we'd for example unconditionally execute
    1597                 :             :          memset (ptr, 0, n) which even with n == 0 implies ptr is non-NULL.  */
    1598                 :       62904 :       if (bb_ld != loop->header
    1599                 :       62904 :           && (!exit
    1600                 :        8719 :               || !dominated_by_p (CDI_DOMINATORS, exit->src, bb_ld)))
    1601                 :        1585 :         return false;
    1602                 :             :     }
    1603                 :             : 
    1604                 :      119545 :   if (single_st)
    1605                 :             :     {
    1606                 :             :       /* Bail out if this is a bitfield memory reference.  */
    1607                 :      108819 :       if (TREE_CODE (DR_REF (single_st)) == COMPONENT_REF
    1608                 :      108819 :           && DECL_BIT_FIELD (TREE_OPERAND (DR_REF (single_st), 1)))
    1609                 :             :         return false;
    1610                 :             : 
    1611                 :             :       /* Data reference must be executed exactly once per iteration.
    1612                 :             :          Same as single_ld, we only need to check against the outermost
    1613                 :             :          loop.  */
    1614                 :      108684 :       bb_st = gimple_bb (DR_STMT (single_st));
    1615                 :      108684 :       if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb_st))
    1616                 :             :         return false;
    1617                 :             : 
    1618                 :             :       /* And before exiting the loop.  */
    1619                 :      104631 :       if (bb_st != loop->header
    1620                 :      104631 :           && (!exit
    1621                 :       14500 :               || !dominated_by_p (CDI_DOMINATORS, exit->src, bb_st)))
    1622                 :         991 :         return false;
    1623                 :             :     }
    1624                 :             : 
    1625                 :      114366 :   if (single_ld && single_st)
    1626                 :             :     {
    1627                 :             :       /* Load and store must be in the same loop nest.  */
    1628                 :       48945 :       if (bb_st->loop_father != bb_ld->loop_father)
    1629                 :             :         return false;
    1630                 :             : 
    1631                 :       48338 :       edge e = single_exit (bb_st->loop_father);
    1632                 :       48338 :       bool dom_ld = dominated_by_p (CDI_DOMINATORS, e->src, bb_ld);
    1633                 :       48338 :       bool dom_st = dominated_by_p (CDI_DOMINATORS, e->src, bb_st);
    1634                 :       48338 :       if (dom_ld != dom_st)
    1635                 :             :         return false;
    1636                 :             :     }
    1637                 :             : 
    1638                 :      113759 :   *src_dr = single_ld;
    1639                 :      113759 :   *dst_dr = single_st;
    1640                 :      113759 :   return true;
    1641                 :             : }
    1642                 :             : 
    1643                 :             : /* Given data reference DR in LOOP_NEST, this function checks the enclosing
    1644                 :             :    loops from inner to outer to see if loop's step equals to access size at
    1645                 :             :    each level of loop.  Return 2 if we can prove this at all level loops;
    1646                 :             :    record access base and size in BASE and SIZE; save loop's step at each
    1647                 :             :    level of loop in STEPS if it is not null.  For example:
    1648                 :             : 
    1649                 :             :      int arr[100][100][100];
    1650                 :             :      for (i = 0; i < 100; i++)       ;steps[2] = 40000
    1651                 :             :        for (j = 100; j > 0; j--)     ;steps[1] = -400
    1652                 :             :          for (k = 0; k < 100; k++)   ;steps[0] = 4
    1653                 :             :            arr[i][j - 1][k] = 0;     ;base = &arr, size = 4000000
    1654                 :             : 
    1655                 :             :    Return 1 if we can prove the equality at the innermost loop, but not all
    1656                 :             :    level loops.  In this case, no information is recorded.
    1657                 :             : 
    1658                 :             :    Return 0 if no equality can be proven at any level loops.  */
    1659                 :             : 
    1660                 :             : static int
    1661                 :       59563 : compute_access_range (loop_p loop_nest, data_reference_p dr, tree *base,
    1662                 :             :                       tree *size, vec<tree> *steps = NULL)
    1663                 :             : {
    1664                 :       59563 :   location_t loc = gimple_location (DR_STMT (dr));
    1665                 :       59563 :   basic_block bb = gimple_bb (DR_STMT (dr));
    1666                 :       59563 :   class loop *loop = bb->loop_father;
    1667                 :       59563 :   tree ref = DR_REF (dr);
    1668                 :       59563 :   tree access_base = build_fold_addr_expr (ref);
    1669                 :       59563 :   tree access_size = TYPE_SIZE_UNIT (TREE_TYPE (ref));
    1670                 :       59563 :   int res = 0;
    1671                 :             : 
    1672                 :       60934 :   do {
    1673                 :       60934 :       tree scev_fn = analyze_scalar_evolution (loop, access_base);
    1674                 :       60934 :       if (TREE_CODE (scev_fn) != POLYNOMIAL_CHREC)
    1675                 :       28951 :         return res;
    1676                 :             : 
    1677                 :       59557 :       access_base = CHREC_LEFT (scev_fn);
    1678                 :       59557 :       if (tree_contains_chrecs (access_base, NULL))
    1679                 :           0 :         return res;
    1680                 :             : 
    1681                 :       59557 :       tree scev_step = CHREC_RIGHT (scev_fn);
    1682                 :             :       /* Only support constant steps.  */
    1683                 :       59557 :       if (TREE_CODE (scev_step) != INTEGER_CST)
    1684                 :        3790 :         return res;
    1685                 :             : 
    1686                 :       55767 :       enum ev_direction access_dir = scev_direction (scev_fn);
    1687                 :       55767 :       if (access_dir == EV_DIR_UNKNOWN)
    1688                 :           0 :         return res;
    1689                 :             : 
    1690                 :       55767 :       if (steps != NULL)
    1691                 :       39190 :         steps->safe_push (scev_step);
    1692                 :             : 
    1693                 :       55767 :       scev_step = fold_convert_loc (loc, sizetype, scev_step);
    1694                 :             :       /* Compute absolute value of scev step.  */
    1695                 :       55767 :       if (access_dir == EV_DIR_DECREASES)
    1696                 :        1209 :         scev_step = fold_build1_loc (loc, NEGATE_EXPR, sizetype, scev_step);
    1697                 :             : 
    1698                 :             :       /* At each level of loop, scev step must equal to access size.  In other
    1699                 :             :          words, DR must access consecutive memory between loop iterations.  */
    1700                 :       55767 :       if (!operand_equal_p (scev_step, access_size, 0))
    1701                 :       23784 :         return res;
    1702                 :             : 
    1703                 :             :       /* Access stride can be computed for data reference at least for the
    1704                 :             :          innermost loop.  */
    1705                 :       31983 :       res = 1;
    1706                 :             : 
    1707                 :             :       /* Compute DR's execution times in loop.  */
    1708                 :       31983 :       tree niters = number_of_latch_executions (loop);
    1709                 :       31983 :       niters = fold_convert_loc (loc, sizetype, niters);
    1710                 :       31983 :       if (dominated_by_p (CDI_DOMINATORS, single_exit (loop)->src, bb))
    1711                 :       31983 :         niters = size_binop_loc (loc, PLUS_EXPR, niters, size_one_node);
    1712                 :             : 
    1713                 :             :       /* Compute DR's overall access size in loop.  */
    1714                 :       31983 :       access_size = fold_build2_loc (loc, MULT_EXPR, sizetype,
    1715                 :             :                                      niters, scev_step);
    1716                 :             :       /* Adjust base address in case of negative step.  */
    1717                 :       31983 :       if (access_dir == EV_DIR_DECREASES)
    1718                 :             :         {
    1719                 :         975 :           tree adj = fold_build2_loc (loc, MINUS_EXPR, sizetype,
    1720                 :             :                                       scev_step, access_size);
    1721                 :         975 :           access_base = fold_build_pointer_plus_loc (loc, access_base, adj);
    1722                 :             :         }
    1723                 :       31983 :   } while (loop != loop_nest && (loop = loop_outer (loop)) != NULL);
    1724                 :             : 
    1725                 :       30612 :   *base = access_base;
    1726                 :       30612 :   *size = access_size;
    1727                 :             :   /* Access stride can be computed for data reference at each level loop.  */
    1728                 :       30612 :   return 2;
    1729                 :             : }
    1730                 :             : 
    1731                 :             : /* Allocate and return builtin struct.  Record information like DST_DR,
    1732                 :             :    SRC_DR, DST_BASE, SRC_BASE and SIZE in the allocated struct.  */
    1733                 :             : 
    1734                 :             : static struct builtin_info *
    1735                 :       11260 : alloc_builtin (data_reference_p dst_dr, data_reference_p src_dr,
    1736                 :             :                tree dst_base, tree src_base, tree size)
    1737                 :             : {
    1738                 :           0 :   struct builtin_info *builtin = XNEW (struct builtin_info);
    1739                 :       11260 :   builtin->dst_dr = dst_dr;
    1740                 :       11260 :   builtin->src_dr = src_dr;
    1741                 :       11260 :   builtin->dst_base = dst_base;
    1742                 :       11260 :   builtin->src_base = src_base;
    1743                 :       11260 :   builtin->size = size;
    1744                 :       11260 :   return builtin;
    1745                 :             : }
    1746                 :             : 
    1747                 :             : /* Given data reference DR in loop nest LOOP, classify if it forms builtin
    1748                 :             :    memset call.  */
    1749                 :             : 
    1750                 :             : static void
    1751                 :       54678 : classify_builtin_st (loop_p loop, partition *partition, data_reference_p dr)
    1752                 :             : {
    1753                 :       54678 :   gimple *stmt = DR_STMT (dr);
    1754                 :       54678 :   tree base, size, rhs = gimple_assign_rhs1 (stmt);
    1755                 :             : 
    1756                 :       54678 :   if (const_with_all_bytes_same (rhs) == -1
    1757                 :       54678 :       && (!INTEGRAL_TYPE_P (TREE_TYPE (rhs))
    1758                 :       49348 :           || (TYPE_MODE (TREE_TYPE (rhs))
    1759                 :       24674 :               != TYPE_MODE (unsigned_char_type_node))))
    1760                 :       47255 :     return;
    1761                 :             : 
    1762                 :       22268 :   if (TREE_CODE (rhs) == SSA_NAME
    1763                 :        4730 :       && !SSA_NAME_IS_DEFAULT_DEF (rhs)
    1764                 :       26480 :       && flow_bb_inside_loop_p (loop, gimple_bb (SSA_NAME_DEF_STMT (rhs))))
    1765                 :             :     return;
    1766                 :             : 
    1767                 :       18216 :   int res = compute_access_range (loop, dr, &base, &size);
    1768                 :       18216 :   if (res == 0)
    1769                 :             :     return;
    1770                 :        7500 :   if (res == 1)
    1771                 :             :     {
    1772                 :          77 :       partition->kind = PKIND_PARTIAL_MEMSET;
    1773                 :          77 :       return;
    1774                 :             :     }
    1775                 :             : 
    1776                 :        7423 :   tree base_offset;
    1777                 :        7423 :   tree base_base;
    1778                 :        7423 :   split_constant_offset (base, &base_base, &base_offset);
    1779                 :        7423 :   if (!cst_and_fits_in_hwi (base_offset))
    1780                 :             :     return;
    1781                 :        7423 :   unsigned HOST_WIDE_INT const_base_offset = int_cst_value (base_offset);
    1782                 :             : 
    1783                 :        7423 :   struct builtin_info *builtin;
    1784                 :        7423 :   builtin = alloc_builtin (dr, NULL, base, NULL_TREE, size);
    1785                 :        7423 :   builtin->dst_base_base = base_base;
    1786                 :        7423 :   builtin->dst_base_offset = const_base_offset;
    1787                 :        7423 :   partition->builtin = builtin;
    1788                 :        7423 :   partition->kind = PKIND_MEMSET;
    1789                 :             : }
    1790                 :             : 
    1791                 :             : /* Given data references DST_DR and SRC_DR in loop nest LOOP and RDG, classify
    1792                 :             :    if it forms builtin memcpy or memmove call.  */
    1793                 :             : 
    1794                 :             : void
    1795                 :       27416 : loop_distribution::classify_builtin_ldst (loop_p loop, struct graph *rdg,
    1796                 :             :                                           partition *partition,
    1797                 :             :                                           data_reference_p dst_dr,
    1798                 :             :                                           data_reference_p src_dr)
    1799                 :             : {
    1800                 :       27416 :   tree base, size, src_base, src_size;
    1801                 :       27416 :   auto_vec<tree> dst_steps, src_steps;
    1802                 :             : 
    1803                 :             :   /* Compute access range of both load and store.  */
    1804                 :       27416 :   int res = compute_access_range (loop, dst_dr, &base, &size, &dst_steps);
    1805                 :       27416 :   if (res != 2)
    1806                 :             :     return;
    1807                 :       13931 :   res = compute_access_range (loop, src_dr, &src_base, &src_size, &src_steps);
    1808                 :       13931 :   if (res != 2)
    1809                 :             :     return;
    1810                 :             : 
    1811                 :             :   /* They must have the same access size.  */
    1812                 :        9258 :   if (!operand_equal_p (size, src_size, 0))
    1813                 :             :     return;
    1814                 :             : 
    1815                 :             :   /* They must have the same storage order.  */
    1816                 :       18516 :   if (reverse_storage_order_for_component_p (DR_REF (dst_dr))
    1817                 :        9258 :       != reverse_storage_order_for_component_p (DR_REF (src_dr)))
    1818                 :             :     return;
    1819                 :             : 
    1820                 :             :   /* Load and store in loop nest must access memory in the same way, i.e,
    1821                 :             :      their must have the same steps in each loop of the nest.  */
    1822                 :       27774 :   if (dst_steps.length () != src_steps.length ())
    1823                 :             :     return;
    1824                 :       37278 :   for (unsigned i = 0; i < dst_steps.length (); ++i)
    1825                 :        9629 :     if (!operand_equal_p (dst_steps[i], src_steps[i], 0))
    1826                 :             :       return;
    1827                 :             : 
    1828                 :             :   /* Now check that if there is a dependence.  */
    1829                 :        9010 :   ddr_p ddr = get_data_dependence (rdg, src_dr, dst_dr);
    1830                 :             : 
    1831                 :             :   /* Classify as memmove if no dependence between load and store.  */
    1832                 :        9010 :   if (DDR_ARE_DEPENDENT (ddr) == chrec_known)
    1833                 :             :     {
    1834                 :        3666 :       partition->builtin = alloc_builtin (dst_dr, src_dr, base, src_base, size);
    1835                 :        3666 :       partition->kind = PKIND_MEMMOVE;
    1836                 :        3666 :       return;
    1837                 :             :     }
    1838                 :             : 
    1839                 :             :   /* Can't do memmove in case of unknown dependence or dependence without
    1840                 :             :      classical distance vector.  */
    1841                 :        5344 :   if (DDR_ARE_DEPENDENT (ddr) == chrec_dont_know
    1842                 :       27807 :       || DDR_NUM_DIST_VECTS (ddr) == 0)
    1843                 :             :     return;
    1844                 :             : 
    1845                 :         391 :   unsigned i;
    1846                 :         391 :   lambda_vector dist_v;
    1847                 :         391 :   int num_lev = (DDR_LOOP_NEST (ddr)).length ();
    1848                 :         739 :   FOR_EACH_VEC_ELT (DDR_DIST_VECTS (ddr), i, dist_v)
    1849                 :             :     {
    1850                 :         348 :       unsigned dep_lev = dependence_level (dist_v, num_lev);
    1851                 :             :       /* Can't do memmove if load depends on store.  */
    1852                 :         374 :       if (dep_lev > 0 && dist_v[dep_lev - 1] > 0 && !DDR_REVERSED_P (ddr))
    1853                 :             :         return;
    1854                 :             :     }
    1855                 :             : 
    1856                 :         171 :   partition->builtin = alloc_builtin (dst_dr, src_dr, base, src_base, size);
    1857                 :         171 :   partition->kind = PKIND_MEMMOVE;
    1858                 :         171 :   return;
    1859                 :       27416 : }
    1860                 :             : 
    1861                 :             : bool
    1862                 :      204085 : loop_distribution::classify_partition (loop_p loop,
    1863                 :             :                                        struct graph *rdg, partition *partition,
    1864                 :             :                                        bitmap stmt_in_all_partitions)
    1865                 :             : {
    1866                 :      204085 :   bitmap_iterator bi;
    1867                 :      204085 :   unsigned i;
    1868                 :      204085 :   data_reference_p single_ld = NULL, single_st = NULL;
    1869                 :      204085 :   bool volatiles_p = false, has_reduction = false;
    1870                 :             : 
    1871                 :     2965071 :   EXECUTE_IF_SET_IN_BITMAP (partition->stmts, 0, i, bi)
    1872                 :             :     {
    1873                 :     2760986 :       gimple *stmt = RDG_STMT (rdg, i);
    1874                 :             : 
    1875                 :     4702126 :       if (gimple_has_volatile_ops (stmt))
    1876                 :     2760986 :         volatiles_p = true;
    1877                 :             : 
    1878                 :             :       /* If the stmt is not included by all partitions and there is uses
    1879                 :             :          outside of the loop, then mark the partition as reduction.  */
    1880                 :     2760986 :       if (stmt_has_scalar_dependences_outside_loop (loop, stmt))
    1881                 :             :         {
    1882                 :             :           /* Due to limitation in the transform phase we have to fuse all
    1883                 :             :              reduction partitions.  As a result, this could cancel valid
    1884                 :             :              loop distribution especially for loop that induction variable
    1885                 :             :              is used outside of loop.  To workaround this issue, we skip
    1886                 :             :              marking partition as reudction if the reduction stmt belongs
    1887                 :             :              to all partitions.  In such case, reduction will be computed
    1888                 :             :              correctly no matter how partitions are fused/distributed.  */
    1889                 :       54185 :           if (!bitmap_bit_p (stmt_in_all_partitions, i))
    1890                 :       23040 :             partition->reduction_p = true;
    1891                 :             :           else
    1892                 :             :             has_reduction = true;
    1893                 :             :         }
    1894                 :             :     }
    1895                 :             : 
    1896                 :             :   /* Simple workaround to prevent classifying the partition as builtin
    1897                 :             :      if it contains any use outside of loop.  For the case where all
    1898                 :             :      partitions have the reduction this simple workaround is delayed
    1899                 :             :      to only affect the last partition.  */
    1900                 :      204085 :   if (partition->reduction_p)
    1901                 :             :      return has_reduction;
    1902                 :             : 
    1903                 :             :   /* Perform general partition disqualification for builtins.  */
    1904                 :      182325 :   if (volatiles_p
    1905                 :      182325 :       || !flag_tree_loop_distribute_patterns)
    1906                 :             :     return has_reduction;
    1907                 :             : 
    1908                 :             :   /* Find single load/store data references for builtin partition.  */
    1909                 :      180575 :   if (!find_single_drs (loop, rdg, partition->stmts, &single_st, &single_ld)
    1910                 :      180575 :       || !single_st)
    1911                 :             :     return has_reduction;
    1912                 :             : 
    1913                 :      102921 :   if (single_ld && single_st)
    1914                 :             :     {
    1915                 :       48243 :       gimple *store = DR_STMT (single_st), *load = DR_STMT (single_ld);
    1916                 :             :       /* Direct aggregate copy or via an SSA name temporary.  */
    1917                 :       48243 :       if (load != store
    1918                 :       48243 :           && gimple_assign_lhs (load) != gimple_assign_rhs1 (store))
    1919                 :             :         return has_reduction;
    1920                 :             :     }
    1921                 :             : 
    1922                 :       82094 :   partition->loc = gimple_location (DR_STMT (single_st));
    1923                 :             : 
    1924                 :             :   /* Classify the builtin kind.  */
    1925                 :       82094 :   if (single_ld == NULL)
    1926                 :       54678 :     classify_builtin_st (loop, partition, single_st);
    1927                 :             :   else
    1928                 :       27416 :     classify_builtin_ldst (loop, rdg, partition, single_st, single_ld);
    1929                 :             :   return has_reduction;
    1930                 :             : }
    1931                 :             : 
    1932                 :             : bool
    1933                 :      657352 : loop_distribution::share_memory_accesses (struct graph *rdg,
    1934                 :             :                        partition *partition1, partition *partition2)
    1935                 :             : {
    1936                 :      657352 :   unsigned i, j;
    1937                 :      657352 :   bitmap_iterator bi, bj;
    1938                 :      657352 :   data_reference_p dr1, dr2;
    1939                 :             : 
    1940                 :             :   /* First check whether in the intersection of the two partitions are
    1941                 :             :      any loads or stores.  Common loads are the situation that happens
    1942                 :             :      most often.  */
    1943                 :     4832918 :   EXECUTE_IF_AND_IN_BITMAP (partition1->stmts, partition2->stmts, 0, i, bi)
    1944                 :     4179479 :     if (RDG_MEM_WRITE_STMT (rdg, i)
    1945                 :     4179479 :         || RDG_MEM_READS_STMT (rdg, i))
    1946                 :             :       return true;
    1947                 :             : 
    1948                 :             :   /* Then check whether the two partitions access the same memory object.  */
    1949                 :     1374856 :   EXECUTE_IF_SET_IN_BITMAP (partition1->datarefs, 0, i, bi)
    1950                 :             :     {
    1951                 :      723353 :       dr1 = datarefs_vec[i];
    1952                 :             : 
    1953                 :      723353 :       if (!DR_BASE_ADDRESS (dr1)
    1954                 :      722592 :           || !DR_OFFSET (dr1) || !DR_INIT (dr1) || !DR_STEP (dr1))
    1955                 :         761 :         continue;
    1956                 :             : 
    1957                 :     1572032 :       EXECUTE_IF_SET_IN_BITMAP (partition2->datarefs, 0, j, bj)
    1958                 :             :         {
    1959                 :      851376 :           dr2 = datarefs_vec[j];
    1960                 :             : 
    1961                 :      851376 :           if (!DR_BASE_ADDRESS (dr2)
    1962                 :      851080 :               || !DR_OFFSET (dr2) || !DR_INIT (dr2) || !DR_STEP (dr2))
    1963                 :         296 :             continue;
    1964                 :             : 
    1965                 :      851080 :           if (operand_equal_p (DR_BASE_ADDRESS (dr1), DR_BASE_ADDRESS (dr2), 0)
    1966                 :      733714 :               && operand_equal_p (DR_OFFSET (dr1), DR_OFFSET (dr2), 0)
    1967                 :      733056 :               && operand_equal_p (DR_INIT (dr1), DR_INIT (dr2), 0)
    1968                 :      853044 :               && operand_equal_p (DR_STEP (dr1), DR_STEP (dr2), 0))
    1969                 :             :             return true;
    1970                 :             :         }
    1971                 :             :     }
    1972                 :             : 
    1973                 :             :   return false;
    1974                 :             : }
    1975                 :             : 
    1976                 :             : /* For each seed statement in STARTING_STMTS, this function builds
    1977                 :             :    partition for it by adding depended statements according to RDG.
    1978                 :             :    All partitions are recorded in PARTITIONS.  */
    1979                 :             : 
    1980                 :             : void
    1981                 :      114653 : loop_distribution::rdg_build_partitions (struct graph *rdg,
    1982                 :             :                                          vec<gimple *> starting_stmts,
    1983                 :             :                                          vec<partition *> *partitions)
    1984                 :             : {
    1985                 :      114653 :   auto_bitmap processed;
    1986                 :      114653 :   int i;
    1987                 :      114653 :   gimple *stmt;
    1988                 :             : 
    1989                 :      438454 :   FOR_EACH_VEC_ELT (starting_stmts, i, stmt)
    1990                 :             :     {
    1991                 :      209148 :       int v = rdg_vertex_for_stmt (rdg, stmt);
    1992                 :             : 
    1993                 :      209148 :       if (dump_file && (dump_flags & TDF_DETAILS))
    1994                 :         138 :         fprintf (dump_file,
    1995                 :             :                  "ldist asked to generate code for vertex %d\n", v);
    1996                 :             : 
    1997                 :             :       /* If the vertex is already contained in another partition so
    1998                 :             :          is the partition rooted at it.  */
    1999                 :      209148 :       if (bitmap_bit_p (processed, v))
    2000                 :        5063 :         continue;
    2001                 :             : 
    2002                 :      204085 :       partition *partition = build_rdg_partition_for_vertex (rdg, v);
    2003                 :      204085 :       bitmap_ior_into (processed, partition->stmts);
    2004                 :             : 
    2005                 :      204085 :       if (dump_file && (dump_flags & TDF_DETAILS))
    2006                 :             :         {
    2007                 :         138 :           fprintf (dump_file, "ldist creates useful %s partition:\n",
    2008                 :         138 :                    partition->type == PTYPE_PARALLEL ? "parallel" : "sequent");
    2009                 :         138 :           bitmap_print (dump_file, partition->stmts, "  ", "\n");
    2010                 :             :         }
    2011                 :             : 
    2012                 :      204085 :       partitions->safe_push (partition);
    2013                 :             :     }
    2014                 :             : 
    2015                 :             :   /* All vertices should have been assigned to at least one partition now,
    2016                 :             :      other than vertices belonging to dead code.  */
    2017                 :      114653 : }
    2018                 :             : 
    2019                 :             : /* Dump to FILE the PARTITIONS.  */
    2020                 :             : 
    2021                 :             : static void
    2022                 :          39 : dump_rdg_partitions (FILE *file, const vec<partition *> &partitions)
    2023                 :             : {
    2024                 :          39 :   int i;
    2025                 :          39 :   partition *partition;
    2026                 :             : 
    2027                 :         102 :   FOR_EACH_VEC_ELT (partitions, i, partition)
    2028                 :          63 :     debug_bitmap_file (file, partition->stmts);
    2029                 :          39 : }
    2030                 :             : 
    2031                 :             : /* Debug PARTITIONS.  */
    2032                 :             : extern void debug_rdg_partitions (const vec<partition *> &);
    2033                 :             : 
    2034                 :             : DEBUG_FUNCTION void
    2035                 :           0 : debug_rdg_partitions (const vec<partition *> &partitions)
    2036                 :             : {
    2037                 :           0 :   dump_rdg_partitions (stderr, partitions);
    2038                 :           0 : }
    2039                 :             : 
    2040                 :             : /* Returns the number of read and write operations in the RDG.  */
    2041                 :             : 
    2042                 :             : static int
    2043                 :        2303 : number_of_rw_in_rdg (struct graph *rdg)
    2044                 :             : {
    2045                 :        2303 :   int i, res = 0;
    2046                 :             : 
    2047                 :       35281 :   for (i = 0; i < rdg->n_vertices; i++)
    2048                 :             :     {
    2049                 :       32978 :       if (RDG_MEM_WRITE_STMT (rdg, i))
    2050                 :        6707 :         ++res;
    2051                 :             : 
    2052                 :       32978 :       if (RDG_MEM_READS_STMT (rdg, i))
    2053                 :        4762 :         ++res;
    2054                 :             :     }
    2055                 :             : 
    2056                 :        2303 :   return res;
    2057                 :             : }
    2058                 :             : 
    2059                 :             : /* Returns the number of read and write operations in a PARTITION of
    2060                 :             :    the RDG.  */
    2061                 :             : 
    2062                 :             : static int
    2063                 :        5030 : number_of_rw_in_partition (struct graph *rdg, partition *partition)
    2064                 :             : {
    2065                 :        5030 :   int res = 0;
    2066                 :        5030 :   unsigned i;
    2067                 :        5030 :   bitmap_iterator ii;
    2068                 :             : 
    2069                 :       48671 :   EXECUTE_IF_SET_IN_BITMAP (partition->stmts, 0, i, ii)
    2070                 :             :     {
    2071                 :       43641 :       if (RDG_MEM_WRITE_STMT (rdg, i))
    2072                 :        6680 :         ++res;
    2073                 :             : 
    2074                 :       43641 :       if (RDG_MEM_READS_STMT (rdg, i))
    2075                 :        4762 :         ++res;
    2076                 :             :     }
    2077                 :             : 
    2078                 :        5030 :   return res;
    2079                 :             : }
    2080                 :             : 
    2081                 :             : /* Returns true when one of the PARTITIONS contains all the read or
    2082                 :             :    write operations of RDG.  */
    2083                 :             : 
    2084                 :             : static bool
    2085                 :        2303 : partition_contains_all_rw (struct graph *rdg,
    2086                 :             :                            const vec<partition *> &partitions)
    2087                 :             : {
    2088                 :        2303 :   int i;
    2089                 :        2303 :   partition *partition;
    2090                 :        2303 :   int nrw = number_of_rw_in_rdg (rdg);
    2091                 :             : 
    2092                 :        7273 :   FOR_EACH_VEC_ELT (partitions, i, partition)
    2093                 :        5030 :     if (nrw == number_of_rw_in_partition (rdg, partition))
    2094                 :             :       return true;
    2095                 :             : 
    2096                 :             :   return false;
    2097                 :             : }
    2098                 :             : 
    2099                 :             : int
    2100                 :     1265069 : loop_distribution::pg_add_dependence_edges (struct graph *rdg, int dir,
    2101                 :             :                          bitmap drs1, bitmap drs2, vec<ddr_p> *alias_ddrs)
    2102                 :             : {
    2103                 :     1265069 :   unsigned i, j;
    2104                 :     1265069 :   bitmap_iterator bi, bj;
    2105                 :     1265069 :   data_reference_p dr1, dr2, saved_dr1;
    2106                 :             : 
    2107                 :             :   /* dependence direction - 0 is no dependence, -1 is back,
    2108                 :             :      1 is forth, 2 is both (we can stop then, merging will occur).  */
    2109                 :     2584528 :   EXECUTE_IF_SET_IN_BITMAP (drs1, 0, i, bi)
    2110                 :             :     {
    2111                 :     1332766 :       dr1 = datarefs_vec[i];
    2112                 :             : 
    2113                 :     2850859 :       EXECUTE_IF_SET_IN_BITMAP (drs2, 0, j, bj)
    2114                 :             :         {
    2115                 :     1531400 :           int res, this_dir = 1;
    2116                 :     1531400 :           ddr_p ddr;
    2117                 :             : 
    2118                 :     1531400 :           dr2 = datarefs_vec[j];
    2119                 :             : 
    2120                 :             :           /* Skip all <read, read> data dependence.  */
    2121                 :     1531400 :           if (DR_IS_READ (dr1) && DR_IS_READ (dr2))
    2122                 :       86610 :             continue;
    2123                 :             : 
    2124                 :     1444790 :           saved_dr1 = dr1;
    2125                 :             :           /* Re-shuffle data-refs to be in topological order.  */
    2126                 :     2889580 :           if (rdg_vertex_for_stmt (rdg, DR_STMT (dr1))
    2127                 :     1444790 :               > rdg_vertex_for_stmt (rdg, DR_STMT (dr2)))
    2128                 :             :             {
    2129                 :       26368 :               std::swap (dr1, dr2);
    2130                 :       26368 :               this_dir = -this_dir;
    2131                 :             :             }
    2132                 :     1444790 :           ddr = get_data_dependence (rdg, dr1, dr2);
    2133                 :     1444790 :           if (DDR_ARE_DEPENDENT (ddr) == chrec_dont_know)
    2134                 :             :             {
    2135                 :       24438 :               this_dir = 0;
    2136                 :       24438 :               res = data_ref_compare_tree (DR_BASE_ADDRESS (dr1),
    2137                 :             :                                            DR_BASE_ADDRESS (dr2));
    2138                 :             :               /* Be conservative.  If data references are not well analyzed,
    2139                 :             :                  or the two data references have the same base address and
    2140                 :             :                  offset, add dependence and consider it alias to each other.
    2141                 :             :                  In other words, the dependence cannot be resolved by
    2142                 :             :                  runtime alias check.  */
    2143                 :       24438 :               if (!DR_BASE_ADDRESS (dr1) || !DR_BASE_ADDRESS (dr2)
    2144                 :       24127 :                   || !DR_OFFSET (dr1) || !DR_OFFSET (dr2)
    2145                 :       24127 :                   || !DR_INIT (dr1) || !DR_INIT (dr2)
    2146                 :       24127 :                   || !DR_STEP (dr1) || !tree_fits_uhwi_p (DR_STEP (dr1))
    2147                 :       16726 :                   || !DR_STEP (dr2) || !tree_fits_uhwi_p (DR_STEP (dr2))
    2148                 :       16715 :                   || res == 0)
    2149                 :             :                 this_dir = 2;
    2150                 :             :               /* Data dependence could be resolved by runtime alias check,
    2151                 :             :                  record it in ALIAS_DDRS.  */
    2152                 :       12316 :               else if (alias_ddrs != NULL)
    2153                 :        6101 :                 alias_ddrs->safe_push (ddr);
    2154                 :             :               /* Or simply ignore it.  */
    2155                 :             :             }
    2156                 :     1420352 :           else if (DDR_ARE_DEPENDENT (ddr) == NULL_TREE)
    2157                 :             :             {
    2158                 :             :               /* Known dependences can still be unordered througout the
    2159                 :             :                  iteration space, see gcc.dg/tree-ssa/ldist-16.c and
    2160                 :             :                  gcc.dg/tree-ssa/pr94969.c.  */
    2161                 :        2057 :               if (DDR_NUM_DIST_VECTS (ddr) != 1)
    2162                 :             :                 this_dir = 2;
    2163                 :             :               /* If the overlap is exact preserve stmt order.  */
    2164                 :        1798 :               else if (lambda_vector_zerop (DDR_DIST_VECT (ddr, 0),
    2165                 :        1798 :                                             DDR_NB_LOOPS (ddr)))
    2166                 :             :                 ;
    2167                 :             :               /* Else as the distance vector is lexicographic positive swap
    2168                 :             :                  the dependence direction.  */
    2169                 :             :               else
    2170                 :             :                 {
    2171                 :         854 :                   if (DDR_REVERSED_P (ddr))
    2172                 :          54 :                     this_dir = -this_dir;
    2173                 :         854 :                   this_dir = -this_dir;
    2174                 :             : 
    2175                 :             :                   /* When then dependence distance of the innermost common
    2176                 :             :                      loop of the DRs is zero we have a conflict.  */
    2177                 :         854 :                   auto l1 = gimple_bb (DR_STMT (dr1))->loop_father;
    2178                 :         854 :                   auto l2 = gimple_bb (DR_STMT (dr2))->loop_father;
    2179                 :         854 :                   int idx = index_in_loop_nest (find_common_loop (l1, l2)->num,
    2180                 :         854 :                                                 DDR_LOOP_NEST (ddr));
    2181                 :         854 :                   if (DDR_DIST_VECT (ddr, 0)[idx] == 0)
    2182                 :             :                     this_dir = 2;
    2183                 :             :                 }
    2184                 :             :             }
    2185                 :             :           else
    2186                 :             :             this_dir = 0;
    2187                 :        7030 :           if (this_dir == 2)
    2188                 :       13307 :             return 2;
    2189                 :     1431496 :           else if (dir == 0)
    2190                 :             :             dir = this_dir;
    2191                 :        8164 :           else if (this_dir != 0 && dir != this_dir)
    2192                 :             :             return 2;
    2193                 :             :           /* Shuffle "back" dr1.  */
    2194                 :     1431483 :           dr1 = saved_dr1;
    2195                 :             :         }
    2196                 :             :     }
    2197                 :             :   return dir;
    2198                 :             : }
    2199                 :             : 
    2200                 :             : /* Compare postorder number of the partition graph vertices V1 and V2.  */
    2201                 :             : 
    2202                 :             : static int
    2203                 :      707853 : pgcmp (const void *v1_, const void *v2_)
    2204                 :             : {
    2205                 :      707853 :   const vertex *v1 = (const vertex *)v1_;
    2206                 :      707853 :   const vertex *v2 = (const vertex *)v2_;
    2207                 :      707853 :   return v2->post - v1->post;
    2208                 :             : }
    2209                 :             : 
    2210                 :             : /* Data attached to vertices of partition dependence graph.  */
    2211                 :             : struct pg_vdata
    2212                 :             : {
    2213                 :             :   /* ID of the corresponding partition.  */
    2214                 :             :   int id;
    2215                 :             :   /* The partition.  */
    2216                 :             :   struct partition *partition;
    2217                 :             : };
    2218                 :             : 
    2219                 :             : /* Data attached to edges of partition dependence graph.  */
    2220                 :             : struct pg_edata
    2221                 :             : {
    2222                 :             :   /* If the dependence edge can be resolved by runtime alias check,
    2223                 :             :      this vector contains data dependence relations for runtime alias
    2224                 :             :      check.  On the other hand, if the dependence edge is introduced
    2225                 :             :      because of compilation time known data dependence, this vector
    2226                 :             :      contains nothing.  */
    2227                 :             :   vec<ddr_p> alias_ddrs;
    2228                 :             : };
    2229                 :             : 
    2230                 :             : /* Callback data for traversing edges in graph.  */
    2231                 :             : struct pg_edge_callback_data
    2232                 :             : {
    2233                 :             :   /* Bitmap contains strong connected components should be merged.  */
    2234                 :             :   bitmap sccs_to_merge;
    2235                 :             :   /* Array constains component information for all vertices.  */
    2236                 :             :   int *vertices_component;
    2237                 :             :   /* Vector to record all data dependence relations which are needed
    2238                 :             :      to break strong connected components by runtime alias checks.  */
    2239                 :             :   vec<ddr_p> *alias_ddrs;
    2240                 :             : };
    2241                 :             : 
    2242                 :             : /* Initialize vertice's data for partition dependence graph PG with
    2243                 :             :    PARTITIONS.  */
    2244                 :             : 
    2245                 :             : static void
    2246                 :       10750 : init_partition_graph_vertices (struct graph *pg,
    2247                 :             :                                vec<struct partition *> *partitions)
    2248                 :             : {
    2249                 :       10750 :   int i;
    2250                 :       10750 :   partition *partition;
    2251                 :       10750 :   struct pg_vdata *data;
    2252                 :             : 
    2253                 :       60842 :   for (i = 0; partitions->iterate (i, &partition); ++i)
    2254                 :             :     {
    2255                 :       50092 :       data = new pg_vdata;
    2256                 :       50092 :       pg->vertices[i].data = data;
    2257                 :       50092 :       data->id = i;
    2258                 :       50092 :       data->partition = partition;
    2259                 :             :     }
    2260                 :       10750 : }
    2261                 :             : 
    2262                 :             : /* Add edge <I, J> to partition dependence graph PG.  Attach vector of data
    2263                 :             :    dependence relations to the EDGE if DDRS isn't NULL.  */
    2264                 :             : 
    2265                 :             : static void
    2266                 :       37792 : add_partition_graph_edge (struct graph *pg, int i, int j, vec<ddr_p> *ddrs)
    2267                 :             : {
    2268                 :       37792 :   struct graph_edge *e = add_edge (pg, i, j);
    2269                 :             : 
    2270                 :             :   /* If the edge is attached with data dependence relations, it means this
    2271                 :             :      dependence edge can be resolved by runtime alias checks.  */
    2272                 :       37792 :   if (ddrs != NULL)
    2273                 :             :     {
    2274                 :        8968 :       struct pg_edata *data = new pg_edata;
    2275                 :             : 
    2276                 :        8968 :       gcc_assert (ddrs->length () > 0);
    2277                 :        8968 :       e->data = data;
    2278                 :        8968 :       data->alias_ddrs = vNULL;
    2279                 :        8968 :       data->alias_ddrs.safe_splice (*ddrs);
    2280                 :             :     }
    2281                 :       37792 : }
    2282                 :             : 
    2283                 :             : /* Callback function for graph travesal algorithm.  It returns true
    2284                 :             :    if edge E should skipped when traversing the graph.  */
    2285                 :             : 
    2286                 :             : static bool
    2287                 :         571 : pg_skip_alias_edge (struct graph_edge *e)
    2288                 :             : {
    2289                 :         571 :   struct pg_edata *data = (struct pg_edata *)e->data;
    2290                 :         571 :   return (data != NULL && data->alias_ddrs.length () > 0);
    2291                 :             : }
    2292                 :             : 
    2293                 :             : /* Callback function freeing data attached to edge E of graph.  */
    2294                 :             : 
    2295                 :             : static void
    2296                 :       37792 : free_partition_graph_edata_cb (struct graph *, struct graph_edge *e, void *)
    2297                 :             : {
    2298                 :       37792 :   if (e->data != NULL)
    2299                 :             :     {
    2300                 :        8967 :       struct pg_edata *data = (struct pg_edata *)e->data;
    2301                 :        8967 :       data->alias_ddrs.release ();
    2302                 :        8967 :       delete data;
    2303                 :             :     }
    2304                 :       37792 : }
    2305                 :             : 
    2306                 :             : /* Free data attached to vertice of partition dependence graph PG.  */
    2307                 :             : 
    2308                 :             : static void
    2309                 :       10750 : free_partition_graph_vdata (struct graph *pg)
    2310                 :             : {
    2311                 :       10750 :   int i;
    2312                 :       10750 :   struct pg_vdata *data;
    2313                 :             : 
    2314                 :       60842 :   for (i = 0; i < pg->n_vertices; ++i)
    2315                 :             :     {
    2316                 :       50092 :       data = (struct pg_vdata *)pg->vertices[i].data;
    2317                 :       50092 :       delete data;
    2318                 :             :     }
    2319                 :       10750 : }
    2320                 :             : 
    2321                 :             : /* Build and return partition dependence graph for PARTITIONS.  RDG is
    2322                 :             :    reduced dependence graph for the loop to be distributed.  If IGNORE_ALIAS_P
    2323                 :             :    is true, data dependence caused by possible alias between references
    2324                 :             :    is ignored, as if it doesn't exist at all; otherwise all depdendences
    2325                 :             :    are considered.  */
    2326                 :             : 
    2327                 :             : struct graph *
    2328                 :       10750 : loop_distribution::build_partition_graph (struct graph *rdg,
    2329                 :             :                                           vec<struct partition *> *partitions,
    2330                 :             :                                           bool ignore_alias_p)
    2331                 :             : {
    2332                 :       10750 :   int i, j;
    2333                 :       10750 :   struct partition *partition1, *partition2;
    2334                 :       21500 :   graph *pg = new_graph (partitions->length ());
    2335                 :       10750 :   auto_vec<ddr_p> alias_ddrs, *alias_ddrs_p;
    2336                 :             : 
    2337                 :       10750 :   alias_ddrs_p = ignore_alias_p ? NULL : &alias_ddrs;
    2338                 :             : 
    2339                 :       10750 :   init_partition_graph_vertices (pg, partitions);
    2340                 :             : 
    2341                 :       10750 :   for (i = 0; partitions->iterate (i, &partition1); ++i)
    2342                 :             :     {
    2343                 :     1376003 :       for (j = i + 1; partitions->iterate (j, &partition2); ++j)
    2344                 :             :         {
    2345                 :             :           /* dependence direction - 0 is no dependence, -1 is back,
    2346                 :             :              1 is forth, 2 is both (we can stop then, merging will occur).  */
    2347                 :     1265069 :           int dir = 0;
    2348                 :             : 
    2349                 :             :           /* If the first partition has reduction, add back edge; if the
    2350                 :             :              second partition has reduction, add forth edge.  This makes
    2351                 :             :              sure that reduction partition will be sorted as the last one.  */
    2352                 :     1265069 :           if (partition_reduction_p (partition1))
    2353                 :             :             dir = -1;
    2354                 :     1264928 :           else if (partition_reduction_p (partition2))
    2355                 :        1773 :             dir = 1;
    2356                 :             : 
    2357                 :             :           /* Cleanup the temporary vector.  */
    2358                 :     1265069 :           alias_ddrs.truncate (0);
    2359                 :             : 
    2360                 :     1265069 :           dir = pg_add_dependence_edges (rdg, dir, partition1->datarefs,
    2361                 :             :                                          partition2->datarefs, alias_ddrs_p);
    2362                 :             : 
    2363                 :             :           /* Add edge to partition graph if there exists dependence.  There
    2364                 :             :              are two types of edges.  One type edge is caused by compilation
    2365                 :             :              time known dependence, this type cannot be resolved by runtime
    2366                 :             :              alias check.  The other type can be resolved by runtime alias
    2367                 :             :              check.  */
    2368                 :     1265069 :           if (dir == 1 || dir == 2
    2369                 :     1270311 :               || alias_ddrs.length () > 0)
    2370                 :             :             {
    2371                 :             :               /* Attach data dependence relations to edge that can be resolved
    2372                 :             :                  by runtime alias check.  */
    2373                 :       19781 :               bool alias_edge_p = (dir != 1 && dir != 2);
    2374                 :       35100 :               add_partition_graph_edge (pg, i, j,
    2375                 :             :                                         (alias_edge_p) ? &alias_ddrs : NULL);
    2376                 :             :             }
    2377                 :     1265069 :           if (dir == -1 || dir == 2
    2378                 :     2534644 :               || alias_ddrs.length () > 0)
    2379                 :             :             {
    2380                 :             :               /* Attach data dependence relations to edge that can be resolved
    2381                 :             :                  by runtime alias check.  */
    2382                 :       18011 :               bool alias_edge_p = (dir != -1 && dir != 2);
    2383                 :       31516 :               add_partition_graph_edge (pg, j, i,
    2384                 :             :                                         (alias_edge_p) ? &alias_ddrs : NULL);
    2385                 :             :             }
    2386                 :             :         }
    2387                 :             :     }
    2388                 :       10750 :   return pg;
    2389                 :       10750 : }
    2390                 :             : 
    2391                 :             : /* Sort partitions in PG in descending post order and store them in
    2392                 :             :    PARTITIONS.  */
    2393                 :             : 
    2394                 :             : static void
    2395                 :       10750 : sort_partitions_by_post_order (struct graph *pg,
    2396                 :             :                                vec<struct partition *> *partitions)
    2397                 :             : {
    2398                 :       10750 :   int i;
    2399                 :       10750 :   struct pg_vdata *data;
    2400                 :             : 
    2401                 :             :   /* Now order the remaining nodes in descending postorder.  */
    2402                 :       10750 :   qsort (pg->vertices, pg->n_vertices, sizeof (vertex), pgcmp);
    2403                 :       10750 :   partitions->truncate (0);
    2404                 :       71592 :   for (i = 0; i < pg->n_vertices; ++i)
    2405                 :             :     {
    2406                 :       50092 :       data = (struct pg_vdata *)pg->vertices[i].data;
    2407                 :       50092 :       if (data->partition)
    2408                 :       46700 :         partitions->safe_push (data->partition);
    2409                 :             :     }
    2410                 :       10750 : }
    2411                 :             : 
    2412                 :             : void
    2413                 :        5802 : loop_distribution::merge_dep_scc_partitions (struct graph *rdg,
    2414                 :             :                                              vec<struct partition *> *partitions,
    2415                 :             :                                              bool ignore_alias_p)
    2416                 :             : {
    2417                 :        5802 :   struct partition *partition1, *partition2;
    2418                 :        5802 :   struct pg_vdata *data;
    2419                 :        5802 :   graph *pg = build_partition_graph (rdg, partitions, ignore_alias_p);
    2420                 :        5802 :   int i, j, num_sccs = graphds_scc (pg, NULL);
    2421                 :             : 
    2422                 :             :   /* Strong connected compoenent means dependence cycle, we cannot distribute
    2423                 :             :      them.  So fuse them together.  */
    2424                 :       11604 :   if ((unsigned) num_sccs < partitions->length ())
    2425                 :             :     {
    2426                 :        1384 :       for (i = 0; i < num_sccs; ++i)
    2427                 :             :         {
    2428                 :         756 :           for (j = 0; partitions->iterate (j, &partition1); ++j)
    2429                 :         756 :             if (pg->vertices[j].component == i)
    2430                 :             :               break;
    2431                 :        4307 :           for (j = j + 1; partitions->iterate (j, &partition2); ++j)
    2432                 :        2883 :             if (pg->vertices[j].component == i)
    2433                 :             :               {
    2434                 :        2586 :                 partition_merge_into (NULL, partition1,
    2435                 :             :                                       partition2, FUSE_SAME_SCC);
    2436                 :        2586 :                 partition1->type = PTYPE_SEQUENTIAL;
    2437                 :        2586 :                 (*partitions)[j] = NULL;
    2438                 :        2586 :                 partition_free (partition2);
    2439                 :        2586 :                 data = (struct pg_vdata *)pg->vertices[j].data;
    2440                 :        2586 :                 data->partition = NULL;
    2441                 :             :               }
    2442                 :             :         }
    2443                 :             :     }
    2444                 :             : 
    2445                 :        5802 :   sort_partitions_by_post_order (pg, partitions);
    2446                 :       11604 :   gcc_assert (partitions->length () == (unsigned)num_sccs);
    2447                 :        5802 :   free_partition_graph_vdata (pg);
    2448                 :        5802 :   for_each_edge (pg, free_partition_graph_edata_cb, NULL);
    2449                 :        5802 :   free_graph (pg);
    2450                 :        5802 : }
    2451                 :             : 
    2452                 :             : /* Callback function for traversing edge E in graph G.  DATA is private
    2453                 :             :    callback data.  */
    2454                 :             : 
    2455                 :             : static void
    2456                 :         180 : pg_collect_alias_ddrs (struct graph *g, struct graph_edge *e, void *data)
    2457                 :             : {
    2458                 :         180 :   int i, j, component;
    2459                 :         180 :   struct pg_edge_callback_data *cbdata;
    2460                 :         180 :   struct pg_edata *edata = (struct pg_edata *) e->data;
    2461                 :             : 
    2462                 :             :   /* If the edge doesn't have attached data dependence, it represents
    2463                 :             :      compilation time known dependences.  This type dependence cannot
    2464                 :             :      be resolved by runtime alias check.  */
    2465                 :         180 :   if (edata == NULL || edata->alias_ddrs.length () == 0)
    2466                 :             :     return;
    2467                 :             : 
    2468                 :         172 :   cbdata = (struct pg_edge_callback_data *) data;
    2469                 :         172 :   i = e->src;
    2470                 :         172 :   j = e->dest;
    2471                 :         172 :   component = cbdata->vertices_component[i];
    2472                 :             :   /* Vertices are topologically sorted according to compilation time
    2473                 :             :      known dependences, so we can break strong connected components
    2474                 :             :      by removing edges of the opposite direction, i.e, edges pointing
    2475                 :             :      from vertice with smaller post number to vertice with bigger post
    2476                 :             :      number.  */
    2477                 :         172 :   if (g->vertices[i].post < g->vertices[j].post
    2478                 :             :       /* We only need to remove edges connecting vertices in the same
    2479                 :             :          strong connected component to break it.  */
    2480                 :          88 :       && component == cbdata->vertices_component[j]
    2481                 :             :       /* Check if we want to break the strong connected component or not.  */
    2482                 :         260 :       && !bitmap_bit_p (cbdata->sccs_to_merge, component))
    2483                 :          88 :     cbdata->alias_ddrs->safe_splice (edata->alias_ddrs);
    2484                 :             : }
    2485                 :             : 
    2486                 :             : /* Callback function for traversing edge E.  DATA is private
    2487                 :             :    callback data.  */
    2488                 :             : 
    2489                 :             : static void
    2490                 :         180 : pg_unmark_merged_alias_ddrs (struct graph *, struct graph_edge *e, void *data)
    2491                 :             : {
    2492                 :         180 :   int i, j, component;
    2493                 :         180 :   struct pg_edge_callback_data *cbdata;
    2494                 :         180 :   struct pg_edata *edata = (struct pg_edata *) e->data;
    2495                 :             : 
    2496                 :         180 :   if (edata == NULL || edata->alias_ddrs.length () == 0)
    2497                 :             :     return;
    2498                 :             : 
    2499                 :         173 :   cbdata = (struct pg_edge_callback_data *) data;
    2500                 :         173 :   i = e->src;
    2501                 :         173 :   j = e->dest;
    2502                 :         173 :   component = cbdata->vertices_component[i];
    2503                 :             :   /* Make sure to not skip vertices inside SCCs we are going to merge.  */
    2504                 :         173 :   if (component == cbdata->vertices_component[j]
    2505                 :         173 :       && bitmap_bit_p (cbdata->sccs_to_merge, component))
    2506                 :             :     {
    2507                 :           1 :       edata->alias_ddrs.release ();
    2508                 :           1 :       delete edata;
    2509                 :           1 :       e->data = NULL;
    2510                 :             :     }
    2511                 :             : }
    2512                 :             : 
    2513                 :             : /* This is the main function breaking strong conected components in
    2514                 :             :    PARTITIONS giving reduced depdendence graph RDG.  Store data dependence
    2515                 :             :    relations for runtime alias check in ALIAS_DDRS.  */
    2516                 :             : void
    2517                 :        4948 : loop_distribution::break_alias_scc_partitions (struct graph *rdg,
    2518                 :             :                                                vec<struct partition *> *partitions,
    2519                 :             :                                                vec<ddr_p> *alias_ddrs)
    2520                 :             : {
    2521                 :        4948 :   int i, j, k, num_sccs, num_sccs_no_alias = 0;
    2522                 :             :   /* Build partition dependence graph.  */
    2523                 :        4948 :   graph *pg = build_partition_graph (rdg, partitions, false);
    2524                 :             : 
    2525                 :        4948 :   alias_ddrs->truncate (0);
    2526                 :             :   /* Find strong connected components in the graph, with all dependence edges
    2527                 :             :      considered.  */
    2528                 :        4948 :   num_sccs = graphds_scc (pg, NULL);
    2529                 :             :   /* All SCCs now can be broken by runtime alias checks because SCCs caused by
    2530                 :             :      compilation time known dependences are merged before this function.  */
    2531                 :        9896 :   if ((unsigned) num_sccs < partitions->length ())
    2532                 :             :     {
    2533                 :         230 :       struct pg_edge_callback_data cbdata;
    2534                 :         230 :       auto_bitmap sccs_to_merge;
    2535                 :         230 :       auto_vec<enum partition_type> scc_types;
    2536                 :         230 :       struct partition *partition, *first;
    2537                 :             : 
    2538                 :             :       /* If all partitions in a SCC have the same type, we can simply merge the
    2539                 :             :          SCC.  This loop finds out such SCCS and record them in bitmap.  */
    2540                 :         230 :       bitmap_set_range (sccs_to_merge, 0, (unsigned) num_sccs);
    2541                 :         474 :       for (i = 0; i < num_sccs; ++i)
    2542                 :             :         {
    2543                 :         264 :           for (j = 0; partitions->iterate (j, &first); ++j)
    2544                 :         264 :             if (pg->vertices[j].component == i)
    2545                 :             :               break;
    2546                 :             : 
    2547                 :         244 :           bool same_type = true, all_builtins = partition_builtin_p (first);
    2548                 :        1175 :           for (++j; partitions->iterate (j, &partition); ++j)
    2549                 :             :             {
    2550                 :         956 :               if (pg->vertices[j].component != i)
    2551                 :          88 :                 continue;
    2552                 :             : 
    2553                 :         868 :               if (first->type != partition->type)
    2554                 :             :                 {
    2555                 :             :                   same_type = false;
    2556                 :             :                   break;
    2557                 :             :                 }
    2558                 :         843 :               all_builtins &= partition_builtin_p (partition);
    2559                 :             :             }
    2560                 :             :           /* Merge SCC if all partitions in SCC have the same type, though the
    2561                 :             :              result partition is sequential, because vectorizer can do better
    2562                 :             :              runtime alias check.  One expecption is all partitions in SCC are
    2563                 :             :              builtins.  */
    2564                 :         244 :           if (!same_type || all_builtins)
    2565                 :          51 :             bitmap_clear_bit (sccs_to_merge, i);
    2566                 :             :         }
    2567                 :             : 
    2568                 :             :       /* Initialize callback data for traversing.  */
    2569                 :         230 :       cbdata.sccs_to_merge = sccs_to_merge;
    2570                 :         230 :       cbdata.alias_ddrs = alias_ddrs;
    2571                 :         230 :       cbdata.vertices_component = XNEWVEC (int, pg->n_vertices);
    2572                 :             :       /* Record the component information which will be corrupted by next
    2573                 :             :          graph scc finding call.  */
    2574                 :        1342 :       for (i = 0; i < pg->n_vertices; ++i)
    2575                 :        1112 :         cbdata.vertices_component[i] = pg->vertices[i].component;
    2576                 :             : 
    2577                 :             :       /* Collect data dependences for runtime alias checks to break SCCs.  */
    2578                 :         230 :       if (bitmap_count_bits (sccs_to_merge) != (unsigned) num_sccs)
    2579                 :             :         {
    2580                 :             :           /* For SCCs we want to merge clear all alias_ddrs for edges
    2581                 :             :              inside the component.  */
    2582                 :          47 :           for_each_edge (pg, pg_unmark_merged_alias_ddrs, &cbdata);
    2583                 :             : 
    2584                 :             :           /* Run SCC finding algorithm again, with alias dependence edges
    2585                 :             :              skipped.  This is to topologically sort partitions according to
    2586                 :             :              compilation time known dependence.  Note the topological order
    2587                 :             :              is stored in the form of pg's post order number.  */
    2588                 :          47 :           num_sccs_no_alias = graphds_scc (pg, NULL, pg_skip_alias_edge);
    2589                 :             :           /* We cannot assert partitions->length () == num_sccs_no_alias
    2590                 :             :              since we are not ignoring alias edges in cycles we are
    2591                 :             :              going to merge.  That's required to compute correct postorder.  */
    2592                 :             :           /* With topological order, we can construct two subgraphs L and R.
    2593                 :             :              L contains edge <x, y> where x < y in terms of post order, while
    2594                 :             :              R contains edge <x, y> where x > y.  Edges for compilation time
    2595                 :             :              known dependence all fall in R, so we break SCCs by removing all
    2596                 :             :              (alias) edges of in subgraph L.  */
    2597                 :          47 :           for_each_edge (pg, pg_collect_alias_ddrs, &cbdata);
    2598                 :             :         }
    2599                 :             : 
    2600                 :             :       /* For SCC that doesn't need to be broken, merge it.  */
    2601                 :         474 :       for (i = 0; i < num_sccs; ++i)
    2602                 :             :         {
    2603                 :         244 :           if (!bitmap_bit_p (sccs_to_merge, i))
    2604                 :          51 :             continue;
    2605                 :             : 
    2606                 :         208 :           for (j = 0; partitions->iterate (j, &first); ++j)
    2607                 :         208 :             if (cbdata.vertices_component[j] == i)
    2608                 :             :               break;
    2609                 :        1323 :           for (k = j + 1; partitions->iterate (k, &partition); ++k)
    2610                 :             :             {
    2611                 :         886 :               struct pg_vdata *data;
    2612                 :             : 
    2613                 :         886 :               if (cbdata.vertices_component[k] != i)
    2614                 :          80 :                 continue;
    2615                 :             : 
    2616                 :         806 :               partition_merge_into (NULL, first, partition, FUSE_SAME_SCC);
    2617                 :         806 :               (*partitions)[k] = NULL;
    2618                 :         806 :               partition_free (partition);
    2619                 :         806 :               data = (struct pg_vdata *)pg->vertices[k].data;
    2620                 :         806 :               gcc_assert (data->id == k);
    2621                 :         806 :               data->partition = NULL;
    2622                 :             :               /* The result partition of merged SCC must be sequential.  */
    2623                 :         806 :               first->type = PTYPE_SEQUENTIAL;
    2624                 :             :             }
    2625                 :             :         }
    2626                 :             :       /* If reduction partition's SCC is broken by runtime alias checks,
    2627                 :             :          we force a negative post order to it making sure it will be scheduled
    2628                 :             :          in the last.  */
    2629                 :         230 :       if (num_sccs_no_alias > 0)
    2630                 :             :         {
    2631                 :             :           j = -1;
    2632                 :         162 :           for (i = 0; i < pg->n_vertices; ++i)
    2633                 :             :             {
    2634                 :         115 :               struct pg_vdata *data = (struct pg_vdata *)pg->vertices[i].data;
    2635                 :         115 :               if (data->partition && partition_reduction_p (data->partition))
    2636                 :             :                 {
    2637                 :           4 :                   gcc_assert (j == -1);
    2638                 :             :                   j = i;
    2639                 :             :                 }
    2640                 :             :             }
    2641                 :          47 :           if (j >= 0)
    2642                 :           4 :             pg->vertices[j].post = -1;
    2643                 :             :         }
    2644                 :             : 
    2645                 :         230 :       free (cbdata.vertices_component);
    2646                 :         230 :     }
    2647                 :             : 
    2648                 :        4948 :   sort_partitions_by_post_order (pg, partitions);
    2649                 :        4948 :   free_partition_graph_vdata (pg);
    2650                 :        4948 :   for_each_edge (pg, free_partition_graph_edata_cb, NULL);
    2651                 :        4948 :   free_graph (pg);
    2652                 :             : 
    2653                 :        4948 :   if (dump_file && (dump_flags & TDF_DETAILS))
    2654                 :             :     {
    2655                 :          15 :       fprintf (dump_file, "Possible alias data dependence to break:\n");
    2656                 :          15 :       dump_data_dependence_relations (dump_file, *alias_ddrs);
    2657                 :             :     }
    2658                 :        4948 : }
    2659                 :             : 
    2660                 :             : /* Compute and return an expression whose value is the segment length which
    2661                 :             :    will be accessed by DR in NITERS iterations.  */
    2662                 :             : 
    2663                 :             : static tree
    2664                 :        1332 : data_ref_segment_size (struct data_reference *dr, tree niters)
    2665                 :             : {
    2666                 :        1332 :   niters = size_binop (MINUS_EXPR,
    2667                 :             :                        fold_convert (sizetype, niters),
    2668                 :             :                        size_one_node);
    2669                 :        1332 :   return size_binop (MULT_EXPR,
    2670                 :             :                      fold_convert (sizetype, DR_STEP (dr)),
    2671                 :             :                      fold_convert (sizetype, niters));
    2672                 :             : }
    2673                 :             : 
    2674                 :             : /* Return true if LOOP's latch is dominated by statement for data reference
    2675                 :             :    DR.  */
    2676                 :             : 
    2677                 :             : static inline bool
    2678                 :        1332 : latch_dominated_by_data_ref (class loop *loop, data_reference *dr)
    2679                 :             : {
    2680                 :        1332 :   return dominated_by_p (CDI_DOMINATORS, single_exit (loop)->src,
    2681                 :        1332 :                          gimple_bb (DR_STMT (dr)));
    2682                 :             : }
    2683                 :             : 
    2684                 :             : /* Compute alias check pairs and store them in COMP_ALIAS_PAIRS for LOOP's
    2685                 :             :    data dependence relations ALIAS_DDRS.  */
    2686                 :             : 
    2687                 :             : static void
    2688                 :          47 : compute_alias_check_pairs (class loop *loop, vec<ddr_p> *alias_ddrs,
    2689                 :             :                            vec<dr_with_seg_len_pair_t> *comp_alias_pairs)
    2690                 :             : {
    2691                 :          47 :   unsigned int i;
    2692                 :          47 :   unsigned HOST_WIDE_INT factor = 1;
    2693                 :          47 :   tree niters_plus_one, niters = number_of_latch_executions (loop);
    2694                 :             : 
    2695                 :          47 :   gcc_assert (niters != NULL_TREE && niters != chrec_dont_know);
    2696                 :          47 :   niters = fold_convert (sizetype, niters);
    2697                 :          47 :   niters_plus_one = size_binop (PLUS_EXPR, niters, size_one_node);
    2698                 :             : 
    2699                 :          47 :   if (dump_file && (dump_flags & TDF_DETAILS))
    2700                 :           0 :     fprintf (dump_file, "Creating alias check pairs:\n");
    2701                 :             : 
    2702                 :             :   /* Iterate all data dependence relations and compute alias check pairs.  */
    2703                 :        1426 :   for (i = 0; i < alias_ddrs->length (); i++)
    2704                 :             :     {
    2705                 :         666 :       ddr_p ddr = (*alias_ddrs)[i];
    2706                 :         666 :       struct data_reference *dr_a = DDR_A (ddr);
    2707                 :         666 :       struct data_reference *dr_b = DDR_B (ddr);
    2708                 :         666 :       tree seg_length_a, seg_length_b;
    2709                 :             : 
    2710                 :         666 :       if (latch_dominated_by_data_ref (loop, dr_a))
    2711                 :         662 :         seg_length_a = data_ref_segment_size (dr_a, niters_plus_one);
    2712                 :             :       else
    2713                 :           4 :         seg_length_a = data_ref_segment_size (dr_a, niters);
    2714                 :             : 
    2715                 :         666 :       if (latch_dominated_by_data_ref (loop, dr_b))
    2716                 :         662 :         seg_length_b = data_ref_segment_size (dr_b, niters_plus_one);
    2717                 :             :       else
    2718                 :           4 :         seg_length_b = data_ref_segment_size (dr_b, niters);
    2719                 :             : 
    2720                 :         666 :       unsigned HOST_WIDE_INT access_size_a
    2721                 :         666 :         = tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr_a))));
    2722                 :         666 :       unsigned HOST_WIDE_INT access_size_b
    2723                 :         666 :         = tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr_b))));
    2724                 :         666 :       unsigned int align_a = TYPE_ALIGN_UNIT (TREE_TYPE (DR_REF (dr_a)));
    2725                 :         666 :       unsigned int align_b = TYPE_ALIGN_UNIT (TREE_TYPE (DR_REF (dr_b)));
    2726                 :             : 
    2727                 :         666 :       dr_with_seg_len_pair_t dr_with_seg_len_pair
    2728                 :        1332 :         (dr_with_seg_len (dr_a, seg_length_a, access_size_a, align_a),
    2729                 :        1332 :          dr_with_seg_len (dr_b, seg_length_b, access_size_b, align_b),
    2730                 :             :          /* ??? Would WELL_ORDERED be safe?  */
    2731                 :         666 :          dr_with_seg_len_pair_t::REORDERED);
    2732                 :             : 
    2733                 :         666 :       comp_alias_pairs->safe_push (dr_with_seg_len_pair);
    2734                 :             :     }
    2735                 :             : 
    2736                 :          47 :   if (tree_fits_uhwi_p (niters))
    2737                 :          30 :     factor = tree_to_uhwi (niters);
    2738                 :             : 
    2739                 :             :   /* Prune alias check pairs.  */
    2740                 :          47 :   prune_runtime_alias_test_list (comp_alias_pairs, factor);
    2741                 :          47 :   if (dump_file && (dump_flags & TDF_DETAILS))
    2742                 :           0 :     fprintf (dump_file,
    2743                 :             :              "Improved number of alias checks from %d to %d\n",
    2744                 :             :              alias_ddrs->length (), comp_alias_pairs->length ());
    2745                 :          47 : }
    2746                 :             : 
    2747                 :             : /* Given data dependence relations in ALIAS_DDRS, generate runtime alias
    2748                 :             :    checks and version LOOP under condition of these runtime alias checks.  */
    2749                 :             : 
    2750                 :             : static void
    2751                 :          47 : version_loop_by_alias_check (vec<struct partition *> *partitions,
    2752                 :             :                              class loop *loop, vec<ddr_p> *alias_ddrs)
    2753                 :             : {
    2754                 :          47 :   profile_probability prob;
    2755                 :          47 :   basic_block cond_bb;
    2756                 :          47 :   class loop *nloop;
    2757                 :          47 :   tree lhs, arg0, cond_expr = NULL_TREE;
    2758                 :          47 :   gimple_seq cond_stmts = NULL;
    2759                 :          47 :   gimple *call_stmt = NULL;
    2760                 :          47 :   auto_vec<dr_with_seg_len_pair_t> comp_alias_pairs;
    2761                 :             : 
    2762                 :             :   /* Generate code for runtime alias checks if necessary.  */
    2763                 :          47 :   gcc_assert (alias_ddrs->length () > 0);
    2764                 :             : 
    2765                 :          47 :   if (dump_file && (dump_flags & TDF_DETAILS))
    2766                 :           0 :     fprintf (dump_file,
    2767                 :             :              "Version loop <%d> with runtime alias check\n", loop->num);
    2768                 :             : 
    2769                 :          47 :   compute_alias_check_pairs (loop, alias_ddrs, &comp_alias_pairs);
    2770                 :          47 :   create_runtime_alias_checks (loop, &comp_alias_pairs, &cond_expr);
    2771                 :          47 :   cond_expr = force_gimple_operand_1 (cond_expr, &cond_stmts,
    2772                 :             :                                       is_gimple_val, NULL_TREE);
    2773                 :             : 
    2774                 :             :   /* Depend on vectorizer to fold IFN_LOOP_DIST_ALIAS.  */
    2775                 :          47 :   bool cancelable_p = flag_tree_loop_vectorize;
    2776                 :          47 :   if (cancelable_p)
    2777                 :             :     {
    2778                 :             :       unsigned i = 0;
    2779                 :             :       struct partition *partition;
    2780                 :          92 :       for (; partitions->iterate (i, &partition); ++i)
    2781                 :          74 :         if (!partition_builtin_p (partition))
    2782                 :             :           break;
    2783                 :             : 
    2784                 :             :      /* If all partitions are builtins, distributing it would be profitable and
    2785                 :             :         we don't want to cancel the runtime alias checks.  */
    2786                 :          86 :       if (i == partitions->length ())
    2787                 :             :         cancelable_p = false;
    2788                 :             :     }
    2789                 :             : 
    2790                 :             :   /* Generate internal function call for loop distribution alias check if the
    2791                 :             :      runtime alias check should be cancelable.  */
    2792                 :          29 :   if (cancelable_p)
    2793                 :             :     {
    2794                 :          25 :       call_stmt = gimple_build_call_internal (IFN_LOOP_DIST_ALIAS,
    2795                 :             :                                               2, NULL_TREE, cond_expr);
    2796                 :          25 :       lhs = make_ssa_name (boolean_type_node);
    2797                 :          25 :       gimple_call_set_lhs (call_stmt, lhs);
    2798                 :             :     }
    2799                 :             :   else
    2800                 :             :     lhs = cond_expr;
    2801                 :             : 
    2802                 :          47 :   prob = profile_probability::guessed_always ().apply_scale (9, 10);
    2803                 :          47 :   initialize_original_copy_tables ();
    2804                 :          47 :   nloop = loop_version (loop, lhs, &cond_bb, prob, prob.invert (),
    2805                 :             :                         prob, prob.invert (), true);
    2806                 :          47 :   free_original_copy_tables ();
    2807                 :             :   /* Record the original loop number in newly generated loops.  In case of
    2808                 :             :      distribution, the original loop will be distributed and the new loop
    2809                 :             :      is kept.  */
    2810                 :          47 :   loop->orig_loop_num = nloop->num;
    2811                 :          47 :   nloop->orig_loop_num = nloop->num;
    2812                 :          47 :   nloop->dont_vectorize = true;
    2813                 :          47 :   nloop->force_vectorize = false;
    2814                 :             : 
    2815                 :          47 :   if (call_stmt)
    2816                 :             :     {
    2817                 :             :       /* Record new loop's num in IFN_LOOP_DIST_ALIAS because the original
    2818                 :             :          loop could be destroyed.  */
    2819                 :          25 :       arg0 = build_int_cst (integer_type_node, loop->orig_loop_num);
    2820                 :          25 :       gimple_call_set_arg (call_stmt, 0, arg0);
    2821                 :          25 :       gimple_seq_add_stmt_without_update (&cond_stmts, call_stmt);
    2822                 :             :     }
    2823                 :             : 
    2824                 :          47 :   if (cond_stmts)
    2825                 :             :     {
    2826                 :          47 :       gimple_stmt_iterator cond_gsi = gsi_last_bb (cond_bb);
    2827                 :          47 :       gsi_insert_seq_before (&cond_gsi, cond_stmts, GSI_SAME_STMT);
    2828                 :             :     }
    2829                 :          47 :   update_ssa (TODO_update_ssa_no_phi);
    2830                 :          47 : }
    2831                 :             : 
    2832                 :             : /* Return true if loop versioning is needed to distrubute PARTITIONS.
    2833                 :             :    ALIAS_DDRS are data dependence relations for runtime alias check.  */
    2834                 :             : 
    2835                 :             : static inline bool
    2836                 :       10646 : version_for_distribution_p (vec<struct partition *> *partitions,
    2837                 :             :                             vec<ddr_p> *alias_ddrs)
    2838                 :             : {
    2839                 :             :   /* No need to version loop if we have only one partition.  */
    2840                 :       12889 :   if (partitions->length () == 1)
    2841                 :             :     return false;
    2842                 :             : 
    2843                 :             :   /* Need to version loop if runtime alias check is necessary.  */
    2844                 :        2243 :   return (alias_ddrs->length () > 0);
    2845                 :             : }
    2846                 :             : 
    2847                 :             : /* Compare base offset of builtin mem* partitions P1 and P2.  */
    2848                 :             : 
    2849                 :             : static int
    2850                 :         291 : offset_cmp (const void *vp1, const void *vp2)
    2851                 :             : {
    2852                 :         291 :   struct partition *p1 = *(struct partition *const *) vp1;
    2853                 :         291 :   struct partition *p2 = *(struct partition *const *) vp2;
    2854                 :         291 :   unsigned HOST_WIDE_INT o1 = p1->builtin->dst_base_offset;
    2855                 :         291 :   unsigned HOST_WIDE_INT o2 = p2->builtin->dst_base_offset;
    2856                 :         291 :   return (o2 < o1) - (o1 < o2);
    2857                 :             : }
    2858                 :             : 
    2859                 :             : /* Fuse adjacent memset builtin PARTITIONS if possible.  This is a special
    2860                 :             :    case optimization transforming below code:
    2861                 :             : 
    2862                 :             :      __builtin_memset (&obj, 0, 100);
    2863                 :             :      _1 = &obj + 100;
    2864                 :             :      __builtin_memset (_1, 0, 200);
    2865                 :             :      _2 = &obj + 300;
    2866                 :             :      __builtin_memset (_2, 0, 100);
    2867                 :             : 
    2868                 :             :    into:
    2869                 :             : 
    2870                 :             :      __builtin_memset (&obj, 0, 400);
    2871                 :             : 
    2872                 :             :    Note we don't have dependence information between different partitions
    2873                 :             :    at this point, as a result, we can't handle nonadjacent memset builtin
    2874                 :             :    partitions since dependence might be broken.  */
    2875                 :             : 
    2876                 :             : static void
    2877                 :        2263 : fuse_memset_builtins (vec<struct partition *> *partitions)
    2878                 :             : {
    2879                 :        2263 :   unsigned i, j;
    2880                 :        2263 :   struct partition *part1, *part2;
    2881                 :        2263 :   tree rhs1, rhs2;
    2882                 :             : 
    2883                 :        7201 :   for (i = 0; partitions->iterate (i, &part1);)
    2884                 :             :     {
    2885                 :        4938 :       if (part1->kind != PKIND_MEMSET)
    2886                 :             :         {
    2887                 :        2952 :           i++;
    2888                 :        2952 :           continue;
    2889                 :             :         }
    2890                 :             : 
    2891                 :             :       /* Find sub-array of memset builtins of the same base.  Index range
    2892                 :             :          of the sub-array is [i, j) with "j > i".  */
    2893                 :        2039 :       for (j = i + 1; partitions->iterate (j, &part2); ++j)
    2894                 :             :         {
    2895                 :        1541 :           if (part2->kind != PKIND_MEMSET
    2896                 :        2059 :               || !operand_equal_p (part1->builtin->dst_base_base,
    2897                 :         518 :                                    part2->builtin->dst_base_base, 0))
    2898                 :             :             break;
    2899                 :             : 
    2900                 :             :           /* Memset calls setting different values can't be merged.  */
    2901                 :          81 :           rhs1 = gimple_assign_rhs1 (DR_STMT (part1->builtin->dst_dr));
    2902                 :          81 :           rhs2 = gimple_assign_rhs1 (DR_STMT (part2->builtin->dst_dr));
    2903                 :          81 :           if (!operand_equal_p (rhs1, rhs2, 0))
    2904                 :             :             break;
    2905                 :             :         }
    2906                 :             : 
    2907                 :             :       /* Stable sort is required in order to avoid breaking dependence.  */
    2908                 :        1986 :       gcc_stablesort (&(*partitions)[i], j - i, sizeof (*partitions)[i],
    2909                 :             :                       offset_cmp);
    2910                 :             :       /* Continue with next partition.  */
    2911                 :        1986 :       i = j;
    2912                 :             :     }
    2913                 :             : 
    2914                 :             :   /* Merge all consecutive memset builtin partitions.  */
    2915                 :        9982 :   for (i = 0; i < partitions->length () - 1;)
    2916                 :             :     {
    2917                 :        2728 :       part1 = (*partitions)[i];
    2918                 :        2728 :       if (part1->kind != PKIND_MEMSET)
    2919                 :             :         {
    2920                 :        1187 :           i++;
    2921                 :        2701 :           continue;
    2922                 :             :         }
    2923                 :             : 
    2924                 :        1541 :       part2 = (*partitions)[i + 1];
    2925                 :             :       /* Only merge memset partitions of the same base and with constant
    2926                 :             :          access sizes.  */
    2927                 :        3009 :       if (part2->kind != PKIND_MEMSET
    2928                 :         518 :           || TREE_CODE (part1->builtin->size) != INTEGER_CST
    2929                 :         250 :           || TREE_CODE (part2->builtin->size) != INTEGER_CST
    2930                 :        1791 :           || !operand_equal_p (part1->builtin->dst_base_base,
    2931                 :         250 :                                part2->builtin->dst_base_base, 0))
    2932                 :             :         {
    2933                 :        1468 :           i++;
    2934                 :        1468 :           continue;
    2935                 :             :         }
    2936                 :          73 :       rhs1 = gimple_assign_rhs1 (DR_STMT (part1->builtin->dst_dr));
    2937                 :          73 :       rhs2 = gimple_assign_rhs1 (DR_STMT (part2->builtin->dst_dr));
    2938                 :          73 :       int bytev1 = const_with_all_bytes_same (rhs1);
    2939                 :          73 :       int bytev2 = const_with_all_bytes_same (rhs2);
    2940                 :             :       /* Only merge memset partitions of the same value.  */
    2941                 :          73 :       if (bytev1 != bytev2 || bytev1 == -1)
    2942                 :             :         {
    2943                 :          24 :           i++;
    2944                 :          24 :           continue;
    2945                 :             :         }
    2946                 :          98 :       wide_int end1 = wi::add (part1->builtin->dst_base_offset,
    2947                 :          49 :                                wi::to_wide (part1->builtin->size));
    2948                 :             :       /* Only merge adjacent memset partitions.  */
    2949                 :          49 :       if (wi::ne_p (end1, part2->builtin->dst_base_offset))
    2950                 :             :         {
    2951                 :          22 :           i++;
    2952                 :          22 :           continue;
    2953                 :             :         }
    2954                 :             :       /* Merge partitions[i] and partitions[i+1].  */
    2955                 :          27 :       part1->builtin->size = fold_build2 (PLUS_EXPR, sizetype,
    2956                 :             :                                           part1->builtin->size,
    2957                 :             :                                           part2->builtin->size);
    2958                 :          27 :       partition_free (part2);
    2959                 :          27 :       partitions->ordered_remove (i + 1);
    2960                 :          49 :     }
    2961                 :        2263 : }
    2962                 :             : 
    2963                 :             : void
    2964                 :       32437 : loop_distribution::finalize_partitions (class loop *loop,
    2965                 :             :                                         vec<struct partition *> *partitions,
    2966                 :             :                                         vec<ddr_p> *alias_ddrs)
    2967                 :             : {
    2968                 :       32437 :   unsigned i;
    2969                 :       32437 :   struct partition *partition, *a;
    2970                 :             : 
    2971                 :       37433 :   if (partitions->length () == 1
    2972                 :       32484 :       || alias_ddrs->length () > 0)
    2973                 :       32437 :     return;
    2974                 :             : 
    2975                 :        4949 :   unsigned num_builtin = 0, num_normal = 0, num_partial_memset = 0;
    2976                 :        4949 :   bool same_type_p = true;
    2977                 :        4949 :   enum partition_type type = ((*partitions)[0])->type;
    2978                 :       28060 :   for (i = 0; partitions->iterate (i, &partition); ++i)
    2979                 :             :     {
    2980                 :       23111 :       same_type_p &= (type == partition->type);
    2981                 :       23111 :       if (partition_builtin_p (partition))
    2982                 :             :         {
    2983                 :        2404 :           num_builtin++;
    2984                 :        2404 :           continue;
    2985                 :             :         }
    2986                 :       20707 :       num_normal++;
    2987                 :       20707 :       if (partition->kind == PKIND_PARTIAL_MEMSET)
    2988                 :           4 :         num_partial_memset++;
    2989                 :             :     }
    2990                 :             : 
    2991                 :             :   /* Don't distribute current loop into too many loops given we don't have
    2992                 :             :      memory stream cost model.  Be even more conservative in case of loop
    2993                 :             :      nest distribution.  */
    2994                 :        4949 :   if ((same_type_p && num_builtin == 0
    2995                 :        2666 :        && (loop->inner == NULL || num_normal != 2 || num_partial_memset != 1))
    2996                 :        2287 :       || (loop->inner != NULL
    2997                 :          57 :           && i >= NUM_PARTITION_THRESHOLD && num_normal > 1)
    2998                 :        2280 :       || (loop->inner == NULL
    2999                 :        2230 :           && i >= NUM_PARTITION_THRESHOLD && num_normal > num_builtin))
    3000                 :             :     {
    3001                 :             :       a = (*partitions)[0];
    3002                 :       18120 :       for (i = 1; partitions->iterate (i, &partition); ++i)
    3003                 :             :         {
    3004                 :       15434 :           partition_merge_into (NULL, a, partition, FUSE_FINALIZE);
    3005                 :       15434 :           partition_free (partition);
    3006                 :             :         }
    3007                 :        2686 :       partitions->truncate (1);
    3008                 :             :     }
    3009                 :             : 
    3010                 :             :   /* Fuse memset builtins if possible.  */
    3011                 :        4949 :   if (partitions->length () > 1)
    3012                 :        2263 :     fuse_memset_builtins (partitions);
    3013                 :             : }
    3014                 :             : 
    3015                 :             : /* Distributes the code from LOOP in such a way that producer statements
    3016                 :             :    are placed before consumer statements.  Tries to separate only the
    3017                 :             :    statements from STMTS into separate loops.  Returns the number of
    3018                 :             :    distributed loops.  Set NB_CALLS to number of generated builtin calls.
    3019                 :             :    Set *DESTROY_P to whether LOOP needs to be destroyed.  */
    3020                 :             : 
    3021                 :             : int
    3022                 :      116963 : loop_distribution::distribute_loop (class loop *loop,
    3023                 :             :                  const vec<gimple *> &stmts,
    3024                 :             :                  control_dependences *cd, int *nb_calls, bool *destroy_p,
    3025                 :             :                  bool only_patterns_p)
    3026                 :             : {
    3027                 :      116963 :   ddrs_table = new hash_table<ddr_hasher> (389);
    3028                 :      116963 :   struct graph *rdg;
    3029                 :      116963 :   partition *partition;
    3030                 :      116963 :   int i, nbp;
    3031                 :             : 
    3032                 :      116963 :   *destroy_p = false;
    3033                 :      116963 :   *nb_calls = 0;
    3034                 :      116963 :   loop_nest.create (0);
    3035                 :      116963 :   if (!find_loop_nest (loop, &loop_nest))
    3036                 :             :     {
    3037                 :           0 :       loop_nest.release ();
    3038                 :           0 :       delete ddrs_table;
    3039                 :           0 :       return 0;
    3040                 :             :     }
    3041                 :             : 
    3042                 :      116963 :   datarefs_vec.create (20);
    3043                 :      116963 :   has_nonaddressable_dataref_p = false;
    3044                 :      116963 :   rdg = build_rdg (loop, cd);
    3045                 :      116963 :   if (!rdg)
    3046                 :             :     {
    3047                 :        2310 :       if (dump_file && (dump_flags & TDF_DETAILS))
    3048                 :           0 :         fprintf (dump_file,
    3049                 :             :                  "Loop %d not distributed: failed to build the RDG.\n",
    3050                 :             :                  loop->num);
    3051                 :             : 
    3052                 :        2310 :       loop_nest.release ();
    3053                 :        2310 :       free_data_refs (datarefs_vec);
    3054                 :        2310 :       delete ddrs_table;
    3055                 :        2310 :       return 0;
    3056                 :             :     }
    3057                 :             : 
    3058                 :      229306 :   if (datarefs_vec.length () > MAX_DATAREFS_NUM)
    3059                 :             :     {
    3060                 :           0 :       if (dump_file && (dump_flags & TDF_DETAILS))
    3061                 :           0 :         fprintf (dump_file,
    3062                 :             :                  "Loop %d not distributed: too many memory references.\n",
    3063                 :             :                  loop->num);
    3064                 :             : 
    3065                 :           0 :       free_rdg (rdg);
    3066                 :           0 :       loop_nest.release ();
    3067                 :           0 :       free_data_refs (datarefs_vec);
    3068                 :           0 :       delete ddrs_table;
    3069                 :           0 :       return 0;
    3070                 :             :     }
    3071                 :             : 
    3072                 :             :   data_reference_p dref;
    3073                 :      469907 :   for (i = 0; datarefs_vec.iterate (i, &dref); ++i)
    3074                 :      355254 :     dref->aux = (void *) (uintptr_t) i;
    3075                 :             : 
    3076                 :      114653 :   if (dump_file && (dump_flags & TDF_DETAILS))
    3077                 :          67 :     dump_rdg (dump_file, rdg);
    3078                 :             : 
    3079                 :      114653 :   auto_vec<struct partition *, 3> partitions;
    3080                 :      114653 :   rdg_build_partitions (rdg, stmts, &partitions);
    3081                 :             : 
    3082                 :      114653 :   auto_vec<ddr_p> alias_ddrs;
    3083                 :             : 
    3084                 :      114653 :   auto_bitmap stmt_in_all_partitions;
    3085                 :      114653 :   bitmap_copy (stmt_in_all_partitions, partitions[0]->stmts);
    3086                 :      318738 :   for (i = 1; partitions.iterate (i, &partition); ++i)
    3087                 :       89432 :     bitmap_and_into (stmt_in_all_partitions, partitions[i]->stmts);
    3088                 :             : 
    3089                 :             :   bool any_builtin = false;
    3090                 :             :   bool reduction_in_all = false;
    3091                 :      318738 :   int reduction_partition_num = -1;
    3092                 :      318738 :   FOR_EACH_VEC_ELT (partitions, i, partition)
    3093                 :             :     {
    3094                 :      204085 :       reduction_in_all
    3095                 :      204085 :         |= classify_partition (loop, rdg, partition, stmt_in_all_partitions);
    3096                 :      204085 :       any_builtin |= partition_builtin_p (partition);
    3097                 :             :     }
    3098                 :             : 
    3099                 :             :   /* If we are only distributing patterns but did not detect any,
    3100                 :             :      simply bail out.  */
    3101                 :      114653 :   if (only_patterns_p
    3102                 :      114653 :       && !any_builtin)
    3103                 :             :     {
    3104                 :       82216 :       nbp = 0;
    3105                 :       82216 :       goto ldist_done;
    3106                 :             :     }
    3107                 :             : 
    3108                 :             :   /* If we are only distributing patterns fuse all partitions that
    3109                 :             :      were not classified as builtins.  This also avoids chopping
    3110                 :             :      a loop into pieces, separated by builtin calls.  That is, we
    3111                 :             :      only want no or a single loop body remaining.  */
    3112                 :       32437 :   struct partition *into;
    3113                 :       32437 :   if (only_patterns_p)
    3114                 :             :     {
    3115                 :       15575 :       for (i = 0; partitions.iterate (i, &into); ++i)
    3116                 :        9229 :         if (!partition_builtin_p (into))
    3117                 :             :           break;
    3118                 :       10183 :       for (++i; partitions.iterate (i, &partition); ++i)
    3119                 :        2525 :         if (!partition_builtin_p (partition))
    3120                 :             :           {
    3121                 :        1996 :             partition_merge_into (NULL, into, partition, FUSE_NON_BUILTIN);
    3122                 :        1996 :             partitions.unordered_remove (i);
    3123                 :        1996 :             partition_free (partition);
    3124                 :        1996 :             i--;
    3125                 :             :           }
    3126                 :             :     }
    3127                 :             : 
    3128                 :             :   /* Due to limitations in the transform phase we have to fuse all
    3129                 :             :      reduction partitions into the last partition so the existing
    3130                 :             :      loop will contain all loop-closed PHI nodes.  */
    3131                 :       90506 :   for (i = 0; partitions.iterate (i, &into); ++i)
    3132                 :       59700 :     if (partition_reduction_p (into))
    3133                 :             :       break;
    3134                 :       34052 :   for (i = i + 1; partitions.iterate (i, &partition); ++i)
    3135                 :        1615 :     if (partition_reduction_p (partition))
    3136                 :             :       {
    3137                 :        1408 :         partition_merge_into (rdg, into, partition, FUSE_REDUCTION);
    3138                 :        1408 :         partitions.unordered_remove (i);
    3139                 :        1408 :         partition_free (partition);
    3140                 :        1408 :         i--;
    3141                 :             :       }
    3142                 :             : 
    3143                 :             :   /* Apply our simple cost model - fuse partitions with similar
    3144                 :             :      memory accesses.  */
    3145                 :       89451 :   for (i = 0; partitions.iterate (i, &into); ++i)
    3146                 :             :     {
    3147                 :       57014 :       bool changed = false;
    3148                 :      714366 :       for (int j = i + 1; partitions.iterate (j, &partition); ++j)
    3149                 :             :         {
    3150                 :      657352 :           if (share_memory_accesses (rdg, into, partition))
    3151                 :             :             {
    3152                 :        5849 :               partition_merge_into (rdg, into, partition, FUSE_SHARE_REF);
    3153                 :        5849 :               partitions.unordered_remove (j);
    3154                 :        5849 :               partition_free (partition);
    3155                 :        5849 :               j--;
    3156                 :        5849 :               changed = true;
    3157                 :             :             }
    3158                 :             :         }
    3159                 :             :       /* If we fused 0 1 2 in step 1 to 0,2 1 as 0 and 2 have similar
    3160                 :             :          accesses when 1 and 2 have similar accesses but not 0 and 1
    3161                 :             :          then in the next iteration we will fail to consider merging
    3162                 :             :          1 into 0,2.  So try again if we did any merging into 0.  */
    3163                 :       57014 :       if (changed)
    3164                 :        2956 :         i--;
    3165                 :             :     }
    3166                 :             : 
    3167                 :             :   /* Put a non-builtin partition last if we need to preserve a reduction.
    3168                 :             :      In most cases this helps to keep a normal partition last avoiding to
    3169                 :             :      spill a reduction result across builtin calls.
    3170                 :             :      ???  The proper way would be to use dependences to see whether we
    3171                 :             :      can move builtin partitions earlier during merge_dep_scc_partitions
    3172                 :             :      and its sort_partitions_by_post_order.  Especially when the
    3173                 :             :      dependence graph is composed of multiple independent subgraphs the
    3174                 :             :      heuristic does not work reliably.  */
    3175                 :       32437 :   if (reduction_in_all
    3176                 :       37590 :       && partition_builtin_p (partitions.last()))
    3177                 :          29 :     FOR_EACH_VEC_ELT (partitions, i, partition)
    3178                 :          19 :       if (!partition_builtin_p (partition))
    3179                 :             :         {
    3180                 :           8 :           partitions.unordered_remove (i);
    3181                 :           8 :           partitions.quick_push (partition);
    3182                 :           8 :           break;
    3183                 :             :         }
    3184                 :             : 
    3185                 :             :   /* Build the partition dependency graph and fuse partitions in strong
    3186                 :             :      connected component.  */
    3187                 :       32437 :   if (partitions.length () > 1)
    3188                 :             :     {
    3189                 :             :       /* Don't support loop nest distribution under runtime alias check
    3190                 :             :          since it's not likely to enable many vectorization opportunities.
    3191                 :             :          Also if loop has any data reference which may be not addressable
    3192                 :             :          since alias check needs to take, compare address of the object.  */
    3193                 :        5802 :       if (loop->inner || has_nonaddressable_dataref_p)
    3194                 :         327 :         merge_dep_scc_partitions (rdg, &partitions, false);
    3195                 :             :       else
    3196                 :             :         {
    3197                 :        5475 :           merge_dep_scc_partitions (rdg, &partitions, true);
    3198                 :        5475 :           if (partitions.length () > 1)
    3199                 :        4948 :             break_alias_scc_partitions (rdg, &partitions, &alias_ddrs);
    3200                 :             :         }
    3201                 :             :     }
    3202                 :             : 
    3203                 :       32437 :   finalize_partitions (loop, &partitions, &alias_ddrs);
    3204                 :             : 
    3205                 :             :   /* If there is a reduction in all partitions make sure the last
    3206                 :             :      non-builtin partition provides the LC PHI defs.  */
    3207                 :       32437 :   if (reduction_in_all)
    3208                 :             :     {
    3209                 :       10334 :       FOR_EACH_VEC_ELT (partitions, i, partition)
    3210                 :        5181 :         if (!partition_builtin_p (partition))
    3211                 :        5162 :           reduction_partition_num = i;
    3212                 :        5153 :       if (reduction_partition_num == -1)
    3213                 :             :         {
    3214                 :             :           /* If all partitions are builtin, force the last one to
    3215                 :             :              be code generated as normal partition.  */
    3216                 :          10 :           partition = partitions.last ();
    3217                 :          10 :           partition->kind = PKIND_NORMAL;
    3218                 :             :         }
    3219                 :             :     }
    3220                 :             : 
    3221                 :       32437 :   nbp = partitions.length ();
    3222                 :       32437 :   if (nbp == 0
    3223                 :       62571 :       || (nbp == 1 && !partition_builtin_p (partitions[0]))
    3224                 :       43143 :       || (nbp > 1 && partition_contains_all_rw (rdg, partitions)))
    3225                 :             :     {
    3226                 :       21791 :       nbp = 0;
    3227                 :       21791 :       goto ldist_done;
    3228                 :             :     }
    3229                 :             : 
    3230                 :       10693 :   if (version_for_distribution_p (&partitions, &alias_ddrs))
    3231                 :          47 :     version_loop_by_alias_check (&partitions, loop, &alias_ddrs);
    3232                 :             : 
    3233                 :       10646 :   if (dump_file && (dump_flags & TDF_DETAILS))
    3234                 :             :     {
    3235                 :          39 :       fprintf (dump_file,
    3236                 :             :                "distribute loop <%d> into partitions:\n", loop->num);
    3237                 :          39 :       dump_rdg_partitions (dump_file, partitions);
    3238                 :             :     }
    3239                 :             : 
    3240                 :       24000 :   FOR_EACH_VEC_ELT (partitions, i, partition)
    3241                 :             :     {
    3242                 :       13354 :       if (partition_builtin_p (partition))
    3243                 :       10813 :         (*nb_calls)++;
    3244                 :       13354 :       *destroy_p |= generate_code_for_partition (loop, partition, i < nbp - 1,
    3245                 :             :                                                  i == reduction_partition_num);
    3246                 :             :     }
    3247                 :             : 
    3248                 :      114653 :  ldist_done:
    3249                 :      114653 :   loop_nest.release ();
    3250                 :      114653 :   free_data_refs (datarefs_vec);
    3251                 :      114653 :   for (hash_table<ddr_hasher>::iterator iter = ddrs_table->begin ();
    3252                 :     2028905 :        iter != ddrs_table->end (); ++iter)
    3253                 :             :     {
    3254                 :      957126 :       free_dependence_relation (*iter);
    3255                 :      957126 :       *iter = NULL;
    3256                 :             :     }
    3257                 :      114653 :   delete ddrs_table;
    3258                 :             : 
    3259                 :      290632 :   FOR_EACH_VEC_ELT (partitions, i, partition)
    3260                 :      175979 :     partition_free (partition);
    3261                 :             : 
    3262                 :      114653 :   free_rdg (rdg);
    3263                 :      114653 :   return nbp - *nb_calls;
    3264                 :      114653 : }
    3265                 :             : 
    3266                 :             : 
    3267                 :      181162 : void loop_distribution::bb_top_order_init (void)
    3268                 :             : {
    3269                 :      181162 :   int rpo_num;
    3270                 :      181162 :   int *rpo = XNEWVEC (int, n_basic_blocks_for_fn (cfun) - NUM_FIXED_BLOCKS);
    3271                 :      181162 :   edge entry = single_succ_edge (ENTRY_BLOCK_PTR_FOR_FN (cfun));
    3272                 :      181162 :   bitmap exit_bbs = BITMAP_ALLOC (NULL);
    3273                 :             : 
    3274                 :      181162 :   bb_top_order_index = XNEWVEC (int, last_basic_block_for_fn (cfun));
    3275                 :      181162 :   bb_top_order_index_size = last_basic_block_for_fn (cfun);
    3276                 :             : 
    3277                 :      181162 :   entry->flags &= ~EDGE_DFS_BACK;
    3278                 :      181162 :   bitmap_set_bit (exit_bbs, EXIT_BLOCK);
    3279                 :      181162 :   rpo_num = rev_post_order_and_mark_dfs_back_seme (cfun, entry, exit_bbs, true,
    3280                 :             :                                                    rpo, NULL);
    3281                 :      181162 :   BITMAP_FREE (exit_bbs);
    3282                 :             : 
    3283                 :     6096193 :   for (int i = 0; i < rpo_num; i++)
    3284                 :     5915031 :     bb_top_order_index[rpo[i]] = i;
    3285                 :             : 
    3286                 :      181162 :   free (rpo);
    3287                 :      181162 : }
    3288                 :             : 
    3289                 :      181162 : void loop_distribution::bb_top_order_destroy ()
    3290                 :             : {
    3291                 :      181162 :   free (bb_top_order_index);
    3292                 :      181162 :   bb_top_order_index = NULL;
    3293                 :      181162 :   bb_top_order_index_size = 0;
    3294                 :      181162 : }
    3295                 :             : 
    3296                 :             : 
    3297                 :             : /* Given LOOP, this function records seed statements for distribution in
    3298                 :             :    WORK_LIST.  Return false if there is nothing for distribution.  */
    3299                 :             : 
    3300                 :             : static bool
    3301                 :      153041 : find_seed_stmts_for_distribution (class loop *loop, vec<gimple *> *work_list)
    3302                 :             : {
    3303                 :      153041 :   basic_block *bbs = get_loop_body_in_dom_order (loop);
    3304                 :             : 
    3305                 :             :   /* Initialize the worklist with stmts we seed the partitions with.  */
    3306                 :      553721 :   for (unsigned i = 0; i < loop->num_nodes; ++i)
    3307                 :             :     {
    3308                 :             :       /* In irreducible sub-regions we don't know how to redirect
    3309                 :             :          conditions, so fail.  See PR100492.  */
    3310                 :      436581 :       if (bbs[i]->flags & BB_IRREDUCIBLE_LOOP)
    3311                 :             :         {
    3312                 :           4 :           if (dump_file && (dump_flags & TDF_DETAILS))
    3313                 :           0 :             fprintf (dump_file, "loop %d contains an irreducible region.\n",
    3314                 :             :                      loop->num);
    3315                 :           4 :           work_list->truncate (0);
    3316                 :           4 :           break;
    3317                 :             :         }
    3318                 :      436577 :       for (gphi_iterator gsi = gsi_start_phis (bbs[i]);
    3319                 :     1030650 :            !gsi_end_p (gsi); gsi_next (&gsi))
    3320                 :             :         {
    3321                 :      594073 :           gphi *phi = gsi.phi ();
    3322                 :     1188146 :           if (virtual_operand_p (gimple_phi_result (phi)))
    3323                 :      160499 :             continue;
    3324                 :             :           /* Distribute stmts which have defs that are used outside of
    3325                 :             :              the loop.  */
    3326                 :      433574 :           if (!stmt_has_scalar_dependences_outside_loop (loop, phi))
    3327                 :      415709 :             continue;
    3328                 :       17865 :           work_list->safe_push (phi);
    3329                 :             :         }
    3330                 :      873154 :       for (gimple_stmt_iterator gsi = gsi_start_bb (bbs[i]);
    3331                 :     2997199 :            !gsi_end_p (gsi); gsi_next (&gsi))
    3332                 :             :         {
    3333                 :     2596519 :           gimple *stmt = gsi_stmt (gsi);
    3334                 :             : 
    3335                 :             :           /* Ignore clobbers, they do not have true side effects.  */
    3336                 :     2596519 :           if (gimple_clobber_p (stmt))
    3337                 :     2328198 :             continue;
    3338                 :             : 
    3339                 :             :           /* If there is a stmt with side-effects bail out - we
    3340                 :             :              cannot and should not distribute this loop.  */
    3341                 :     2592212 :           if (gimple_has_side_effects (stmt))
    3342                 :             :             {
    3343                 :       35897 :               free (bbs);
    3344                 :       35897 :               return false;
    3345                 :             :             }
    3346                 :             : 
    3347                 :             :           /* Distribute stmts which have defs that are used outside of
    3348                 :             :              the loop.  */
    3349                 :     2556315 :           if (stmt_has_scalar_dependences_outside_loop (loop, stmt))
    3350                 :             :             ;
    3351                 :             :           /* Otherwise only distribute stores for now.  */
    3352                 :     4044427 :           else if (!gimple_vdef (stmt))
    3353                 :     2323891 :             continue;
    3354                 :             : 
    3355                 :      232424 :           work_list->safe_push (stmt);
    3356                 :             :         }
    3357                 :             :     }
    3358                 :      117144 :   bool res = work_list->length () > 0;
    3359                 :      116972 :   if (res && !can_copy_bbs_p (bbs, loop->num_nodes))
    3360                 :             :     {
    3361                 :           8 :       if (dump_file && (dump_flags & TDF_DETAILS))
    3362                 :           0 :         fprintf (dump_file, "cannot copy loop %d.\n", loop->num);
    3363                 :             :       res = false;
    3364                 :             :     }
    3365                 :      117144 :   free (bbs);
    3366                 :      117144 :   return res;
    3367                 :             : }
    3368                 :             : 
    3369                 :             : /* A helper function for generate_{rawmemchr,strlen}_builtin functions in order
    3370                 :             :    to place new statements SEQ before LOOP and replace the old reduction
    3371                 :             :    variable with the new one.  */
    3372                 :             : 
    3373                 :             : static void
    3374                 :         100 : generate_reduction_builtin_1 (loop_p loop, gimple_seq &seq,
    3375                 :             :                               tree reduction_var_old, tree reduction_var_new,
    3376                 :             :                               const char *info, machine_mode load_mode)
    3377                 :             : {
    3378                 :         100 :   gcc_assert (flag_tree_loop_distribute_patterns);
    3379                 :             : 
    3380                 :             :   /* Place new statements before LOOP.  */
    3381                 :         100 :   gimple_stmt_iterator gsi = gsi_last_bb (loop_preheader_edge (loop)->src);
    3382                 :         100 :   gsi_insert_seq_after (&gsi, seq, GSI_CONTINUE_LINKING);
    3383                 :             : 
    3384                 :             :   /* Replace old reduction variable with new one.  */
    3385                 :         100 :   imm_use_iterator iter;
    3386                 :         100 :   gimple *stmt;
    3387                 :         100 :   use_operand_p use_p;
    3388                 :         474 :   FOR_EACH_IMM_USE_STMT (stmt, iter, reduction_var_old)
    3389                 :             :     {
    3390                 :        1496 :       FOR_EACH_IMM_USE_ON_STMT (use_p, iter)
    3391                 :         374 :         SET_USE (use_p, reduction_var_new);
    3392                 :             : 
    3393                 :         374 :       update_stmt (stmt);
    3394                 :         100 :     }
    3395                 :             : 
    3396                 :         100 :   if (dump_file && (dump_flags & TDF_DETAILS))
    3397                 :           7 :     fprintf (dump_file, info, GET_MODE_NAME (load_mode));
    3398                 :         100 : }
    3399                 :             : 
    3400                 :             : /* Generate a call to rawmemchr and place it before LOOP.  REDUCTION_VAR is
    3401                 :             :    replaced with a fresh SSA name representing the result of the call.  */
    3402                 :             : 
    3403                 :             : static void
    3404                 :           0 : generate_rawmemchr_builtin (loop_p loop, tree reduction_var,
    3405                 :             :                             data_reference_p store_dr, tree base, tree pattern,
    3406                 :             :                             location_t loc)
    3407                 :             : {
    3408                 :           0 :   gimple_seq seq = NULL;
    3409                 :             : 
    3410                 :           0 :   tree mem = force_gimple_operand (base, &seq, true, NULL_TREE);
    3411                 :           0 :   gimple *fn_call = gimple_build_call_internal (IFN_RAWMEMCHR, 2, mem, pattern);
    3412                 :           0 :   tree reduction_var_new = copy_ssa_name (reduction_var);
    3413                 :           0 :   gimple_call_set_lhs (fn_call, reduction_var_new);
    3414                 :           0 :   gimple_set_location (fn_call, loc);
    3415                 :           0 :   gimple_seq_add_stmt (&seq, fn_call);
    3416                 :             : 
    3417                 :           0 :   if (store_dr)
    3418                 :             :     {
    3419                 :           0 :       gassign *g = gimple_build_assign (DR_REF (store_dr), reduction_var_new);
    3420                 :           0 :       gimple_seq_add_stmt (&seq, g);
    3421                 :             :     }
    3422                 :             : 
    3423                 :           0 :   generate_reduction_builtin_1 (loop, seq, reduction_var, reduction_var_new,
    3424                 :             :                                 "generated rawmemchr%s\n",
    3425                 :           0 :                                 TYPE_MODE (TREE_TYPE (TREE_TYPE (base))));
    3426                 :           0 : }
    3427                 :             : 
    3428                 :             : /* Helper function for generate_strlen_builtin(,_using_rawmemchr)  */
    3429                 :             : 
    3430                 :             : static void
    3431                 :         100 : generate_strlen_builtin_1 (loop_p loop, gimple_seq &seq,
    3432                 :             :                            tree reduction_var_old, tree reduction_var_new,
    3433                 :             :                            machine_mode mode, tree start_len)
    3434                 :             : {
    3435                 :             :   /* REDUCTION_VAR_NEW has either size type or ptrdiff type and must be
    3436                 :             :      converted if types of old and new reduction variable are not compatible. */
    3437                 :         100 :   reduction_var_new = gimple_convert (&seq, TREE_TYPE (reduction_var_old),
    3438                 :             :                                       reduction_var_new);
    3439                 :             : 
    3440                 :             :   /* Loops of the form `for (i=42; s[i]; ++i);` have an additional start
    3441                 :             :      length.  */
    3442                 :         100 :   if (!integer_zerop (start_len))
    3443                 :             :     {
    3444                 :          96 :       tree lhs = make_ssa_name (TREE_TYPE (reduction_var_new));
    3445                 :          96 :       gimple *g = gimple_build_assign (lhs, PLUS_EXPR, reduction_var_new,
    3446                 :             :                                        start_len);
    3447                 :          96 :       gimple_seq_add_stmt (&seq, g);
    3448                 :          96 :       reduction_var_new = lhs;
    3449                 :             :     }
    3450                 :             : 
    3451                 :         100 :   generate_reduction_builtin_1 (loop, seq, reduction_var_old, reduction_var_new,
    3452                 :             :                                 "generated strlen%s\n", mode);
    3453                 :         100 : }
    3454                 :             : 
    3455                 :             : /* Generate a call to strlen and place it before LOOP.  REDUCTION_VAR is
    3456                 :             :    replaced with a fresh SSA name representing the result of the call.  */
    3457                 :             : 
    3458                 :             : static void
    3459                 :         100 : generate_strlen_builtin (loop_p loop, tree reduction_var, tree base,
    3460                 :             :                          tree start_len, location_t loc)
    3461                 :             : {
    3462                 :         100 :   gimple_seq seq = NULL;
    3463                 :             : 
    3464                 :         100 :   tree reduction_var_new = make_ssa_name (size_type_node);
    3465                 :             : 
    3466                 :         100 :   tree mem = force_gimple_operand (base, &seq, true, NULL_TREE);
    3467                 :         200 :   tree fn = build_fold_addr_expr (builtin_decl_implicit (BUILT_IN_STRLEN));
    3468                 :         100 :   gimple *fn_call = gimple_build_call (fn, 1, mem);
    3469                 :         100 :   gimple_call_set_lhs (fn_call, reduction_var_new);
    3470                 :         100 :   gimple_set_location (fn_call, loc);
    3471                 :         100 :   gimple_seq_add_stmt (&seq, fn_call);
    3472                 :             : 
    3473                 :         100 :   generate_strlen_builtin_1 (loop, seq, reduction_var, reduction_var_new,
    3474                 :             :                              QImode, start_len);
    3475                 :         100 : }
    3476                 :             : 
    3477                 :             : /* Generate code in order to mimic the behaviour of strlen but this time over
    3478                 :             :    an array of elements with mode different than QI.  REDUCTION_VAR is replaced
    3479                 :             :    with a fresh SSA name representing the result, i.e., the length.  */
    3480                 :             : 
    3481                 :             : static void
    3482                 :           0 : generate_strlen_builtin_using_rawmemchr (loop_p loop, tree reduction_var,
    3483                 :             :                                          tree base, tree load_type,
    3484                 :             :                                          tree start_len, location_t loc)
    3485                 :             : {
    3486                 :           0 :   gimple_seq seq = NULL;
    3487                 :             : 
    3488                 :           0 :   tree start = force_gimple_operand (base, &seq, true, NULL_TREE);
    3489                 :           0 :   tree zero = build_zero_cst (load_type);
    3490                 :           0 :   gimple *fn_call = gimple_build_call_internal (IFN_RAWMEMCHR, 2, start, zero);
    3491                 :           0 :   tree end = make_ssa_name (TREE_TYPE (base));
    3492                 :           0 :   gimple_call_set_lhs (fn_call, end);
    3493                 :           0 :   gimple_set_location (fn_call, loc);
    3494                 :           0 :   gimple_seq_add_stmt (&seq, fn_call);
    3495                 :             : 
    3496                 :             :   /* Determine the number of elements between START and END by
    3497                 :             :      evaluating (END - START) / sizeof (*START).  */
    3498                 :           0 :   tree diff = make_ssa_name (ptrdiff_type_node);
    3499                 :           0 :   gimple *diff_stmt = gimple_build_assign (diff, POINTER_DIFF_EXPR, end, start);
    3500                 :           0 :   gimple_seq_add_stmt (&seq, diff_stmt);
    3501                 :             :   /* Let SIZE be the size of each character.  */
    3502                 :           0 :   tree size = gimple_convert (&seq, ptrdiff_type_node,
    3503                 :           0 :                               TYPE_SIZE_UNIT (load_type));
    3504                 :           0 :   tree count = make_ssa_name (ptrdiff_type_node);
    3505                 :           0 :   gimple *count_stmt = gimple_build_assign (count, TRUNC_DIV_EXPR, diff, size);
    3506                 :           0 :   gimple_seq_add_stmt (&seq, count_stmt);
    3507                 :             : 
    3508                 :           0 :   generate_strlen_builtin_1 (loop, seq, reduction_var, count,
    3509                 :           0 :                              TYPE_MODE (load_type),
    3510                 :             :                              start_len);
    3511                 :           0 : }
    3512                 :             : 
    3513                 :             : /* Return true if we can count at least as many characters by taking pointer
    3514                 :             :    difference as we can count via reduction_var without an overflow.  Thus
    3515                 :             :    compute 2^n < (2^(m-1) / s) where n = TYPE_PRECISION (reduction_var_type),
    3516                 :             :    m = TYPE_PRECISION (ptrdiff_type_node), and s = size of each character.  */
    3517                 :             : static bool
    3518                 :           0 : reduction_var_overflows_first (tree reduction_var_type, tree load_type)
    3519                 :             : {
    3520                 :           0 :   widest_int n2 = wi::lshift (1, TYPE_PRECISION (reduction_var_type));;
    3521                 :           0 :   widest_int m2 = wi::lshift (1, TYPE_PRECISION (ptrdiff_type_node) - 1);
    3522                 :           0 :   widest_int s = wi::to_widest (TYPE_SIZE_UNIT (load_type));
    3523                 :           0 :   return wi::ltu_p (n2, wi::udiv_trunc (m2, s));
    3524                 :           0 : }
    3525                 :             : 
    3526                 :             : static gimple *
    3527                 :       19368 : determine_reduction_stmt_1 (const loop_p loop, const basic_block *bbs)
    3528                 :             : {
    3529                 :       19368 :   gimple *reduction_stmt = NULL;
    3530                 :             : 
    3531                 :       68907 :   for (unsigned i = 0, ninsns = 0; i < loop->num_nodes; ++i)
    3532                 :             :     {
    3533                 :       54107 :       basic_block bb = bbs[i];
    3534                 :             : 
    3535                 :      106092 :       for (gphi_iterator bsi = gsi_start_phis (bb); !gsi_end_p (bsi);
    3536                 :       51985 :            gsi_next_nondebug (&bsi))
    3537                 :             :         {
    3538                 :       52781 :           gphi *phi = bsi.phi ();
    3539                 :      105562 :           if (virtual_operand_p (gimple_phi_result (phi)))
    3540                 :       17459 :             continue;
    3541                 :       35322 :           if (stmt_has_scalar_dependences_outside_loop (loop, phi))
    3542                 :             :             {
    3543                 :        6650 :               if (reduction_stmt)
    3544                 :         796 :                 return NULL;
    3545                 :             :               reduction_stmt = phi;
    3546                 :             :             }
    3547                 :             :         }
    3548                 :             : 
    3549                 :      238679 :       for (gimple_stmt_iterator bsi = gsi_start_bb (bb); !gsi_end_p (bsi);
    3550                 :      132057 :            gsi_next_nondebug (&bsi), ++ninsns)
    3551                 :             :         {
    3552                 :             :           /* Bail out early for loops which are unlikely to match.  */
    3553                 :      135829 :           if (ninsns > 16)
    3554                 :        3772 :             return NULL;
    3555                 :      133749 :           gimple *stmt = gsi_stmt (bsi);
    3556                 :      133749 :           if (gimple_clobber_p (stmt))
    3557                 :        3401 :             continue;
    3558                 :      130348 :           if (gimple_code (stmt) == GIMPLE_LABEL)
    3559                 :         702 :             continue;
    3560                 :      217718 :           if (gimple_has_volatile_ops (stmt))
    3561                 :             :             return NULL;
    3562                 :      129602 :           if (stmt_has_scalar_dependences_outside_loop (loop, stmt))
    3563                 :             :             {
    3564                 :        4729 :               if (reduction_stmt)
    3565                 :             :                 return NULL;
    3566                 :             :               reduction_stmt = stmt;
    3567                 :             :             }
    3568                 :             :         }
    3569                 :             :     }
    3570                 :             : 
    3571                 :             :   return reduction_stmt;
    3572                 :             : }
    3573                 :             : 
    3574                 :             : /* If LOOP has a single non-volatile reduction statement, then return a pointer
    3575                 :             :    to it.  Otherwise return NULL.  */
    3576                 :             : static gimple *
    3577                 :       19368 : determine_reduction_stmt (const loop_p loop)
    3578                 :             : {
    3579                 :       19368 :   basic_block *bbs = get_loop_body (loop);
    3580                 :       19368 :   gimple *reduction_stmt = determine_reduction_stmt_1 (loop, bbs);
    3581                 :       19368 :   XDELETEVEC (bbs);
    3582                 :       19368 :   return reduction_stmt;
    3583                 :             : }
    3584                 :             : 
    3585                 :             : /* Transform loops which mimic the effects of builtins rawmemchr or strlen and
    3586                 :             :    replace them accordingly.  For example, a loop of the form
    3587                 :             : 
    3588                 :             :      for (; *p != 42; ++p);
    3589                 :             : 
    3590                 :             :    is replaced by
    3591                 :             : 
    3592                 :             :      p = rawmemchr<MODE> (p, 42);
    3593                 :             : 
    3594                 :             :    under the assumption that rawmemchr is available for a particular MODE.
    3595                 :             :    Another example is
    3596                 :             : 
    3597                 :             :      int i;
    3598                 :             :      for (i = 42; s[i]; ++i);
    3599                 :             : 
    3600                 :             :    which is replaced by
    3601                 :             : 
    3602                 :             :      i = (int)strlen (&s[42]) + 42;
    3603                 :             : 
    3604                 :             :    for some character array S.  In case array S is not of type character array
    3605                 :             :    we end up with
    3606                 :             : 
    3607                 :             :      i = (int)(rawmemchr<MODE> (&s[42], 0) - &s[42]) + 42;
    3608                 :             : 
    3609                 :             :    assuming that rawmemchr is available for a particular MODE.  */
    3610                 :             : 
    3611                 :             : bool
    3612                 :       72335 : loop_distribution::transform_reduction_loop (loop_p loop)
    3613                 :             : {
    3614                 :       72335 :   gimple *reduction_stmt;
    3615                 :       72335 :   data_reference_p load_dr = NULL, store_dr = NULL;
    3616                 :             : 
    3617                 :       72335 :   edge e = single_exit (loop);
    3618                 :      211525 :   gcond *cond = safe_dyn_cast <gcond *> (*gsi_last_bb (e->src));
    3619                 :       54893 :   if (!cond)
    3620                 :             :     return false;
    3621                 :             :   /* Ensure loop condition is an (in)equality test and loop is exited either if
    3622                 :             :      the inequality test fails or the equality test succeeds.  */
    3623                 :       40866 :   if (!(e->flags & EDGE_FALSE_VALUE && gimple_cond_code (cond) == NE_EXPR)
    3624                 :       71895 :       && !(e->flags & EDGE_TRUE_VALUE && gimple_cond_code (cond) == EQ_EXPR))
    3625                 :             :     return false;
    3626                 :             :   /* A limitation of the current implementation is that we only support
    3627                 :             :      constant patterns in (in)equality tests.  */
    3628                 :       25447 :   tree pattern = gimple_cond_rhs (cond);
    3629                 :       25447 :   if (TREE_CODE (pattern) != INTEGER_CST)
    3630                 :             :     return false;
    3631                 :             : 
    3632                 :       19368 :   reduction_stmt = determine_reduction_stmt (loop);
    3633                 :             : 
    3634                 :             :   /* A limitation of the current implementation is that we require a reduction
    3635                 :             :      statement.  Therefore, loops without a reduction statement as in the
    3636                 :             :      following are not recognized:
    3637                 :             :      int *p;
    3638                 :             :      void foo (void) { for (; *p; ++p); } */
    3639                 :       19368 :   if (reduction_stmt == NULL)
    3640                 :             :     return false;
    3641                 :             : 
    3642                 :             :   /* Reduction variables are guaranteed to be SSA names.  */
    3643                 :        6318 :   tree reduction_var;
    3644                 :        6318 :   switch (gimple_code (reduction_stmt))
    3645                 :             :     {
    3646                 :        6184 :     case GIMPLE_ASSIGN:
    3647                 :        6184 :     case GIMPLE_PHI:
    3648                 :        6184 :       reduction_var = gimple_get_lhs (reduction_stmt);
    3649                 :        6184 :       break;
    3650                 :             :     default:
    3651                 :             :       /* Bail out e.g. for GIMPLE_CALL.  */
    3652                 :             :       return false;
    3653                 :             :     }
    3654                 :             : 
    3655                 :        6184 :   struct graph *rdg = build_rdg (loop, NULL);
    3656                 :        6184 :   if (rdg == NULL)
    3657                 :             :     {
    3658                 :         704 :       if (dump_file && (dump_flags & TDF_DETAILS))
    3659                 :           0 :         fprintf (dump_file,
    3660                 :             :                  "Loop %d not transformed: failed to build the RDG.\n",
    3661                 :             :                  loop->num);
    3662                 :             : 
    3663                 :         704 :       return false;
    3664                 :             :     }
    3665                 :       10960 :   auto_bitmap partition_stmts;
    3666                 :        5480 :   bitmap_set_range (partition_stmts, 0, rdg->n_vertices);
    3667                 :        5480 :   find_single_drs (loop, rdg, partition_stmts, &store_dr, &load_dr);
    3668                 :        5480 :   free_rdg (rdg);
    3669                 :             : 
    3670                 :             :   /* Bail out if there is no single load.  */
    3671                 :        5480 :   if (load_dr == NULL)
    3672                 :             :     return false;
    3673                 :             : 
    3674                 :             :   /* Reaching this point we have a loop with a single reduction variable,
    3675                 :             :      a single load, and an optional single store.  */
    3676                 :             : 
    3677                 :        2514 :   tree load_ref = DR_REF (load_dr);
    3678                 :        2514 :   tree load_type = TREE_TYPE (load_ref);
    3679                 :        2514 :   tree load_access_base = build_fold_addr_expr (load_ref);
    3680                 :        2514 :   tree load_access_size = TYPE_SIZE_UNIT (load_type);
    3681                 :        2514 :   affine_iv load_iv, reduction_iv;
    3682                 :             : 
    3683                 :        2514 :   if (!INTEGRAL_TYPE_P (load_type)
    3684                 :        2514 :       || !type_has_mode_precision_p (load_type))
    3685                 :        1129 :     return false;
    3686                 :             : 
    3687                 :             :   /* We already ensured that the loop condition tests for (in)equality where the
    3688                 :             :      rhs is a constant pattern. Now ensure that the lhs is the result of the
    3689                 :             :      load.  */
    3690                 :        1385 :   if (gimple_cond_lhs (cond) != gimple_assign_lhs (DR_STMT (load_dr)))
    3691                 :             :     return false;
    3692                 :             : 
    3693                 :             :   /* Bail out if no affine induction variable with constant step can be
    3694                 :             :      determined.  */
    3695                 :        1235 :   if (!simple_iv (loop, loop, load_access_base, &load_iv, false))
    3696                 :             :     return false;
    3697                 :             : 
    3698                 :             :   /* Bail out if memory accesses are not consecutive or not growing.  */
    3699                 :        1226 :   if (!operand_equal_p (load_iv.step, load_access_size, 0))
    3700                 :             :     return false;
    3701                 :             : 
    3702                 :        1206 :   if (!simple_iv (loop, loop, reduction_var, &reduction_iv, false))
    3703                 :             :     return false;
    3704                 :             : 
    3705                 :             :   /* Handle rawmemchr like loops.  */
    3706                 :        1088 :   if (operand_equal_p (load_iv.base, reduction_iv.base)
    3707                 :        1088 :       && operand_equal_p (load_iv.step, reduction_iv.step))
    3708                 :             :     {
    3709                 :         262 :       if (store_dr)
    3710                 :             :         {
    3711                 :             :           /* Ensure that we store to X and load from X+I where I>0.  */
    3712                 :           0 :           if (TREE_CODE (load_iv.base) != POINTER_PLUS_EXPR
    3713                 :           0 :               || !integer_onep (TREE_OPERAND (load_iv.base, 1)))
    3714                 :           0 :             return false;
    3715                 :           0 :           tree ptr_base = TREE_OPERAND (load_iv.base, 0);
    3716                 :           0 :           if (TREE_CODE (ptr_base) != SSA_NAME)
    3717                 :             :             return false;
    3718                 :           0 :           gimple *def = SSA_NAME_DEF_STMT (ptr_base);
    3719                 :           0 :           if (!gimple_assign_single_p (def)
    3720                 :           0 :               || gimple_assign_rhs1 (def) != DR_REF (store_dr))
    3721                 :             :             return false;
    3722                 :             :           /* Ensure that the reduction value is stored.  */
    3723                 :           0 :           if (gimple_assign_rhs1 (DR_STMT (store_dr)) != reduction_var)
    3724                 :             :             return false;
    3725                 :             :         }
    3726                 :             :       /* Bail out if target does not provide rawmemchr for a certain mode.  */
    3727                 :         262 :       machine_mode mode = TYPE_MODE (load_type);
    3728                 :         262 :       if (direct_optab_handler (rawmemchr_optab, mode) == CODE_FOR_nothing)
    3729                 :             :         return false;
    3730                 :           0 :       location_t loc = gimple_location (DR_STMT (load_dr));
    3731                 :           0 :       generate_rawmemchr_builtin (loop, reduction_var, store_dr, load_iv.base,
    3732                 :             :                                   pattern, loc);
    3733                 :           0 :       return true;
    3734                 :             :     }
    3735                 :             : 
    3736                 :             :   /* Handle strlen like loops.  */
    3737                 :         826 :   if (store_dr == NULL
    3738                 :         814 :       && integer_zerop (pattern)
    3739                 :         814 :       && INTEGRAL_TYPE_P (TREE_TYPE (reduction_var))
    3740                 :         804 :       && TREE_CODE (reduction_iv.base) == INTEGER_CST
    3741                 :         800 :       && TREE_CODE (reduction_iv.step) == INTEGER_CST
    3742                 :        1626 :       && integer_onep (reduction_iv.step))
    3743                 :             :     {
    3744                 :         767 :       location_t loc = gimple_location (DR_STMT (load_dr));
    3745                 :         767 :       tree reduction_var_type = TREE_TYPE (reduction_var);
    3746                 :             :       /* While determining the length of a string an overflow might occur.
    3747                 :             :          If an overflow only occurs in the loop implementation and not in the
    3748                 :             :          strlen implementation, then either the overflow is undefined or the
    3749                 :             :          truncated result of strlen equals the one of the loop.  Otherwise if
    3750                 :             :          an overflow may also occur in the strlen implementation, then
    3751                 :             :          replacing a loop by a call to strlen is sound whenever we ensure that
    3752                 :             :          if an overflow occurs in the strlen implementation, then also an
    3753                 :             :          overflow occurs in the loop implementation which is undefined.  It
    3754                 :             :          seems reasonable to relax this and assume that the strlen
    3755                 :             :          implementation cannot overflow in case sizetype is big enough in the
    3756                 :             :          sense that an overflow can only happen for string objects which are
    3757                 :             :          bigger than half of the address space; at least for 32-bit targets and
    3758                 :             :          up.
    3759                 :             : 
    3760                 :             :          For strlen which makes use of rawmemchr the maximal length of a string
    3761                 :             :          which can be determined without an overflow is PTRDIFF_MAX / S where
    3762                 :             :          each character has size S.  Since an overflow for ptrdiff type is
    3763                 :             :          undefined we have to make sure that if an overflow occurs, then an
    3764                 :             :          overflow occurs in the loop implementation, too, and this is
    3765                 :             :          undefined, too.  Similar as before we relax this and assume that no
    3766                 :             :          string object is larger than half of the address space; at least for
    3767                 :             :          32-bit targets and up.  */
    3768                 :         767 :       if (TYPE_MODE (load_type) == TYPE_MODE (char_type_node)
    3769                 :         431 :           && TYPE_PRECISION (load_type) == TYPE_PRECISION (char_type_node)
    3770                 :         431 :           && ((TYPE_PRECISION (sizetype) >= TYPE_PRECISION (ptr_type_node) - 1
    3771                 :         431 :                && TYPE_PRECISION (ptr_type_node) >= 32)
    3772                 :           0 :               || (TYPE_OVERFLOW_UNDEFINED (reduction_var_type)
    3773                 :           0 :                   && TYPE_PRECISION (reduction_var_type) <= TYPE_PRECISION (sizetype)))
    3774                 :         867 :           && builtin_decl_implicit (BUILT_IN_STRLEN))
    3775                 :         100 :         generate_strlen_builtin (loop, reduction_var, load_iv.base,
    3776                 :             :                                  reduction_iv.base, loc);
    3777                 :         667 :       else if (direct_optab_handler (rawmemchr_optab, TYPE_MODE (load_type))
    3778                 :             :                != CODE_FOR_nothing
    3779                 :         667 :                && ((TYPE_PRECISION (ptrdiff_type_node) == TYPE_PRECISION (ptr_type_node)
    3780                 :           0 :                     && TYPE_PRECISION (ptrdiff_type_node) >= 32)
    3781                 :           0 :                    || (TYPE_OVERFLOW_UNDEFINED (reduction_var_type)
    3782                 :           0 :                        && reduction_var_overflows_first (reduction_var_type, load_type))))
    3783                 :           0 :         generate_strlen_builtin_using_rawmemchr (loop, reduction_var,
    3784                 :             :                                                  load_iv.base,
    3785                 :             :                                                  load_type,
    3786                 :             :                                                  reduction_iv.base, loc);
    3787                 :             :       else
    3788                 :         667 :         return false;
    3789                 :         100 :       return true;
    3790                 :             :     }
    3791                 :             : 
    3792                 :             :   return false;
    3793                 :             : }
    3794                 :             : 
    3795                 :             : /* Given innermost LOOP, return the outermost enclosing loop that forms a
    3796                 :             :    perfect loop nest.  */
    3797                 :             : 
    3798                 :             : static class loop *
    3799                 :      139594 : prepare_perfect_loop_nest (class loop *loop)
    3800                 :             : {
    3801                 :      139594 :   class loop *outer = loop_outer (loop);
    3802                 :      139594 :   tree niters = number_of_latch_executions (loop);
    3803                 :             : 
    3804                 :             :   /* TODO: We only support the innermost 3-level loop nest distribution
    3805                 :             :      because of compilation time issue for now.  This should be relaxed
    3806                 :             :      in the future.  Note we only allow 3-level loop nest distribution
    3807                 :             :      when parallelizing loops.  */
    3808                 :      139594 :   while ((loop->inner == NULL
    3809                 :       13958 :           || (loop->inner->inner == NULL && flag_tree_parallelize_loops > 1))
    3810                 :      139664 :          && loop_outer (outer)
    3811                 :       32656 :          && outer->inner == loop && loop->next == NULL
    3812                 :       21650 :          && single_exit (outer)
    3813                 :       20299 :          && !chrec_contains_symbols_defined_in_loop (niters, outer->num)
    3814                 :       14852 :          && (niters = number_of_latch_executions (outer)) != NULL_TREE
    3815                 :      168404 :          && niters != chrec_dont_know)
    3816                 :             :     {
    3817                 :       13958 :       loop = outer;
    3818                 :       13958 :       outer = loop_outer (loop);
    3819                 :             :     }
    3820                 :             : 
    3821                 :      139594 :   return loop;
    3822                 :             : }
    3823                 :             : 
    3824                 :             : 
    3825                 :             : unsigned int
    3826                 :      181162 : loop_distribution::execute (function *fun)
    3827                 :             : {
    3828                 :      181162 :   bool changed = false;
    3829                 :      181162 :   basic_block bb;
    3830                 :      181162 :   control_dependences *cd = NULL;
    3831                 :      181162 :   auto_vec<loop_p> loops_to_be_destroyed;
    3832                 :             : 
    3833                 :      536697 :   if (number_of_loops (fun) <= 1)
    3834                 :             :     return 0;
    3835                 :             : 
    3836                 :      181162 :   bb_top_order_init ();
    3837                 :             : 
    3838                 :     6458517 :   FOR_ALL_BB_FN (bb, fun)
    3839                 :             :     {
    3840                 :     6277355 :       gimple_stmt_iterator gsi;
    3841                 :     9897766 :       for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (&gsi))
    3842                 :     3620411 :         gimple_set_uid (gsi_stmt (gsi), -1);
    3843                 :    48714818 :       for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
    3844                 :    36160108 :         gimple_set_uid (gsi_stmt (gsi), -1);
    3845                 :             :     }
    3846                 :             : 
    3847                 :             :   /* We can at the moment only distribute non-nested loops, thus restrict
    3848                 :             :      walking to innermost loops.  */
    3849                 :      896662 :   for (auto loop : loops_list (cfun, LI_ONLY_INNERMOST))
    3850                 :             :     {
    3851                 :             :       /* Don't distribute multiple exit edges loop, or cold loop when
    3852                 :             :          not doing pattern detection.  */
    3853                 :      353176 :       if (!single_exit (loop)
    3854                 :      353176 :           || (!flag_tree_loop_distribute_patterns
    3855                 :        1121 :               && !optimize_loop_for_speed_p (loop)))
    3856                 :      141031 :         continue;
    3857                 :             : 
    3858                 :             :       /* If niters is unknown don't distribute loop but rather try to transform
    3859                 :             :          it to a call to a builtin.  */
    3860                 :      212145 :       tree niters = number_of_latch_executions (loop);
    3861                 :      212145 :       if (niters == NULL_TREE || niters == chrec_dont_know)
    3862                 :             :         {
    3863                 :       72551 :           datarefs_vec.create (20);
    3864                 :       72551 :           if (flag_tree_loop_distribute_patterns
    3865                 :       72551 :               && transform_reduction_loop (loop))
    3866                 :             :             {
    3867                 :         100 :               changed = true;
    3868                 :         100 :               loops_to_be_destroyed.safe_push (loop);
    3869                 :         100 :               if (dump_enabled_p ())
    3870                 :             :                 {
    3871                 :           7 :                   dump_user_location_t loc = find_loop_location (loop);
    3872                 :           7 :                   dump_printf_loc (MSG_OPTIMIZED_LOCATIONS,
    3873                 :             :                                    loc, "Loop %d transformed into a builtin.\n",
    3874                 :             :                                    loop->num);
    3875                 :             :                 }
    3876                 :             :             }
    3877                 :       72551 :           free_data_refs (datarefs_vec);
    3878                 :       72551 :           continue;
    3879                 :       72551 :         }
    3880                 :             : 
    3881                 :             :       /* Get the perfect loop nest for distribution.  */
    3882                 :      139594 :       loop = prepare_perfect_loop_nest (loop);
    3883                 :      281989 :       for (; loop; loop = loop->inner)
    3884                 :             :         {
    3885                 :      153041 :           auto_vec<gimple *> work_list;
    3886                 :      153041 :           if (!find_seed_stmts_for_distribution (loop, &work_list))
    3887                 :       36078 :             continue;
    3888                 :             : 
    3889                 :      116963 :           const char *str = loop->inner ? " nest" : "";
    3890                 :      116963 :           dump_user_location_t loc = find_loop_location (loop);
    3891                 :      116963 :           if (!cd)
    3892                 :             :             {
    3893                 :       53669 :               calculate_dominance_info (CDI_DOMINATORS);
    3894                 :       53669 :               calculate_dominance_info (CDI_POST_DOMINATORS);
    3895                 :       53669 :               cd = new control_dependences ();
    3896                 :       53669 :               free_dominance_info (CDI_POST_DOMINATORS);
    3897                 :             :             }
    3898                 :             : 
    3899                 :      116963 :           bool destroy_p;
    3900                 :      116963 :           int nb_generated_loops, nb_generated_calls;
    3901                 :      116963 :           bool only_patterns = !optimize_loop_for_speed_p (loop)
    3902                 :      116963 :                                || !flag_tree_loop_distribution;
    3903                 :             :           /* do not try to distribute loops that are not expected to iterate.  */
    3904                 :       25012 :           if (!only_patterns)
    3905                 :             :             {
    3906                 :       25012 :               HOST_WIDE_INT iterations = estimated_loop_iterations_int (loop);
    3907                 :       25012 :               if (iterations < 0)
    3908                 :       11864 :                 iterations = likely_max_loop_iterations_int (loop);
    3909                 :       25012 :               if (!iterations)
    3910                 :       91961 :                 only_patterns = true;
    3911                 :             :             }
    3912                 :      116963 :           nb_generated_loops
    3913                 :      116963 :             = distribute_loop (loop, work_list, cd, &nb_generated_calls,
    3914                 :             :                                &destroy_p, only_patterns);
    3915                 :      116963 :           if (destroy_p)
    3916                 :        9047 :             loops_to_be_destroyed.safe_push (loop);
    3917                 :             : 
    3918                 :      116963 :           if (nb_generated_loops + nb_generated_calls > 0)
    3919                 :             :             {
    3920                 :       10646 :               changed = true;
    3921                 :       10646 :               if (dump_enabled_p ())
    3922                 :          68 :                 dump_printf_loc (MSG_OPTIMIZED_LOCATIONS,
    3923                 :             :                                  loc, "Loop%s %d distributed: split to %d loops "
    3924                 :             :                                  "and %d library calls.\n", str, loop->num,
    3925                 :             :                                  nb_generated_loops, nb_generated_calls);
    3926                 :             : 
    3927                 :       10646 :               break;
    3928                 :             :             }
    3929                 :             : 
    3930                 :      106317 :           if (dump_file && (dump_flags & TDF_DETAILS))
    3931                 :          28 :             fprintf (dump_file, "Loop%s %d not distributed.\n", str, loop->num);
    3932                 :      153041 :         }
    3933                 :      181162 :     }
    3934                 :             : 
    3935                 :      181162 :   if (cd)
    3936                 :       53669 :     delete cd;
    3937                 :             : 
    3938                 :      181162 :   if (bb_top_order_index != NULL)
    3939                 :      181162 :     bb_top_order_destroy ();
    3940                 :             : 
    3941                 :      181162 :   if (changed)
    3942                 :             :     {
    3943                 :             :       /* Destroy loop bodies that could not be reused.  Do this late as we
    3944                 :             :          otherwise can end up refering to stale data in control dependences.  */
    3945                 :             :       unsigned i;
    3946                 :             :       class loop *loop;
    3947                 :       15936 :       FOR_EACH_VEC_ELT (loops_to_be_destroyed, i, loop)
    3948                 :        9147 :         destroy_loop (loop);
    3949                 :             : 
    3950                 :             :       /* Cached scalar evolutions now may refer to wrong or non-existing
    3951                 :             :          loops.  */
    3952                 :        6789 :       scev_reset ();
    3953                 :        6789 :       mark_virtual_operands_for_renaming (fun);
    3954                 :        6789 :       rewrite_into_loop_closed_ssa (NULL, TODO_update_ssa);
    3955                 :             :     }
    3956                 :             : 
    3957                 :      181162 :   checking_verify_loop_structure ();
    3958                 :             : 
    3959                 :      181162 :   return changed ? TODO_cleanup_cfg : 0;
    3960                 :      181162 : }
    3961                 :             : 
    3962                 :             : 
    3963                 :             : /* Distribute all loops in the current function.  */
    3964                 :             : 
    3965                 :             : namespace {
    3966                 :             : 
    3967                 :             : const pass_data pass_data_loop_distribution =
    3968                 :             : {
    3969                 :             :   GIMPLE_PASS, /* type */
    3970                 :             :   "ldist", /* name */
    3971                 :             :   OPTGROUP_LOOP, /* optinfo_flags */
    3972                 :             :   TV_TREE_LOOP_DISTRIBUTION, /* tv_id */
    3973                 :             :   ( PROP_cfg | PROP_ssa ), /* properties_required */
    3974                 :             :   0, /* properties_provided */
    3975                 :             :   0, /* properties_destroyed */
    3976                 :             :   0, /* todo_flags_start */
    3977                 :             :   0, /* todo_flags_finish */
    3978                 :             : };
    3979                 :             : 
    3980                 :             : class pass_loop_distribution : public gimple_opt_pass
    3981                 :             : {
    3982                 :             : public:
    3983                 :      280455 :   pass_loop_distribution (gcc::context *ctxt)
    3984                 :      560910 :     : gimple_opt_pass (pass_data_loop_distribution, ctxt)
    3985                 :             :   {}
    3986                 :             : 
    3987                 :             :   /* opt_pass methods: */
    3988                 :      217395 :   bool gate (function *) final override
    3989                 :             :     {
    3990                 :      217395 :       return flag_tree_loop_distribution
    3991                 :      217395 :         || flag_tree_loop_distribute_patterns;
    3992                 :             :     }
    3993                 :             : 
    3994                 :             :   unsigned int execute (function *) final override;
    3995                 :             : 
    3996                 :             : }; // class pass_loop_distribution
    3997                 :             : 
    3998                 :             : unsigned int
    3999                 :      181162 : pass_loop_distribution::execute (function *fun)
    4000                 :             : {
    4001                 :      181162 :   return loop_distribution ().execute (fun);
    4002                 :             : }
    4003                 :             : 
    4004                 :             : } // anon namespace
    4005                 :             : 
    4006                 :             : gimple_opt_pass *
    4007                 :      280455 : make_pass_loop_distribution (gcc::context *ctxt)
    4008                 :             : {
    4009                 :      280455 :   return new pass_loop_distribution (ctxt);
    4010                 :             : }
        

Generated by: LCOV version 2.1-beta

LCOV profile is generated on x86_64 machine using following configure options: configure --disable-bootstrap --enable-coverage=opt --enable-languages=c,c++,fortran,go,jit,lto,rust,m2 --enable-host-shared. GCC test suite is run with the built compiler.