|
bool | maybe_hot_afdo_count_p (profile_count count) |
static char * | autofdo::get_original_name (const char *name, bool alloc=true) |
static unsigned | autofdo::get_combined_location (location_t loc, tree decl) |
static tree | autofdo::get_function_decl_from_block (tree block) |
static void | autofdo::dump_afdo_loc (FILE *f, unsigned loc) |
static const char * | autofdo::raw_symbol_name (const char *asmname) |
static const char * | autofdo::raw_symbol_name (tree decl) |
static void | autofdo::dump_inline_stack (FILE *f, inline_stack *stack) |
static void | autofdo::get_inline_stack (location_t locus, inline_stack *stack, tree fn=current_function_decl) |
static void | autofdo::get_inline_stack_in_node (location_t locus, inline_stack *stack, cgraph_node *node) |
static unsigned | autofdo::get_relative_location_for_locus (tree fn, tree block, location_t locus) |
static unsigned | autofdo::get_relative_location_for_stmt (tree fn, gimple *stmt) |
static int | autofdo::match_with_target (cgraph_node *n, gimple *stmt, function_instance *inlined_fn, cgraph_node *orig_callee) |
static void | autofdo::dump_stmt (gimple *stmt, count_info *info, function_instance *inlined_fn, inline_stack &stack) |
void | autofdo::mark_expr_locations (function_instance *f, tree t, cgraph_node *node, hash_set< const count_info * > &counts) |
static void | autofdo::walk_block (tree fn, function_instance *s, tree block) |
static void | autofdo::fake_read_autofdo_module_profile () |
static void | autofdo::read_profile (void) |
static bool | autofdo::afdo_indirect_call (gcall *stmt, const icall_target_map &map, bool transform, cgraph_edge *indirect_edge) |
static bool | autofdo::afdo_vpt (gcall *gs, const icall_target_map &map, bool transform, cgraph_edge *indirect_edge) |
static bool | autofdo::is_bb_annotated (const basic_block bb, const bb_set &annotated) |
static void | autofdo::set_bb_annotated (basic_block bb, bb_set *annotated) |
static void | autofdo::update_count_by_afdo_count (profile_count *count, gcov_type c) |
static void | autofdo::update_count_by_afdo_count (profile_count *count, profile_count c) |
static bool | autofdo::afdo_set_bb_count (basic_block bb, hash_set< basic_block > &zero_bbs) |
static void | autofdo::afdo_find_equiv_class (bb_set *annotated_bb) |
static bool | autofdo::afdo_propagate_edge (bool is_succ, bb_set *annotated_bb) |
static void | autofdo::afdo_propagate_circuit (const bb_set &annotated_bb) |
static void | autofdo::afdo_propagate (bb_set *annotated_bb) |
static int | autofdo::cmp (const void *a, const void *b) |
static void | autofdo::add_scale (vec< scale > *scales, profile_count annotated, profile_count orig) |
static void | autofdo::scale_bbs (const vec< basic_block > &bbs, sreal scale) |
void | autofdo::afdo_adjust_guessed_profile (bb_set *annotated_bb) |
static void | autofdo::afdo_calculate_branch_prob (bb_set *annotated_bb) |
static void | autofdo::afdo_annotate_cfg (void) |
static unsigned int | autofdo::auto_profile (void) |
void | read_autofdo_file (void) |
void | end_auto_profile (void) |
bool | afdo_callsite_hot_enough_for_early_inline (struct cgraph_edge *edge) |
bool | afdo_vpt_for_early_inline (cgraph_node *node) |
void | remove_afdo_speculative_target (cgraph_edge *e) |
simple_ipa_opt_pass * | make_pass_ipa_auto_profile (gcc::context *ctxt) |
simple_ipa_opt_pass * | make_pass_ipa_auto_profile_offline (gcc::context *ctxt) |
#define DEFAULT_AUTO_PROFILE_FILE "fbdata.afdo" |
The following routines implements AutoFDO optimization.
This optimization uses sampling profiles to annotate basic block counts
and uses heuristics to estimate branch probabilities.
There are three phases in AutoFDO:
Phase 1: At startup.
Read profile from the profile data file.
The following info is read from the profile datafile:
* string_table: a map between function name and its index.
* autofdo_source_profile: a map from function_instance name to
function_instance. This is represented as a forest of
function_instances.
* WorkingSet: a histogram of how many instructions are covered for a
given percentage of total cycles. This is describing the binary
level information (not source level). This info is used to help
decide if we want aggressive optimizations that could increase
code footprint (e.g. loop unroll etc.)
A function instance is an instance of function that could either be a
standalone symbol, or a clone of a function that is inlined into another
function.
Phase 2: In afdo_offline pass.
Remove function instances from other translation units
and offline all cross-translation unit inlining done during train
run compilation. This is necessary to not lose profiles with
LTO train run.
Phase 3: During early optimization.
AFDO inline + value profile transformation.
This happens during early optimization.
During early inlning AFDO inliner is executed which
uses autofdo_source_profile to find if a callsite is:
* inlined in the profiled binary.
* callee body is hot in the profiling run.
If both condition satisfies, early inline will inline the callsite
regardless of the code growth.
Performing this early has benefit of doing early optimizations
before read IPA passe and getting more "context sensitivity" of
the profile read. Profile of inlined functions may differ
significantly form one inline instance to another and from the
offline version.
This is controlled by -fauto-profile-inlinig and is independent
of -fearly-inlining.
Phase 4: In AFDO pass.
Offline all functions that has been inlined in the
train run but were not inlined in early inlining nor AFDO
inline.
Phase 5: In AFDO pass.
Annotate control flow graph.
* Annotate basic block count
* Estimate branch probability
* Use earlier static profile to fill in the gaps
if AFDO profile is ambigous
After the above 5 phases, all profile is readily annotated on the GCC IR.
AutoFDO tries to reuse all FDO infrastructure as much as possible to make
use of the profile. E.g. it uses existing mechanism to calculate the basic
block/edge frequency, as well as the cgraph node/edge count.
Referenced by read_autofdo_file().
Do indirect call promotion during early inlining to make the
IR match the profiled binary before actual annotation.
This is needed because an indirect call might have been promoted
and inlined in the profiled binary. If we do not promote and
inline these indirect calls before annotation, the profile for
these promoted functions will be lost.
e.g. foo() --indirect_call--> bar()
In profiled binary, the callsite is promoted and inlined, making
the profile look like:
foo: {
loc_foo_1: count_1
bar@loc_foo_2: {
loc_bar_1: count_2
loc_bar_2: count_3
}
}
Before AutoFDO pass, loc_foo_2 is not promoted thus not inlined.
If we perform annotation on it, the profile inside bar@loc_foo2
will be wasted.
To avoid this, we promote loc_foo_2 and inline the promoted bar
function before annotation, so the profile inside bar@loc_foo2
will be useful.
References autofdo::afdo_source_profile, autofdo::afdo_vpt(), changed, autofdo::count_info::count, symtab_node::decl, gimple_bb(), gsi_end_p(), gsi_next(), gsi_start_bb(), gsi_stmt(), cgraph_node::indirect_calls, cgraph_node::inlined_to, MAX, NULL, and autofdo::count_info::targets.
Referenced by inline_functions_by_afdo().