![]() |
|
Industrial Linux is dedicated to professional linux system administrators who need a portal site for the best linux news, downloads, documentation, and resources for creating fast, secure, & reliable linux servers. |
  | ||
NewsMailing list archivesGuides
Introduction MLUG presentations
2000-11-11
2000-09-09 rpmfind.net search
RSSLite
A perl module that parses dirty
Get some content from
XMLTree, LLMS
I built the
Lazy LFS Make System pgcc |
I got tired of not knowing how far I could push the -O flag in gcc. Did it stop at -O3, like the doc says, or were even higher settings possible? And what might they do? So I cracked open the source for pgcc, and here's what I found (original post located here). Straight gcc stops at -O3; pgcc goes to -O7. You can specify bigger numbers (like -O9) with no harm or additional effect; it just gives you the highest setting. For pgcc, there seems to be some schizophrenia surrounding the optimization of instruction scheduling. Normal gcc -O2 turns it on; gcc i386 -O2 then turns it off; pgcc i386 -O2 turns it back on, and pgcc i386 -O5 used to turn it on *again*, but it's commented out with a note that it hurts performance. I guess some tests with -fschedule-insn vs. -fno-schedule-insn are in order. The option is documented thusly: "If supported for the target machine, attempt to reorder instructions to eliminate execution stalls due to required data being unavailable. This helps machines that have slow floating point or memory load instructions by allowing other instructions to be issued until the result of the load or floating point instruction is required." pgcc 2.95.2. adds option "sibling_call", which looked interesting. You get it (and unroll_loops) with the max setting, -O7. For pgcc 2.95.2 on an i386 platform, the following levels of optimization give you the indicated '-f' flags. Don't forget that the -On flag is just shorthand for setting these flags individually. You can use an -On flag and augment it with specific -f flags. Note: higher levels build on the earlier levels: -O: defer_pop, thread_jumps, delayed_branch, omit_frame_pointer, opt_reg_use, reduce_index_givs -O2: cse_follow_jumps, cse_skip_blocks, gcse, expensive_optimizations, strength_reduce, rerun_cse_after_loop, rerun_loop_opt, caller_saves, force_mem, regmove, schedule_insns, schedule_insns_after_reload -O3: inline_functions, jump_back, copy_prop, compare_elim, sftwr_pipe, reg_reg_copy_opt, peep_spills, replace_stack_mem, opt_jumps_out, replace_mem, correct_cse_mistakes, push_load_into_loop, replace_reload_regs, sign_extension_elim, lift_stores -O4: swap_for_agi, risc, risc_const, interleave_stack_non_stack, schedule_stack_reg_insns -O5: runtime_lift_stores, omit_frame_pointer -O6: all_mem_givs, do_offload, risc_mem_dest -O7: unroll_loops, sibling_call Here's the relevant source code:
=== toplev.c === if (optimize >= 1) { flag_defer_pop = 1; flag_thread_jumps = 1; #ifdef DELAY_SLOTS flag_delayed_branch = 1; #endif #ifdef CAN_DEBUG_WITHOUT_FP flag_omit_frame_pointer = 1; #endif } if (optimize >= 2) { flag_cse_follow_jumps = 1; flag_cse_skip_blocks = 1; flag_gcse = 1; flag_expensive_optimizations = 1; flag_strength_reduce = 1; flag_rerun_cse_after_loop = 1; flag_rerun_loop_opt = 1; flag_caller_saves = 1; flag_force_mem = 1; #ifdef INSN_SCHEDULING flag_schedule_insns = 1; flag_schedule_insns_after_reload = 1; #endif /* flag_sibling_call = 1;*//*D*/ flag_regmove = 1; } if (optimize >= 3) { flag_inline_functions = 1; } === config/i386/i386.c === optimization_options (level, size) int level; int size ATTRIBUTE_UNUSED; { /* For -O2 and beyond, turn off -fschedule-insns by default. It tends to make the problem with not enough registers even worse. */ #ifdef INSN_SCHEDULING if (level > 1) flag_schedule_insns = 0; #endif optimization_options_intel1(level, size); } ...and... static void optimization_options_intel1 (level, size) int level; int size; { if (level > 0) { flag_opt_reg_use = 2; flag_reduce_index_givs = 2; } if (level >= 2) { flag_schedule_insns = 2; flag_schedule_insns_after_reload = 2; } if (level >= 3) { flag_inline_functions = 2; flag_jump_back = 2; flag_copy_prop = 2; flag_compare_elim = 2; flag_sftwr_pipe = 2; flag_reg_reg_copy_opt = 2; /*flag_opt_reg_stack = 2;*//*D*/ /*flag_loop_after_global = 2;*//*D*/ flag_peep_spills = 2; flag_replace_stack_mem = 2; flag_opt_jumps_out = 2; flag_replace_mem = 2; flag_correct_cse_mistakes = 2; flag_push_load_into_loop = 2; flag_replace_reload_regs = 2; flag_sign_extension_elim = 2; flag_lift_stores = 2; } if (level >= 4) { /*flag_combine_222 = 2;*/ /*D*/ #ifdef INSN_SCHEDULING flag_schedule_insns_after_reload = 2; flag_swap_for_agi = 2; flag_risc = 2; flag_risc_const = 2; /*flag_recombine = 2;*//*D*/ /* ??? actually slows down */ flag_interleave_stack_non_stack = 2; flag_schedule_stack_reg_insns = 2; #endif } if (level >= 5) { flag_runtime_lift_stores = 2; /* big space penalty */ flag_omit_frame_pointer = 2; #ifdef INSN_SCHEDULING /*flag_schedule_insns = 2;*/ /* hurts performance! */ #endif } if (level >= 6) { flag_all_mem_givs = 2; flag_do_offload = 2; flag_risc_mem_dest = 2; } /* if (level >= 7) { flag_unroll_loops = 2; flag_sibling_call = 2; }*/ }
Copyright 2000 by Scott Thomason |