Skip to content
  • Rich Felker's avatar
    optimize hot paths of getc with manual shrink-wrapping · dd8f02b7
    Rich Felker authored
    with these changes, in a program that has not created any threads
    besides the main thread and that has not called f[try]lockfile, getc
    performs indistinguishably from getc_unlocked. this was measured on
    several i386 and x86_64 models, and should hold on other archs too
    simply by the properties of the code generation.
    
    the case where the caller already holds the lock (via flockfile) is
    improved significantly as well (40-60% reduction in time on machines
    tested) and the case where locking is needed is improved somewhat
    (roughly 10%).
    
    the key technique used here is forcing the non-hot path out-of-line
    and enabling it to be a tail call. a static noinline function
    (conditional on __GNUC__) is used rather than the extern hiddens used
    elsewhere for this purpose, so that the compiler can choose
    non-default calling conventions, making it possible to tail-call to a
    callee that takes more arguments than the caller on archs where
    arguments are passed on the stack or must have space reserved on the
    stack for spilling the. the tid could just be reloaded via the thread
    pointer in locking_getc, but that would be ridiculously expensive on
    some archs where thread pointer load requires a trap or syscall.
    dd8f02b7