Skip to content

timocafe/poly

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

poly

Let's play with 236 exp(x) implementation. This work is the code of my exponential paper, how make an exponential faster than vendor exp(x). All details are in this paper. The idea is to perform factorization for the polynomial evaluation of the exp(x) function. For each factor choose a method evaluation and bench. As processors are out of order, you never know what will be the results, surprise!

At least it works on X86, I did not tested since a long time on Power because I do not have any machine available.

Minimum Recquirements:

  • GCC > 4.9 (primary compiler because inlining is better)
  • Intel Compiler (to compare to vendor implementation)
  • Power/X86 system
  • linux system
  • cmake > 2.9
  • machine that understand x86/ppc ASM and the inline GCC mode

Arborescence:

  poly -- bench (contains the benchmarks for latency/throughtput/ulp + header for the timer library)
          -- latency (latency benchmark)
          -- throughtput (throughtput benchmark)
          -- lib (contains implementation of exp, scalar/vector version)
             -- exp
                -- scalar (implementation of the exp scalar version) 
                -- vector (implementation of the exp vector version)          
             -- poly
                -- scalar (implementation of the polynomial evaluation scalar version) 
                -- vector (implementation of the polynomial evaluation vector version)          
             -- tool
                -- scalar(implementation of 2^k and the branching part for the scalar version)
                -- vector(implementation of 2^k and the branching part for the vector version)            
          -- ulp (ulp benchmark)
       -- cyme (DSL for the vectorial version)
       -- dot (contains ASM - DAG graphiz format/ATT)
       -- llc (tiny library to measure the throughput, read hardware counter)  
       -- poly (contains the program that generate all variations of the exp implementation for poly/lib directory)

Compilation

  mkdir b
  cd b
  cmake ..
  make // can be long > 3000 files to compile

Modification

  ccmake .
  POLY_CYME buidl the vectorial version using cyme DSL (ON DEFAULT)
  POLY_BENCH build the benchmark throughput/ulp/latency (ON DEFAULT)
  POLY_TEST build the test (ON DEFAULT)
  CMAKE_BUILD_TYPE DEBUG (default) / RELEASE (mandatory for the perf)

Run (all)

  run.sh b exp > out // b for the build directory and exp for the results

Run (by hand)

All numbers are fictives

Latency

   ./b/bench/latency/scalar_vector_latency_ed10 poly // run ed10 for the polynomial scalar and vector version
   ./b/bench/latency/scalar_vector_latency_ed10 exp  // run ed10 for the polynomial scalar and vector version
   ./b/bench/latency/scalar_vector_latency_ed10 tool  // run the 2^k and the boundary
   [ewart@super_machine b]$ ./b/bench/latency/scalar_vector_latency_ed10 poly
   scalar::poly 35.0671
   vector::poly 30.2791
   [ewart@super_machine b]$ ./b/bench/latency/scalar_vector_latency_ed10 exp
   scalar::exp 75.0317
   vector::exp 61.4446
   [ewart@super_machine b]$ ./b/bench/latency/scalar_vector_latency_ed10 tool
   scalar::twok 36.006
   scalar::boundary 26.0249
   vector::twok 31.0046 // 2^k
   vector::boundary 20.96371 // the boundary condition
   [ewart@super_machine b]$ ./b/bench/latency/scalar_vector_latency_ed10 vendor
   imf 76.9489 // scalar version of intel
   svml4d 75.166 //vec version of intel

ULP

   ./b/bench/ulp/exp/exp_scalar_ulp_ed10
   3 // the ulp is 3 compare to std::exp (IEEE std)

Throughput

   ./bench/throughput/exp/exp_scalar_throughput_ed10
   4.63

to postprocess (all)

  perl pp_exp.pl out > out.hmtl

For the story: The directory poly/lib/scalar and poly/lib/vector contain the implementation of every exp(x). The generation of all theses files is performed with main.cpp of the lib directory.

If your machine is super experimental you may switch off (POLY_CYME,POLY_BENCH,POLY_TEST). Then you will a get a library for every implementation of the exp, free to you to work a simple benchmark with it.

About

Have fun with polynomial evaluation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages