xbestpp tune --annotated-only -- ./my_program xbestpp profile --gpu --kernel="myKernel" -- ./cuda_app Reports: occupancy, global load/store efficiency, bank conflicts. 5.3 Regression testing in CI xbestpp ci --baseline=golden.json --max-regression=0.05 -- ./test_suite Fails if any metric worsens >5%. 6. Configuration File ( xbestpp.toml ) Example:
Function Baseline (ms) Optimized (ms) Speedup matrix_multiply 342.12 189.44 1.81x 5.1 Targeted tuning via annotation Add to your C++ code:
Apply with:
[profiling] events = ["cycles", "cache-misses", "instructions"] duration = 10 # seconds [optimization] max_unroll = 8 allow_fp_contract = true gpu_grid_size = [256, 1, 1]
[output] format = "html" threshold_speedup = 1.10 # only show improvements >10%
[[xbestpp::hot(iterations=1000000)]] void compute() ... Then run:
xbestpp tune --annotated-only -- ./my_program xbestpp profile --gpu --kernel="myKernel" -- ./cuda_app Reports: occupancy, global load/store efficiency, bank conflicts. 5.3 Regression testing in CI xbestpp ci --baseline=golden.json --max-regression=0.05 -- ./test_suite Fails if any metric worsens >5%. 6. Configuration File ( xbestpp.toml ) Example:
Function Baseline (ms) Optimized (ms) Speedup matrix_multiply 342.12 189.44 1.81x 5.1 Targeted tuning via annotation Add to your C++ code: xbestpp
Apply with:
[profiling] events = ["cycles", "cache-misses", "instructions"] duration = 10 # seconds [optimization] max_unroll = 8 allow_fp_contract = true gpu_grid_size = [256, 1, 1] xbestpp tune --annotated-only --
[output] format = "html" threshold_speedup = 1.10 # only show improvements >10% global load/store efficiency
[[xbestpp::hot(iterations=1000000)]] void compute() ... Then run: