Introduction
HammerDB is the industry-leading benchmarking and load testing software that is used to characterize the performance of a database. When used with MySQL, together, they exercise a large portion of the Linux kernel. When a performance regression is observed in HammerDB with MySQL with the latest upstream kernel release, identifying the exact commit that is responsible for this may be challenging. In this blog, I demonstrate an approach of tackling this using Claude Code and skills. Skills are reusable, task-focused capabilities that let AI agents reliably perform specific jobs in a repeatable way. Recently, after the release of v6.17, a large regression was observed for HammerDB with MySQL against v6.12, an LTS kernel release. Traditional profiling using perf stat indicated an increase in front-end stalls per transaction on v6.17 when compared to v6.12. The challenge was to identify an upstream change that could explain this observed behavior. Manually reviewing every commit between these two versions is impossible.
Solution
This approach uses two inputs:
- A workload profile consisting of kernel functions that the workload executes while executing in a steady state.
- A git log that focuses on commits between selected tags that modify the files associated with the kernel functions in the workload profile
Claude Code then produces a ranked list of commits most likely to affect performance of this workload. At a high level, the loop looks like this:
- Profile the workload to identify hot kernel functions and the files they belong to
- Use a Claude Code skill to list relevant upstream commits between upstream tags for those paths
- Ask Claude Code to rank commits that affect performance with additional profiling information
- Repeat between next pair of upstream tags
- Validate by reverting/cherry-picking the commit and building a kernel
Let's walk through what this looks like for HammerDB with MySQL with a Claude Code skill.
Step 1: Profile HammerDB and MySQL to identify hot kernel functions
This workload profile will help filter commits between two different upstream tags as the commits I am interested in are most likely affecting kernel functions that the workload is executing. For a large workload like HammerDB with MySQL, choose a steady state interval to profile using perf record and kernel cycles. Parse the resulting perf report to identify hot kernel functions. The output of this step is a simple CSV containing:
- Kernel function name
- Sample count
- Source file path
Example kernel_func.csv file:
function,samples,sys_percent,file_path
cpuidle_enter_state,5381,25.72,drivers/cpuidle/coupled.c
tick_nohz_idle_exit,615,2.94,kernel/sched/idle.c
__wake_up_common_lock,553,2.64,kernel/sched/wait.c
task_sched_runtime,324,1.55,kernel/sched/core.c
mutex_lock,145,0.69,drivers/dma/acpi-dma.c
mutex_unlock,144,0.69,drivers/dma/acpi-dma.c
Step 2: Use a Claude Code skill to list relevant upstream commits between v6.12 and v6.13 for those paths
Instead of listing all 72,482 commits between v6.12 and v6.17, the target upstream tags, a chunking approach is used to keep Claude Code focused and prevent context poisoning. A skill was defined to extract all upstream commits between two tags. Create a skill using the skill builder or manually. Following are the steps to add a skill titled find-perf-commits manually:
- Create a folder under ~/.claude/skills/ with the name of the skill.
- Add a SKILL.md file with a short description and other relevant information that allows Claude Code to invoke the skill and effectively use it.
- Optionally, add any scripts under ~/.claude/skills/find-perf-commits/scripts/ as references. I had a basic python script that used git log and regex to retrieve and clean commit messages.
Example find-perf-commits SKILL.md:
---
name: find-perf-commits
description: Use this skill when you need to find Linux kernel commits between two version tags that touch source files associated with a workload profile. Invoke this during performance regression investigations to narrow a large commit range down to candidates relevant to the workload.
---
## Purpose
Filter Linux upstream kernel commits between two tags to those touching source files associated with a given workload profile. Used to narrow down performance regressions between kernel versions without reviewing every commit manually.
## Inputs
- `tag1`: start tag (e.g., `6.12`)
- `tag2`: end tag (e.g., `6.13`)
- `input_csv`: path to a CSV file listing hot kernel functions from a perf profile
### Expected CSV format
```
function,samples,sys_percent,file_path
cpuidle_enter_state,5381,25.72,drivers/cpuidle/coupled.c
tick_nohz_idle_exit,615,2.94,kernel/sched/idle.c
```
## Running the script
Run the script using its full path:
```
python3 ~/.claude/skills/find-perf-commits/scripts/generate_git_log.py v6.12 v6.13 ~/results/hammerdb/kernel_func.csv
```
The script expects the Linux upstream kernel repository at `~/linux/`. If it is elsewhere, pass the repo path as a fourth argument:
```
python3 ~/.claude/skills/find-perf-commits/scripts/generate_git_log.py v6.12 v6.13 ~/results/hammerdb/kernel_func.csv ~/src/linux
```
## Output
The script writes `git_log_tag1_tag2.txt` to the same directory as `input_csv`.
After the script completes:
1. Read the output file into the current context.
2. Present the commit list to the user.
3. Ask the user for any additional profiling context (e.g., perf metrics, front-end stall counts) to help rank commits by likelihood of causing a performance regression.
## Error handling
- If the repository path is wrong, the script will report a `git` error — ask the user to confirm the repo path.
- If no commits are found, the `file_path` values in the CSV may not match the kernel source tree layout — ask the user to verify them.
Step 3: Ask Claude Code to rank commits that affect performance with additional front-end stalls metric
In this case, the commits mostly included changes made to the kernel scheduler (/sched). This makes sense as a scheduler related change could potentially introduce front-end stalls.
Example prompt: It was observed that HammerDB with MySQL Database produced a high number of frontend stalls on 6.17 when compared to 6.12 per transaction. Use this information to re-check the commits list. Which commits may lead to an increase in frontend stalls?
Step 4: Repeat between v6.13 and v6.14, v6.14 and v6.15...
Repeating steps 2 and 3 for all upstream tags in the target tag window produces a ranked list of commits between each tag. Next, I asked Claude Code to produce a super ranking. I provided additional context to Claude Code by asking it to run git show against all the short-listed commits between all tags. This adds the code changes affected by this commit into the context. An example commit in this super ranking was:
155213a — sched/fair: Bump sd->max_newidle_lb_cost when newidle balance fails
This commit was introduced in the v6.17 development window. Example ranking:
- 155213a2aed4 sched/fair: Bump sd->max_newidle_lb_cost when newidle balance fails
- c1f43c342e1f sched/fair: Fix sched_can_stop_tick() for fair tasks
- e932c4ab38f0 sched/core: Prevent wakeup of ksoftirqd during idle load balance
- dcbb598e689e block: blk-mq: fix uninit-value in blk_rq_prep_clone and refactor
- c9a3019603b8 fs/file.c: conditionally clear full_fds
Step 5: Validate by reverting 155213a in v6.17 and building a kernel
The commit 155213a backs off newidle balancing by increasing its cost when it fails. The arguments for this patch are that when newidle is unsuccessful, continued scanning is of little benefit. Low latency workloads like schbench benefit from aggressive backoff. The arguments against this patch are that the backoff is too aggressive and it can lead to under-utilization of CPUs. This commit was tested early as it appeared in the super ranking and additionally, upstream had discovered regressions in multiple workloads due to this commit. A v6.17 kernel built with this commit reverted restored HammerDB performance to the v6.12 level.
Limitations
This approach uses Claude Code, a generative AI tool. Therefore, it has its limitations. The rankings are probabilistic as output from Claude Code is probabilistic. There is a possibility that a commit that affects the performance of your workload is missed. Additionally, identifying upstream changes to attribute to a performance issue is non-trivial. There could be multiple changes that are together responsible for the observed performance issue. Commit messages may also not tell the whole story. Kernel mailing lists often discuss each patch set in detail and provide supporting testing data to the developers as feedback. This would not be captured in the final commit message.
Conclusion
This approach helped pick a high-value first revert to test, reducing the overall time-to-answer. Claude Code successfully helps narrow down a large candidate space of commits potentially affecting performance. To recap, if you are trying to evaluate upstream changes that may affect the performance of your workload, capture (1) a workload profile, (2) a candidate commit list, and (3) then validate the results.