On 2025-07-28 16:30, Vlastimil Babka wrote:

> On 7/28/25 07:41, siddhartha@kenip.in wrote:
> 
>> On 2025-07-07 14:26, Vlastimil Babka wrote:
>> Hi Lorenzo, Dev, Mel,
>> 
>> I'm following up on this patch submission from earlier this month:
>> "[PATCH] mm: limit THP alignment - performance gain observed in AI
>> inference workloads."
> 
> I'm confused. That wasn't a patch submission, but reporting performance
> results for my patch from late 2024? (and thanks for those!)
> 
> The patch was also already merged in late 2024:
> 
> commit d4148aeab412432bf928f311eca8a2ba52bb05df
> Author: Vlastimil Babka <vbabka@suse.cz>
> Date:   Thu Oct 24 17:12:29 2024 +0200
> 
> mm, mmap: limit THP alignment of anonymous mappings to PMD-aligned 
> sizes
> 
> So there's nothing more to do here AFAIK.

> Hello Vlastimil,
> 
> Hope you are doing great!
> 
> Sorry about the late reply, my inbox made your email invisible somehow.
> 
> Thank you for the clarification -- yes, I am aware that the mm, mmap: 
> limit THP alignment of anonymous mappings to PMD-aligned sizes patch 
> was merged in late 2024 (commit 
> d4148aeab412432bf928f311eca8a2ba52bb05df).
> 
> The performance results I shared were generated much later because of 
> my working setup:
> 
> *
> 
> The tests were conducted on Intel Developer Cloud workloads as part of 
> a broader benchmarking exercise involving OpenVINO-based inference 
> pipelines.
> *
> 
> The specific environment, dataset, and configuration scripts were 
> stored on an SSD that unfortunately suffered corruption. I am currently 
> working to recover them so I can share the exact test harness and 
> commit-specific diffs. If and when I get that access back from Intel 
> Developer Cloud, I can surely provide all those relevant files.
> 
> Although this is not a new patch submission, I thought the numbers 
> might still be valuable -- they show notable throughput and latency 
> changes when aligning the current behavior with OpenVINO's large 
> contiguous allocation preferences in certain inference scenarios.
> 
> Summary of observed improvements:
> 
> *
> 
> Throughput: +7.3% average increase in model inference throughput on 
> ResNet-50 with mixed batch sizes (64/128)
> *
> 
> Latency: -5.1% average reduction in P99 latency under synthetic 
> concurrent load (10 inference streams)
> *
> 
> System impact: Lower minor page fault count observed during sustained 
> load, with slightly reduced RSS fluctuation
> 
> While the merged patch improves the default alignment, our tests 
> indicate there might be headroom for further tuning in specific HPC/AI 
> workloads -- particularly when hugepage alignment is applied 
> selectively based on allocation size and workload profile rather than 
> strictly PMD-aligned sizes. I was also working on specifics and pseudo 
> diffs from the working Linux code that I can generate to send that 
> email via git send-email.
> 
> I'd be happy to collaborate on a deeper investigation once I recover 
> the original scripts -- or I can try to replicate the environment on a 
> fresh setup and collect new diffs for comparison.
> 
> Best regards,
> Siddhartha Sharma