On 2025-07-28 16:30, Vlastimil Babka wrote: > On 7/28/25 07:41, siddhartha@kenip.in wrote: > >> On 2025-07-07 14:26, Vlastimil Babka wrote: >> Hi Lorenzo, Dev, Mel, >> >> I'm following up on this patch submission from earlier this month: >> "[PATCH] mm: limit THP alignment - performance gain observed in AI >> inference workloads." > > I'm confused. That wasn't a patch submission, but reporting performance > results for my patch from late 2024? (and thanks for those!) > > The patch was also already merged in late 2024: > > commit d4148aeab412432bf928f311eca8a2ba52bb05df > Author: Vlastimil Babka > Date: Thu Oct 24 17:12:29 2024 +0200 > > mm, mmap: limit THP alignment of anonymous mappings to PMD-aligned > sizes > > So there's nothing more to do here AFAIK. > Hello Vlastimil, > > Hope you are doing great! > > Sorry about the late reply, my inbox made your email invisible somehow. > > Thank you for the clarification -- yes, I am aware that the mm, mmap: > limit THP alignment of anonymous mappings to PMD-aligned sizes patch > was merged in late 2024 (commit > d4148aeab412432bf928f311eca8a2ba52bb05df). > > The performance results I shared were generated much later because of > my working setup: > > * > > The tests were conducted on Intel Developer Cloud workloads as part of > a broader benchmarking exercise involving OpenVINO-based inference > pipelines. > * > > The specific environment, dataset, and configuration scripts were > stored on an SSD that unfortunately suffered corruption. I am currently > working to recover them so I can share the exact test harness and > commit-specific diffs. If and when I get that access back from Intel > Developer Cloud, I can surely provide all those relevant files. > > Although this is not a new patch submission, I thought the numbers > might still be valuable -- they show notable throughput and latency > changes when aligning the current behavior with OpenVINO's large > contiguous allocation preferences in certain inference scenarios. > > Summary of observed improvements: > > * > > Throughput: +7.3% average increase in model inference throughput on > ResNet-50 with mixed batch sizes (64/128) > * > > Latency: -5.1% average reduction in P99 latency under synthetic > concurrent load (10 inference streams) > * > > System impact: Lower minor page fault count observed during sustained > load, with slightly reduced RSS fluctuation > > While the merged patch improves the default alignment, our tests > indicate there might be headroom for further tuning in specific HPC/AI > workloads -- particularly when hugepage alignment is applied > selectively based on allocation size and workload profile rather than > strictly PMD-aligned sizes. I was also working on specifics and pseudo > diffs from the working Linux code that I can generate to send that > email via git send-email. > > I'd be happy to collaborate on a deeper investigation once I recover > the original scripts -- or I can try to replicate the environment on a > fresh setup and collect new diffs for comparison. > > Best regards, > Siddhartha Sharma