From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8E21DCA0EC4 for ; Mon, 11 Aug 2025 22:15:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7D5A18E0091; Mon, 11 Aug 2025 18:15:07 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 75F2C8E0045; Mon, 11 Aug 2025 18:15:07 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 64E5E8E0091; Mon, 11 Aug 2025 18:15:07 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 4CCFF8E0045 for ; Mon, 11 Aug 2025 18:15:07 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 53CCFB5ABD for ; Mon, 11 Aug 2025 22:15:06 +0000 (UTC) X-FDA: 83765883012.20.87C48AC Received: from techbitestudio.com (techbitestudio.com [75.119.147.106]) by imf04.hostedemail.com (Postfix) with ESMTP id 1F4284000B for ; Mon, 11 Aug 2025 22:15:03 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=kenip.in header.s=mail header.b=ZFFE5jq0; dmarc=pass (policy=none) header.from=kenip.in; spf=pass (imf04.hostedemail.com: domain of siddhartha@kenip.in designates 75.119.147.106 as permitted sender) smtp.mailfrom=siddhartha@kenip.in ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1754950504; a=rsa-sha256; cv=none; b=WuBQ6Q7eA34dcRHnE8mY+f+3k3jeqX2ByJkF4DIFF/y81MI1SZzi7nE4eqOJROPFBzDFc6 UOIXKEJ4RRNF0UPm3nvH57YV4V0zC9fItO8H7KVSxtFb3NUyZIwzZ+sofN0DZFdnriSQ0H tC5oorFQlW+1mrwn767hsEIPo4Vl1hc= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=kenip.in header.s=mail header.b=ZFFE5jq0; dmarc=pass (policy=none) header.from=kenip.in; spf=pass (imf04.hostedemail.com: domain of siddhartha@kenip.in designates 75.119.147.106 as permitted sender) smtp.mailfrom=siddhartha@kenip.in ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1754950504; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=IGo63ECA/gGOkB+h0cgRokxR+zHknMKGQ31ozejwmbg=; b=r8/GwEiotdpnHfvRRWEp/fQIWSloUxRXGlQlA/uKeuf4ea1zRJi5ANkL20WkxRt3kYa+Yu jfcfQi5vVmIP5NxlQMkcQGRwb5F6DFYoG6VP661qvbkn7Ncr23cOk1XBbPWMCCldvWRBJx UTl/Ys2zNmB2AXKw9WlExCxSPQyawn4= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=kenip.in; s=mail; h=Content-Type:Message-ID:Subject:Cc:To:From:Date:MIME-Version: Sender:Reply-To:Content-Transfer-Encoding:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: In-Reply-To:References:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=IGo63ECA/gGOkB+h0cgRokxR+zHknMKGQ31ozejwmbg=; b=ZFFE5jq0is1ZvoEVTHcYc7klJb 6s7HAJpwVk7wk6R+hcz2Zk8V01jgpO9vearVdHyoDPzoLgzT/IY5Ii1O5yhEKnAHyXr7jTxqh2oXP fRqcpWdq3JOw5bwlVoHj26zaYOrUvNucDDcX7zcuD93wbm1QxplSAF20xoo6FNwKIm1E=; Received: from localhost ([127.0.0.1] helo=kenip.in) by techbitestudio.com with esmtpa (Exim 4.93) (envelope-from ) id 1ulanB-0006bA-Vp; Tue, 12 Aug 2025 03:44:58 +0530 MIME-Version: 1.0 Date: Tue, 12 Aug 2025 03:44:57 +0530 From: siddhartha@kenip.in To: Vlastimil Babka Cc: Dev Jain , Lorenzo Stoakes , linux-mm@kvack.org, LKML Subject: =?UTF-8?Q?Re=3A_=5BPATCH=5D_mm=3A_limit_THP_alignment_=E2=80=93_?= =?UTF-8?Q?performance_gain_observed_in_AI_inference_workloads?= Message-ID: X-Sender: siddhartha@kenip.in X-Priority: 1 (Highest) Content-Type: multipart/alternative; boundary="=_8bd5ca14ad3f608e71429883053cf7fa" X-Rspamd-Server: rspam03 X-Rspam-User: X-Rspamd-Queue-Id: 1F4284000B X-Stat-Signature: 8f7u65q9dpgmypcjzoto5rqykzahoow5 X-HE-Tag: 1754950503-443269 X-HE-Meta: U2FsdGVkX1+zT6kQ5NHIYan4P7d7zwTOl8enp/7n/Q4mZPSTeP0s3Cnt7f1ajjUst2CGbTwDRdQgEGmTrAM2WYAzMSSsweUzQa+QiCiXwNN76N2uhx8FCSv0HM0xQFiqGXBVNo0Op96b3T7ZP69mT6aVdjEFJ403ExBg+arpX/MS4FHfa9DGnEz1CUR8tY946P/XQnkltCy8BVC4w71F4lTGhdtoFqKjLXW93qU3U0gmszE47SkQT3UOSLtjLOV/jK7m99djtBHx+/AjFKOGSZ9Rv8wcvUyVUBMxpmJie+6YGt58bJZ2msqSCRpShfEF8FmvXQQz4m8t1ZxZQP5xbLiY1JyXYx+LZtQ962tdTXwt/e7OkHUN/IdGTSoT9E2jrFJ5HxBXKB4mYsgYl3EE6AK62pAdIcQjkOxYxHpeWnpkR6EIcuodVbplRjKK2XPg9Nn8oa7GTsofqE0uHbSbkkEPGx2+AQ9gdwjv1RrIqUyDGu688R1/eRN0PiRAY9NLQGklbKA4ExlMjsfdrTnNiDSguGEpHhIKwCcwpI6Gy25OcWWRuwzspeLBuiReEcAXPid4d906vjyaBymMvx8O2nAB3RpeFL1PMXxlkcvU/HYTflkd5VmxwhV+dmFJIZrepQlflVjLhP0G8Jm/TTCHGN5bSL8JQcdSMyTx3UU/U4LoH+V6SpG6bQOXPOpC/1BbcSjKNP8Bm6ywn4br4Xf3uxUukInFEQV9H4XOarmQUe/2yEglaMWozjsnmU7ExRstSkr8A2KKbybVNkLrMsE+S2vPHTpOz8X7o0oETZ8m9v04dIi0rb6QuokKogAFIq7beoFfKygO/fSjuqMiLXwpgZzjxvycv/lLLNYf0bUCSaD+1WhuCaKSieAqOQze6PGD1HiBvP9Y5IlDfXgE6Hvzf0CayZyapNzc9IlfLki/e2rOnOF8hw5SHMry1EukbcJlfL/BIaabLlwGsaAsVWr 5zKIvf/R nq3P7cM4jvxvxqCma3MRijavKsnvLBDJHnDqjTN7NK874qzz9uOaTaRAX+8Wn/uTbO+W2vQIPRHSAGJqaSm/cqOycJYtp9JeiiePfmv01BegP3JXu1+jK6uY2I5f8uCnWrxFU3pThU1pRtnbNo/ZXO7B9TlAnGMhhl870E5eHPhEgIJaQMImKedB9DXOevGQrqisOjVBFUCxrHcmLbl6MB68i+L2FBzqCKc6LkY9xN22iWUAnSIlLXTQ6xyAQ50PGtMbmmJo1ot/9+4YhGjZlYRLVmoPLL5izjjW7HAK7nsoLULF0xwSKXfDaXtitaK5iZWoP0OKIwrgiFLwqisAc0+fVYi4vXSgVlcg85YBLd2rTcNUpXRgk4xairUz90t35nTnxygxqVIj3wPmI91wi+DRpYgCxUEDyGMHGGQiV8Y1M5W6EpKvhTwOf88tI8k9dc0HbK4u1A9J2E+q7cmd/PzeHES3xFC7hZNEYrXbtu2C4LRoPWfm6D3EgTKEHXfMrFrDAwXklXDsKg0VjoZOIQPWubnPHufJX1MSRalczLM8aiDkL/pAhhnjJNPOMkpZDuBI0OSs4aAlSBFTsjS0SU6J2NVGmZtbi3M7uqcP/N7W7Hls= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: --=_8bd5ca14ad3f608e71429883053cf7fa Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=US-ASCII; format=flowed On 2025-07-28 16:30, Vlastimil Babka wrote: > On 7/28/25 07:41, siddhartha@kenip.in wrote: > >> On 2025-07-07 14:26, Vlastimil Babka wrote: >> Hi Lorenzo, Dev, Mel, >> >> I'm following up on this patch submission from earlier this month: >> "[PATCH] mm: limit THP alignment - performance gain observed in AI >> inference workloads." > > I'm confused. That wasn't a patch submission, but reporting performance > results for my patch from late 2024? (and thanks for those!) > > The patch was also already merged in late 2024: > > commit d4148aeab412432bf928f311eca8a2ba52bb05df > Author: Vlastimil Babka > Date: Thu Oct 24 17:12:29 2024 +0200 > > mm, mmap: limit THP alignment of anonymous mappings to PMD-aligned > sizes > > So there's nothing more to do here AFAIK. > Hello Vlastimil, > > Hope you are doing great! > > Sorry about the late reply, my inbox made your email invisible somehow. > > Thank you for the clarification -- yes, I am aware that the mm, mmap: > limit THP alignment of anonymous mappings to PMD-aligned sizes patch > was merged in late 2024 (commit > d4148aeab412432bf928f311eca8a2ba52bb05df). > > The performance results I shared were generated much later because of > my working setup: > > * > > The tests were conducted on Intel Developer Cloud workloads as part of > a broader benchmarking exercise involving OpenVINO-based inference > pipelines. > * > > The specific environment, dataset, and configuration scripts were > stored on an SSD that unfortunately suffered corruption. I am currently > working to recover them so I can share the exact test harness and > commit-specific diffs. If and when I get that access back from Intel > Developer Cloud, I can surely provide all those relevant files. > > Although this is not a new patch submission, I thought the numbers > might still be valuable -- they show notable throughput and latency > changes when aligning the current behavior with OpenVINO's large > contiguous allocation preferences in certain inference scenarios. > > Summary of observed improvements: > > * > > Throughput: +7.3% average increase in model inference throughput on > ResNet-50 with mixed batch sizes (64/128) > * > > Latency: -5.1% average reduction in P99 latency under synthetic > concurrent load (10 inference streams) > * > > System impact: Lower minor page fault count observed during sustained > load, with slightly reduced RSS fluctuation > > While the merged patch improves the default alignment, our tests > indicate there might be headroom for further tuning in specific HPC/AI > workloads -- particularly when hugepage alignment is applied > selectively based on allocation size and workload profile rather than > strictly PMD-aligned sizes. I was also working on specifics and pseudo > diffs from the working Linux code that I can generate to send that > email via git send-email. > > I'd be happy to collaborate on a deeper investigation once I recover > the original scripts -- or I can try to replicate the environment on a > fresh setup and collect new diffs for comparison. > > Best regards, > Siddhartha Sharma --=_8bd5ca14ad3f608e71429883053cf7fa Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=UTF-8
= On 2025-07-28 16:30, Vlastimil Babka wrote:
On 7/28/25 07:41, siddhartha@kenip.in wrote:
On 2025-07-07 14:26, Vlastimil Babka wrote:
Hi Lo= renzo, Dev, Mel,

I'm following up on this patch submission from = earlier this month:
"[PATCH] mm: limit THP alignment – performan= ce gain observed in AI
inference workloads."

I'm confused. That wasn't a patch submission, but reporting performan= ce
results for my patch from late 2024? (and thanks for those!)
<= br />The patch was also already merged in late 2024:

commit d414= 8aeab412432bf928f311eca8a2ba52bb05df
Author: Vlastimil Babka <vbabka@suse.cz>
Date:   = Thu Oct 24 17:12:29 2024 +0200

    mm, mmap:= limit THP alignment of anonymous mappings to PMD-aligned sizes

=
So there's nothing more to do here AFAIK.

Hello Vlastimil,

Hope you are doing great!

Sorry about the late reply, my inbox= made your email invisible somehow.

Thank you for the clarification &mda= sh; yes, I am aware that the mm, = mmap: limit THP alignment of anonymous mappings to PMD-aligned sizes= patch was merged in late 2024 (commit d4148aeab412432bf928f311eca8a2ba52bb05df).

The performance results I shared wer= e generated much later because of my working setup:

  • The tests were conducted on Intel D= eveloper Cloud workloads as part of a broader benchmarking exercise involvi= ng OpenVINO-based inference pipelines.

  • The specific environment, dataset,= and configuration scripts were stored on an SSD that unfortunately suffere= d corruption. I am currently working to recover them so I can share the exa= ct test harness and commit-specific diffs. If and when I get that access ba= ck from Intel Developer Cloud, I can surely provide all those relevant file= s.

Although this is not a new patch s= ubmission, I thought the numbers might still be valuable — they show = notable throughput and latency changes when aligning the current behavior w= ith OpenVINO’s large contiguous allocation preferences in certain inf= erence scenarios.

Summary of observed improvements:

  • Throughput: +7.3% average increase in model inference = throughput on ResNet-50 with mixed batch sizes (64/128)

  • Latency: -5.1% average reduction in P99 latency under = synthetic concurrent load (10 inference streams)

  • System impact: Lower minor page fault count observed d= uring sustained load, with slightly reduced RSS fluctuation

While the merged patch improves th= e default alignment, our tests indicate there might be headroom for further= tuning in specific HPC/AI workloads — particularly when hugepage ali= gnment is applied selectively based on allocation size and workload profile= rather than strictly PMD-aligned sizes. I was also working on specifics an= d pseudo diffs from the working Linux code that I can generate to send that= email via git send-email.

I’d be happy to collaborate = on a deeper investigation once I recover the original scripts — or I = can try to replicate the environment on a fresh setup and collect new diffs= for comparison.

Best regards,
Siddhartha Sharma

--=_8bd5ca14ad3f608e71429883053cf7fa--