From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 06131C77B7F for ; Fri, 27 Jun 2025 10:39:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6772D6B00CE; Fri, 27 Jun 2025 06:39:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 626D26B00D0; Fri, 27 Jun 2025 06:39:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 53D7A6B00D1; Fri, 27 Jun 2025 06:39:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 407546B00CE for ; Fri, 27 Jun 2025 06:39:21 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id D1ACF5892D for ; Fri, 27 Jun 2025 10:39:20 +0000 (UTC) X-FDA: 83600833680.08.7D9C5BD Received: from techbitestudio.com (techbitestudio.com [75.119.147.106]) by imf22.hostedemail.com (Postfix) with ESMTP id BD104C0017 for ; Fri, 27 Jun 2025 10:39:18 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=kenip.in header.s=mail header.b=cNiVjLij; spf=pass (imf22.hostedemail.com: domain of siddhartha@kenip.in designates 75.119.147.106 as permitted sender) smtp.mailfrom=siddhartha@kenip.in; dmarc=pass (policy=none) header.from=kenip.in ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1751020759; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=0akcOW2gYFSf51i2ugkmfmThX8pSKjWE5TDslO2xpjk=; b=H8ISDl+RjAySlqTKlPUHoBZI0LNgw/K1ILTpaFYQOeYnkunW+zOdojGGAbq6GJBZfopSCn rsWCWRox/7uXMeoq5vkckih96nO9XgB3aK68mL3cGm5JmmPtvZVi2f9yiNYyvCwQcgA+Th UQEH4dg9Yu5K0HE5SG4Olwv4DGB9xGA= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=kenip.in header.s=mail header.b=cNiVjLij; spf=pass (imf22.hostedemail.com: domain of siddhartha@kenip.in designates 75.119.147.106 as permitted sender) smtp.mailfrom=siddhartha@kenip.in; dmarc=pass (policy=none) header.from=kenip.in ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1751020759; a=rsa-sha256; cv=none; b=vJgEB0vEHfoQHqnzTYFjbBsXaTiTDmeW0YaoiDIdqMG43ReUQLgDttzlsSvGkykfYfm6w6 KjEgc/2ueBmH0V7f2Wyj6prZ2B8j+OEEXA1Cs5o68YlzsVdvq02nlRQq8Gr4oEhObVfn5g xk8ITlaZk1c1lNu+A2Us+1mT8EDx1xg= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=kenip.in; s=mail; h=Content-Transfer-Encoding:Content-Type:Message-ID:Subject:Cc:To: From:Date:MIME-Version:Sender:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: In-Reply-To:References:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=0akcOW2gYFSf51i2ugkmfmThX8pSKjWE5TDslO2xpjk=; b=cNiVjLijduAidhx/XhScghm16b hUP6QpTC64zS4Uf6AW8dMRHBDD+e0c5S+VPUrISes0cV9Xs5Or6308ihE7MloUoUWfu0cYEK2oyvu KbdrWUFO4m6smJ6N9LKZkOwQjB7OyZ0MpJnG4UM0DZRqrTSC+M/xFF8PClKNVweKb80Y=; Received: from localhost ([127.0.0.1] helo=kenip.in) by techbitestudio.com with esmtpa (Exim 4.93) (envelope-from ) id 1uV6UG-0003e6-Hr; Fri, 27 Jun 2025 16:09:16 +0530 MIME-Version: 1.0 Date: Fri, 27 Jun 2025 16:09:16 +0530 From: siddhartha@kenip.in To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, mgorman@suse.de Subject: =?UTF-8?Q?=5BPATCH=5D_mm=3A_limit_THP_alignment_=E2=80=93_perfor?= =?UTF-8?Q?mance_gain_observed_in_AI_inference_workloads?= Message-ID: <28338f055b3c9afa8b69ff6f05ea20ed@kenip.in> X-Sender: siddhartha@kenip.in X-Priority: 1 (Highest) Content-Type: text/plain; charset=US-ASCII; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: BD104C0017 X-Stat-Signature: ekmft761coocz5qhzm1qpu1ix95yazw8 X-Rspam-User: X-HE-Tag: 1751020758-844664 X-HE-Meta: U2FsdGVkX18hQEaCNAL0hLN8UuKFvoODK9Px31npQMCC2I0W5ptXkz30IFkaoH9Ukvj8Ex9dbwoFW0NCNAx64rZcaCMVQHAgFWmWBXc1VTA8iVhtQF5ur7yQ9LPb4PiS21oEpE8K+ZRWWLtVb1wgrSNVO84LZzNXtRy5d7RoS3MC4akNLRf1pwhxaESAetYPUpC+6iNqwbWYmhELp61tRB3eERNO/YjIEhlR0J+dfs3EmzJTFNr0Bc7nQCznBXqFE0CzJ3FtAFHC8bn3X42LZzcZ3ffAmV83o8FOd87jtOi4akqNSGHwk6g3lemhc0C71lzXsaXWDJoUIO8rO1RUtPTjCikHPuhe92Qbysk74On9UyuIVFa5S+qccavDB/NLG+ediBJ46LRG1uL4YZ3ctrIDmanH0Zz/4nvkuNcY+ge78U8i6Y1fcJgnr5SmTH8AXGuwA/07W21scgjyhAEYqK4IouQB8sIwWgk7JRy06mf9PhIU7SxA0p2XBe/AXX8LZzydglUX008sBfNZHEUMWOgCpn4doGlAp4TxFYawNlYWxD5MiUGepjNEjXQl7933vjIYbjuCdPvzpIEmJ6toPTnGkYSNJfL4iW4FF7f0kTOZRunCCMAhTw/i5HZK2wodb43rL0Qfs3JfzOMxlhvCEI5hR1nxwE0hE2VTrwqD2PXFkvSYxFlMU17MoAmdKZxc53hDt/wTJwuc5mQ7Fz0Be1VwzgY+HhS1LoP+WDEg7KbRJQ0/BY9ihDkhf43pk8VdsCgO5DWOoMrKVapoTeumnsYopeztGQ7E3D7wrd8Bn4+BmXPCLDYGUh8QThBURApQcRi0WhotJSO6yiRThZ22Cll0awpLTyv4A1+ry4m270UJqOovAjgoN2C4H44OQhEFbWelLmF6UZnyM4wKut0634MiyJcIocEtLeeZjojKAsUsyNxr8cEPIOzp84ABCuzi7AteDji+P2CuhujTRZF ck0+EcuY Z/bcZsQ1xvPDqju+oWHPbS6rJ8wlSI8l9QUhoYbLeG7ObTQMvFFDNsLRyynhARh6GfPeF1u0J0p6lufuikRomuy3peRUnORw0tXjisFFQbgMB7YAY1LY/RuTh3e6DTOPuZIsxMMSajwtAdUg+7kXm30R2EuIM35IBKptgfvzBM7CO9RiP8U2oVFIWWqLT+OZ6XHAuPId99ausBZlStigxvzpgnTayLjqRCHVYIIj9kpqeEcG5pJKOwI1Jr1Pmeqp7YcxaFpT652ELrrSS1j9Bl6HQ43RRQ9J8nFlJTyJ4MkRtZaPYKtTpPEsOqc0pskLRvcQ708R1Fr/dq32Zedg4VIbZTGwPRo0MtDjo6axzSHqPKMwP4xUCOmHMH17hYSyMvDa7IDwoPTQhO8klh3gNUqXLoPEorGiAxpM1NiGEquzrg4Bej34fk/q0E/KwZggH7TU3R8U3OVo/aq994sKcFz5LqloI18sDK+JnY3B8z1khcGzVofgFIQPffCvuZfE+TCs0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.027485, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi all, I wanted to share validation data from a Hugging Face-based AI inferencing workload, which was significantly impacted by the THP alignment logic introduced in commit efa7df3e3bb5. Using transformer models with dynamic input lengths on Intel Xeon (Cooper Lake), we observed up to a 3200% throughput improvement after applying the patch from Oct 2024: mm: limit THP alignment of anonymous mappings to PMD-aligned sizes Metrics: - Model: BERT-base - Inference engine: Transformers + ONNX Runtime - Kernel: 6.6 vs patched 6.6.8 - Batch size: 8-32, input length: 64-512 tokens - Metric: inference throughput (samples/sec) Thanks for the fix -- this change had real impact on a production-relevant workload. Best Regards, Siddhartha Sharma ISV @ Kenip Solution Link: https://www.intel.com/content/www/us/en/partner/showcase/offering/a5bHo00000045YUIAY/deadlock-clearance.html