From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E49BCC3ABAA for ; Fri, 2 May 2025 20:32:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3B08F6B0083; Fri, 2 May 2025 16:32:48 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 386A66B008C; Fri, 2 May 2025 16:32:48 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 227D46B0092; Fri, 2 May 2025 16:32:48 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id F2B336B0083 for ; Fri, 2 May 2025 16:32:47 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 72ED25B45C for ; Fri, 2 May 2025 20:32:49 +0000 (UTC) X-FDA: 83399116458.10.084E4AC Received: from smtp-relay-internal-0.canonical.com (smtp-relay-internal-0.canonical.com [185.125.188.122]) by imf09.hostedemail.com (Postfix) with ESMTP id D3CA914000D for ; Fri, 2 May 2025 20:32:46 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=canonical.com header.s=20210705 header.b=a2Uubvf1; dmarc=pass (policy=none) header.from=canonical.com; spf=pass (imf09.hostedemail.com: domain of mitchell.augustin@canonical.com designates 185.125.188.122 as permitted sender) smtp.mailfrom=mitchell.augustin@canonical.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1746217967; a=rsa-sha256; cv=none; b=twefkcbhpHsE5JbdaljyUFWJLttX25zTCcgB7xb42d1qHRZ+HJ8hT0NrBlM2GbnycaP2yw le9BRx5ZX7cYvikf3gExgK1wfzrnU8a4xPJm0ISUZl07OvL5Hlu57ZkHsbQgAo1yL1RtrT xuFObhe9hmHcvOoIwq+kS6e7vjhDbk0= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=canonical.com header.s=20210705 header.b=a2Uubvf1; dmarc=pass (policy=none) header.from=canonical.com; spf=pass (imf09.hostedemail.com: domain of mitchell.augustin@canonical.com designates 185.125.188.122 as permitted sender) smtp.mailfrom=mitchell.augustin@canonical.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1746217967; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=VAIFhBNQ2ephxIQJqgSXawxiNxi3ZjGx1AXSf9jW9fQ=; b=uFv7mu1tEDnFizpoWqEIz4lw/sBccl6jHmGV/liOJ0cg2GKDB/zuV6wKVOpsP+aW9y7HX9 7dqYplmkBv5Ig7HcDqZTlooEpA/aNlFERcMGftgHz6bDU9y7V8RlSPPd3k2yrKqCzMRX6Y WIhFXqCzjZmcFq6S1Noek3qNRSKq69c= Received: from mail-ed1-f72.google.com (mail-ed1-f72.google.com [209.85.208.72]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-relay-internal-0.canonical.com (Postfix) with ESMTPS id 3C80A40659 for ; Fri, 2 May 2025 20:32:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=canonical.com; s=20210705; t=1746217961; bh=VAIFhBNQ2ephxIQJqgSXawxiNxi3ZjGx1AXSf9jW9fQ=; h=MIME-Version:References:In-Reply-To:From:Date:Message-ID:Subject: To:Cc:Content-Type; b=a2Uubvf1zFOdVWzPRQrcxFWtqb1sgxk8KGWIX5ur3o1UARyMOT9qnk4QNeZJR5BS4 JHUdop7mIlcuwHgI2JNU1hCYN0rHifU3O5atU3CPtJs1x06hfqy9haOS/C7fj59vOk CouAhrg9DzQPIJEv91lsYTCYrd6cBlEUvgG0TJZg7XPfjt0JG+re3uOU5pcQBpJslP LKpD7vP3GG09XIxMmTTveoWWgcyJ+BRdlc0UObpSFNguCFBF6pia+hw32cotsRrHTG 94UH07WiK7evE4fIq+BYJti1ao5IuPpEv0bVg6GsXWZyp3AxFrmG1grq1GE+3UZ/Ud 6wka2nXelUefw== Received: by mail-ed1-f72.google.com with SMTP id 4fb4d7f45d1cf-5e68e9d9270so2040820a12.0 for ; Fri, 02 May 2025 13:32:41 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1746217960; x=1746822760; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=VAIFhBNQ2ephxIQJqgSXawxiNxi3ZjGx1AXSf9jW9fQ=; b=HXcWMnSuBH91yrfZUkiF2XN3+wQwkZIzMYwlc2rGuyvXAlpbTH4ntUQ9UuXFmfCD73 W7rourtuWA8+Vae2UEU8aqSJmLq4+1E8Kg+K/CwFyh/d7y9NUiIeGHJqSEqvwo++RdN1 0QG8CnrIKxtsMT0PlniQ6XRgjDbt457bOHoQnxuaTlBnvch8ZqEtkZDSrZHzCiKyw65q 1BG6dDHQkQwA8HAEVztlrKjwC102PcauYDkpGxryrKgoHN1INAglndNrmg6GACpAxDMy veD95K/PhpHRChvzua5Z5WZd/hr9BOnFZHpM2Ueh4Ju/mfue1a1icz6COXnJwxtHnHwz CSOw== X-Forwarded-Encrypted: i=1; AJvYcCWxZOtOiUxnw2qucEVEteflcMRGtakQISk4F33mEdrUTpz404+ziYTGy11uunuqm+LyAvx3f/e+Xg==@kvack.org X-Gm-Message-State: AOJu0YxVmHKd+EE6ifAkOPuvs7Be1Zj2c5TjXXO3YwmpobHjNEJnxZqA AY8bYz0z575tuAyzQdlsrygKyrbR/9/t9f7gGoqNW6QdubMwNWSwTA/Mv3HSFjxF+lDxz2uuUwV q5LhCXPbT5syA+KCTUeSdLjZWIjYi7G/wvjjYLfrV1bW0Wf7z62tCDwbXAgL44lLtn2aIG/PHcK QjOIcrwTR0iBxuFOy9Zxoa9xtA0mjMv+/38P3R0nA= X-Gm-Gg: ASbGnctvxjPYvXTdtqHftkmlpLcJwmTuhHxnhEzc/Q6YLOUoSMx4LzkLgvuMmOZlHmu 35WXnLUizasKKhsj4rSxhw4FlOU16SOSkMv2m+Iadmdw6m6RdMkqvgOkAKjxSKFklYyZi/4k2js PkZAFeMqL3BCuO1DL6sUbchkSFxnc= X-Received: by 2002:a05:6402:50c8:b0:5f3:7f49:a4b6 with SMTP id 4fb4d7f45d1cf-5fa788ed135mr3691003a12.23.1746217959393; Fri, 02 May 2025 13:32:39 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFBCWfPjsXlKE0ZFrwUwu1eJRLkyzd+WuRjovWm7XOrr/qM/h5+ChAKTQzSWB7IrhZgqDKiwjOJV81V6eEpt6s= X-Received: by 2002:a05:6402:50c8:b0:5f3:7f49:a4b6 with SMTP id 4fb4d7f45d1cf-5fa788ed135mr3690943a12.23.1746217958837; Fri, 02 May 2025 13:32:38 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Mitchell Augustin Date: Fri, 2 May 2025 15:32:27 -0500 X-Gm-Features: ATxdqUHCaw1ZX6H-rO-vHRekC14OqiSywTK0ZpWL0536pWRriG9YXBG6uCSjhUg Message-ID: Subject: Re: [PATCH v2 00/17] khugepaged: Asynchronous mTHP collapse To: Nico Pache Cc: akpm@linux-foundation.org, 20250211152341.3431089327c5e0ec6ba6064d@linux-foundation.org, 21cnbao@gmail.com, aneesh.kumar@kernel.org, anshuman.khandual@arm.com, apopple@nvidia.com, baohua@kernel.org, catalin.marinas@arm.com, cl@gentwo.org, dave.hansen@linux.intel.com, david@redhat.com, dev.jain@arm.com, haowenchao22@gmail.com, hughd@google.com, ioworker0@gmail.com, jack@suse.cz, jglisse@google.com, John Hubbard , kirill.shutemov@linux.intel.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, mhocko@suse.com, Peter Xu , ryan.roberts@arm.com, srivatsa@csail.mit.edu, surenb@google.com, vbabka@suse.cz, vishal.moola@gmail.com, wangkefeng.wang@huawei.com, will@kernel.org, willy@infradead.org, yang@os.amperecomputing.com, zhengqi.arch@bytedance.com, Zi Yan , zokeefe@google.com, Jacob Martin , =?UTF-8?Q?Vanda_Hendrychov=C3=A1?= Content-Type: multipart/alternative; boundary="000000000000232bc706342d0db5" X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: D3CA914000D X-Stat-Signature: tugzgzbcpqf37cycocyf3g1swzhzur8i X-Rspam-User: X-HE-Tag: 1746217966-194024 X-HE-Meta: U2FsdGVkX1/2jYiNpcTjgRJ+uGVqdbYUsUbatAZtyN5pIUb1WjIlsqVQ8fjuQ5AyZWVR+MHEPf3Vm/m9/JEet0rKJjSlvW39fNMdpmurvU03e3SObzozvDoia6G7k4ywJQkWhcyMyzTBlgt4U1r/59PWJSr5TOSLx925Dstk4MRHWRAxhK9Dpt8LQWBnwlm8UITmSoW3UQZHJ4MCcxHYqmbFqI7uX7oEPFZigSE1CiEs4LeL8BQ0KKA4/KIfp6SVkXrwf4rboS4FKQ0B+F7ntPv8M/PHvtthZtSPAn7YE0hANpt4+B4faBUevhOiDb5mg2SqaUTz+XCHy3pTHJpRHFK56Nnl3AFkKwewtmg7CgZD2/Lk2nFhubu1Q2Qtn61ElEAx7XYzQs12kkX8PQS46VUvMtiov5/xhSUhtD9nne5m0wLmDVYeKx6UGwmPsIUuCwrdFG2pcTpdONYpmOwQeDWyrz62yJSxegODrSn91tf5L/mqUp9cQwky0AndpIwgbhAcp9LK7gvBegedT733mMG/yp13vmVjwuK7s3nAjspiJanxMDxqnhY9disgNXV6vR660lQoPpLSqKPanU+PaUJlYjr+DUFwMNvoE8JOxpmydbi/h9gQU9bgAc3vMIN5L0G3AAEeAuY/qcz6MlyThD3OP5jU07nQzYMVlqJrbwJzceYm/8yETfM5rvHHZiurXu9+VAvAh60F2aCf5NFElpQNf5d5EKqf9l1IUMRsX1WBkedP/dNhDlOxTUMPPLlZSRv2oLtdYBjT7ZoSYku5zFCGifZ6vWVr9cD26Y7SuRB5OBUfXgF7I8chqCFQP1+oduKW9l+UgdktYrDKAHjyKlOhjDJri/h/jOkgwlCRGtwUdbEhELTkUFFiDLuhmPtaXFuDEHFm3cqExtFseuOwpo2G1OO7b9rFjXw/qRCFMd+ncaG0EEVAQ6zulD38QQgzHpmqla0a+8cSu5JrflS lZNwrl/0 14P3vnsvPBaSapMaRETbNRn16QL0bJ5Zxf4GxL9jv7hLwy0BZc0MYe2v2wqzwI9xRo0k1PFQVubB8yZxvOD1gVOnWm6ZrWcG3fBH1L0yF0Qb1cgMsHF8ITvjXyNMCcVJ/uFy2mgQ47aHbux0Wwab7tRnQx5gdbKVJTuTLprMFN3kMfkxKCmGGVa0PFgzwjDk1UNLfYjWCXUryhLdtuYdrWC+O4UqNBpeCaGVFPaFnBLPo+D1u4yoXdpjgNLLAIFLpAbXfWGDJpNyfzgduGZLJEPUHNLG25etEACmxU5M0tEY0ajbTnUdfcvR+h6TzKZTgSWZIrlcEZSRnXxww8yjPzJ8HNwY7uGY8R8EtEQnzSllq7HbASfwoYn7f57hRwwoJMPOzFcc0YOawKdGiHq7vgvg6lB/4zckBnnIpOhHnGQ/00kI/MaogmJDDj3YZdgMMuUcmb9NrMyVCcEsnEUAHNk74LtsghhUmG7u/bAk02YwMff6A8DDdGsESHWpXGUc749LfhjzsC97+kdt3+dmeHqtQa0zghGZtlSSv75Yb6j6hq8JCmgCbGC0YnOVOI3GZF8dDX9XRn1mjYN33bKIKVhWJR9yV0bN+B+lDpPtyIpfyNCXFWpR1cXXn562jDkJkgxiay49ve64bK3Tqu7ddygRO9n/yUHnuqc2SBrKFhjcxOG9tLcDxiibA+1RnM0Gqcq+vKJtVKz1b4ZUv4/d6XW+8EWUS2xVY1z8SYPBlaJhbSwk= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: --000000000000232bc706342d0db5 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi Nico, As suggested, I did some new runs of my workloads with your recommended configurations (on akpm/mm-new this time). The results for the subset that my team is most interested in still do not show significant improvements (in the context of the delta between the control test and the thp=3Dalways case). On the bright side, I did observe that the Rodinia OpenMP tests show slightly more noticeable performance improvements when defer+collapse are in use than without, and I also did not observe any concerning regression indicators in any of these results. My report for these tests is attached if you'd like to take a look. [0] Thanks! [0] https://pastebin.ubuntu.com/p/432KtgnXH3/ On Thu, Apr 24, 2025 at 2:45=E2=80=AFPM Mitchell Augustin < mitchell.augustin@canonical.com> wrote: > Hi Nico, > > Thank you for the quick response and suggestions! I'll see if we can > find some time to test our workload out with your suggested settings > and will let you know what we find (although it may be a few weeks). > > -Mitchell Augustin > > On Thu, Apr 24, 2025 at 1:57=E2=80=AFPM Nico Pache wr= ote: > > > > On Thu, Apr 24, 2025 at 12:18=E2=80=AFPM Mitchell Augustin > > wrote: > > > > > > Hello, > > > > > > I realize this is an older version of the series, but @Vanda > > > Hendrychov=C3=A1 and I started on a benchmark effort of this version = prior > > > to the most recent revision's introduction and wanted to provide our > > > results as feedback for this discussion. > > > > > > For context, my team and I previously identified that some of the > > > benchmarks outlined in this phoronix benchmark suite [0] perform more > > > poorly with thp=3Dmadvise than thp=3Dalways - so I suspected that the > > > THP=3Ddefer and khugepaged collapse functionality outlined in this > > > article [6] might yield performance in between madvise and always for > > > the following benchmarks from that suite: > > > - GraphicsMagick (all tests), which were substantially improved when > > > switching from thp=3Dmadvise to thp=3Dalways > > > - 7-Zip Compression rating, which was substantially improved when > > > switching from thp=3Dmadvise to thp=3Dalways > > > - Compilation time tests, which were slightly improved when switching > > > from thp=3Dmadvise to thp=3Dalways > > > > > > There were more benchmarks in this suite, but these three were the > > > ones we had previously identified as being significantly impacted by > > > the thp setting, and thus are the primary focus of our results. > > > > > > To analyze this, we ran the benchmarks outlined in this article on th= e > > > upstream 6.14 kernel with the following configurations: > > > - linux v6.14 thp=3Ddefer-v1: Transparent Huge Pages: defer > > > - linux v6.14 thp=3Ddefer-v2: Transparent Huge Pages: defer > > > - linux v6.14 thp=3Dalways: Transparent Huge Pages: always > > > - linux v6.14 thp=3Dnever: Transparent Huge Pages: never > > > - linux v6.14 thp=3Dmadvise: Transparent Huge Pages: madvise > > > > > > "defer-v1" refers to the thp collapse implementation by Nico Pache > > > [3], and "defer-v2" refers to the implementation in this thread [4]. > > > Both use defer as implemented by series [5]. > > > > > > > > > Ultimately, we did observe that some of the GraphicsMagick tests > > > performed marginally better with Nico Pache's khugepaged collapse > > > implementation and thp=3Ddefer than with just thp=3Dmadvise, which al= igns > > > a bit with my theory - however, these improvements unfortunately did > > > not appear to be statistically significant and gained only marginal > > > ground in the performance gap between thp=3Dmadvise and thp=3Dalways = in > > > our workloads of interest. > > > > > > Results for other benchmarks in this set also did not show any > > > conclusive performance gains from mTHP=3Ddefer (however I was not > > > expecting those to change significantly with this series, since they > > > weren=E2=80=99t heavily impacted by thp settings in my prior tests). > > > > > > I can't speak for the impact of this series on other workloads - I > > > just wanted to share results for the ones we were aware of and > > > interested in. > > Hi Mitchell, > > > > Thank you very much for both testing and sharing the results! I'm glad > > no major regressions were noted, and in some cases performance was > > marginally better. Another good set of workloads to test for defer > > would be latency tests... THP=3Dalways can increase PF latencies, while > > "defer" should eliminate that penalty, with the hopes of regaining > > some of the THP benefits after the khugepaged collapse. > > > > I wanted to note one thing, with the default of max_ptes_none=3D511 and > > no mTHP sizes configured, the khugepaged series' (both mine and Devs) > > should have very little impact. This is a good test of the defer > > feature, while confirming that neither me nor Dev regressed the legacy > > PMD khugepaged case; however, this is not a good test of the actual > > mTHP collapsing. > > > > If you plan on testing the mTHP changes for performance changes, I > > would suggest enabling all the mTHP orders and setting max_ptes_none=3D= 0 > > (Devs series requires 0 or 511 for mTHP collapse to work). Given this > > is a new feature, it may be hard to find something to compare it to, > > other than each other's series'. enabling defer during these tests has > > the added benefit of pushing everything to khugepaged and really > > stressing its mTHP collapse performance. > > > > Once again thank you for taking the time to test these features :) > > -- Nico > > > > > > > > > > Full results from our tests on the DGX A100 [1] and Lenovo SR670v2 [2= ] > > > are linked below. > > > > > > [0]: https://www.phoronix.com/review/linux-os-ampereone/5 > > > [1]: https://pastebin.ubuntu.com/p/SDSSj8cr6k/ > > > [2]: https://pastebin.ubuntu.com/p/nqbWxyC33d/ > > > [3]: https://lwn.net/ml/all/20250211003028.213461-1-npache@redhat.com > > > [4]: https://lwn.net/ml/all/20250211111326.14295-1-dev.jain@arm.com > > > [5]: https://lwn.net/ml/all/20250211004054.222931-1-npache@redhat.com > > > [6]: https://lwn.net/Articles/1009039/ > > > -- > > > Mitchell Augustin > > > Software Engineer - Ubuntu Partner Engineering > > > > > > > > -- > Mitchell Augustin > Software Engineer - Ubuntu Partner Engineering > --=20 [image: Canonical-20th-anniversary] Mitchell Augustin Software Engineer - Ubuntu Partner Engineering Email: mitchell.augustin@canonical.com Location: United States of America canonical.com ubuntu.com --000000000000232bc706342d0db5 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi Nico,

As suggested,= I did some new runs of my workloads with your recommended configurations (= on akpm/mm-new this time). The results for the subset that my team is most = interested in still do not show significant=C2=A0improvements (in the conte= xt of the delta between the=C2=A0control test=C2=A0and the thp=3Dalways cas= e).

On the bright side, I did observe that the Rod= inia OpenMP tests show slightly=C2=A0more noticeable performance improvemen= ts when defer+collapse are in use than without, and I also did not observe = any concerning regression indicators in any of these results.
My report for these tests is attached if you'd like to take= a look. [0] Thanks!



On Thu, Apr 24, 2025 at 2:45= =E2=80=AFPM Mitchell Augustin <mitchell.augustin@canonical.com> wrote:
Hi Nico,

Thank you for the quick response and suggestions! I'll see if we can find some time to test our workload out with your suggested settings
and will let you know what we find (although it may be a few weeks).

-Mitchell Augustin

On Thu, Apr 24, 2025 at 1:57=E2=80=AFPM Nico Pache <npache@redhat.com> wrote:
>
> On Thu, Apr 24, 2025 at 12:18=E2=80=AFPM Mitchell Augustin
> <mitchell.augustin@canonical.com> wrote:
> >
> > Hello,
> >
> > I realize this is an older version of the series, but @Vanda
> > Hendrychov=C3=A1 and I started on a benchmark effort of this vers= ion prior
> > to the most recent revision's introduction and wanted to prov= ide our
> > results as feedback for this discussion.
> >
> > For context, my team and I previously identified that some of the=
> > benchmarks outlined in this phoronix benchmark suite [0] perform = more
> > poorly with thp=3Dmadvise than thp=3Dalways - so I suspected that= the
> > THP=3Ddefer and khugepaged collapse functionality outlined in thi= s
> > article [6] might yield performance in between madvise and always= for
> > the following benchmarks from that suite:
> > - GraphicsMagick (all tests), which were substantially improved w= hen
> > switching from thp=3Dmadvise to thp=3Dalways
> > - 7-Zip Compression rating, which was substantially improved when=
> > switching from thp=3Dmadvise to thp=3Dalways
> > - Compilation time tests, which were slightly improved when switc= hing
> > from thp=3Dmadvise to thp=3Dalways
> >
> > There were more benchmarks in this suite, but these three were th= e
> > ones we had previously identified as being significantly impacted= by
> > the thp setting, and thus are the primary focus of our results. > >
> > To analyze this, we ran the benchmarks outlined in this article o= n the
> > upstream 6.14 kernel with the following configurations:
> > - linux v6.14 thp=3Ddefer-v1: Transparent Huge Pages: defer
> > - linux v6.14 thp=3Ddefer-v2: Transparent Huge Pages: defer
> > - linux v6.14 thp=3Dalways: Transparent Huge Pages: always
> > - linux v6.14 thp=3Dnever: Transparent Huge Pages: never
> > - linux v6.14 thp=3Dmadvise: Transparent Huge Pages: madvise
> >
> > "defer-v1" refers to the thp collapse implementation by= Nico Pache
> > [3], and "defer-v2" refers to the implementation in thi= s thread [4].
> > Both use defer as implemented by series [5].
> >
> >
> > Ultimately, we did observe that some of the GraphicsMagick tests<= br> > > performed marginally better with Nico Pache's khugepaged coll= apse
> > implementation and thp=3Ddefer than with just thp=3Dmadvise, whic= h aligns
> > a bit with my theory - however, these improvements unfortunately = did
> > not appear to be statistically significant and gained only margin= al
> > ground in the performance gap between thp=3Dmadvise and thp=3Dalw= ays in
> > our workloads of interest.
> >
> > Results for other benchmarks in this set also did not show any > > conclusive performance gains from mTHP=3Ddefer (however I was not=
> > expecting those to change significantly with this series, since t= hey
> > weren=E2=80=99t heavily impacted by thp settings in my prior test= s).
> >
> > I can't speak for the impact of this series on other workload= s - I
> > just wanted to share results for the ones we were aware of and > > interested in.
> Hi Mitchell,
>
> Thank you very much for both testing and sharing the results! I'm = glad
> no major regressions were noted, and in some cases performance was
> marginally better. Another good set of workloads to test for defer
> would be latency tests... THP=3Dalways can increase PF latencies, whil= e
> "defer" should eliminate that penalty, with the hopes of reg= aining
> some of the THP benefits after the khugepaged collapse.
>
> I wanted to note one thing, with the default of max_ptes_none=3D511 an= d
> no mTHP sizes configured, the khugepaged series' (both mine and De= vs)
> should have very little impact. This is a good test of the defer
> feature, while confirming that neither me nor Dev regressed the legacy=
> PMD khugepaged case; however, this is not a good test of the actual > mTHP collapsing.
>
> If you plan on testing the mTHP changes for performance changes, I
> would suggest enabling all the mTHP orders and setting max_ptes_none= =3D0
> (Devs series requires 0 or 511 for mTHP collapse to work). Given this<= br> > is a new feature, it may be hard to find something to compare it to, > other than each other's series'. enabling defer during these t= ests has
> the added benefit of pushing everything to khugepaged and really
> stressing its mTHP collapse performance.
>
> Once again thank you for taking the time to test these features :)
> -- Nico
>
>
> >
> > Full results from our tests on the DGX A100 [1] and Lenovo SR670v= 2 [2]
> > are linked below.
> >
> > [0]: https://www.phoronix.com/review/l= inux-os-ampereone/5
> > [1]: https://pastebin.ubuntu.com/p/SDSSj8cr6k/
> > [2]:
https://pastebin.ubuntu.com/p/nqbWxyC33d/
> > [3]:
https://lwn.net/ml/al= l/20250211003028.213461-1-npache@redhat.com
> > [4]: https://lwn.net/ml/all/= 20250211111326.14295-1-dev.jain@arm.com
> > [5]: https://lwn.net/ml/al= l/20250211004054.222931-1-npache@redhat.com
> > [6]: https://lwn.net/Articles/1009039/
> > --
> > Mitchell Augustin
> > Software Engineer - Ubuntu Partner Engineering
> >
>


--
Mitchell Augustin
Software Engineer - Ubuntu Partner Engineering


--
<= tr><= td style=3D"vertical-align:top;padding:0px 4px 0px 5px;font-size:0px">

mitchell.augustin@canonic= al.com

3D"Canonical-20th-anniversary"

Mitchell Augustin

Software Engineer - Ubuntu Partner Engineering

Email:

Location:

United States of America



canonical.= com

ubuntu.com

--000000000000232bc706342d0db5--