From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1E716C3ABAA for ; Fri, 2 May 2025 20:35:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 29DB46B008C; Fri, 2 May 2025 16:35:10 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2275E6B0092; Fri, 2 May 2025 16:35:10 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0775B6B0093; Fri, 2 May 2025 16:35:10 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id DB21C6B008C for ; Fri, 2 May 2025 16:35:09 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 18068C0AD1 for ; Fri, 2 May 2025 20:35:10 +0000 (UTC) X-FDA: 83399122380.07.84929F3 Received: from smtp-relay-internal-0.canonical.com (smtp-relay-internal-0.canonical.com [185.125.188.122]) by imf08.hostedemail.com (Postfix) with ESMTP id E3011160006 for ; Fri, 2 May 2025 20:35:07 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=canonical.com header.s=20210705 header.b=PCEzcqzz; spf=pass (imf08.hostedemail.com: domain of mitchell.augustin@canonical.com designates 185.125.188.122 as permitted sender) smtp.mailfrom=mitchell.augustin@canonical.com; dmarc=pass (policy=none) header.from=canonical.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1746218108; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=AL7owyBMWJAOZeenaVNUwkoD0U3ZSllU7CnXmN22cj8=; b=X5lmOa/hWMDKcb0a+vSetFwoDH1p5juq3HbEhb1SgU6FAznSbvtd0K1E9XZoa/qP2so2Wv 9LflyVeMWlH/tv1YTF49h7WZgnc4OHfZaq4baQP5bzl9x/+WVf5dnVLomFsBw3BvbF03XD SYjqFXzshfcJAdTJRy6S0QtBvaQV3dE= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=canonical.com header.s=20210705 header.b=PCEzcqzz; spf=pass (imf08.hostedemail.com: domain of mitchell.augustin@canonical.com designates 185.125.188.122 as permitted sender) smtp.mailfrom=mitchell.augustin@canonical.com; dmarc=pass (policy=none) header.from=canonical.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1746218108; a=rsa-sha256; cv=none; b=kyTt3euVlpIh6dNyviDs/i9tWXMIAHlFKpYIR8zi2VJ+IV+9taRuQ4aDcN4sLu1dkVMrCW Gvb7Sf12eC0mGvJ0a1SlM8yQSxYxloph+RiZ9tydbJVX13Bm0phuP51MT7llHgqPYJtIJK U/sIg1gmNoKN6qfRlk4AOKqdCWsvvjk= Received: from mail-ed1-f69.google.com (mail-ed1-f69.google.com [209.85.208.69]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-relay-internal-0.canonical.com (Postfix) with ESMTPS id D72124065B for ; Fri, 2 May 2025 20:35:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=canonical.com; s=20210705; t=1746218105; bh=AL7owyBMWJAOZeenaVNUwkoD0U3ZSllU7CnXmN22cj8=; h=MIME-Version:References:In-Reply-To:From:Date:Message-ID:Subject: To:Cc:Content-Type; b=PCEzcqzzriL85sEe/CsvdJF3j2pMo4RCEiwW35DpblRedwqaWtxl2LVRWbwlTe9pj E/YZ293GB5KrLo5+IUjqPHVIzRRgLKCVsy+ORpUGpJOZmjas6vmBdWGaxiY0AFFfJZ d8yB1J4FJ5f8ND8NZ6kB1myYgg5+0Y2TklroedqSSL8CsTUgC1UMJLrfPEBNbTg7Uf uRh7lpgvOp92uk7iF4LNPubA1ztvw3lcZNbnuOAVfvP07mjA8ADh8J8rb3t2jScnl9 4JyYCHtsJSgwTIm/HqMMVjNS4kukii2q3juVmRBvpp2ol6QtufLRiDcOADbW/7sFF7 YXnsEwUe6Icgw== Received: by mail-ed1-f69.google.com with SMTP id 4fb4d7f45d1cf-5e68e9d9270so2042184a12.0 for ; Fri, 02 May 2025 13:35:05 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1746218105; x=1746822905; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=AL7owyBMWJAOZeenaVNUwkoD0U3ZSllU7CnXmN22cj8=; b=NzoqT8x6EIKEOgvFBHRbF7FHIJ/4+qLiY48G1J+mZNUEWgx0lkgVA/tgpGnd+lLYbW u7zN/0ipVUW6/jt3l7BmuUgCjbFbxXq9AntjSGf0r19txFqxx6vfFhArp21IrvetMIw+ hRxSqtMHRJbXoFaFv+9oRL8YbC+VgIyCusBJdD6LQIEtg6QQuqfPToNTKlPJiy/hVFU9 RvpMtC+/Ug2hcSugFu3cPTDnViPhuYyRfZvkLBaAkt5A6DujKdmzQrJvPt74OgVdUZ0E 2tu+ykxWZTtjfs7Qj7hdZjp9FgAR/kXQEX0EuV/3terFMOXv9nRVOH0YnqBsMVFByag6 8t8A== X-Forwarded-Encrypted: i=1; AJvYcCWmjWNn1KzWhAI3T72zgdca158U43t6GQYn4HngFDT3zqJW9iodfx6u/55ax2HmVcQ2SRswJULaGg==@kvack.org X-Gm-Message-State: AOJu0YzvZnM3sNGW3v8qbY2oDRkM/qXvzLy+ve4XyNLxm25fplSiDvDl xfH+gtZcmo0amGh2YiiCZqCqVJoMPvLKtyPnZ5cxJKqo/tdE0YAORIpW86QbgF0e5h+KhWLnwSR l9uACnZHTUBIjVnJK7pkoqDYi1zmJFy3VvdGibXTVWKjKeVtxXfqwcR4frOqM0bHylTrPPpmaZS WRX8wl8IgLYcLdhvle/zmxsQOY4sUbIHY0FwnO3zU= X-Gm-Gg: ASbGnctBWEQRu0N/4PgUKVKqF1dT8UCdxkmpOOKmDfpMCtmmy/xCOD0a4LYWp+2A73v UOk7lB0gvC15hKwgJ7o0InNywgb/xWMtCVWLeXGe1TzNlLkj4NxdTqCGVxrqoF2+EVf8NCLL1N7 lkfccMdJKHbPUza0lrPg0/EuNAAIU= X-Received: by 2002:a05:6402:274c:b0:5f6:218d:34f3 with SMTP id 4fb4d7f45d1cf-5fa7893cb0fmr3609972a12.28.1746218104460; Fri, 02 May 2025 13:35:04 -0700 (PDT) X-Google-Smtp-Source: AGHT+IH5ZB4rcimVvhRfIqBZRnBNqZIKmSTEpmCJg3WD0Z9wvupiBmdeZ4AIOI+CN0thxnXOCV4bc8GdDmUp1pZ6B6A= X-Received: by 2002:a05:6402:274c:b0:5f6:218d:34f3 with SMTP id 4fb4d7f45d1cf-5fa7893cb0fmr3609932a12.28.1746218104070; Fri, 02 May 2025 13:35:04 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Mitchell Augustin Date: Fri, 2 May 2025 15:34:53 -0500 X-Gm-Features: ATxdqUH4_8QdQdqJC--y8ol1sVUifcSl2Yk3zPMG7EMrXD3fm5W1v6WJwEzhJCE Message-ID: Subject: Re: [PATCH v2 00/17] khugepaged: Asynchronous mTHP collapse To: Nico Pache Cc: akpm@linux-foundation.org, 20250211152341.3431089327c5e0ec6ba6064d@linux-foundation.org, 21cnbao@gmail.com, aneesh.kumar@kernel.org, anshuman.khandual@arm.com, apopple@nvidia.com, baohua@kernel.org, catalin.marinas@arm.com, cl@gentwo.org, dave.hansen@linux.intel.com, david@redhat.com, dev.jain@arm.com, haowenchao22@gmail.com, hughd@google.com, ioworker0@gmail.com, jack@suse.cz, jglisse@google.com, John Hubbard , kirill.shutemov@linux.intel.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, mhocko@suse.com, Peter Xu , ryan.roberts@arm.com, srivatsa@csail.mit.edu, surenb@google.com, vbabka@suse.cz, vishal.moola@gmail.com, wangkefeng.wang@huawei.com, will@kernel.org, willy@infradead.org, yang@os.amperecomputing.com, zhengqi.arch@bytedance.com, Zi Yan , zokeefe@google.com, Jacob Martin , =?UTF-8?Q?Vanda_Hendrychov=C3=A1?= Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: dtwhyaa3ijiaxmzjjzswh8nmhbdbq4zi X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: E3011160006 X-Rspam-User: X-HE-Tag: 1746218107-240211 X-HE-Meta: U2FsdGVkX1+q1ev+TfSxs1DgVITnOzf70qECV7UVI1adwutS59QM4hqJRWfgee22m9tNHyha6L76A584LETYxFarvBLcm1y1aV27V0o19zz+Kb/znoB54WJSt/sy9n+0hYxf7fpSwKT8gp6MdxgCOCcfVGaBiGlW+h83DJ4Dbo41Oy/apMJUZdyEPG9YlLSrSIi0B3bM1297TpYCIo4hukpJiga5Jjh0c2hBJFuTtJQr8l8rdrUCTIH2GJ9qjlZ6SLSBvrSFszSKonf54SJ3IcDT5/MnudsS0bbczZ3UpHFO172JpUHEdzj3WXrZmP1T/3zglN5Om7VDMZAIh71V6E3F4MYOltNo20GPJo4REVT6tn0zSx1NdyXDqfEo3ghxiQRGEhOzZpp5jdDoUChjSlQPRXXtrPk44d7X/sSI8wT2nKLkgZtCN9Lor/tpvrRu77p2ISsktV/79AIYbEcX4Jg9VvIzLWdu4HvMr5RRHb+H+ecSASGLRt4/yQroko+Q3AdtkrmnxcbLpL7BEyLKLXgubMGRObr8++2AYED+7B+2jFSoGtZjGdqHzIvy3xKbtEeKqyQ/sMGsj079Stc6u8q5emh9SD0/ouT6ZcYy5qB3UDx5VGqxkENugBN2Z8uu/gEnfIvwOBiFVGP8gK8hQuhslyN2hVE+IelSaJRwLkMrplP1GV9ICX9RfkrevbsGxb7X/KCBMzYI99q1OryVTVBxJtNm9UnONOZIosLwmdB47SDkde8PvoatG0iksWw1cxaPTUGd6fvRd8Q+mCZdowEPuuL1IElJ01gavKzl6nKnX7bjQ86Otx/LezZYJULo+Yv/uQnHIuGECSr3gH7ozvxk7iV+BJ7rMdJRzaECLPMepYDNKrnWMfOi/cb95Vg8TD2cGGoJ9cEx5CYEDa6smYN+dyA1sOcsUnfcELRVT4/01bTUsxVnYMwTJwsLV32AhSvPiHGCVmiheep/0Vv eANcxvP+ szvJr165JMZAifG0e++GD4m8flBbtKuR5LWChaJQKtLJzHayrbX/HQ+KnPLfvUIbqWEEXPYeOW0SA66DRXbVDmz7XSE0BRDw4SwhZSyxWDm85LfUtqnOmT+qU9jRluCCrOcmmiDx3nXiHSDy+GSbkdslio3SPto0QK+pe0FTuGyEma2FYIsqAW55PRkIP82PWJrfnnDMV4oxDcquIAEMNY9M/vnRJ6VgmXmeHRh6JExlvwI8g98D8Xz1BS2ieISqEQRD/GUAHHHjsHelwQ1WQGDUY+dIfICSnGe8wmAXeo7BULKuQ1+pA4lpjzGoIMUqSLMrD3ymasW365eUrM5G5PNRmoJZ23kpdMJEqcsF00l5L0X5Jfq8ZxLdIEV1CaoTA7gTaLkTIkBH2BG92s4GDmZWhpCp4R6oJ5zW/JmJx8rT9axmSkaUNuwpDM5P+HotkwBrObYjCfJWbnFG2ZvDBKU1iLqr3CWRG6DgI6vs3uvfAaoGvtmRYn8MBiyOn7SNSLs1STlj51+LLibnjtEW0bJuf8TLz0b4NQ/DdaEJHpUOu3jZclk+AGvU7E9w5AlGLWgbtt8wpJmB/Qm6Czm/15866xEHu5TLv1hCj X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi Nico, As suggested, I did some new runs of my workloads with your recommended configurations (on akpm/mm-new this time). The results for the subset that my team is most interested in still do not show significant improvements (in the context of the delta between the control test and the thp=3Dalways case). On the bright side, I did observe that the Rodinia OpenMP tests show slightly more noticeable performance improvements when defer+collapse are in use than without, and I also did not observe any concerning regression indicators in any of these results. My report for these tests is attached if you'd like to take a look. [0] Tha= nks! [0] https://pastebin.ubuntu.com/p/432KtgnXH3/ On Thu, Apr 24, 2025 at 2:45=E2=80=AFPM Mitchell Augustin wrote: > > Hi Nico, > > Thank you for the quick response and suggestions! I'll see if we can > find some time to test our workload out with your suggested settings > and will let you know what we find (although it may be a few weeks). > > -Mitchell Augustin > > On Thu, Apr 24, 2025 at 1:57=E2=80=AFPM Nico Pache wr= ote: > > > > On Thu, Apr 24, 2025 at 12:18=E2=80=AFPM Mitchell Augustin > > wrote: > > > > > > Hello, > > > > > > I realize this is an older version of the series, but @Vanda > > > Hendrychov=C3=A1 and I started on a benchmark effort of this version = prior > > > to the most recent revision's introduction and wanted to provide our > > > results as feedback for this discussion. > > > > > > For context, my team and I previously identified that some of the > > > benchmarks outlined in this phoronix benchmark suite [0] perform more > > > poorly with thp=3Dmadvise than thp=3Dalways - so I suspected that the > > > THP=3Ddefer and khugepaged collapse functionality outlined in this > > > article [6] might yield performance in between madvise and always for > > > the following benchmarks from that suite: > > > - GraphicsMagick (all tests), which were substantially improved when > > > switching from thp=3Dmadvise to thp=3Dalways > > > - 7-Zip Compression rating, which was substantially improved when > > > switching from thp=3Dmadvise to thp=3Dalways > > > - Compilation time tests, which were slightly improved when switching > > > from thp=3Dmadvise to thp=3Dalways > > > > > > There were more benchmarks in this suite, but these three were the > > > ones we had previously identified as being significantly impacted by > > > the thp setting, and thus are the primary focus of our results. > > > > > > To analyze this, we ran the benchmarks outlined in this article on th= e > > > upstream 6.14 kernel with the following configurations: > > > - linux v6.14 thp=3Ddefer-v1: Transparent Huge Pages: defer > > > - linux v6.14 thp=3Ddefer-v2: Transparent Huge Pages: defer > > > - linux v6.14 thp=3Dalways: Transparent Huge Pages: always > > > - linux v6.14 thp=3Dnever: Transparent Huge Pages: never > > > - linux v6.14 thp=3Dmadvise: Transparent Huge Pages: madvise > > > > > > "defer-v1" refers to the thp collapse implementation by Nico Pache > > > [3], and "defer-v2" refers to the implementation in this thread [4]. > > > Both use defer as implemented by series [5]. > > > > > > > > > Ultimately, we did observe that some of the GraphicsMagick tests > > > performed marginally better with Nico Pache's khugepaged collapse > > > implementation and thp=3Ddefer than with just thp=3Dmadvise, which al= igns > > > a bit with my theory - however, these improvements unfortunately did > > > not appear to be statistically significant and gained only marginal > > > ground in the performance gap between thp=3Dmadvise and thp=3Dalways = in > > > our workloads of interest. > > > > > > Results for other benchmarks in this set also did not show any > > > conclusive performance gains from mTHP=3Ddefer (however I was not > > > expecting those to change significantly with this series, since they > > > weren=E2=80=99t heavily impacted by thp settings in my prior tests). > > > > > > I can't speak for the impact of this series on other workloads - I > > > just wanted to share results for the ones we were aware of and > > > interested in. > > Hi Mitchell, > > > > Thank you very much for both testing and sharing the results! I'm glad > > no major regressions were noted, and in some cases performance was > > marginally better. Another good set of workloads to test for defer > > would be latency tests... THP=3Dalways can increase PF latencies, while > > "defer" should eliminate that penalty, with the hopes of regaining > > some of the THP benefits after the khugepaged collapse. > > > > I wanted to note one thing, with the default of max_ptes_none=3D511 and > > no mTHP sizes configured, the khugepaged series' (both mine and Devs) > > should have very little impact. This is a good test of the defer > > feature, while confirming that neither me nor Dev regressed the legacy > > PMD khugepaged case; however, this is not a good test of the actual > > mTHP collapsing. > > > > If you plan on testing the mTHP changes for performance changes, I > > would suggest enabling all the mTHP orders and setting max_ptes_none=3D= 0 > > (Devs series requires 0 or 511 for mTHP collapse to work). Given this > > is a new feature, it may be hard to find something to compare it to, > > other than each other's series'. enabling defer during these tests has > > the added benefit of pushing everything to khugepaged and really > > stressing its mTHP collapse performance. > > > > Once again thank you for taking the time to test these features :) > > -- Nico > > > > > > > > > > Full results from our tests on the DGX A100 [1] and Lenovo SR670v2 [2= ] > > > are linked below. > > > > > > [0]: https://www.phoronix.com/review/linux-os-ampereone/5 > > > [1]: https://pastebin.ubuntu.com/p/SDSSj8cr6k/ > > > [2]: https://pastebin.ubuntu.com/p/nqbWxyC33d/ > > > [3]: https://lwn.net/ml/all/20250211003028.213461-1-npache@redhat.com > > > [4]: https://lwn.net/ml/all/20250211111326.14295-1-dev.jain@arm.com > > > [5]: https://lwn.net/ml/all/20250211004054.222931-1-npache@redhat.com > > > [6]: https://lwn.net/Articles/1009039/ > > > -- > > > Mitchell Augustin > > > Software Engineer - Ubuntu Partner Engineering > > > > > > > > -- > Mitchell Augustin > Software Engineer - Ubuntu Partner Engineering --=20 Mitchell Augustin Software Engineer - Ubuntu Partner Engineering