From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1D528C52D7C for ; Mon, 12 Aug 2024 08:18:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A8AF86B00BB; Mon, 12 Aug 2024 04:18:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A3A696B00BC; Mon, 12 Aug 2024 04:18:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 901296B00BD; Mon, 12 Aug 2024 04:18:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 724F86B00BB for ; Mon, 12 Aug 2024 04:18:51 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 1966CC08F2 for ; Mon, 12 Aug 2024 08:18:51 +0000 (UTC) X-FDA: 82442892462.09.63EF8A6 Received: from mail-ej1-f42.google.com (mail-ej1-f42.google.com [209.85.218.42]) by imf12.hostedemail.com (Postfix) with ESMTP id 43E6840011 for ; Mon, 12 Aug 2024 08:18:49 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=erbPUIYy; spf=pass (imf12.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.218.42 as permitted sender) smtp.mailfrom=mjguzik@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1723450694; a=rsa-sha256; cv=none; b=zbVVrkL9r8BGL6jQU8xMR7nUcARIVSgX5GkR9GN2LbDxQUAiT+U/xwV6iKSOxXCJp8Y0/L bg1fB7NeYqzBa/iH8+v32xRssu44XAi4+pmAoXp8146dvdSkzTMIhOYAWYreFISdQR2RE2 wCKRvbxmHeCvPASYTBlJqDdgHDs0cEg= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=erbPUIYy; spf=pass (imf12.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.218.42 as permitted sender) smtp.mailfrom=mjguzik@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1723450694; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=VT64ovfrCU9Zee9Y2QVyAPt/J2QcowfwSWdbeJCRYUs=; b=QlfCRWvHJbOfIcjaIq+TbyPJ8xsuJNNeIXFAH2n34fbQDgEULGyl6KlpTmEUtTiKNjzrNj uv3Eh2hPXeiJOKNHhf0S4Owco2eNED3punDVcTrxKtbyzYTWdv+drD/EBPuiFpryXj7viO MtvRJLm2sLhu1iIOxLSnajlyW9MAvn0= Received: by mail-ej1-f42.google.com with SMTP id a640c23a62f3a-a7d2a9a23d9so437450266b.3 for ; Mon, 12 Aug 2024 01:18:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1723450728; x=1724055528; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=VT64ovfrCU9Zee9Y2QVyAPt/J2QcowfwSWdbeJCRYUs=; b=erbPUIYy5QPLmlLhl+tTw2xAdHDjJN/u+0TB2F8xaLKfWUABLLVOjYH1afRa3RGfZP Kr+aBo/u/1iQrlFP4I1XO/18HCrHcz+5qSGaY7IVh67ORtuDQtuogK1Rjj5GE/YHUak6 cYv6oYovzOotAuGMJmVEigWHFKQ5syW25hiaPHeT2hM9DrfAm+8T2ej6b75GO3vawAxm yjLS0KbYev31QzwCx1NCkDT56HO18O1d2bsncyCkh39ocFfugPosBiPfzODbyrjBZVlI gYu50T+ijnI3eQuxrQxd7fUYzzJB0O2NGrH4Qe328jHcvhaTOmZunsiUHEh570R4CXYx 3pBA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1723450728; x=1724055528; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=VT64ovfrCU9Zee9Y2QVyAPt/J2QcowfwSWdbeJCRYUs=; b=m1GwOAj54nD7erYBkuCXcJSyHV8r/9N22sqG6mtKzS42MFGA61WhMCpfqkrpQwJg5k faUn90nCZsO4k3QtbEDp/+WN411nsVZdjf6o7QRImTu3O5x5XRVd1rlTEj88qMABfyOJ MTp1FN5U6sRfwZ7VcQFeNBFj3DRjXbMvVhhW2prssLyYWO1DjQwYAOBeMwgLcHO78KDk /m/WhRQeVXEXzSJ1TYir2wGE+he9EZJvats9h87PVdRidT2h1iFPN4avSeKC1494jxeQ JhF8ZToPmw/IWwcEeUJnNIiyKrGIDqkeyB5jVLdRKRFouYpzC3HxGiv3WmDsEPJgGlWY Koeg== X-Forwarded-Encrypted: i=1; AJvYcCUcQjU6CblDTgFFB68HLJ09dxTN28UgaABhUq2An2fTSQA3ienhl1K4SCW6n+g20Y128j8AYYwt/y/U072SvKAQQgE= X-Gm-Message-State: AOJu0Yyon76xxYx3tOxCZnXRScdqiqpzX+w9BljHB2wXFGQkZWr6ZZ4q RFNZ2NqmqUvBq6c849t4fLSQ5K3nW8wahvS+KQGbkyufJrn7reslyPLbmH970la1rfGm45Us/9C OIif+JnjP/tE8cf6haLkbrTDj4Gc= X-Google-Smtp-Source: AGHT+IE8dvVEEZMCdEXUQaeQ55ANsM3WcSwuZwJERb8mL0HdgQ+W3IPYqLGwGaFdlIv2R8tMEBJcxVOXFQKK5nETx5U= X-Received: by 2002:a17:907:f1e9:b0:a7a:8d73:c2c6 with SMTP id a640c23a62f3a-a80aa595ca3mr592758066b.18.1723450727316; Mon, 12 Aug 2024 01:18:47 -0700 (PDT) MIME-Version: 1.0 References: <202407301049.5051dc19-oliver.sang@intel.com> <193e302c-4401-4756-a552-9f1e07ecedcf@redhat.com> <439265d8-e71e-41db-8a46-55366fdd334e@intel.com> <90477952-fde2-41d7-8ff4-2102c45e341d@redhat.com> <6uxnuf2gysgabyai2r77xrqegb7t7cc2dlzjz6upwsgwrnfk3x@cjj6on3wqm4x> <5a67c103-1d9d-440d-8bed-bbfa7d3ecf71@redhat.com> <5c0979a2-9a56-4284-82d2-42da62bda4a5@redhat.com> <66c4fcc5-47f6-438c-a73a-3af6e19c3200@redhat.com> In-Reply-To: <66c4fcc5-47f6-438c-a73a-3af6e19c3200@redhat.com> From: Mateusz Guzik Date: Mon, 12 Aug 2024 10:18:35 +0200 Message-ID: Subject: Re: [linus:master] [mm] c0bff412e6: stress-ng.clone.ops_per_sec -2.9% regression To: David Hildenbrand Cc: Yin Fengwei , kernel test robot , Peter Xu , oe-lkp@lists.linux.dev, lkp@intel.com, linux-kernel@vger.kernel.org, Andrew Morton , Huacai Chen , Jason Gunthorpe , Matthew Wilcox , Nathan Chancellor , Ryan Roberts , WANG Xuerui , linux-mm@kvack.org, ying.huang@intel.com, feng.tang@intel.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: usr91jwjd91ogknwqr8hzcs1ne69wiqw X-Rspamd-Queue-Id: 43E6840011 X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1723450729-834677 X-HE-Meta: U2FsdGVkX19fs5Aa2atAXoN6tlVpgLqibtJxYl8v3YNnDNCL+41tAMyss/6kVIGI9qHDcR7CleOp0Y/YC3B/7L4063US+GiTSsqCpZS6SnoQhu+MEMCVWqJlrsJV4ijvmKBqImlOou4tDe04FzoxuSQ0VEfznfrxf0N3BkHWBzWGUnfa6uGfeoHE4W+X9QuCND3SvXjGfawpSLKJ0F97Co5VWHI+yZCrYQwoebSTdhMXucz9ODiPgDRV0TZpBS/TaoP3sQx1v7eGgqTqQbWQPrisTD0FWCW2lQtMXbaLMVh1ZRHObqrGQBonMMa/xrnUsNWqdy1px9jNvLys6x+ifpaXc9fRRqHFwjeuMbFTR8t10cYzSXEd9aPT1XXLV1TiccHh+kBJGDNV8nX/qLNa+JToA5VbpWm5OAK+xHWfI/YcLIq+uEilf3ZN6R/O+IXB7ZMuZO2/ItRrNxP0lVOvsIbFR3K9kRuTDgy7Mcro9yoF2SIF7q7Gx/NGEekSKcOQ8eOsVt0k9RWqk2JnhO59c4ViW1yAv9Y6CeTdl+RBJA+wIxYba9fZLZtyiSFlLzctZiTllCJeDxcKjHlA+AewHVW+CarEnX9odff7u01K1052C3c/JlyzObLuTbHe5vKIAxOMEfB+ZK6aZ1NC0lyCSF/6fCHd4w/6w+2MvFvB+v+9IreWP57jS1DP7z4fvMXrctb/cp+XEyAwh5wet5XFu4NNDdxg2tbjoV4K0AUiZRkOLF+gpm8hSw6tNFb0+La+4gztkoQnnqGVz1EW2EWuyu3pdeYbmAqVWxEH26Tok/9zQTOD0qGZ0OQ3vGf9yqUQi2umA9dyBWULE0OgIoMxaMXrtF7MJM4qEOwxRXzvnI64ECiV9QVu5KZIZOTS00j05KbT0is0iHPdi0YgCTSSmJUomLLc4zx8GhhNKxCQsfI/mZZjRGciC63xRV8Wv1cuPvia0cL197BTelIWuHD I6g/KLqn bdIQZOA4xEqvl9qc9SRn++00sYHuoMy1q1jeYM1bSgfUWN6f+eoCnapHF5UVq8kiN7qYP9rMKBzPWFnQg+RERXIOMuKw95pnCLxC0WDNODFu18+6iEAc6tsrQucgoRBogu+YZHwHKk5Qemls/X9gsf3hEOQpQY4iZUInQTfi0xUofpNcTsO175lxLm+6QVcKHzD8UfFKeHkaK0Byv1jlS7y0cEW+cXw6qdw4h5y13la5uyHDUF7iWxw8q443v04fo/mIliuOZgyxaveEA8sjRVrA0/aYeUx+/CXXUt5sOhvW6XJqvMalC1JeBas+wDHF+H/77j6U+VGv4GmSp915brfKJDhG93yqPpJMoN5epYhOm00x3IrIoZ/rHgS8jXnd3LmYcLwXX0j1v4ts= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Aug 12, 2024 at 10:12=E2=80=AFAM David Hildenbrand wrote: > > On 12.08.24 06:49, Mateusz Guzik wrote: > > On Mon, Aug 12, 2024 at 12:43:08PM +0800, Yin Fengwei wrote: > >> Hi David, > >> > >> On 8/1/24 09:44, David Hildenbrand wrote: > >>> On 01.08.24 15:37, Mateusz Guzik wrote: > >>>> On Thu, Aug 1, 2024 at 3:34=E2=80=AFPM David Hildenbrand > >>>> wrote: > >>>>> > >>>>> On 01.08.24 15:30, Mateusz Guzik wrote: > >>>>>> On Thu, Aug 01, 2024 at 08:49:27AM +0200, David Hildenbrand wrote: > >>>>>>> Yes indeed. fork() can be extremely sensitive to each > >>>>>>> added instruction. > >>>>>>> > >>>>>>> I even pointed out to Peter why I didn't add the > >>>>>>> PageHuge check in there > >>>>>>> originally [1]. > >>>>>>> > >>>>>>> "Well, and I didn't want to have runtime-hugetlb checks in > >>>>>>> PageAnonExclusive code called on certainly-not-hugetlb code paths= ." > >>>>>>> > >>>>>>> > >>>>>>> We now have to do a page_folio(page) and then test for hugetlb. > >>>>>>> > >>>>>>> return folio_test_hugetlb(page_folio(page)); > >>>>>>> > >>>>>>> Nowadays, folio_test_hugetlb() will be faster than at > >>>>>>> c0bff412e6 times, so > >>>>>>> maybe at least part of the overhead is gone. > >>>>>>> > >>>>>> > >>>>>> I'll note page_folio expands to a call to _compound_head. > >>>>>> > >>>>>> While _compound_head is declared as an inline, it ends up being bi= g > >>>>>> enough that the compiler decides to emit a real function instead a= nd > >>>>>> real func calls are not particularly cheap. > >>>>>> > >>>>>> I had a brief look with a profiler myself and for single-threaded = usage > >>>>>> the func is quite high up there, while it manages to get out with = the > >>>>>> first branch -- that is to say there is definitely performance los= t for > >>>>>> having a func call instead of an inlined branch. > >>>>>> > >>>>>> The routine is deinlined because of a call to page_fixed_fake_head= , > >>>>>> which itself is annotated with always_inline. > >>>>>> > >>>>>> This is of course patchable with minor shoveling. > >>>>>> > >>>>>> I did not go for it because stress-ng results were too unstable fo= r me > >>>>>> to confidently state win/loss. > >>>>>> > >>>>>> But should you want to whack the regression, this is what I would = look > >>>>>> into. > >>>>>> > >>>>> > >>>>> This might improve it, at least for small folios I guess: > >> Do you want us to test this change? Or you have further optimization > >> ongoing? Thanks. > > > > I verified the thing below boots, I have no idea about performance. If > > it helps it can be massaged later from style perspective. > > As quite a lot of setups already run with the vmemmap optimization enable= d, I > wonder how effective this would be (might need more fine tuning, did not = look > at the generated code): > > > diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h > index 085dd8dcbea2..7ddcdbd712ec 100644 > --- a/include/linux/page-flags.h > +++ b/include/linux/page-flags.h > @@ -233,7 +233,7 @@ static __always_inline int page_is_fake_head(const st= ruct page *page) > return page_fixed_fake_head(page) !=3D page; > } > > -static inline unsigned long _compound_head(const struct page *page) > +static __always_inline unsigned long _compound_head(const struct page *p= age) > { > unsigned long head =3D READ_ONCE(page->compound_head); > > Well one may need to justify it with bloat-o-meter which is why I did not just straight up inline the entire thing. But if you are down to fight opposition of the sort I agree this is the patch to benchmark. :) --=20 Mateusz Guzik