From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9A44CC00144 for ; Thu, 4 Aug 2022 06:42:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B91428E0002; Thu, 4 Aug 2022 02:42:43 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B407F8E0001; Thu, 4 Aug 2022 02:42:43 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A08AC8E0002; Thu, 4 Aug 2022 02:42:43 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 918DA8E0001 for ; Thu, 4 Aug 2022 02:42:43 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 4054DAC0C9 for ; Thu, 4 Aug 2022 06:42:43 +0000 (UTC) X-FDA: 79760967006.29.125B1AE Received: from mail-pj1-f51.google.com (mail-pj1-f51.google.com [209.85.216.51]) by imf26.hostedemail.com (Postfix) with ESMTP id D5D6C140126 for ; Thu, 4 Aug 2022 06:42:42 +0000 (UTC) Received: by mail-pj1-f51.google.com with SMTP id ha11so18863069pjb.2 for ; Wed, 03 Aug 2022 23:42:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=t0hVpaX67O5peLQKyTWorSsJEziUO3Aa/9O4XXseo2g=; b=RUr0dTDQnZ8mwg+7mveG7DJmZS2Vs3pHHhTi+q1WegmR0jqmlyMwAg5ZMTIcZkaGjA ylUI57PJpNGf3+axQl4eZ0sJrMxJOZmJkGmKwtBB8OZor3MCZ4Nnboc3BESNCizAu3zY kLMxrWyR9kMEw8X+pw0cSND3x+m5qIXKXYW8YCWTnMw1jqfr33Jf4e07wB0+fBBLSZcr 8qzzrQFJSpG/0u17d5osxVUNOdTXKuHP6b99Uwo95EzgNiMpYSXVc7dKGyd6Ir6nhoqu uRrFrphubelLGIjoW3RISlcD7IVAPqGkQ1O44BJimOg3y1kaHFIHhLsnhMxdPzowDyWd zlEw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=t0hVpaX67O5peLQKyTWorSsJEziUO3Aa/9O4XXseo2g=; b=SYcEOFpA2PsZMK8Vi5XKp2KAO4gxfMAAWIuw2faWj+q7Q6lXi55DY1Ai7Hn5zVpfzu nmHmCXtcrXpzkl7pUx6Xz+FDN4GU/3Ce5GNsyLB9t+gttifwV9b7wuwZhSHJA+xbmbLz ETedEkUuILm6M3uTZ1ybXkiSV0Sb0CpdWBzNV6l1aTbmOaSGB6GYyVUVX69T7dsOEVRp mrjmr/rrArBx7Qg5oD4wRYc9hp90XH0Aq4FrhAYrYnoIl4budMwtCVbYL/p4w0SHooF3 BCaEdwUg8YMUA6nXrnP8PCsHNEuFS1mBuw+ojSXSEz0RIVStT110IAQw3gdtvq3v/PYQ wRHw== X-Gm-Message-State: ACgBeo0XP6/E4lqg9Cdc9gaq9p0HzCRzWInuX+9PjvcT2jo7QYHRS63g fzsD2r9DyMXwyTbNGvloSF4= X-Google-Smtp-Source: AA6agR73NIrOxmIl9LnO95/01NaFEAFZpy4KIi1ifnULXkctNLzW60kvGiOiLSKK/BI7W3qh7HSajA== X-Received: by 2002:a17:90a:c70d:b0:1f3:2a3a:24d4 with SMTP id o13-20020a17090ac70d00b001f32a3a24d4mr8852622pjt.16.1659595361464; Wed, 03 Aug 2022 23:42:41 -0700 (PDT) Received: from smtpclient.apple (c-24-6-216-183.hsd1.ca.comcast.net. [24.6.216.183]) by smtp.gmail.com with ESMTPSA id pb15-20020a17090b3c0f00b001f31776ccf3sm268185pjb.12.2022.08.03.23.42.40 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 03 Aug 2022 23:42:41 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3696.120.41.1.1\)) Subject: Re: [PATCH 2/2] mm: Remember young bit for page migrations From: Nadav Amit In-Reply-To: Date: Wed, 3 Aug 2022 23:42:39 -0700 Cc: LKML , Linux MM , Andrea Arcangeli , Andi Kleen , Andrew Morton , Hugh Dickins , Huang Ying , "Kirill A . Shutemov" , Vlastimil Babka , David Hildenbrand Content-Transfer-Encoding: quoted-printable Message-Id: References: <20220803012159.36551-1-peterx@redhat.com> <20220803012159.36551-3-peterx@redhat.com> To: Peter Xu X-Mailer: Apple Mail (2.3696.120.41.1.1) ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1659595362; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=t0hVpaX67O5peLQKyTWorSsJEziUO3Aa/9O4XXseo2g=; b=AGvUk3EFnCjjCkgLlHA9zmxGGwBTsbZmpgIglvPVY0ihqNI2t2iyQHv2vwp/BxQ/sGnUOY nBh7ryzsEMPKKqxsnHyKvCxdGoBXaODQ5PEywxLID8TiXIe9rGySFnM5hR2CGV7Dc+3/5k cCE9PFTrnjiSip80paOpQVCSPEiD3lg= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=RUr0dTDQ; spf=pass (imf26.hostedemail.com: domain of nadav.amit@gmail.com designates 209.85.216.51 as permitted sender) smtp.mailfrom=nadav.amit@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1659595362; a=rsa-sha256; cv=none; b=8rJCHBh8ICo/MOKqC4EBg+dPX/t84nagh1sDdEgsN/pKqFJnVErEIqtsWp8VoYx86Vtyvy cEVqo5A4V7WHBTBQKhl/DBwlPY5Dh1sSpO45z7tlNfOIklst3feqxi401OkrnHa5ggg4y4 628qjh4lP4S8/LYQ+DMC+eyEfjDL+JI= X-Stat-Signature: qo5yqqnfmu5f1c5hr95m6zn53nmmo5ga X-Rspam-User: X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: D5D6C140126 Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=RUr0dTDQ; spf=pass (imf26.hostedemail.com: domain of nadav.amit@gmail.com designates 209.85.216.51 as permitted sender) smtp.mailfrom=nadav.amit@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-HE-Tag: 1659595362-24216 X-Bogosity: Ham, tests=bogofilter, spamicity=0.006819, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Aug 3, 2022, at 9:45 AM, Peter Xu wrote: > On Wed, Aug 03, 2022 at 12:42:54AM -0700, Nadav Amit wrote: >> On Aug 2, 2022, at 6:21 PM, Peter Xu wrote: >>=20 >> On the negative side, I am not sure whether other archs, that might = require >> a TLB flush for resetting the access-bit, and the overhead of doing = atomic >> operation to clear the access-bit, would not induce more overhead = than they >> would save. >=20 > I think your proposal makes sense and looks clean, maybe even cleaner = than > the new max_swapfile_size() approach (and definitely nicer than the = old one > of mine). It's just that I still want this to happen even without page = idle > enabled - at least Fedora doesn't have page idle enabled by default. = I'm > not sure whether it'll be worth it to define Young bit just for this = (note: > iiuc we don't need Idle bit in this case, but only the Young bit). >=20 > The other thing is whether there's other side effect of losing pte = level > granularity of young bit, since right after we merge them into the = page > flags, then that granule is lost. So far I don't worry a lot on the = tlb > flush overhead, but hopefully nothing else we missed. I agree with your arguments. I missed the fact that page young bit is = only defined if PAGE_IDLE is defined. So unless it makes sense in general to = have all pages marked as accessed if the page is young, adding the bit would cause additional overheads, especially for 32-bit systems. I also agree that the solution that you provided would improve page-idle behavior. However, while not being wrong, users who try to clear = page-idle indications would not succeed doing so for pages that are undergoing a migration. There are some additional implications that I do not remember that you = or anyone else (including me) mentioned, specifically for MADV_COLD and MADV_FREE. You may want to teach madvise_cold_or_pageout_pte_range() and madvise_free_pte_range() to deal with these new type of entries. On the other hand, madvise is already =E2=80=9Cbroken=E2=80=9D today in = that regard, since IIUC, it does not even clear PageReferenced (on MADV_COLD) for migrated pages. >> One more unrelated point - note that remove_migration_pte() would = always set >> a clean PTE even when the old one was dirty=E2=80=A6 >=20 > Correct. Say it in another way, at least initial writes perf will = still > suffer after migration on x86. >=20 > Dirty bit is kind of different in this case so I didn't yet try to = cover > it. E.g., we won't lose it even without this patchset but = consolidates it > into PageDirty already or it'll be a bug. >=20 > I think PageDirty could be cleared during migration procedure, if so = we > could be wrongly applying the dirty bit to the recovered pte. I = thought > about this before posting this series, but I hesitated on adding dirty = bit > altogether with it at least in these initial versions since dirty bit = may > need some more justifications. >=20 > Please feel free to share any further thoughts on the dirty bit. I fully understand that the dirty-bit can stay for a different = patch(-set). But I do not see a problem in setting the dirty-bit if the PTE is mapped = as writable, since anyhow the page can be dirties at any given moment afterwards without the kernel involvement. If you are concerned that the page will be written back in between and = the PTE would be marked as dirty unnecessarily, you can keep the bit only if = the both PageDirty and a new "swap entry dirty-bit=E2=80=9D are set.