From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 641E7C19F2A for ; Thu, 4 Aug 2022 17:07:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6FDC98E0001; Thu, 4 Aug 2022 13:07:43 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 686776B0072; Thu, 4 Aug 2022 13:07:43 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4D8E98E0001; Thu, 4 Aug 2022 13:07:43 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 35AC86B0071 for ; Thu, 4 Aug 2022 13:07:43 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 08C6E1612DF for ; Thu, 4 Aug 2022 17:07:43 +0000 (UTC) X-FDA: 79762542006.15.694FBC9 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf21.hostedemail.com (Postfix) with ESMTP id 271CC1C013C for ; Thu, 4 Aug 2022 17:07:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1659632861; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ycStJmEGwcc0keBdhDtkJ5snzEG02mRUS+UDUNsIBL4=; b=e/wfzwljwO2iobDTaddO3qARQXdBv0y+3c8T/odcY2Gdwvl3THC/7K15iINU6+/V7y7rjg 2F6PoN5AT0x7yx2EetrHb0UrurTsoATKKOXwvdVkicBwy7OOoD2ThlvNzhqXxY5ZFwU9IP sfvxuvYXVzLnddgdK2PTmFfkqYPL+d8= Received: from mail-qt1-f197.google.com (mail-qt1-f197.google.com [209.85.160.197]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-671-GKE6prtWNnWKRjQfym4Gjg-1; Thu, 04 Aug 2022 13:07:38 -0400 X-MC-Unique: GKE6prtWNnWKRjQfym4Gjg-1 Received: by mail-qt1-f197.google.com with SMTP id h6-20020ac87146000000b0033eb4c65676so215603qtp.11 for ; Thu, 04 Aug 2022 10:07:38 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to; bh=ycStJmEGwcc0keBdhDtkJ5snzEG02mRUS+UDUNsIBL4=; b=t5UVU2MsnonHQ/NxsfJDLNCFUGT/8t5I49Brvp0pfQqdEOBR9fyjeTVRZPHT3bTmFc ICyh62psRB631e2z8hPpuO3+Vlqb9/XTxb3pDnOkTk5yHOFFwIWYdSkMUgZJyVR3n4gM meDQ1/ssEheEtQ2eQ6d5LEOns5Re27B0QloGtCPhj3DEreU2Rt1Oo3gv/U3xoXAC3Q8u ImkeV1QhtluaRmEwtuE3lsg3vlL6LbVGBpnpiiBkbnmPyiXUlledFIzzVEwcxr3sejZQ Es4ijD2aqvUSInIU3rk2TM9rMf+Vb8VzHa8ssIIK5+19Dr70LnW0fKKcJEHnX8fsLXIe H8cA== X-Gm-Message-State: ACgBeo0KPHtlOiT0e3wmF3ulQ3L6bJZPalEMx17gc1ms6I66f17Yiyg9 JLQO38DsQOOC/2Dld5WGroWY93+NVLVR6dTIb8hjLEEZ7FoOvD3fgSASsm6Gwymth6eYK1tc4U/ CWhz+E9d1kJg= X-Received: by 2002:a05:620a:15ae:b0:6b9:112a:a281 with SMTP id f14-20020a05620a15ae00b006b9112aa281mr8692qkk.671.1659632857711; Thu, 04 Aug 2022 10:07:37 -0700 (PDT) X-Google-Smtp-Source: AA6agR7B+awuNztuUVDMTYOOV4ZzJEwDLaBwo4xaalKf9vfReO9wc9Z/5ADeqCpUor/+aaoiYzuCAA== X-Received: by 2002:a05:620a:15ae:b0:6b9:112a:a281 with SMTP id f14-20020a05620a15ae00b006b9112aa281mr8664qkk.671.1659632857374; Thu, 04 Aug 2022 10:07:37 -0700 (PDT) Received: from xz-m1.local (bras-base-aurron9127w-grc-35-70-27-3-10.dsl.bell.ca. [70.27.3.10]) by smtp.gmail.com with ESMTPSA id h17-20020a05620a245100b006b568bdd7d5sm1076851qkn.71.2022.08.04.10.07.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 04 Aug 2022 10:07:36 -0700 (PDT) Date: Thu, 4 Aug 2022 13:07:35 -0400 From: Peter Xu To: Nadav Amit Cc: LKML , Linux MM , Andrea Arcangeli , Andi Kleen , Andrew Morton , Hugh Dickins , Huang Ying , "Kirill A . Shutemov" , Vlastimil Babka , David Hildenbrand , Minchan Kim Subject: Re: [PATCH 2/2] mm: Remember young bit for page migrations Message-ID: References: <20220803012159.36551-1-peterx@redhat.com> <20220803012159.36551-3-peterx@redhat.com> MIME-Version: 1.0 In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1659632862; a=rsa-sha256; cv=none; b=F+ZQWJie3GrfEX38UQHFFr1R3x2SSoLSqQIPS+CBsnNeSetDqpEOFnq6VQGsfBtMmgmNKa SqsP0xMST7nzTIPICY5HTNCFkB4M3pgCRPHWYQpy8I/kceETw3QcvCoVHVvmUBnNqlW9q/ 3nN51t9N0YFZgacmcKJ6MfB0179n9dY= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="e/wfzwlj"; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf21.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1659632862; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ycStJmEGwcc0keBdhDtkJ5snzEG02mRUS+UDUNsIBL4=; b=Zh4YxDsEmfKxcvhHuhl3FXmsEAYAYYYoQ4oBH93+phGABBvckej/zHGnhfolKnO368CkEZ zBmV0lTcdnXUWcrlplWjZtLLRX966EG39dmCOxhhnWnqX5oLVOdswVSOUr5CUJLFt6C1m7 KV9cyu07yhIkjhcH1xQcVO7BsCdO8as= X-Stat-Signature: 399cxxp6r5n95e16dwwuzubxwhir1rdk X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 271CC1C013C Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="e/wfzwlj"; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf21.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com X-Rspam-User: X-HE-Tag: 1659632862-46174 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Aug 03, 2022 at 11:42:39PM -0700, Nadav Amit wrote: > On Aug 3, 2022, at 9:45 AM, Peter Xu wrote: > > > On Wed, Aug 03, 2022 at 12:42:54AM -0700, Nadav Amit wrote: > >> On Aug 2, 2022, at 6:21 PM, Peter Xu wrote: > >> > >> On the negative side, I am not sure whether other archs, that might require > >> a TLB flush for resetting the access-bit, and the overhead of doing atomic > >> operation to clear the access-bit, would not induce more overhead than they > >> would save. > > > > I think your proposal makes sense and looks clean, maybe even cleaner than > > the new max_swapfile_size() approach (and definitely nicer than the old one > > of mine). It's just that I still want this to happen even without page idle > > enabled - at least Fedora doesn't have page idle enabled by default. I'm > > not sure whether it'll be worth it to define Young bit just for this (note: > > iiuc we don't need Idle bit in this case, but only the Young bit). > > > > The other thing is whether there's other side effect of losing pte level > > granularity of young bit, since right after we merge them into the page > > flags, then that granule is lost. So far I don't worry a lot on the tlb > > flush overhead, but hopefully nothing else we missed. > > I agree with your arguments. I missed the fact that page young bit is only > defined if PAGE_IDLE is defined. So unless it makes sense in general to have > all pages marked as accessed if the page is young, adding the bit would > cause additional overheads, especially for 32-bit systems. > > I also agree that the solution that you provided would improve page-idle > behavior. However, while not being wrong, users who try to clear page-idle > indications would not succeed doing so for pages that are undergoing a > migration. Right. Since I don't have a clear mind of reusing PageYoung here unconditionally, I think I'll still stick with the current approach if nothing else jumps in. I still see the page idle tracking on migration entries a long standing and relatively separate issue, so IMHO we can move one step at a time on solving the "page idle tracking for migrated page", leaving the "page idle reset during page migrating" to latter. > > There are some additional implications that I do not remember that you or > anyone else (including me) mentioned, specifically for MADV_COLD and > MADV_FREE. You may want to teach madvise_cold_or_pageout_pte_range() and > madvise_free_pte_range() to deal with these new type of entries. > > On the other hand, madvise is already “broken” today in that regard, since > IIUC, it does not even clear PageReferenced (on MADV_COLD) for migrated > pages. Yeah, afaict we don't handle migration pages for both madvises. Maybe it's because it's racy to access the page knowing that it's operated upon by the thread doing migration (when without the page lock)? For real migrations (not THP split), we'll also be changing the old page not new one. So maybe the migration entries are just not yet the major target for both of the madvises. For this series, I can think more of dropping the young bit for migration entry during these madvises (which should be relatively safe with the pgtable held, since I don't need to touch the page but just modify the swap entry within), but probably that's not really the major problem here, so not sure whether that matters a huge lot (e.g., for FREE we should really drop the whole entry?). Copying Minchan too. > > >> One more unrelated point - note that remove_migration_pte() would always set > >> a clean PTE even when the old one was dirty… > > > > Correct. Say it in another way, at least initial writes perf will still > > suffer after migration on x86. > > > > Dirty bit is kind of different in this case so I didn't yet try to cover > > it. E.g., we won't lose it even without this patchset but consolidates it > > into PageDirty already or it'll be a bug. > > > > I think PageDirty could be cleared during migration procedure, if so we > > could be wrongly applying the dirty bit to the recovered pte. I thought > > about this before posting this series, but I hesitated on adding dirty bit > > altogether with it at least in these initial versions since dirty bit may > > need some more justifications. > > > > Please feel free to share any further thoughts on the dirty bit. > > I fully understand that the dirty-bit can stay for a different patch(-set). > But I do not see a problem in setting the dirty-bit if the PTE is mapped as > writable, since anyhow the page can be dirties at any given moment > afterwards without the kernel involvement. > > If you are concerned that the page will be written back in between and the > PTE would be marked as dirty unnecessarily, you can keep the bit only if the > both PageDirty and a new "swap entry dirty-bit” are set. Sounds good, I'll think more of it and see whether I'll cover that too. Thanks, -- Peter Xu