From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C0B8EECAAD4 for ; Fri, 26 Aug 2022 14:32:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C949E940007; Fri, 26 Aug 2022 10:32:16 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C43376B0075; Fri, 26 Aug 2022 10:32:16 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B0B17940007; Fri, 26 Aug 2022 10:32:16 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id A20836B0074 for ; Fri, 26 Aug 2022 10:32:16 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 73CFDC07F7 for ; Fri, 26 Aug 2022 14:32:16 +0000 (UTC) X-FDA: 79841983872.06.8A74C87 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf22.hostedemail.com (Postfix) with ESMTP id 018D4C0015 for ; Fri, 26 Aug 2022 14:32:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1661524335; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=rx2FKGUlOIaH18EmtIfbPwlvVef3uy1QGtBjrZhEA+U=; b=Jb5tigxRMniCJ6IaPa4fbhEDsDtCQGZ+2Dbv9AyIeQoq8mFyU7FqQsjDfY/OqX7xYW68p8 6AhXyb80Sgr3i/Uvig0T4s92Oi49OiQ0mTrktlbavBTWKcuraA2TsANbjYwWuLBB2u9OeE WE7szmRJxsGRHl82QCtdSocIDTqcK2A= Received: from mail-qk1-f200.google.com (mail-qk1-f200.google.com [209.85.222.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-413-3tquMDXOPb6_s5WybNJbFQ-1; Fri, 26 Aug 2022 10:32:14 -0400 X-MC-Unique: 3tquMDXOPb6_s5WybNJbFQ-1 Received: by mail-qk1-f200.google.com with SMTP id br15-20020a05620a460f00b006bc58c26501so1394912qkb.0 for ; Fri, 26 Aug 2022 07:32:14 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc; bh=rx2FKGUlOIaH18EmtIfbPwlvVef3uy1QGtBjrZhEA+U=; b=qbg/gBvf6MvNYtOipCJr4C2n6WHYPN1rKhl3pgRfBhl8zyq1oe/AmW2SxM2j16P3wa c2xEzQQLGqraq1ZsZTyIPPXNNMzO5H/hXWeJzcfQWz7n3jL+kxibYY+XtghR59oEPCzm JpG+3PV0LCXIOpyAFKx3FpHRUTN5obtsKpnzVGSq5sTbQJRnKtWK0zpI8I71+5Wxdi4E iKp2eQrz+Vhz7dU7ju2RFzQfpU2sxnz31hEgc24XzSX4ML73Fs9aDVwrPSbzwXxHA5CJ e+JnajqzIG7UrgQRlDXOZ4W13IgeLznjzvpPrUQJhyV/YXuSnfCtwhP+KyuJc2VcqYaV 88HQ== X-Gm-Message-State: ACgBeo2JM5ahs6b0IA0lUUZLGk64CTk52XfqyklXWsZZspu3SahIJYdp 0DXEan6MScy7LagZY26RW+e0DTY7S+e/c+gmPZ/nd9eb7vVF3iJH5BtPwJvZ4DCk7MOxTT9AuLR pKW5BtRiSqbQ= X-Received: by 2002:a05:620a:1404:b0:6ba:c2c2:5eca with SMTP id d4-20020a05620a140400b006bac2c25ecamr6734928qkj.220.1661524333948; Fri, 26 Aug 2022 07:32:13 -0700 (PDT) X-Google-Smtp-Source: AA6agR5H2DPJrrTdbtaxbHHrpQykfjh2vpuVbPL5I2n1u2lNVZP8JaHXyWomoNuISyWTWhIM87OAtA== X-Received: by 2002:a05:620a:1404:b0:6ba:c2c2:5eca with SMTP id d4-20020a05620a140400b006bac2c25ecamr6734907qkj.220.1661524333696; Fri, 26 Aug 2022 07:32:13 -0700 (PDT) Received: from xz-m1.local (bras-base-aurron9127w-grc-35-70-27-3-10.dsl.bell.ca. [70.27.3.10]) by smtp.gmail.com with ESMTPSA id bj11-20020a05620a190b00b006a6ab259261sm1836181qkb.29.2022.08.26.07.32.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 26 Aug 2022 07:32:12 -0700 (PDT) Date: Fri, 26 Aug 2022 10:32:10 -0400 From: Peter Xu To: Alistair Popple Cc: linux-mm@kvack.org, akpm@linux-foundation.org, Nadav Amit , huang ying , LKML , "Sierra Guiza, Alejandro (Alex)" , Felix Kuehling , Jason Gunthorpe , John Hubbard , David Hildenbrand , Ralph Campbell , Matthew Wilcox , Karol Herbst , Lyude Paul , Ben Skeggs , Logan Gunthorpe , paulus@ozlabs.org, linuxppc-dev@lists.ozlabs.org, stable@vger.kernel.org, Huang Ying Subject: Re: [PATCH v3 2/3] mm/migrate_device.c: Copy pte dirty bit to page Message-ID: References: <3b01af093515ce2960ac39bb16ff77473150d179.1661309831.git-series.apopple@nvidia.com> <8735dkeyyg.fsf@nvdebian.thelocal> <8735dj7qwb.fsf@nvdebian.thelocal> MIME-Version: 1.0 In-Reply-To: <8735dj7qwb.fsf@nvdebian.thelocal> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=Jb5tigxR; spf=pass (imf22.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1661524336; a=rsa-sha256; cv=none; b=er6a5uB6N3smQrTaYI2b5rbrhqjUxcigD3V06XA4AzakXGXfuA10LbjrXELIBd3q2qkKo8 jGG/rvTokB9vrFAmBPwerv544QDEdKLmsgObhdbxYD/ZZKz1Gm16ETp1jFNC0aImk9hNB0 vhTd2C9JfqbVrfqnsQ6FKaAjE20EgKE= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1661524336; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=rx2FKGUlOIaH18EmtIfbPwlvVef3uy1QGtBjrZhEA+U=; b=R1HU93LyR8xFtN9/48aBxfjMVu563Ndrz1N2Zesj5j95Dh0lC/iHPhVttEK9kQkM4otmjG 9PbJziBZZG17PALNmIyQHwPsp3CK2778qsTQ8OEY/MwfzfUt/rWhKvcQ8IORXteMcWwq6Z bHCS9FYyz38oKOcjNTHtIXAcHEHwJvw= X-Rspam-User: Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=Jb5tigxR; spf=pass (imf22.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Rspamd-Server: rspam07 X-Stat-Signature: ff7yye1f15etey9knhx5sfupa6hhgwdf X-Rspamd-Queue-Id: 018D4C0015 X-HE-Tag: 1661524335-415437 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Aug 26, 2022 at 11:02:58AM +1000, Alistair Popple wrote: > > Peter Xu writes: > > > On Fri, Aug 26, 2022 at 08:21:44AM +1000, Alistair Popple wrote: > >> > >> Peter Xu writes: > >> > >> > On Wed, Aug 24, 2022 at 01:03:38PM +1000, Alistair Popple wrote: > >> >> migrate_vma_setup() has a fast path in migrate_vma_collect_pmd() that > >> >> installs migration entries directly if it can lock the migrating page. > >> >> When removing a dirty pte the dirty bit is supposed to be carried over > >> >> to the underlying page to prevent it being lost. > >> >> > >> >> Currently migrate_vma_*() can only be used for private anonymous > >> >> mappings. That means loss of the dirty bit usually doesn't result in > >> >> data loss because these pages are typically not file-backed. However > >> >> pages may be backed by swap storage which can result in data loss if an > >> >> attempt is made to migrate a dirty page that doesn't yet have the > >> >> PageDirty flag set. > >> >> > >> >> In this case migration will fail due to unexpected references but the > >> >> dirty pte bit will be lost. If the page is subsequently reclaimed data > >> >> won't be written back to swap storage as it is considered uptodate, > >> >> resulting in data loss if the page is subsequently accessed. > >> >> > >> >> Prevent this by copying the dirty bit to the page when removing the pte > >> >> to match what try_to_migrate_one() does. > >> >> > >> >> Signed-off-by: Alistair Popple > >> >> Acked-by: Peter Xu > >> >> Reported-by: Huang Ying > >> >> Fixes: 8c3328f1f36a ("mm/migrate: migrate_vma() unmap page from vma while collecting pages") > >> >> Cc: stable@vger.kernel.org > >> >> > >> >> --- > >> >> > >> >> Changes for v3: > >> >> > >> >> - Defer TLB flushing > >> >> - Split a TLB flushing fix into a separate change. > >> >> > >> >> Changes for v2: > >> >> > >> >> - Fixed up Reported-by tag. > >> >> - Added Peter's Acked-by. > >> >> - Atomically read and clear the pte to prevent the dirty bit getting > >> >> set after reading it. > >> >> - Added fixes tag > >> >> --- > >> >> mm/migrate_device.c | 9 +++++++-- > >> >> 1 file changed, 7 insertions(+), 2 deletions(-) > >> >> > >> >> diff --git a/mm/migrate_device.c b/mm/migrate_device.c > >> >> index 6a5ef9f..51d9afa 100644 > >> >> --- a/mm/migrate_device.c > >> >> +++ b/mm/migrate_device.c > >> >> @@ -7,6 +7,7 @@ > >> >> #include > >> >> #include > >> >> #include > >> >> +#include > >> >> #include > >> >> #include > >> >> #include > >> >> @@ -196,7 +197,7 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, > >> >> anon_exclusive = PageAnon(page) && PageAnonExclusive(page); > >> >> if (anon_exclusive) { > >> >> flush_cache_page(vma, addr, pte_pfn(*ptep)); > >> >> - ptep_clear_flush(vma, addr, ptep); > >> >> + pte = ptep_clear_flush(vma, addr, ptep); > >> >> > >> >> if (page_try_share_anon_rmap(page)) { > >> >> set_pte_at(mm, addr, ptep, pte); > >> >> @@ -206,11 +207,15 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, > >> >> goto next; > >> >> } > >> >> } else { > >> >> - ptep_get_and_clear(mm, addr, ptep); > >> >> + pte = ptep_get_and_clear(mm, addr, ptep); > >> >> } > >> > > >> > I remember that in v2 both flush_cache_page() and ptep_get_and_clear() are > >> > moved above the condition check so they're called unconditionally. Could > >> > you explain the rational on why it's changed back (since I think v2 was the > >> > correct approach)? > >> > >> Mainly because I agree with your original comments, that it would be > >> better to keep the batching of TLB flushing if possible. After the > >> discussion I don't think there is any issues with HW pte dirty bits > >> here. There are already other cases where HW needs to get that right > >> anyway (eg. zap_pte_range). > > > > Yes tlb batching was kept, thanks for doing that way. Though if only apply > > patch 1 we'll have both ptep_clear_flush() and batched flush which seems to > > be redundant. > > > >> > >> > The other question is if we want to split the patch, would it be better to > >> > move the tlb changes to patch 1, and leave the dirty bit fix in patch 2? > >> > >> Isn't that already the case? Patch 1 moves the TLB flush before the PTL > >> as suggested, patch 2 atomically copies the dirty bit without changing > >> any TLB flushing. > > > > IMHO it's cleaner to have patch 1 fix batch flush, replace > > ptep_clear_flush() with ptep_get_and_clear() and update pte properly. > > Which ptep_clear_flush() are you referring to? This one? > > if (anon_exclusive) { > flush_cache_page(vma, addr, pte_pfn(*ptep)); > ptep_clear_flush(vma, addr, ptep); Correct. > > My understanding is that we need to do a flush for anon_exclusive. To me anon exclusive only shows this mm exclusively owns this page. I didn't quickly figure out why that requires different handling on tlb flushs. Did I perhaps miss something? Thanks, -- Peter Xu