From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0833FECAAA2 for ; Thu, 25 Aug 2022 23:27:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0E01F940007; Thu, 25 Aug 2022 19:27:09 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 090886B0075; Thu, 25 Aug 2022 19:27:09 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E73F6940007; Thu, 25 Aug 2022 19:27:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id D15A66B0074 for ; Thu, 25 Aug 2022 19:27:08 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id A558F12043E for ; Thu, 25 Aug 2022 23:27:08 +0000 (UTC) X-FDA: 79839702936.21.A3FAB44 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf06.hostedemail.com (Postfix) with ESMTP id AF5CC180002 for ; Thu, 25 Aug 2022 23:27:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1661470027; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=8enL1rrDOi/OnKgKQtLA4H9w0YLjxz1gjz5XS9Gs15s=; b=ZLB31PtOBcKYBEalbaNejUIqVUkgqh3Qk+wvVQ6NQMPGewXvL8FdFVrV9+QMzB4IDd56Ry JjMVIK0TUJXo/UPrTeYDs52FQfmmxNiE6K7eFXa1AgFjKZH9blH8UhiQgI+TZ4i/d8bkqY sKGXuKQg6Np19AQpAGHvtVTD4c2Fp2E= Received: from mail-qt1-f198.google.com (mail-qt1-f198.google.com [209.85.160.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-115-E0NsRBgjNoaBt2xy4rzCcg-1; Thu, 25 Aug 2022 19:27:03 -0400 X-MC-Unique: E0NsRBgjNoaBt2xy4rzCcg-1 Received: by mail-qt1-f198.google.com with SMTP id d20-20020a05622a05d400b00344997f0537so159080qtb.0 for ; Thu, 25 Aug 2022 16:27:03 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc; bh=8enL1rrDOi/OnKgKQtLA4H9w0YLjxz1gjz5XS9Gs15s=; b=o7LZHwxoHUAPfLfh+6RGjayswOTInNGwjch1tOal5iJ2sY0ee70pp+/4vwAqA6kGNz K1EE4FzhXgoXMrJLNstUvbweEcoqYCCYm1/Ehf7aEknfmZAy7TLwlGoP3KiZ4BhDUa3R 8kDQlsBnxEGjoFG/AC6S9fzo4v1V2ZOf1QUgBwpB7uJn2cNbREB91BwRkG7tA974VGXd 17RMe/2RfVIHoESfd96qP5zbwY1KGkqrgSNC8ZJAq/Fwy70GTJaBJh3mFnArOKpdX8mR FFyMf0TpBrVU4Fkc8w/FTgKpHmsblbWN98Io8wtqCd6LlyREvkQR4AybNQ+rwVPI24g+ c04Q== X-Gm-Message-State: ACgBeo2CtfUaOu94lUJbLyqv1uKCyOcWKpgbzee4YmFy0/41b1MdMKOw 8Gk7vOmgNa1jdz0yw9NA19Mg1jcKDkmlsZwssnJ9t00Mzuh+4ZaKLMiwILDpuxlpr3IjXc9qtr2 AmtC/reDRYi4= X-Received: by 2002:a05:620a:1111:b0:6bb:604e:9d3d with SMTP id o17-20020a05620a111100b006bb604e9d3dmr4747570qkk.61.1661470023167; Thu, 25 Aug 2022 16:27:03 -0700 (PDT) X-Google-Smtp-Source: AA6agR7yqCUTQPAFxdfqhgCoZWa8dIEgb2UAHgMekDc93LYhHe8Ujqd1dkT+XYo5GK51eAd882LClw== X-Received: by 2002:a05:620a:1111:b0:6bb:604e:9d3d with SMTP id o17-20020a05620a111100b006bb604e9d3dmr4747543qkk.61.1661470022876; Thu, 25 Aug 2022 16:27:02 -0700 (PDT) Received: from xz-m1.local (bras-base-aurron9127w-grc-35-70-27-3-10.dsl.bell.ca. [70.27.3.10]) by smtp.gmail.com with ESMTPSA id u13-20020a05620a0c4d00b006b9c6d590fasm704728qki.61.2022.08.25.16.27.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 25 Aug 2022 16:27:02 -0700 (PDT) Date: Thu, 25 Aug 2022 19:27:00 -0400 From: Peter Xu To: Alistair Popple Cc: linux-mm@kvack.org, akpm@linux-foundation.org, Nadav Amit , huang ying , LKML , "Sierra Guiza, Alejandro (Alex)" , Felix Kuehling , Jason Gunthorpe , John Hubbard , David Hildenbrand , Ralph Campbell , Matthew Wilcox , Karol Herbst , Lyude Paul , Ben Skeggs , Logan Gunthorpe , paulus@ozlabs.org, linuxppc-dev@lists.ozlabs.org, stable@vger.kernel.org, Huang Ying Subject: Re: [PATCH v3 2/3] mm/migrate_device.c: Copy pte dirty bit to page Message-ID: References: <3b01af093515ce2960ac39bb16ff77473150d179.1661309831.git-series.apopple@nvidia.com> <8735dkeyyg.fsf@nvdebian.thelocal> MIME-Version: 1.0 In-Reply-To: <8735dkeyyg.fsf@nvdebian.thelocal> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=ZLB31PtO; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf06.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1661470028; a=rsa-sha256; cv=none; b=Pi0J6zOLePwxJvNT0bVi59Ajk0XgbkN9Sz7kDrcJroNKpFjiUgyEcS23p0GyjSsgDbYZJp qzGrANDiiwDT9AmrKoQ/PoRqjZddUlhuyyUocZb5lON78ZnT+mXgM7J2bvPkuIJiXK90Oq UFgCXajezREp3D9OBPqsFGBsJLuBMQE= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1661470028; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=8enL1rrDOi/OnKgKQtLA4H9w0YLjxz1gjz5XS9Gs15s=; b=m98imRU5windEGuxwh3oJwD8ekGEQYUb4Th5jsa5k6c6XGYVqVAvkZQFAEyS0MYR+TNqh2 AHM/HCYFHWNqBeHFPMHPHAoyhhozozE9uWb9zjrZDvDw4diW7rApr022gXLPotky/HosRe J8vow2ymfyzXUY69toSxExRExsC7eY4= X-Stat-Signature: c6ord4iag6uh9fahxrk68gsbsjp5nuit Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=ZLB31PtO; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf06.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: AF5CC180002 X-Rspam-User: X-HE-Tag: 1661470027-690818 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Aug 26, 2022 at 08:21:44AM +1000, Alistair Popple wrote: > > Peter Xu writes: > > > On Wed, Aug 24, 2022 at 01:03:38PM +1000, Alistair Popple wrote: > >> migrate_vma_setup() has a fast path in migrate_vma_collect_pmd() that > >> installs migration entries directly if it can lock the migrating page. > >> When removing a dirty pte the dirty bit is supposed to be carried over > >> to the underlying page to prevent it being lost. > >> > >> Currently migrate_vma_*() can only be used for private anonymous > >> mappings. That means loss of the dirty bit usually doesn't result in > >> data loss because these pages are typically not file-backed. However > >> pages may be backed by swap storage which can result in data loss if an > >> attempt is made to migrate a dirty page that doesn't yet have the > >> PageDirty flag set. > >> > >> In this case migration will fail due to unexpected references but the > >> dirty pte bit will be lost. If the page is subsequently reclaimed data > >> won't be written back to swap storage as it is considered uptodate, > >> resulting in data loss if the page is subsequently accessed. > >> > >> Prevent this by copying the dirty bit to the page when removing the pte > >> to match what try_to_migrate_one() does. > >> > >> Signed-off-by: Alistair Popple > >> Acked-by: Peter Xu > >> Reported-by: Huang Ying > >> Fixes: 8c3328f1f36a ("mm/migrate: migrate_vma() unmap page from vma while collecting pages") > >> Cc: stable@vger.kernel.org > >> > >> --- > >> > >> Changes for v3: > >> > >> - Defer TLB flushing > >> - Split a TLB flushing fix into a separate change. > >> > >> Changes for v2: > >> > >> - Fixed up Reported-by tag. > >> - Added Peter's Acked-by. > >> - Atomically read and clear the pte to prevent the dirty bit getting > >> set after reading it. > >> - Added fixes tag > >> --- > >> mm/migrate_device.c | 9 +++++++-- > >> 1 file changed, 7 insertions(+), 2 deletions(-) > >> > >> diff --git a/mm/migrate_device.c b/mm/migrate_device.c > >> index 6a5ef9f..51d9afa 100644 > >> --- a/mm/migrate_device.c > >> +++ b/mm/migrate_device.c > >> @@ -7,6 +7,7 @@ > >> #include > >> #include > >> #include > >> +#include > >> #include > >> #include > >> #include > >> @@ -196,7 +197,7 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, > >> anon_exclusive = PageAnon(page) && PageAnonExclusive(page); > >> if (anon_exclusive) { > >> flush_cache_page(vma, addr, pte_pfn(*ptep)); > >> - ptep_clear_flush(vma, addr, ptep); > >> + pte = ptep_clear_flush(vma, addr, ptep); > >> > >> if (page_try_share_anon_rmap(page)) { > >> set_pte_at(mm, addr, ptep, pte); > >> @@ -206,11 +207,15 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, > >> goto next; > >> } > >> } else { > >> - ptep_get_and_clear(mm, addr, ptep); > >> + pte = ptep_get_and_clear(mm, addr, ptep); > >> } > > > > I remember that in v2 both flush_cache_page() and ptep_get_and_clear() are > > moved above the condition check so they're called unconditionally. Could > > you explain the rational on why it's changed back (since I think v2 was the > > correct approach)? > > Mainly because I agree with your original comments, that it would be > better to keep the batching of TLB flushing if possible. After the > discussion I don't think there is any issues with HW pte dirty bits > here. There are already other cases where HW needs to get that right > anyway (eg. zap_pte_range). Yes tlb batching was kept, thanks for doing that way. Though if only apply patch 1 we'll have both ptep_clear_flush() and batched flush which seems to be redundant. > > > The other question is if we want to split the patch, would it be better to > > move the tlb changes to patch 1, and leave the dirty bit fix in patch 2? > > Isn't that already the case? Patch 1 moves the TLB flush before the PTL > as suggested, patch 2 atomically copies the dirty bit without changing > any TLB flushing. IMHO it's cleaner to have patch 1 fix batch flush, replace ptep_clear_flush() with ptep_get_and_clear() and update pte properly. No strong opinions on the layout, but I still think we should drop the redundant ptep_clear_flush() above, meanwhile add the flush_cache_page() properly for !exclusive case too. Thanks, -- Peter Xu