From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3B4A8ECAAD4 for ; Fri, 26 Aug 2022 14:47:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B13166B0074; Fri, 26 Aug 2022 10:47:30 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AC21A6B0075; Fri, 26 Aug 2022 10:47:30 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 93C8B940007; Fri, 26 Aug 2022 10:47:30 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 7ED2E6B0074 for ; Fri, 26 Aug 2022 10:47:30 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 4C604140765 for ; Fri, 26 Aug 2022 14:47:30 +0000 (UTC) X-FDA: 79842022260.02.AFDA9FB Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf04.hostedemail.com (Postfix) with ESMTP id EDE5E40012 for ; Fri, 26 Aug 2022 14:47:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1661525249; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=99JWd2r7uI+NtQEfMTMU0psWdgteknoK0WTYvTciCxY=; b=W4mg/q0CmlPI23TawUve9Ksx4ACqtVCuIvCxZIhVcJNS/Zj7Ul/TCHyxmteoY5JPgeoJ2T HAs0LdJud83fSdMHqujLQlxtaR/x5NjEgO8z4k7nb/b/g+PTfwYikvr0YjgVs2fQHsYKez QWoA3gpKJ9rHP2LaozNfyxgaRyiJnek= Received: from mail-wm1-f69.google.com (mail-wm1-f69.google.com [209.85.128.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-447-RY0RPhNvMO6fZMy4FLuvDQ-1; Fri, 26 Aug 2022 10:47:28 -0400 X-MC-Unique: RY0RPhNvMO6fZMy4FLuvDQ-1 Received: by mail-wm1-f69.google.com with SMTP id 203-20020a1c02d4000000b003a5f5bce876so4170376wmc.2 for ; Fri, 26 Aug 2022 07:47:24 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:organization:from:references :cc:to:content-language:subject:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc; bh=99JWd2r7uI+NtQEfMTMU0psWdgteknoK0WTYvTciCxY=; b=Wwy4xWhYwp/GnNtHI3C2qj2WvEogQ8nj+sD+ARbuAn2iXJDTddWyoY56zvN8/Ck9OA dnsg8Ptgl8jv5gobVNakDjRblHg/xxsY8DyyWn/pAIuYTaquLbStWI5WJdJkEDKO6iE4 UJ5AyoDGEqxrnGVW7RJ8AG8CF+EvN9XO71zgX+BUeycfu84Z8i/y9l1hdAKwBMjCIacl Qf+bVywvUXMw+/uU5GX+756x4i1pTzYc3eNQFgJNTBLd05TmXbWQYEbBsvJ0djQktZZT TKZdTqKQLEUpBCmxiWeqfsa303YzuHvr2aRBX0YGrfbeK/7oh7kOJRqjip8ro6jsVgbk Lr6g== X-Gm-Message-State: ACgBeo0JJt8GN4SQdt7pFjDArV4LKRgN342Yvi7G+97C3n49k5d4NA70 SopfroFrh7GsM/G0YhIqoWQQI5In7m7xs/7+LGfxy3m00NeYrR65wBKI8/qhFiw1PuD8RckQ4sn wKi0VWaB3kkA= X-Received: by 2002:a5d:5985:0:b0:222:c827:11d5 with SMTP id n5-20020a5d5985000000b00222c82711d5mr520wri.323.1661525243921; Fri, 26 Aug 2022 07:47:23 -0700 (PDT) X-Google-Smtp-Source: AA6agR6DwZULOBS0dk+aCd9iIC0MR/e4UA3DRO6z9TwsFVCvv46I7xo6Oq/KhO+SKX3jNfwHlL+UnQ== X-Received: by 2002:a5d:5985:0:b0:222:c827:11d5 with SMTP id n5-20020a5d5985000000b00222c82711d5mr499wri.323.1661525243581; Fri, 26 Aug 2022 07:47:23 -0700 (PDT) Received: from ?IPV6:2003:cb:c708:f600:abad:360:c840:33fa? (p200300cbc708f600abad0360c84033fa.dip0.t-ipconnect.de. [2003:cb:c708:f600:abad:360:c840:33fa]) by smtp.gmail.com with ESMTPSA id l25-20020a05600c1d1900b003a62052053csm10503423wms.18.2022.08.26.07.47.22 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 26 Aug 2022 07:47:23 -0700 (PDT) Message-ID: <72146725-3d70-0427-50d4-165283a5a85d@redhat.com> Date: Fri, 26 Aug 2022 16:47:22 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.12.0 Subject: Re: [PATCH v3 2/3] mm/migrate_device.c: Copy pte dirty bit to page To: Peter Xu , Alistair Popple Cc: linux-mm@kvack.org, akpm@linux-foundation.org, Nadav Amit , huang ying , LKML , "Sierra Guiza, Alejandro (Alex)" , Felix Kuehling , Jason Gunthorpe , John Hubbard , Ralph Campbell , Matthew Wilcox , Karol Herbst , Lyude Paul , Ben Skeggs , Logan Gunthorpe , paulus@ozlabs.org, linuxppc-dev@lists.ozlabs.org, stable@vger.kernel.org, Huang Ying References: <3b01af093515ce2960ac39bb16ff77473150d179.1661309831.git-series.apopple@nvidia.com> <8735dkeyyg.fsf@nvdebian.thelocal> <8735dj7qwb.fsf@nvdebian.thelocal> From: David Hildenbrand Organization: Red Hat In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1661525250; a=rsa-sha256; cv=none; b=CC20uyUXeGrRLySeoYWf+m3SFj5G8Qg5Ya2WczcImEUtnBdieMLx9qDISP+jHJIxkFwJAt XQFPxIi7M+zTj1tkNIlBZqYXwhXCTxOeXRXryHFxP57zjXZc3r7c3TgrRuQy+sSJB6rbGf RhlifaQicunh6QON3a1TmqEt2uppA2M= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="W4mg/q0C"; spf=pass (imf04.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1661525250; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=99JWd2r7uI+NtQEfMTMU0psWdgteknoK0WTYvTciCxY=; b=Oq+VbcJnI3nSwPcOOmb2RKMkAToPMUYz9sLOOWTC4wECR7ZjX9XpJea5VLyZNXuWEvhBQx Rja4EuVPkPGYGlhxA832PY6KpqSDxcrbhhRqr6J8gFgJguWaH2zqJLq3a8biIcLnawEKMJ HCIh8yhDeDdk8l7RaT89xHEQuSMRYUs= X-Stat-Signature: aqrhg73cxheope6fyowtxu3b673ddxb6 X-Rspamd-Queue-Id: EDE5E40012 X-Rspam-User: X-Rspamd-Server: rspam06 Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="W4mg/q0C"; spf=pass (imf04.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-HE-Tag: 1661525249-555783 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 26.08.22 16:32, Peter Xu wrote: > On Fri, Aug 26, 2022 at 11:02:58AM +1000, Alistair Popple wrote: >> >> Peter Xu writes: >> >>> On Fri, Aug 26, 2022 at 08:21:44AM +1000, Alistair Popple wrote: >>>> >>>> Peter Xu writes: >>>> >>>>> On Wed, Aug 24, 2022 at 01:03:38PM +1000, Alistair Popple wrote: >>>>>> migrate_vma_setup() has a fast path in migrate_vma_collect_pmd() that >>>>>> installs migration entries directly if it can lock the migrating page. >>>>>> When removing a dirty pte the dirty bit is supposed to be carried over >>>>>> to the underlying page to prevent it being lost. >>>>>> >>>>>> Currently migrate_vma_*() can only be used for private anonymous >>>>>> mappings. That means loss of the dirty bit usually doesn't result in >>>>>> data loss because these pages are typically not file-backed. However >>>>>> pages may be backed by swap storage which can result in data loss if an >>>>>> attempt is made to migrate a dirty page that doesn't yet have the >>>>>> PageDirty flag set. >>>>>> >>>>>> In this case migration will fail due to unexpected references but the >>>>>> dirty pte bit will be lost. If the page is subsequently reclaimed data >>>>>> won't be written back to swap storage as it is considered uptodate, >>>>>> resulting in data loss if the page is subsequently accessed. >>>>>> >>>>>> Prevent this by copying the dirty bit to the page when removing the pte >>>>>> to match what try_to_migrate_one() does. >>>>>> >>>>>> Signed-off-by: Alistair Popple >>>>>> Acked-by: Peter Xu >>>>>> Reported-by: Huang Ying >>>>>> Fixes: 8c3328f1f36a ("mm/migrate: migrate_vma() unmap page from vma while collecting pages") >>>>>> Cc: stable@vger.kernel.org >>>>>> >>>>>> --- >>>>>> >>>>>> Changes for v3: >>>>>> >>>>>> - Defer TLB flushing >>>>>> - Split a TLB flushing fix into a separate change. >>>>>> >>>>>> Changes for v2: >>>>>> >>>>>> - Fixed up Reported-by tag. >>>>>> - Added Peter's Acked-by. >>>>>> - Atomically read and clear the pte to prevent the dirty bit getting >>>>>> set after reading it. >>>>>> - Added fixes tag >>>>>> --- >>>>>> mm/migrate_device.c | 9 +++++++-- >>>>>> 1 file changed, 7 insertions(+), 2 deletions(-) >>>>>> >>>>>> diff --git a/mm/migrate_device.c b/mm/migrate_device.c >>>>>> index 6a5ef9f..51d9afa 100644 >>>>>> --- a/mm/migrate_device.c >>>>>> +++ b/mm/migrate_device.c >>>>>> @@ -7,6 +7,7 @@ >>>>>> #include >>>>>> #include >>>>>> #include >>>>>> +#include >>>>>> #include >>>>>> #include >>>>>> #include >>>>>> @@ -196,7 +197,7 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, >>>>>> anon_exclusive = PageAnon(page) && PageAnonExclusive(page); >>>>>> if (anon_exclusive) { >>>>>> flush_cache_page(vma, addr, pte_pfn(*ptep)); >>>>>> - ptep_clear_flush(vma, addr, ptep); >>>>>> + pte = ptep_clear_flush(vma, addr, ptep); >>>>>> >>>>>> if (page_try_share_anon_rmap(page)) { >>>>>> set_pte_at(mm, addr, ptep, pte); >>>>>> @@ -206,11 +207,15 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, >>>>>> goto next; >>>>>> } >>>>>> } else { >>>>>> - ptep_get_and_clear(mm, addr, ptep); >>>>>> + pte = ptep_get_and_clear(mm, addr, ptep); >>>>>> } >>>>> >>>>> I remember that in v2 both flush_cache_page() and ptep_get_and_clear() are >>>>> moved above the condition check so they're called unconditionally. Could >>>>> you explain the rational on why it's changed back (since I think v2 was the >>>>> correct approach)? >>>> >>>> Mainly because I agree with your original comments, that it would be >>>> better to keep the batching of TLB flushing if possible. After the >>>> discussion I don't think there is any issues with HW pte dirty bits >>>> here. There are already other cases where HW needs to get that right >>>> anyway (eg. zap_pte_range). >>> >>> Yes tlb batching was kept, thanks for doing that way. Though if only apply >>> patch 1 we'll have both ptep_clear_flush() and batched flush which seems to >>> be redundant. >>> >>>> >>>>> The other question is if we want to split the patch, would it be better to >>>>> move the tlb changes to patch 1, and leave the dirty bit fix in patch 2? >>>> >>>> Isn't that already the case? Patch 1 moves the TLB flush before the PTL >>>> as suggested, patch 2 atomically copies the dirty bit without changing >>>> any TLB flushing. >>> >>> IMHO it's cleaner to have patch 1 fix batch flush, replace >>> ptep_clear_flush() with ptep_get_and_clear() and update pte properly. >> >> Which ptep_clear_flush() are you referring to? This one? >> >> if (anon_exclusive) { >> flush_cache_page(vma, addr, pte_pfn(*ptep)); >> ptep_clear_flush(vma, addr, ptep); > > Correct. > >> >> My understanding is that we need to do a flush for anon_exclusive. > > To me anon exclusive only shows this mm exclusively owns this page. I > didn't quickly figure out why that requires different handling on tlb > flushs. Did I perhaps miss something? GUP-fast is the magic bit, we have to make sure that we won't see new GUP pins, thus the TLB flush. include/linux/mm.h:gup_must_unshare() contains documentation. Without GUP-fast, some things would be significantly easier to handle. -- Thanks, David / dhildenb