From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 21146C433EF for ; Wed, 20 Jul 2022 15:10:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9F8576B0071; Wed, 20 Jul 2022 11:10:48 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9A7366B0073; Wed, 20 Jul 2022 11:10:48 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 870416B0074; Wed, 20 Jul 2022 11:10:48 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 775EF6B0071 for ; Wed, 20 Jul 2022 11:10:48 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 45D8A1205FE for ; Wed, 20 Jul 2022 15:10:48 +0000 (UTC) X-FDA: 79707815376.24.7CACBBB Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf22.hostedemail.com (Postfix) with ESMTP id B2BC0C0079 for ; Wed, 20 Jul 2022 15:10:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1658329847; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Gza5PmGLKy1qSecZj4RWFadUQnxBcv3jx7XSNkb+j80=; b=HMGbMCSbkbVEoBpO9yqvmUEPO83i0WWPNlW+7RqpjlWm1FfK4hxRvpbw8AdVM3paOg6RUO /pphBXAk41V9CXBbc278iJuVwRWbeLbyrA0bcd+oNi1t+ih2L6m5Nj9NtuCceQzs7OjJHu 8jy3b/JgH7MlCK6MxaLXrm1KzQm/c/U= Received: from mail-wm1-f71.google.com (mail-wm1-f71.google.com [209.85.128.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-29-w_rM84p2Ol-cS6Vk7Eaxig-1; Wed, 20 Jul 2022 11:10:41 -0400 X-MC-Unique: w_rM84p2Ol-cS6Vk7Eaxig-1 Received: by mail-wm1-f71.google.com with SMTP id v11-20020a1cf70b000000b003a318238826so907162wmh.2 for ; Wed, 20 Jul 2022 08:10:40 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :content-language:to:cc:references:from:organization:in-reply-to :content-transfer-encoding; bh=Gza5PmGLKy1qSecZj4RWFadUQnxBcv3jx7XSNkb+j80=; b=Fk7DuTb6xGQEvny9HfVx/VVq7mObGrKMSTS+k1may6/AabS/e2hG0dsvKK701deGuk oD/QibIE5W0m3TOB9VgpgVWdkTDI4SYJyOnnC+8O9FHYapXE1OTjyWKXQZWeEPyzycsT ZIQGUQKmaHENCMMxD8UGxbqvhv/GJHjL9VoN79M0WiJAwzurCKCUicLK6GDLvlybd24k pZEtFHdBxwQpYC50xTZdJ2HXBFhAz3RVPpq+h8g5KfZx499qowANxstQEOSxC1aY9qze gEdFPHRNykEOpIxN9HssK7L77Th+Aiqb38EyTX+ciqnzS1x3ACn/alLlDn4+TOv5SrwQ mceg== X-Gm-Message-State: AJIora8LbGDVn21Y/Qki0oXBlX3+RBLbsjfQ8/lWypbSV82+YT2LyPLc 1J63aqVJx1ey3GsbxNHHzHkRJOKtprPFgg3WUKpQXGHu0Y26hDAS1Jzai1zvp/83+G5a7YVYlE6 LYT1Jjq+A3HE= X-Received: by 2002:a5d:6d86:0:b0:21d:a3ca:bad0 with SMTP id l6-20020a5d6d86000000b0021da3cabad0mr31601061wrs.12.1658329839050; Wed, 20 Jul 2022 08:10:39 -0700 (PDT) X-Google-Smtp-Source: AGRyM1visr78sAwmLgNUyShdyjWna04Tp6reUTZ5r1mIq++86IWNldc+Y4tOgDKX5dzjGAzBtAWpKA== X-Received: by 2002:a5d:6d86:0:b0:21d:a3ca:bad0 with SMTP id l6-20020a5d6d86000000b0021da3cabad0mr31601036wrs.12.1658329838752; Wed, 20 Jul 2022 08:10:38 -0700 (PDT) Received: from ?IPV6:2003:cb:c706:e00:8d96:5dba:6bc4:6e89? (p200300cbc7060e008d965dba6bc46e89.dip0.t-ipconnect.de. [2003:cb:c706:e00:8d96:5dba:6bc4:6e89]) by smtp.gmail.com with ESMTPSA id j23-20020a05600c1c1700b003a32251c3f9sm3410885wms.5.2022.07.20.08.10.37 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 20 Jul 2022 08:10:38 -0700 (PDT) Message-ID: <2b4393ce-95c9-dd3e-8495-058a139e771e@redhat.com> Date: Wed, 20 Jul 2022 17:10:37 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.11.0 Subject: Re: [RFC PATCH 01/14] userfaultfd: set dirty and young on writeprotect To: Peter Xu Cc: Nadav Amit , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton , Mike Rapoport , Axel Rasmussen , Nadav Amit , Andrea Arcangeli , Andrew Cooper , Andy Lutomirski , Dave Hansen , Peter Zijlstra , Thomas Gleixner , Will Deacon , Yu Zhao , Nick Piggin References: <20220718120212.3180-1-namit@vmware.com> <20220718120212.3180-2-namit@vmware.com> <017facf0-7ef8-3faf-138d-3013a20b37db@redhat.com> From: David Hildenbrand Organization: Red Hat In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1658329847; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Gza5PmGLKy1qSecZj4RWFadUQnxBcv3jx7XSNkb+j80=; b=6HPCZAkN+6diyTxNU1GrtVIrrNa8C/i+3W8vjljGA8HAev6Gu6+vNfoKEptmRHx1toK6kE aCHzJpUhj0Fc2IjdaslPRhEeDWlvDqPvplB8YNThXNc4hFM1Nto26LKNp/rUl4G8DCF4+6 sO5beJVMWsebpLWSuCX2OrAL8laUffc= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1658329847; a=rsa-sha256; cv=none; b=fhsXUB2w1EeQ8vn+Qcb1QFrn0FmM+wE3JNUywFYTblT3f5uOTjbB9ZtDG+rBBiXK5PBYGd tefGpG/TgVJOWLi4xuAvjhDaVDaMNI/6l09czpZSEvY17Og5QgfiHOLfLuclCTCDXy26s4 U3kSr5c6C3KCl7DF9PZtvilRFc/OmHY= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=HMGbMCSb; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf22.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 170.10.129.124) smtp.mailfrom=david@redhat.com X-Rspamd-Queue-Id: B2BC0C0079 Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=HMGbMCSb; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf22.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 170.10.129.124) smtp.mailfrom=david@redhat.com X-Rspamd-Server: rspam12 X-Rspam-User: X-Stat-Signature: o5j1aoqaxjfihptgcz4odxci4bqoji5k X-HE-Tag: 1658329847-610385 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 20.07.22 15:10, Peter Xu wrote: > On Wed, Jul 20, 2022 at 11:39:23AM +0200, David Hildenbrand wrote: >> On 19.07.22 22:47, Peter Xu wrote: >>> On Mon, Jul 18, 2022 at 05:01:59AM -0700, Nadav Amit wrote: >>>> From: Nadav Amit >>>> >>>> When userfaultfd makes a PTE writable, it can now change the PTE >>>> directly, in some cases, without going triggering a page-fault first. >>>> Yet, doing so might leave the PTE that was write-unprotected as old and >>>> clean. At least on x86, this would cause a >500 cycles overhead when the >>>> PTE is first accessed. >>>> >>>> Use MM_CP_WILL_NEED to set the PTE as young and dirty when userfaultfd >>>> gets a hint that the page is likely to be used. Avoid changing the PTE >>>> to young and dirty in other cases to avoid excessive writeback and >>>> messing with the page reclamation logic. >>>> >>>> Cc: Andrea Arcangeli >>>> Cc: Andrew Cooper >>>> Cc: Andrew Morton >>>> Cc: Andy Lutomirski >>>> Cc: Dave Hansen >>>> Cc: David Hildenbrand >>>> Cc: Peter Xu >>>> Cc: Peter Zijlstra >>>> Cc: Thomas Gleixner >>>> Cc: Will Deacon >>>> Cc: Yu Zhao >>>> Cc: Nick Piggin >>>> --- >>>> include/linux/mm.h | 2 ++ >>>> mm/mprotect.c | 9 ++++++++- >>>> mm/userfaultfd.c | 8 ++++++-- >>>> 3 files changed, 16 insertions(+), 3 deletions(-) >>>> >>>> diff --git a/include/linux/mm.h b/include/linux/mm.h >>>> index 9cc02a7e503b..4afd75ce5875 100644 >>>> --- a/include/linux/mm.h >>>> +++ b/include/linux/mm.h >>>> @@ -1988,6 +1988,8 @@ extern unsigned long move_page_tables(struct vm_area_struct *vma, >>>> /* Whether this change is for write protecting */ >>>> #define MM_CP_UFFD_WP (1UL << 2) /* do wp */ >>>> #define MM_CP_UFFD_WP_RESOLVE (1UL << 3) /* Resolve wp */ >>>> +/* Whether to try to mark entries as dirty as they are to be written */ >>>> +#define MM_CP_WILL_NEED (1UL << 4) >>>> #define MM_CP_UFFD_WP_ALL (MM_CP_UFFD_WP | \ >>>> MM_CP_UFFD_WP_RESOLVE) >>>> >>>> diff --git a/mm/mprotect.c b/mm/mprotect.c >>>> index 996a97e213ad..34c2dfb68c42 100644 >>>> --- a/mm/mprotect.c >>>> +++ b/mm/mprotect.c >>>> @@ -82,6 +82,7 @@ static unsigned long change_pte_range(struct mmu_gather *tlb, >>>> bool prot_numa = cp_flags & MM_CP_PROT_NUMA; >>>> bool uffd_wp = cp_flags & MM_CP_UFFD_WP; >>>> bool uffd_wp_resolve = cp_flags & MM_CP_UFFD_WP_RESOLVE; >>>> + bool will_need = cp_flags & MM_CP_WILL_NEED; >>>> >>>> tlb_change_page_size(tlb, PAGE_SIZE); >>>> >>>> @@ -172,6 +173,9 @@ static unsigned long change_pte_range(struct mmu_gather *tlb, >>>> ptent = pte_clear_uffd_wp(ptent); >>>> } >>>> >>>> + if (will_need) >>>> + ptent = pte_mkyoung(ptent); >>> >>> For uffd path, UFFD_FLAGS_ACCESS_LIKELY|UFFD_FLAGS_WRITE_LIKELY are new >>> internal flags used with or without the new feature bit set. It means even >>> with !ACCESS_HINT we'll start to set young bit while we used not to? Is >>> that some kind of a light abi change? >>> >>> I'd suggest we only set will_need if ACCESS_HINT is set. >>> >>>> + >>>> /* >>>> * In some writable, shared mappings, we might want >>>> * to catch actual write access -- see >>>> @@ -187,8 +191,11 @@ static unsigned long change_pte_range(struct mmu_gather *tlb, >>>> */ >>>> if ((cp_flags & MM_CP_TRY_CHANGE_WRITABLE) && >>>> !pte_write(ptent) && >>>> - can_change_pte_writable(vma, addr, ptent)) >>>> + can_change_pte_writable(vma, addr, ptent)) { >>>> ptent = pte_mkwrite(ptent); >>>> + if (will_need) >>>> + ptent = pte_mkdirty(ptent); >>> >>> Can we make this unconditional? IOW to cover both: >>> >>> (1) When will_need is not set, or >>> (2) mprotect() too >>> >>> David's patch is good in that we merged the unprotect and CoW. However >>> that's not complete because the dirty bit ops are missing. >>> >>> Here IMHO we should have a standalone patch to just add the dirty bit into >>> this logic when we'll grant write bit. IMHO it'll make the write+dirty >>> bits coherent again in all paths. >> >> I'm not sure I follow. >> >> We *surely* don't want to dirty random pages (especially once in the >> pagecache/swapcache) simply because we change protection. >> >> Just like we don't set all pages write+dirty in a writable VMA on a read >> fault. > > IMO unmprotect (in generic mprotect form or uffd form) has a stronger sign > of page being written, unlike read faults, as many of them happen because > page being written and message generated. I'm sorry, but I am very skeptical about such statements. I don't buy it. > > But yeah you have a point too, maybe we shouldn't assume such a condition. > Especially as long as we won't set write=1 without soft-dirty tracking > enabled, I think it should be safe. For pagecache pages it may as well be *plain wrong* to bypass the write fault handler and simply mark pages dirty+map them writable. Please, let's keep this protection change handler here as simple as possible and *not* try to replicate the whole write fault handler logic in here by relying on statements as above. If we try optimizing for corner cases by making the implementation here overly complicated, then we are clearly doing something wrong. -- Thanks, David / dhildenb