From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EF5C0C43334 for ; Tue, 19 Jul 2022 20:47:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 720396B0071; Tue, 19 Jul 2022 16:47:24 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6CFC76B0073; Tue, 19 Jul 2022 16:47:24 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 570366B0074; Tue, 19 Jul 2022 16:47:24 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 41FBC6B0071 for ; Tue, 19 Jul 2022 16:47:24 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 14470A01E3 for ; Tue, 19 Jul 2022 20:47:24 +0000 (UTC) X-FDA: 79705034808.26.84BEFC6 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf31.hostedemail.com (Postfix) with ESMTP id EF1442003C for ; Tue, 19 Jul 2022 20:47:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1658263642; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=nRwyMa+B5BW2EUv7kraKWHnILigjVrz1hOEZkTUv6+Q=; b=c7WvRTTJIj7XbwLLEGb+V3+1fZGqmwFQGK3rAXiGlfex3Hg9thQIdqxKqQxwhG3VYaxfEU m2fBt6uJO8l6RydXJAOthsJwnJ0FZWJYEEGzgUg4sXtlQCRhtCCOz9TOzSlEL0hD+RiPXM /pCVpfDCBDQVkF9uxBX9Xc1G7P1nQqM= Received: from mail-qk1-f198.google.com (mail-qk1-f198.google.com [209.85.222.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-488-Tjrx3-ShNoiUIUg26J2Mmw-1; Tue, 19 Jul 2022 16:47:19 -0400 X-MC-Unique: Tjrx3-ShNoiUIUg26J2Mmw-1 Received: by mail-qk1-f198.google.com with SMTP id m8-20020a05620a24c800b006b5d8eb23efso8847532qkn.10 for ; Tue, 19 Jul 2022 13:47:19 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=nRwyMa+B5BW2EUv7kraKWHnILigjVrz1hOEZkTUv6+Q=; b=DpKTGStCnOJNnmC3RrVMLeHzest5RZtWHFLwf1+P8NnWkHINwIGxoERT6c/ozXLooT t5R/Q0jSyHRbxYbCmXHwWpiZuIqqQLLiLZ1nM23kr7EltaAewmgIDLzWW8w3J6BlMijs VeNWluGdkJZ9YgzcqwEKnW49MZ40gquhZD+83gGYkB7KPTE4Ukxj2qNpAbC+h8XjjaW6 vNmJN4W/qBBHRVF9vLKSWl+ZEdTonRG2WFyrRLJNKZOebVpB/MkU+G1FTKGmyIBfvg8a wyxONbYb1WH/WTWVg/LFUY3euock3crAn9XVnwaoBJUCITZj8YfvyXEYF5tgyMBUMFPB gkOg== X-Gm-Message-State: AJIora80nZZv8uD0pw7Z0kuLTeN7CjvxKqCk4nFa8LqvMcrPhX8E410u ZI0qB97jsx7QW8JEh00GDAW3IfWKB0AFDY3AxiqRQFG0U8V4narAjBSly836rhZ1UCLgPa1nDsD SBUNPW4C5AfY= X-Received: by 2002:a05:6214:2525:b0:473:9b:d933 with SMTP id gg5-20020a056214252500b00473009bd933mr26714035qvb.116.1658263638695; Tue, 19 Jul 2022 13:47:18 -0700 (PDT) X-Google-Smtp-Source: AGRyM1tTJ5OUhufabI8GTrMxW72hAJrjyjVDyree1lNBqt4DNxz8udXADFr/vG0mbNE/KSOh9d03Cg== X-Received: by 2002:a05:6214:2525:b0:473:9b:d933 with SMTP id gg5-20020a056214252500b00473009bd933mr26714013qvb.116.1658263638484; Tue, 19 Jul 2022 13:47:18 -0700 (PDT) Received: from xz-m1.local (bras-base-aurron9127w-grc-37-74-12-30-48.dsl.bell.ca. [74.12.30.48]) by smtp.gmail.com with ESMTPSA id bs43-20020a05620a472b00b006b60dae4da6sm3599qkb.2.2022.07.19.13.47.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 19 Jul 2022 13:47:17 -0700 (PDT) Date: Tue, 19 Jul 2022 16:47:16 -0400 From: Peter Xu To: Nadav Amit Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton , Mike Rapoport , Axel Rasmussen , Nadav Amit , Andrea Arcangeli , Andrew Cooper , Andy Lutomirski , Dave Hansen , David Hildenbrand , Peter Zijlstra , Thomas Gleixner , Will Deacon , Yu Zhao , Nick Piggin Subject: Re: [RFC PATCH 01/14] userfaultfd: set dirty and young on writeprotect Message-ID: References: <20220718120212.3180-1-namit@vmware.com> <20220718120212.3180-2-namit@vmware.com> MIME-Version: 1.0 In-Reply-To: <20220718120212.3180-2-namit@vmware.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline ARC-Authentication-Results: i=1; imf31.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=c7WvRTTJ; spf=none (imf31.hostedemail.com: domain of peterx@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1658263643; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=nRwyMa+B5BW2EUv7kraKWHnILigjVrz1hOEZkTUv6+Q=; b=bvkD9A5kFRmEsN5JCD8JssrpZ1Z7c6YHu56TJs41Q5nWvpv+A+/GzuDKhNDoXy3sT7L/5E XSjp/OK2eAdFDq4ILY+CWhLwETvSSzNGLc8vUngRxwYaLtZsNUy1VYRJk5ZvYFh5J54UUd XDrSXnBNEahsW8vHLd5Y9hzdkNUtsgI= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1658263643; a=rsa-sha256; cv=none; b=Zq+r9sMQcxzDPdIkRDkqkpv3D6PXaZrsrKguN+L1k0/o3u/jbOtJzveTeyFOEEyuN8wETq rpdYXU3bY/PSdM4TWwaZYl7y9rNT9+VBX/uweCxbOmDyehQJ5RgCjmtyMQxohpK5h5V9Zg PPhNazVvkWrFQ/m3yOWu4eb46o5hOHE= X-Rspam-User: X-Rspamd-Queue-Id: EF1442003C Authentication-Results: imf31.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=c7WvRTTJ; spf=none (imf31.hostedemail.com: domain of peterx@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Stat-Signature: jg75zzmma6njhtu7qwmfquz96wqyxs8s X-Rspamd-Server: rspam07 X-HE-Tag: 1658263642-10180 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Jul 18, 2022 at 05:01:59AM -0700, Nadav Amit wrote: > From: Nadav Amit > > When userfaultfd makes a PTE writable, it can now change the PTE > directly, in some cases, without going triggering a page-fault first. > Yet, doing so might leave the PTE that was write-unprotected as old and > clean. At least on x86, this would cause a >500 cycles overhead when the > PTE is first accessed. > > Use MM_CP_WILL_NEED to set the PTE as young and dirty when userfaultfd > gets a hint that the page is likely to be used. Avoid changing the PTE > to young and dirty in other cases to avoid excessive writeback and > messing with the page reclamation logic. > > Cc: Andrea Arcangeli > Cc: Andrew Cooper > Cc: Andrew Morton > Cc: Andy Lutomirski > Cc: Dave Hansen > Cc: David Hildenbrand > Cc: Peter Xu > Cc: Peter Zijlstra > Cc: Thomas Gleixner > Cc: Will Deacon > Cc: Yu Zhao > Cc: Nick Piggin > --- > include/linux/mm.h | 2 ++ > mm/mprotect.c | 9 ++++++++- > mm/userfaultfd.c | 8 ++++++-- > 3 files changed, 16 insertions(+), 3 deletions(-) > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index 9cc02a7e503b..4afd75ce5875 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -1988,6 +1988,8 @@ extern unsigned long move_page_tables(struct vm_area_struct *vma, > /* Whether this change is for write protecting */ > #define MM_CP_UFFD_WP (1UL << 2) /* do wp */ > #define MM_CP_UFFD_WP_RESOLVE (1UL << 3) /* Resolve wp */ > +/* Whether to try to mark entries as dirty as they are to be written */ > +#define MM_CP_WILL_NEED (1UL << 4) > #define MM_CP_UFFD_WP_ALL (MM_CP_UFFD_WP | \ > MM_CP_UFFD_WP_RESOLVE) > > diff --git a/mm/mprotect.c b/mm/mprotect.c > index 996a97e213ad..34c2dfb68c42 100644 > --- a/mm/mprotect.c > +++ b/mm/mprotect.c > @@ -82,6 +82,7 @@ static unsigned long change_pte_range(struct mmu_gather *tlb, > bool prot_numa = cp_flags & MM_CP_PROT_NUMA; > bool uffd_wp = cp_flags & MM_CP_UFFD_WP; > bool uffd_wp_resolve = cp_flags & MM_CP_UFFD_WP_RESOLVE; > + bool will_need = cp_flags & MM_CP_WILL_NEED; > > tlb_change_page_size(tlb, PAGE_SIZE); > > @@ -172,6 +173,9 @@ static unsigned long change_pte_range(struct mmu_gather *tlb, > ptent = pte_clear_uffd_wp(ptent); > } > > + if (will_need) > + ptent = pte_mkyoung(ptent); For uffd path, UFFD_FLAGS_ACCESS_LIKELY|UFFD_FLAGS_WRITE_LIKELY are new internal flags used with or without the new feature bit set. It means even with !ACCESS_HINT we'll start to set young bit while we used not to? Is that some kind of a light abi change? I'd suggest we only set will_need if ACCESS_HINT is set. > + > /* > * In some writable, shared mappings, we might want > * to catch actual write access -- see > @@ -187,8 +191,11 @@ static unsigned long change_pte_range(struct mmu_gather *tlb, > */ > if ((cp_flags & MM_CP_TRY_CHANGE_WRITABLE) && > !pte_write(ptent) && > - can_change_pte_writable(vma, addr, ptent)) > + can_change_pte_writable(vma, addr, ptent)) { > ptent = pte_mkwrite(ptent); > + if (will_need) > + ptent = pte_mkdirty(ptent); Can we make this unconditional? IOW to cover both: (1) When will_need is not set, or (2) mprotect() too David's patch is good in that we merged the unprotect and CoW. However that's not complete because the dirty bit ops are missing. Here IMHO we should have a standalone patch to just add the dirty bit into this logic when we'll grant write bit. IMHO it'll make the write+dirty bits coherent again in all paths. > + } > > ptep_modify_prot_commit(vma, addr, pte, oldpte, ptent); > if (pte_needs_flush(oldpte, ptent)) > diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c > index 954c6980b29f..e0492f5f06a0 100644 > --- a/mm/userfaultfd.c > +++ b/mm/userfaultfd.c > @@ -749,6 +749,7 @@ int mwriteprotect_range(struct mm_struct *dst_mm, unsigned long start, > bool enable_wp = uffd_flags & UFFD_FLAGS_WP; > struct vm_area_struct *dst_vma; > unsigned long page_mask; > + unsigned long cp_flags; > struct mmu_gather tlb; > pgprot_t newprot; > int err; > @@ -795,9 +796,12 @@ int mwriteprotect_range(struct mm_struct *dst_mm, unsigned long start, > else > newprot = vm_get_page_prot(dst_vma->vm_flags); > > + cp_flags = enable_wp ? MM_CP_UFFD_WP : MM_CP_UFFD_WP_RESOLVE; > + if (uffd_flags & (UFFD_FLAGS_ACCESS_LIKELY|UFFD_FLAGS_WRITE_LIKELY)) > + cp_flags |= MM_CP_WILL_NEED; > + > tlb_gather_mmu(&tlb, dst_mm); > - change_protection(&tlb, dst_vma, start, start + len, newprot, > - enable_wp ? MM_CP_UFFD_WP : MM_CP_UFFD_WP_RESOLVE); > + change_protection(&tlb, dst_vma, start, start + len, newprot, cp_flags); > tlb_finish_mmu(&tlb); > > err = 0; > -- > 2.25.1 > -- Peter Xu