From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 44859C77B73 for ; Tue, 2 May 2023 16:53:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AB2466B0071; Tue, 2 May 2023 12:53:49 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A63416B0072; Tue, 2 May 2023 12:53:49 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 92A9A6B0074; Tue, 2 May 2023 12:53:49 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from mail-wm1-f48.google.com (mail-wm1-f48.google.com [209.85.128.48]) by kanga.kvack.org (Postfix) with ESMTP id 42A596B0071 for ; Tue, 2 May 2023 12:53:49 -0400 (EDT) Received: by mail-wm1-f48.google.com with SMTP id 5b1f17b1804b1-3f315735514so177977015e9.1 for ; Tue, 02 May 2023 09:53:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1683046428; x=1685638428; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=H/XhiryUevVm3Cw0bRPhAFmwGi+VKwNTD6d8PLLNIJU=; b=CRk7F9cD3K3ahmGmI3nRMbzR6bOv/GR4Lw1i0CqadZ2gfc3QYRy4U3WMKo6dHYddaw F8xPCOiOdqT1rEQQp2FPHuCb88EFVUEVNa8YxWeRDgtLuZVuu6DU2AbkCFYYFncKM+62 trHryjBobn6ZkM3VnwGlqM+tffzT49q3t5FeZzlJiTKDpiRNqHydcmR3m9geS9mWPfXw /K2t8KAtmSDybpVkB2mi0K5H3UYQ7xfBc8AwQKQrZdSEVdOVt2IVg2+1qpZKMHomAsTq sR2FWuit4tnLak2PzqyaZV8Db616SiuaVBe7DjzxNVnSF2Pu502pJOHmUJyi2WtvtsZV c3oA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1683046428; x=1685638428; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=H/XhiryUevVm3Cw0bRPhAFmwGi+VKwNTD6d8PLLNIJU=; b=JxT8lxf2VpJnvno8dhDmIY+UcJZY905IJ9WMw1ZThmuwcG1u/CNdydhf+Us1oopJSC ErImsDe0wBnTPLVrTzhBkOVYHrtueNKUQbCz6KyY4N8PnHuaSMKXnNSjt4WCl2L+0sqL WZrUDR9UjiHpk9ycbi+fl2pJ47j+wH/ySL6KwsF6zjp2Sm9gfoolkhOAkU1Vjfr7gIHf 4hI5V0RkCv8L3cK7uTJ4omgA0howDWddO2dElnT567rI8/w43i8g2gK/uiAS4t9d9/q3 VPtIrNn20bfTqZBb3kWQ92ussMlkKaq7/6iUIOfXlccU+XCU1Mi5f38zoxQEqVMNH3BP iXSQ== X-Gm-Message-State: AC+VfDzY0xmzcbWs4JPe4hzdrJOTyvePu47iAkmm/52oa0JRLxoLg7sJ ilFM+J+i2z0/K/LerebNe6k= X-Google-Smtp-Source: ACHHUZ4ppGi+iR+Jsy3zNozzU1DV4Xar0EemcXOeMdxzPpuTIJs+25x9DbjqElAlMxgprhLNLh/F0g== X-Received: by 2002:a05:600c:82c5:b0:3f1:76d9:c788 with SMTP id eo5-20020a05600c82c500b003f176d9c788mr16116146wmb.8.1683046428364; Tue, 02 May 2023 09:53:48 -0700 (PDT) Received: from localhost (host86-156-84-164.range86-156.btcentralplus.com. [86.156.84.164]) by smtp.gmail.com with ESMTPSA id g10-20020a7bc4ca000000b003f171234a08sm35756623wmk.20.2023.05.02.09.53.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 02 May 2023 09:53:47 -0700 (PDT) Date: Tue, 2 May 2023 17:53:46 +0100 From: Lorenzo Stoakes To: David Hildenbrand Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton , Jason Gunthorpe , Jens Axboe , Matthew Wilcox , Dennis Dalessandro , Leon Romanovsky , Christian Benvenuti , Nelson Escobar , Bernard Metzler , Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Ian Rogers , Adrian Hunter , Bjorn Topel , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , "David S . Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Christian Brauner , Richard Cochran , Alexei Starovoitov , Daniel Borkmann , Jesper Dangaard Brouer , John Fastabend , linux-fsdevel@vger.kernel.org, linux-perf-users@vger.kernel.org, netdev@vger.kernel.org, bpf@vger.kernel.org, Oleg Nesterov , Jason Gunthorpe , John Hubbard , Jan Kara , "Kirill A . Shutemov" , Pavel Begunkov , Mika Penttila , Dave Chinner , Theodore Ts'o , Peter Xu , Matthew Rosato , "Paul E . McKenney" , Christian Borntraeger Subject: Re: [PATCH v7 1/3] mm/mmap: separate writenotify and dirty tracking logic Message-ID: References: <72a90af5a9e4445a33ae44efa710f112c2694cb1.1683044162.git.lstoakes@gmail.com> <56696a72-24fa-958e-e6a1-7a17c9e54081@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <56696a72-24fa-958e-e6a1-7a17c9e54081@redhat.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, May 02, 2023 at 06:38:53PM +0200, David Hildenbrand wrote: > On 02.05.23 18:34, Lorenzo Stoakes wrote: > > vma_wants_writenotify() is specifically intended for setting PTE page table > > flags, accounting for existing PTE flag state and whether that might > > already be read-only while mixing this check with a check whether the > > filesystem performs dirty tracking. > > > > Separate out the notions of dirty tracking and a PTE write notify checking > > in order that we can invoke the dirty tracking check from elsewhere. > > > > Note that this change introduces a very small duplicate check of the > > separated out vm_ops_needs_writenotify(). This is necessary to avoid making > > vma_needs_dirty_tracking() needlessly complicated (e.g. passing a > > check_writenotify flag or having it assume this check was already > > performed). This is such a small check that it doesn't seem too egregious > > to do this. > > > > Signed-off-by: Lorenzo Stoakes > > Reviewed-by: John Hubbard > > Reviewed-by: Mika Penttilä > > Reviewed-by: Jan Kara > > Reviewed-by: Jason Gunthorpe > > --- > > include/linux/mm.h | 1 + > > mm/mmap.c | 36 +++++++++++++++++++++++++++--------- > > 2 files changed, 28 insertions(+), 9 deletions(-) > > > > diff --git a/include/linux/mm.h b/include/linux/mm.h > > index 27ce77080c79..7b1d4e7393ef 100644 > > --- a/include/linux/mm.h > > +++ b/include/linux/mm.h > > @@ -2422,6 +2422,7 @@ extern unsigned long move_page_tables(struct vm_area_struct *vma, > > #define MM_CP_UFFD_WP_ALL (MM_CP_UFFD_WP | \ > > MM_CP_UFFD_WP_RESOLVE) > > +bool vma_needs_dirty_tracking(struct vm_area_struct *vma); > > int vma_wants_writenotify(struct vm_area_struct *vma, pgprot_t vm_page_prot); > > static inline bool vma_wants_manual_pte_write_upgrade(struct vm_area_struct *vma) > > { > > diff --git a/mm/mmap.c b/mm/mmap.c > > index 5522130ae606..295c5f2e9bd9 100644 > > --- a/mm/mmap.c > > +++ b/mm/mmap.c > > @@ -1475,6 +1475,31 @@ SYSCALL_DEFINE1(old_mmap, struct mmap_arg_struct __user *, arg) > > } > > #endif /* __ARCH_WANT_SYS_OLD_MMAP */ > > +/* Do VMA operations imply write notify is required? */ > > +static bool vm_ops_needs_writenotify(const struct vm_operations_struct *vm_ops) > > +{ > > + return vm_ops && (vm_ops->page_mkwrite || vm_ops->pfn_mkwrite); > > +} > > + > > +/* > > + * Does this VMA require the underlying folios to have their dirty state > > + * tracked? > > + */ > > +bool vma_needs_dirty_tracking(struct vm_area_struct *vma) > > +{ > > Sorry for not noticing this earlier, but ... pints_owed++ > > what about MAP_PRIVATE mappings? When we write, we populate an anon page, > which will work as expected ... because we don't have to notify the fs? > > I think you really also want the "If it was private or non-writable, the > write bit is already clear */" part as well and remove "false" in that case. > Not sure a 'write bit is already clear' case is relevant to checking whether a filesystem dirty tracks? That seems specific entirely to the page table bits. That's why I didn't include it, A !VM_WRITE shouldn't be GUP-writable except for FOLL_FORCE, and that surely could be problematic if VM_MAYWRITE later? Thinking about it though a !VM_SHARE should probably can be safely assumed to not be dirty-trackable, so we probably do need to add a check for !VM_SHARED -> !vma_needs_dirty_tracking > Or was there a good reason to disallow private mappings as well? > Until the page is CoW'd walking the page tables will get you to the page cache page right? This was the reason I (perhaps rather too quickly) felt MAP_PRIVATE should be excluded. However a FOLL_WRITE would trigger CoW... and then we'd be trivially OK. So yeah, ok perhaps I dismissed that a little too soon. I was concerned about some sort of egregious FOLL_FORCE case where somehow we'd end up with the page cache folio. But actually, that probably can't happen... > -- > Thanks, > > David / dhildenb >