From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 87198C7EE23 for ; Mon, 15 May 2023 11:16:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E259A900003; Mon, 15 May 2023 07:16:27 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DD546900002; Mon, 15 May 2023 07:16:27 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C9D3E900003; Mon, 15 May 2023 07:16:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id BAC4C900002 for ; Mon, 15 May 2023 07:16:27 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 79352AE15C for ; Mon, 15 May 2023 11:16:27 +0000 (UTC) X-FDA: 80792236014.04.2443CF1 Received: from mail-ed1-f53.google.com (mail-ed1-f53.google.com [209.85.208.53]) by imf16.hostedemail.com (Postfix) with ESMTP id 91C10180009 for ; Mon, 15 May 2023 11:16:25 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=TDrCEU8F; spf=pass (imf16.hostedemail.com: domain of lstoakes@gmail.com designates 209.85.208.53 as permitted sender) smtp.mailfrom=lstoakes@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1684149385; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=knPSlIoi8JJlfvyB9MqyozwzNx+4xRNl0fuueJdn78w=; b=ddFSxaAQkG+lWNgrt5YRPXR7p3iPQBrFASj+cOvu6jX0PKWIiWwl3hat0a5XvDkZ8jx7Bp IMh3Qd56KOdtQUhGEQ/ZbPVo9BmRNmub3MS4HekBmxJxocbMjvSdshdQm90+C7ugRHxXs9 F3phJYyEMgaebWDw+41IqGD79c/ZYpY= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1684149385; a=rsa-sha256; cv=none; b=WBPWaA67dkdiOvUPI9dD0efZtWGRFBe3gAhPZTd9lx1Bs66XxuOFs6EKq5d9+75aIKcaay KR6m0l8iAOj6Hhx4ZihW8+lTbnSV252ubJiD5pAyDeSxnLHocVLBC7uy3Yx4m1oQZifE/c W3IUdqPGa0Nf5qa3p8iXXJG4+8LM+co= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=TDrCEU8F; spf=pass (imf16.hostedemail.com: domain of lstoakes@gmail.com designates 209.85.208.53 as permitted sender) smtp.mailfrom=lstoakes@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-ed1-f53.google.com with SMTP id 4fb4d7f45d1cf-50be0d835aaso22381901a12.3 for ; Mon, 15 May 2023 04:16:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1684149384; x=1686741384; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=knPSlIoi8JJlfvyB9MqyozwzNx+4xRNl0fuueJdn78w=; b=TDrCEU8FuW3hDweQ5bwdycEiJjcobOp3IpmpMjXkU7AaTDL+TdsNmJoEXoMEQMruop GIScUD01CMAgF/1mC0SD1gv77x0WiGT8QjnQz3Ma4Wy//I36ne/NL9Vd6iTB4U/wlThq 8bcQj7xmcYxzSNb8+GF2+O52JpZVE3ZVfzvvS9BdKzWQ/h6Hi1zStdC9/TvhYq1OvFiZ yiDCT/F8jkXfyg5LlWU+wYh3CYKCnwZ0lwdXgeg2isHG83ddUTkj5mfHqhUVBQxKGfCR yBPB/bdHdisdWue0ctSmOENHVHpaiuH7A0vEFCI7gdMD9Hhe0ViXPRv77UOveUz2UpaD egdQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684149384; x=1686741384; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=knPSlIoi8JJlfvyB9MqyozwzNx+4xRNl0fuueJdn78w=; b=Hgl8PRtwNwoD/aCJpA/JAx1+8aSKN+qMBbpxCsA66WKubbNLM5zn5m4/ITa/3ETUUC gKx8O9cR/Lq68hi7k2XaczOXf4B/Or/IIHJJwjW5/+Sa/e5k8tNvvR7lr7SaF4XFED35 LAZG3PDcA8BOPx72puH5SaNiiKolaRAbw1OPreR9eCh00VL07mZo4i1MchMHSaH4jTf7 gAMu/r4ebdHoLh8Ie0Di7xXruVzknLmuX/f68nC6DsDFgR50GFJDK+rjLD7ZnQEEzyTz KAL5zfu56LfjAxxJRFq0nHPkGB0cfZGwP8gLVt/fd18R24WGD4DwJovDVy8j8bPvttnC Yi1Q== X-Gm-Message-State: AC+VfDyUaBml6h6C1VB+4KtOC6T1iWu0rj1bwPRwcVFfI/s1qDNuDkPM n3Kh56Si8KiwooY2ejTdenE= X-Google-Smtp-Source: ACHHUZ7WHM6iw+dCrh6+o/s11/K9CwCF/i+8iSZDcuGO4RTTrbL4NWF45E3g1pqnoenQBo6gAyOXSw== X-Received: by 2002:a17:906:9c83:b0:94f:449e:75db with SMTP id fj3-20020a1709069c8300b0094f449e75dbmr32016828ejc.52.1684149383725; Mon, 15 May 2023 04:16:23 -0700 (PDT) Received: from localhost ([31.94.21.70]) by smtp.gmail.com with ESMTPSA id wi21-20020a170906fd5500b0094edbe5c7ddsm9460583ejb.38.2023.05.15.04.16.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 15 May 2023 04:16:22 -0700 (PDT) Date: Mon, 15 May 2023 12:16:21 +0100 From: Lorenzo Stoakes To: "Kirill A . Shutemov" Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton , Jason Gunthorpe , Jens Axboe , Matthew Wilcox , Dennis Dalessandro , Leon Romanovsky , Christian Benvenuti , Nelson Escobar , Bernard Metzler , Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Ian Rogers , Adrian Hunter , Bjorn Topel , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , "David S . Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Christian Brauner , Richard Cochran , Alexei Starovoitov , Daniel Borkmann , Jesper Dangaard Brouer , John Fastabend , linux-fsdevel@vger.kernel.org, linux-perf-users@vger.kernel.org, netdev@vger.kernel.org, bpf@vger.kernel.org, Oleg Nesterov , Jason Gunthorpe , John Hubbard , Jan Kara , Pavel Begunkov , Mika Penttila , David Hildenbrand , Dave Chinner , Theodore Ts'o , Peter Xu , Matthew Rosato , "Paul E . McKenney" , Christian Borntraeger Subject: Re: [PATCH v9 0/3] mm/gup: disallow GUP writing to file-backed mappings by default Message-ID: <7f6dbe36-88f2-468e-83c1-c97e666d8317@lucifer.local> References: <20230515110315.uqifqgqkzcrrrubv@box.shutemov.name> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20230515110315.uqifqgqkzcrrrubv@box.shutemov.name> X-Rspamd-Queue-Id: 91C10180009 X-Rspam-User: X-Rspamd-Server: rspam06 X-Stat-Signature: grsjk4phdsoybuydwf5ompb8offsmizo X-HE-Tag: 1684149385-947024 X-HE-Meta: U2FsdGVkX1+/lLIvbLz/vy1QWzdXtgv3fo7R/BUP+seVmsyC3nwV3ZeHF4iTAftiWwYr2OioENPf1VR3iirDaOtXSiWuGSTvbzVFOYLcHNcUQNlj15GMw/QpkcQbToKyFPAQzkad1Sl9GqLKLBQW0ppVKpzA/ojRUbXFsYNYGdlo2pZs+8gXMRmTP/crTw62KEzNWrSTBtLpIsC9H/qL959H6pT8Ad5TqMHW37ZMlBiFp8lkj8MoobTBzovGnyKVopAMWXMoQUT8Zjm/BCov1E6z5ZdYeoa6f6inDcrAfiCaa3VQ4nPkn3hzs/aMrLKB80yXKU927OljkWzdMDb3iU5hKOkuhD7PVkvME5sgh4F9kmCaqGR0b4lZE965/RE98n7lFtUOJ1PqJOdhgW3iGFbiM2cKz24PiZutAyGKUoA/hHUNx/Qh1TBTI6eK/muzQwnwkCRDC0hERcdjUbJdn0Ta1dKtN0buXp4Tibowfj//jLNfCYj+g4kNL50AFj+GpQB24tnMQz9NGVi8rRbS8jYUBJ9f1l7SoYxfJAoP8EAx7gb5pRuipRzRqsbODGX4Y2/BWD8anPwXqyRKSHyE6blcJI3gHTUQ9EOnIJ3ot3vPH1nk2maLBny4CfwwihHErZvazBx+7EXOCBGU1+o4OCPYW5UwfBF1bxlUng+ZRooP//HNFPE+bXzFRcgRoEHiYv30ZFnnM5aXJpi+P2pRXdvEbIFwu4dQ4RDKLsOnXYcxgkRI3FC0VDpRAqoY7YO+OJLWqAYzfVEXo9jlmBnZc9ojSZdblJ6vRnXkQFD/AXdbQEmIm7p+Q8hxsjdt9O+faRt1o4dUBeOsYfX1f8xAiU4h8W99vThzwrh3JiJjZw0ZTqs4wOikts5wuzccKTmllKzCN5ziGjOwE+CAPGIDDm01OkLXyqKraAl9Zf+R7EutavS8xGk9iddzdL61yhe4U2mZXab5CA4yxxwLCDg 7iZxWkV8 /h/ub4XUwiKyLz86QUckufWX14qFbbDtZuuwzLRTxatgVM5I16WtBvjKfvNQanDXp8nX67PYOx8q4RM7QgAFGpaYE5RmDjfPorgeOLZNRuS/iFlOGWwY8C+tsqLBx8e5YKxHWnpJXdwYWuPUBgEkU0SpW1CadGGpzlmXh3O5NM4zBYBOyZ06yy6KKg3vChRDbK4wawEWB9GOkd7yhXfD1xz9YyyCzvn1I18B7KLk3Ut/wQv/FMGY5HyZNPwAYbZHaedPe2gkGJC2hRZ8LAezjQRxSNi2/PcP7DIfuCmxbRKT2/de4del4xQZiZvcV1DGCWvV4dng+EYHRlEU2LCsXQhl9XjjqSU/GA2NTxNzDZGWYM3tpQlBzUlxUkl/NUt6zpID8QTopEzccdXOHPFT57dMHc3OZvKyHAQdZlOIOLaCWv0uzR9eefMDk3UNWhRq/5sMRf5zSG2lL5ad4YWW2c9r+eQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, May 15, 2023 at 02:03:15PM +0300, Kirill A . Shutemov wrote: > On Thu, May 04, 2023 at 10:27:50PM +0100, Lorenzo Stoakes wrote: > > Writing to file-backed mappings which require folio dirty tracking using > > GUP is a fundamentally broken operation, as kernel write access to GUP > > mappings do not adhere to the semantics expected by a file system. > > > > A GUP caller uses the direct mapping to access the folio, which does not > > cause write notify to trigger, nor does it enforce that the caller marks > > the folio dirty. > > Okay, problem is clear and the patchset look good to me. But I'm worried > breaking existing users. > > Do we expect the change to be visible to real world users? If yes, are we > okay to break them? The general consensus at the moment is that there is no entirely reasonable usage of this case and you're already running the riks of a kernel oops if you do this, so it's already broken. > > One thing that came to mind is KVM with "qemu -object memory-backend-file,share=on..." > It is mostly used for pmem emulation. > > Do we have plan B? Yes, we can make it opt-in or opt-out via a FOLL_FLAG. This would be easy to implement in the event of any issues arising. > > Just a random/crazy/broken idea: > > - Allow folio_mkclean() (and folio_clear_dirty_for_io()) to fail, > indicating that the page cannot be cleared because it is pinned; > > - Introduce a new vm_operations_struct::mkclean() that would be called by > page_vma_mkclean_one() before clearing the range and can fail; > > - On GUP, create an in-kernel fake VMA that represents the file, but with > custom vm_ops. The VMA registered in rmap to get notified on > folio_mkclean() and fail it because of GUP. > > - folio_clear_dirty_for_io() callers will handle the new failure as > indication that the page can be written back but will stay dirty and > fs-specific data that is associated with the page writeback cannot be > freed. > > I'm sure the idea is broken on many levels (I have never looked closely at > the writeback path). But maybe it is good enough as conversation started? > Yeah there are definitely a few ideas down this road that might be possible, I am not sure how a filesystem can be expected to cope or this to be reasonably used without dirty/writeback though because you'll just not track anything or I guess you mean the mapping would be read-only but somehow stay dirty? I also had ideas along these lines of e.g. having a special vmalloc mode which mimics the correct wrprotect settings + does the right thing, but of course that does nothing to help DMA writing to a GUP-pinned page. Though if the issue is at the point of the kernel marking the page dirty unexpectedly, perhaps we can just invoke the mkwrite() _there_ before marking dirty? There are probably some sycnhronisation issues there too. Jason will have some thoughts on this I'm sure. I guess the key question here is - is it actually feasible for this to work at all? Once we establish that, the rest are details :) > -- > Kiryl Shutsemau / Kirill A. Shutemov