From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0B107C77B73 for ; Tue, 2 May 2023 18:01:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 961EE6B0071; Tue, 2 May 2023 14:01:50 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9381A6B0072; Tue, 2 May 2023 14:01:50 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 801DE6B0075; Tue, 2 May 2023 14:01:50 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 738E26B0071 for ; Tue, 2 May 2023 14:01:50 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 50DBE81777 for ; Tue, 2 May 2023 18:01:50 +0000 (UTC) X-FDA: 80746083180.10.10F8785 Received: from mail-ej1-f41.google.com (mail-ej1-f41.google.com [209.85.218.41]) by imf21.hostedemail.com (Postfix) with ESMTP id 4000D1C122C for ; Tue, 2 May 2023 12:37:39 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=jkGnapr+; spf=pass (imf21.hostedemail.com: domain of lstoakes@gmail.com designates 209.85.218.41 as permitted sender) smtp.mailfrom=lstoakes@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1683031288; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ZN+pooP8nd+FXQmX2bQBwmYVFsxwnGNnZBxQw0pqYZA=; b=b9dW+OH6YlypZv4V7YKhfcvQ+pNVRo8swSM7og763dNd20xP2BNrfPwIj+SJfl1QFYncof j2APdL7ap8S2R8iKTVX2RGG6z3nIvwjz3CS7uyqM3UeUzA1mRSRX2t6isU70vzeFk7qBvH fFzNkRMJuBkVKG9/G1XduPXEh3KbLPs= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1683031288; a=rsa-sha256; cv=none; b=YNuLsVWXoUWbD9Jtp/wF6XuypiAAsIhdY7e0FN51p/RQu3TamTBSvsEzOAD54OD+52F4I/ m/IjbKtMipwV5HXkobs+CGFydNhaSOIsJVwliNw9sdogMkxxT6a3t1Lewm/ss4KueLbTg4 S3nTJEH/4Wad5PjssT/bBLVHewhAe1I= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=jkGnapr+; spf=pass (imf21.hostedemail.com: domain of lstoakes@gmail.com designates 209.85.218.41 as permitted sender) smtp.mailfrom=lstoakes@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-ej1-f41.google.com with SMTP id a640c23a62f3a-94ef8b88a5bso574265966b.2 for ; Tue, 02 May 2023 05:36:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1683030912; x=1685622912; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=ZN+pooP8nd+FXQmX2bQBwmYVFsxwnGNnZBxQw0pqYZA=; b=jkGnapr+QzlkPJWg6aqOqrgiFc2PrcUJXX/PZm/YvmXxE/VEG1ISXTYUybH8D9KUzD TppsTY/14F1o05VBpZm3vQQV2m03MAJ9ea9GOXuMXWtGjFUgACwX2phXkJc8ZKoUYOKv gplBPlhOzsx7nr6XuFHYT7AqHlZT7GHS8pwdGGoeJ8o4scKojB96RvlQuVM7gQHOrAtw wIrrrmpgzO+ZgS+DP/TjUnIYxgAxrcWho8u5ZsvC80xEmiwK/uavFNaIrqaiz2nXvBSm wFVTfprx8XAof2OLeBQ0VHzVonW2GI5vwk7T7dA/hqi+23OB+PpyCQvJCkPQJpDJDLIu jSZg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1683030912; x=1685622912; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=ZN+pooP8nd+FXQmX2bQBwmYVFsxwnGNnZBxQw0pqYZA=; b=OLuRTC/iIMMhgoFGTlP1uUuyk618gbDMlQU/kgtMs1nnU0mA3uiwa6PjDNyYUoHGCI MO82L0mzY/ld+9vfd4z0vbH01ygKTMohu6duxWo2oXj81sBWz+dJq0qgcu5Zh4tyruxJ cmGsRpedDkyj8OltJD4uL2XferWQtSfstVz8cnOzD5hto1GtQ2r42lblbTaSyr45vQce OINt9YIExVIWhTye9bzvxC7LpxYVAk+cQ9/fLztjXWgR0Jair5olpfJbdf3dhGWOF59P HDAjYW5Bjgs6ZEXIw7xM3tQ5J0GHLxwPKO4UB9yhmh+SeuqN0V77P0YpObsTlbGXlxZT rScA== X-Gm-Message-State: AC+VfDxw7ov9sXEKUGprr73HFqRD+ApYJ1CKNUVkwbzIdMExvemwM1g8 Jfzk/yLRsH5VAWlry2iItNa5g1mOsJQ1GQ== X-Google-Smtp-Source: ACHHUZ4+DQOWWJ1+EFDDG7oTmHd8+g/1rY3vrIA73s5XXOKVAggQWkPQcEm0YidgqeEZ5RK9+IFzLw== X-Received: by 2002:a05:600c:2309:b0:3ea:f73e:9d8a with SMTP id 9-20020a05600c230900b003eaf73e9d8amr12689569wmo.30.1683030463792; Tue, 02 May 2023 05:27:43 -0700 (PDT) Received: from localhost (host86-156-84-164.range86-156.btcentralplus.com. [86.156.84.164]) by smtp.gmail.com with ESMTPSA id p5-20020a05600c468500b003f18141a016sm38543498wmo.18.2023.05.02.05.27.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 02 May 2023 05:27:43 -0700 (PDT) Date: Tue, 2 May 2023 13:27:42 +0100 From: Lorenzo Stoakes To: Peter Zijlstra Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton , Jason Gunthorpe , Jens Axboe , Matthew Wilcox , Dennis Dalessandro , Leon Romanovsky , Christian Benvenuti , Nelson Escobar , Bernard Metzler , Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Ian Rogers , Adrian Hunter , Bjorn Topel , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , "David S . Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Christian Brauner , Richard Cochran , Alexei Starovoitov , Daniel Borkmann , Jesper Dangaard Brouer , John Fastabend , linux-fsdevel@vger.kernel.org, linux-perf-users@vger.kernel.org, netdev@vger.kernel.org, bpf@vger.kernel.org, Oleg Nesterov , Jason Gunthorpe , John Hubbard , Jan Kara , "Kirill A . Shutemov" , Pavel Begunkov , Mika Penttila , David Hildenbrand , Dave Chinner , Theodore Ts'o , Peter Xu , Paul McKenney Subject: Re: [PATCH v6 3/3] mm/gup: disallow FOLL_LONGTERM GUP-fast writing to file-backed mappings Message-ID: <44ee78e9-6cc9-4aee-92fd-e5335576a55c@lucifer.local> References: <20230502111334.GP1597476@hirez.programming.kicks-ass.net> <20230502120810.GD1597538@hirez.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20230502120810.GD1597538@hirez.programming.kicks-ass.net> X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 4000D1C122C X-Stat-Signature: 6zx9umcw83odcfgexdtupnsuze8yyqhx X-Rspam-User: X-HE-Tag: 1683031059-662636 X-HE-Meta: U2FsdGVkX1+1s5axcwd0QZUz+6zBus8FrEei2ILJmopbgt6Qp2jPoPr6vVvaLHMPO3UaFr8Ux0E8sJ1NehXkTIfcjw1e33RQZXg6OQYpgQi2tWzVvQf5KM7UEs/sc/RCjFSmVfykA/ThVDkPnTY6ZpcRpGiGsgfz6uWmbG0dflo/RGNoRdDH6rQaaw0e2ld6XnawJqg1+7IjxN6TT4sRwCQHhazqNk0ABl+LSpwGCZ1+DdNOoIAXOQkeEXlHjSGF50Qrs5161i4Z6hlpoveMsX/YwQDulSaiGHRLns9CI4PktPJu8S11hLiuciiLO/BuoxZSZfjAP7oJPGvyPXkzUKWAujO5Qi9JSANZd25BDazhvgFRC3+2eOl3utZZvLxosTb6CV+EeSzoDAgj752hV4cN1KxdK2mGxHr+imm34OBQMIrEyFW2C5n0K/zkl0KfELVXQQMwaUhpKMijlWBHvU3q8+89TlNBI+pWg4+geZbPlk4EKm9/rPlQV783uPnckScYEQIV5WINTy3wexWAs8K0hdz6V0nfBkn3lPTWmv4AhQ39FObXHFbB0RWEGrq0XUX0QHHPMChof0uiA0OUX0opm9dI7S8ZVvr7sv9bJAn2J90axTM/JdyB+/NtSunCenEYTixIeaf+P68o+GWAdrc5iBPQlVBG72/9noC7mwaCpBkU5NxHk6tJ7dKf0Nf8jPF1c39JCw81Bhvtlokq43jVIDfP6W9NwiHiMQGebqkk6pi+hAVugpITAfm4aC2JCWFvONbEzcKUEoA87J4Ow0eoopjYoOLQR2hBale3w8TJRN/chnq1lOwsFCIBWfp7QFS7VjFRURkWhyYEGW/iZ1hcpDGDJRQqOagxC+gjodl/WeaUCyiTj1s5/8ZaUWsmxh+M5TQs20TadETmEoRJilOSzQgTygG9yJTAAK56w9DjYW9MEQ6g2GPCxrPEJ12DGoKdcY9z8fdA6SDBy9H OAq/LQoL 3fMKO8r/gWI0hiQXxSupyu5gRX+SfbQDFTeOKoai7LrwWjQ1l7T7Tg2Irgtal0IURKCOjGpWHoci88U2jCJfYES/c2j2CTZz28+UyIxfwp/NWtn7f2TbdQwEJ0i15yq60hkw3sTYjKIimJ2IV9QVsq382GA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, May 02, 2023 at 02:08:10PM +0200, Peter Zijlstra wrote: > On Tue, May 02, 2023 at 12:25:54PM +0100, Lorenzo Stoakes wrote: > > On Tue, May 02, 2023 at 01:13:34PM +0200, Peter Zijlstra wrote: > > > On Tue, May 02, 2023 at 12:11:49AM +0100, Lorenzo Stoakes wrote: > > > > @@ -95,6 +96,77 @@ static inline struct folio *try_get_folio(struct page *page, int refs) > > > > return folio; > > > > } > > > > > > > > +#ifdef CONFIG_MMU_GATHER_RCU_TABLE_FREE > > > > +static bool stabilise_mapping_rcu(struct folio *folio) > > > > +{ > > > > + struct address_space *mapping = READ_ONCE(folio->mapping); > > > > + > > > > + rcu_read_lock(); > > > > + > > > > + return mapping == READ_ONCE(folio->mapping); > > > > > > This doesn't make sense; why bother reading the same thing twice? > > > > The intent is to see whether the folio->mapping has been truncated from > > underneath us, as per the futex code that Kirill referred to which does > > something similar [1]. > > Yeah, but per that 3rd load you got nothing here. Also that futex code > did the early load to deal with the !mapping case, but you're not doing > that. > OK I drafted a response three times then deleted which shows you how this stuff messes with your mind :) I realise now that literally it is checking whether the previous !mapping case and lack of action taken on that was valid for futex, rendering this pointless for the logic here. We do check !mapping later but obviously with the 'stable' mapping whose relation to pre-rcu lock is irrelevant. Thanks for patiently explaining this :) RCU remains an area I need to take a closer look at generally. > > > Who cares if the thing changes from before; what you care about is that > > > the value you see has stable storage, this doesn't help with that. > > > > > > > +} > > > > + > > > > +static void unlock_rcu(void) > > > > +{ > > > > + rcu_read_unlock(); > > > > +} > > > > +#else > > > > +static bool stabilise_mapping_rcu(struct folio *) > > > > +{ > > > > + return true; > > > > +} > > > > + > > > > +static void unlock_rcu(void) > > > > +{ > > > > +} > > > > +#endif > > > > > > Anyway, this all can go away. RCU can't progress while you have > > > interrupts disabled anyway. > > > > There seems to be other code in the kernel that assumes that this is not > > the case, > > Yeah, so Paul went back on forth on that a bit. It used to be true in > the good old days when everything was simple. Then Paul made things > complicated by separating out sched-RCU bh-RCU and 'regular' RCU > flavours. > > At that point disabling IRQs would only (officially) inhibit sched and > bh RCU flavours, but not the regular RCU. > > But then some years ago Linus convinced Paul that having all these > separate RCU flavours with separate QS rules was a big pain in the > backside and Paul munged them all together again. > > So now, anything that inhibits any of the RCU flavours inhibits them > all. So disabling IRQs is sufficient. > > > i.e. the futex code, though not sure if that's being run with > > IRQs disabled... > > That futex code runs in preemptible context, per the lock_page() that > can sleep etc.. :-) OK I am actually really happy to hear this because this means I can go simplify this code significantly! > > > > > +/* > > > > + * Used in the GUP-fast path to determine whether a FOLL_PIN | FOLL_LONGTERM | > > > > + * FOLL_WRITE pin is permitted for a specific folio. > > > > + * > > > > + * This assumes the folio is stable and pinned. > > > > + * > > > > + * Writing to pinned file-backed dirty tracked folios is inherently problematic > > > > + * (see comment describing the writeable_file_mapping_allowed() function). We > > > > + * therefore try to avoid the most egregious case of a long-term mapping doing > > > > + * so. > > > > + * > > > > + * This function cannot be as thorough as that one as the VMA is not available > > > > + * in the fast path, so instead we whitelist known good cases. > > > > + * > > > > + * The folio is stable, but the mapping might not be. When truncating for > > > > + * instance, a zap is performed which triggers TLB shootdown. IRQs are disabled > > > > + * so we are safe from an IPI, but some architectures use an RCU lock for this > > > > + * operation, so we acquire an RCU lock to ensure the mapping is stable. > > > > + */ > > > > +static bool folio_longterm_write_pin_allowed(struct folio *folio) > > > > +{ > > > > + bool ret; > > > > + > > > > + /* hugetlb mappings do not require dirty tracking. */ > > > > + if (folio_test_hugetlb(folio)) > > > > + return true; > > > > + > > > > > > This: > > > > > > > + if (stabilise_mapping_rcu(folio)) { > > > > + struct address_space *mapping = folio_mapping(folio); > > > > > > And this is 3rd read of folio->mapping, just for giggles? > > > > I like to giggle :) > > > > Actually this is to handle the various cases in which the mapping might not > > be what we want (i.e. have PAGE_MAPPING_FLAGS set) which doesn't appear to > > have a helper exposed for a check. Given previous review about duplication > > I felt best to reuse this even though it does access again... yes I felt > > weird about doing that. > > Right, I had a peek inside folio_mapping(), but the point is that this > 3rd load might see yet *another* value of mapping from the prior two > loads, rendering them somewhat worthless. > > > > > + > > > > + /* > > > > + * Neither anonymous nor shmem-backed folios require > > > > + * dirty tracking. > > > > + */ > > > > + ret = folio_test_anon(folio) || > > > > + (mapping && shmem_mapping(mapping)); > > > > + } else { > > > > + /* If the mapping is unstable, fallback to the slow path. */ > > > > + ret = false; > > > > + } > > > > + > > > > + unlock_rcu(); > > > > + > > > > + return ret; > > > > > > then becomes: > > > > > > > > > if (folio_test_anon(folio)) > > > return true; > > > > This relies on the mapping so belongs below the lockdep assert imo. > > Oh, right you are. > > > > > > > /* > > > * Having IRQs disabled (as per GUP-fast) also inhibits RCU > > > * grace periods from making progress, IOW. they imply > > > * rcu_read_lock(). > > > */ > > > lockdep_assert_irqs_disabled(); > > > > > > /* > > > * Inodes and thus address_space are RCU freed and thus safe to > > > * access at this point. > > > */ > > > mapping = folio_mapping(folio); > > > if (mapping && shmem_mapping(mapping)) > > > return true; > > > > > > return false; > > > > > > > +} > > > > I'm more than happy to do this (I'd rather drop the RCU bits if possible) > > but need to be sure it's safe. > > GUP-fast as a whole relies on it :-) Indeed, the only question was what happened with CONFIG_MMU_GATHER_RCU_TABLE_FREE arches which appeared to require special handling, but I'm very happy to hear they don't! Will respin along the lines of your suggestion.