From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 49BA8C77B73 for ; Tue, 2 May 2023 17:54:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D5B756B0080; Tue, 2 May 2023 13:54:16 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CE5BA6B0081; Tue, 2 May 2023 13:54:16 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B37406B0082; Tue, 2 May 2023 13:54:16 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 9C4836B0080 for ; Tue, 2 May 2023 13:54:16 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 37BDA160FD7 for ; Tue, 2 May 2023 15:24:51 +0000 (UTC) X-FDA: 80745687582.10.F79C3A5 Received: from mail-lj1-f176.google.com (mail-lj1-f176.google.com [209.85.208.176]) by imf04.hostedemail.com (Postfix) with ESMTP id 23ECC40BF7 for ; Tue, 2 May 2023 11:37:06 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=kAg8E6m4; spf=pass (imf04.hostedemail.com: domain of lstoakes@gmail.com designates 209.85.208.176 as permitted sender) smtp.mailfrom=lstoakes@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1683027446; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=kRs10HvNSyY4aofcI1ZBj0OmWfT5cmFsMr/bJt88upc=; b=E4KdgkArMhiH1njBmoEy6u4rK/G451d5R1+jiGnpE59rRAAibckCvNn5mZszLsLHLqSLz6 0YcE6jNiaJdyRJFRiM8NrqgltxEDx5qmStXZzlnqUopr5JMp16kOi3VfGcKLtjD1q13K2C caR6qE84CTAG5ME4ke6y2bzw/Edmgpk= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1683027446; a=rsa-sha256; cv=none; b=greMODkv78qNiTJafSY0DYBpA3BgscWqGusU7K+hSyWzlC0ND6JlU0zaNRGrwzCQms5y06 aEisp3hU5xODhOzVOvKdVx161HOG6XGOX+HyXTg1mbqUnctgQvD3+3TnQtVUwRFNAS8djY kQthMnoE47QyrJK+Wi4ROjq+xswHxmo= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=kAg8E6m4; spf=pass (imf04.hostedemail.com: domain of lstoakes@gmail.com designates 209.85.208.176 as permitted sender) smtp.mailfrom=lstoakes@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-lj1-f176.google.com with SMTP id 38308e7fff4ca-2a7ac89b82dso35877041fa.1 for ; Tue, 02 May 2023 04:36:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1683027329; x=1685619329; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=kRs10HvNSyY4aofcI1ZBj0OmWfT5cmFsMr/bJt88upc=; b=kAg8E6m4lnjgelYEUczRzEs8jFuiImODk9AYfNdJuJedADOhJKCEYuPLTsJwXydyqi 9OIB2pnSBtMYO8idnubF0sCeTPIcsBlnLNqJe2JTZwKA2hhcq8/FVDvz7wD/gY+Bt5mN tKsjk5hv8zUm4jTtqpzCCBD/cDeOtU1MNcuU2tv+dteX4S9eOlOlVLCVGqT/AuQ8bI+G bg2wlXLDbg99uXDiDTZ07YpK0dL2CsZi753dbHHqdAtaO7l78MkhYYm9NmGe0dYPjCDr 3kXUkUEfLK2GzV0aaB7o8Xe8JnJR5615AA65qm4grh0339Cb/JFq12+UYfEXQRhrcrxX YNJw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1683027329; x=1685619329; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=kRs10HvNSyY4aofcI1ZBj0OmWfT5cmFsMr/bJt88upc=; b=SyNPiub+X6EhOPJ99VT7lJLv8qmhEx/cILCEjdl4863F7vQwgG7vZ/16ibWhO04zDA 8xbLjaGgzCDSc5dUEX8FYMmCuwKWRmwZCSYU/IYaT01O20a58sOGpqVVw+6jMRrYxCch 8VekHZ4y8GGULsYih1IpAccygAix/buMc4PIaik8EyoWDAqzKX097TPquJoYJO/VSFId o97IooaqiQt5CnJWwcdCSxCs5ryxlYzhTd4C3xj9bFO5TROcQRpVsEroQ8QWqyE8FIDx jyK8tBCfKsmznMwDNvWLKGaE+JXH7FecIySnMZNjy+DHmD4Sm8ImruND3owMKp23UFHm qv2w== X-Gm-Message-State: AC+VfDxTOuQTDXPTEifv6csXCeyF6dCizJK8kjZDiF+9/Qnb6irvJM+a U/6pxPlAxb46mWqVuWU3m/mmRZmGFlarUw== X-Google-Smtp-Source: ACHHUZ5LhZMn8zC/XVOdRH0O9q5eFynI6mY/ADOayr/C7Ur8HRCDGD1BQqumNxthDJQRB5HwenjdKQ== X-Received: by 2002:a5d:544c:0:b0:306:264d:5667 with SMTP id w12-20020a5d544c000000b00306264d5667mr6360015wrv.41.1683026756161; Tue, 02 May 2023 04:25:56 -0700 (PDT) Received: from localhost (host86-156-84-164.range86-156.btcentralplus.com. [86.156.84.164]) by smtp.gmail.com with ESMTPSA id s4-20020adfeb04000000b003047f7a7ad1sm21455310wrn.71.2023.05.02.04.25.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 02 May 2023 04:25:55 -0700 (PDT) Date: Tue, 2 May 2023 12:25:54 +0100 From: Lorenzo Stoakes To: Peter Zijlstra Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton , Jason Gunthorpe , Jens Axboe , Matthew Wilcox , Dennis Dalessandro , Leon Romanovsky , Christian Benvenuti , Nelson Escobar , Bernard Metzler , Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Ian Rogers , Adrian Hunter , Bjorn Topel , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , "David S . Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Christian Brauner , Richard Cochran , Alexei Starovoitov , Daniel Borkmann , Jesper Dangaard Brouer , John Fastabend , linux-fsdevel@vger.kernel.org, linux-perf-users@vger.kernel.org, netdev@vger.kernel.org, bpf@vger.kernel.org, Oleg Nesterov , Jason Gunthorpe , John Hubbard , Jan Kara , "Kirill A . Shutemov" , Pavel Begunkov , Mika Penttila , David Hildenbrand , Dave Chinner , Theodore Ts'o , Peter Xu Subject: Re: [PATCH v6 3/3] mm/gup: disallow FOLL_LONGTERM GUP-fast writing to file-backed mappings Message-ID: References: <20230502111334.GP1597476@hirez.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20230502111334.GP1597476@hirez.programming.kicks-ass.net> X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 23ECC40BF7 X-Stat-Signature: ap3aytyj9zwurrd8ef1cxof57kh16hwp X-Rspam-User: X-HE-Tag: 1683027426-171708 X-HE-Meta: U2FsdGVkX1/uwRQnyaUYU3Fx1y6URmVrzKf0fuS9rte5iX7GNKVunAx9pM/nqzasbRoBNTevXAMBQCGxbigygCVVr6cwvwvwPRaO6G6g3h55MGpgfR7F1/WimYblcvO3eMiT7NNlIeWiQFiO6l80NIU7eWmjg6yYUFubctOJzVFkDl8Dp1XoHfeRbzoz4T/W9tW7Dvx4rZYg3xWOOxbjZ+TNmYJL4Jj6DZCvQerYwaO4XQghTMMIxhxF781tTum1RsBSXjeohFcvB9FDXG358NyFI07gu8DIkjDrZfle4NQycLDYtQ1lNsmWLNpN2W8kD9GPAFhBehZIYnlCM3rKwQ1NfJWLQrnXgP85ox1rxP8fkUp0C1rJwQwkNpXNT1u/7FQrHmMspRh3pb/ETQetBH1PtY8VJLm1elTKNdX5UJ6hD4N5ey2FIx7S70DHjlm6m9vUoJ0cljU0fTiHUP8gqFs+J9dk11Rwz4+HnTyOVTnMg3ky1m1VJcZm/MBpz9rogQNb4yje89OW0U5v3dPCpqyRwvmIZegxJ/wq+5XXSKUK3WCqkuD6io9MqnaxwhEhG2JHkFLcaR7k4ppaVPWL2aV/4qH17UXeOQrDDxx16nMouVBuh17Nq27OOMtm7FCVQO6TFrnksF9aBJnbnArysPuzN4ZF3CJFPM85/qgdm5H68fyI6swiUNifxdCVzH0psGlv869Up6tePTSc2KmWBUsLQPLCM+6jagOUQoC2G0ScfJ4omSegKsHwQasGD8W45dRQ86oM4+BncRNIJNAlKsxJWvXSXAufWJBMWvI9xenAZ5jVZbsXgZNi41exRu4Gl5mb//XCZ4+3F3EkMSvmzhUAXfFc26w1WnIlDDhTXtPUoHFO4RGVatca96DDfg28kVc48NM0BxZQJKsLMRtIjDEVTNGxqJPUJt5JBvxJ6rxEc2+9AgF1oi2l+6Mo/9kucU9ZfPKbQl/kAsjUP2K PgQDfWcR PEqsfcVE8G/wEEhCsYFtiH/MZwvZ32PzleJOV1YEJBeDq+0oTJ4BC/W+tMM861UfoJARc4NNcqVtqZ94ibg8P81GAHzuSF95BTAxWbDFBDmMRNZwSj6wWxL7OzIKyHbIzjwRibEzoZEmW70/kmyLf3AlQYr9fw8R2WC+l X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, May 02, 2023 at 01:13:34PM +0200, Peter Zijlstra wrote: > On Tue, May 02, 2023 at 12:11:49AM +0100, Lorenzo Stoakes wrote: > > @@ -95,6 +96,77 @@ static inline struct folio *try_get_folio(struct page *page, int refs) > > return folio; > > } > > > > +#ifdef CONFIG_MMU_GATHER_RCU_TABLE_FREE > > +static bool stabilise_mapping_rcu(struct folio *folio) > > +{ > > + struct address_space *mapping = READ_ONCE(folio->mapping); > > + > > + rcu_read_lock(); > > + > > + return mapping == READ_ONCE(folio->mapping); > > This doesn't make sense; why bother reading the same thing twice? The intent is to see whether the folio->mapping has been truncated from underneath us, as per the futex code that Kirill referred to which does something similar [1]. > > Who cares if the thing changes from before; what you care about is that > the value you see has stable storage, this doesn't help with that. > > > +} > > + > > +static void unlock_rcu(void) > > +{ > > + rcu_read_unlock(); > > +} > > +#else > > +static bool stabilise_mapping_rcu(struct folio *) > > +{ > > + return true; > > +} > > + > > +static void unlock_rcu(void) > > +{ > > +} > > +#endif > > Anyway, this all can go away. RCU can't progress while you have > interrupts disabled anyway. There seems to be other code in the kernel that assumes that this is not the case, i.e. the futex code, though not sure if that's being run with IRQs disabled... if not and it's absolutely certain that we need no special handling for the RCU case, then happy days and more than glad to remove this bit. I'm far from an expert on RCU (I need to gain a better understanding of it) so I'm deferring how best to proceed on _this part_ to the community. > > > +/* > > + * Used in the GUP-fast path to determine whether a FOLL_PIN | FOLL_LONGTERM | > > + * FOLL_WRITE pin is permitted for a specific folio. > > + * > > + * This assumes the folio is stable and pinned. > > + * > > + * Writing to pinned file-backed dirty tracked folios is inherently problematic > > + * (see comment describing the writeable_file_mapping_allowed() function). We > > + * therefore try to avoid the most egregious case of a long-term mapping doing > > + * so. > > + * > > + * This function cannot be as thorough as that one as the VMA is not available > > + * in the fast path, so instead we whitelist known good cases. > > + * > > + * The folio is stable, but the mapping might not be. When truncating for > > + * instance, a zap is performed which triggers TLB shootdown. IRQs are disabled > > + * so we are safe from an IPI, but some architectures use an RCU lock for this > > + * operation, so we acquire an RCU lock to ensure the mapping is stable. > > + */ > > +static bool folio_longterm_write_pin_allowed(struct folio *folio) > > +{ > > + bool ret; > > + > > + /* hugetlb mappings do not require dirty tracking. */ > > + if (folio_test_hugetlb(folio)) > > + return true; > > + > > This: > > > + if (stabilise_mapping_rcu(folio)) { > > + struct address_space *mapping = folio_mapping(folio); > > And this is 3rd read of folio->mapping, just for giggles? I like to giggle :) Actually this is to handle the various cases in which the mapping might not be what we want (i.e. have PAGE_MAPPING_FLAGS set) which doesn't appear to have a helper exposed for a check. Given previous review about duplication I felt best to reuse this even though it does access again... yes I felt weird about doing that. > > > + > > + /* > > + * Neither anonymous nor shmem-backed folios require > > + * dirty tracking. > > + */ > > + ret = folio_test_anon(folio) || > > + (mapping && shmem_mapping(mapping)); > > + } else { > > + /* If the mapping is unstable, fallback to the slow path. */ > > + ret = false; > > + } > > + > > + unlock_rcu(); > > + > > + return ret; > > then becomes: > > > if (folio_test_anon(folio)) > return true; This relies on the mapping so belongs below the lockdep assert imo. > > /* > * Having IRQs disabled (as per GUP-fast) also inhibits RCU > * grace periods from making progress, IOW. they imply > * rcu_read_lock(). > */ > lockdep_assert_irqs_disabled(); > > /* > * Inodes and thus address_space are RCU freed and thus safe to > * access at this point. > */ > mapping = folio_mapping(folio); > if (mapping && shmem_mapping(mapping)) > return true; > > return false; > > > +} I'm more than happy to do this (I'd rather drop the RCU bits if possible) but need to be sure it's safe.