From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 16FC3C433F5 for ; Fri, 21 Jan 2022 08:59:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 717816B00A9; Fri, 21 Jan 2022 03:59:38 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 6C7506B00AB; Fri, 21 Jan 2022 03:59:38 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5B62C6B00AC; Fri, 21 Jan 2022 03:59:38 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0041.hostedemail.com [216.40.44.41]) by kanga.kvack.org (Postfix) with ESMTP id 4CABB6B00A9 for ; Fri, 21 Jan 2022 03:59:38 -0500 (EST) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id EAE9C8249980 for ; Fri, 21 Jan 2022 08:59:37 +0000 (UTC) X-FDA: 79053695994.14.6681E56 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf07.hostedemail.com (Postfix) with ESMTP id E295B40003 for ; Fri, 21 Jan 2022 08:59:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=7wc2zUSfhyENFO6ymFjjrv1H+4Rj+PnhH77Hsl91gjc=; b=jrGOS1JRI2plfq4yl+kTVyT8QB lplUqiCN6OZhxp39ABXcb17U2+hx8Zi3Ii7dFt2lQV4Yi+H+jHOQjMkudyjhYmrnmhNmxV2EB8qIK Y7is+ME0ziH3XbydM6URLERYWx+utHOCGGx7fZm6czg8Ry+nqh5edj+D+gJTOaHi55kyzjU9AezKR y/elJtDK+t2+A6ZRbssaNwApsBzd9VDEWE8p2TB1MqNqiEy2TueB83hlPebuGPcpIFA11a4UcN84w WLbuYn0SFqOzDnWP9nAHyd261fwWH8WPAyNVBK27kmbzcAcPv5Fphr1F5PXpAdpJwakqPMD4CYclj 53nSlfJA==; Received: from j217100.upc-j.chello.nl ([24.132.217.100] helo=worktop.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1nAplO-00FROw-8s; Fri, 21 Jan 2022 08:59:18 +0000 Received: by worktop.programming.kicks-ass.net (Postfix, from userid 1000) id 0AB9A9853F1; Fri, 21 Jan 2022 09:59:17 +0100 (CET) Date: Fri, 21 Jan 2022 09:59:17 +0100 From: Peter Zijlstra To: David Hildenbrand Cc: mingo@redhat.com, tglx@linutronix.de, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-api@vger.kernel.org, x86@kernel.org, pjt@google.com, posk@google.com, avagin@google.com, jannh@google.com, tdelisle@uwaterloo.ca, mark.rutland@arm.com, posk@posk.io Subject: Re: [RFC][PATCH v2 1/5] mm: Avoid unmapping pinned pages Message-ID: <20220121085917.GA22849@worktop.programming.kicks-ass.net> References: <20220120155517.066795336@infradead.org> <20220120160822.666778608@infradead.org> <20220121075157.GA20638@worktop.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20220121075157.GA20638@worktop.programming.kicks-ass.net> X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: E295B40003 X-Stat-Signature: tmtmdejhqw5ufdtzo47itnzympjeotwx Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=jrGOS1JR; dmarc=none; spf=none (imf07.hostedemail.com: domain of peterz@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=peterz@infradead.org X-HE-Tag: 1642755576-23478 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Jan 21, 2022 at 08:51:57AM +0100, Peter Zijlstra wrote: > On Thu, Jan 20, 2022 at 07:25:08PM +0100, David Hildenbrand wrote: > > On 20.01.22 16:55, Peter Zijlstra wrote: > > > Add a guarantee for Anon pages that pin_user_page*() ensures the > > > user-mapping of these pages stay preserved. In order to ensure this > > > all rmap users have been audited: > > > > > > vmscan: already fails eviction due to page_maybe_dma_pinned() > > > > > > migrate: migration will fail on pinned pages due to > > > expected_page_refs() not matching, however that is > > > *after* try_to_migrate() has already destroyed the > > > user mapping of these pages. Add an early exit for > > > this case. > > > > > > numa-balance: as per the above, pinned pages cannot be migrated, > > > however numa balancing scanning will happily PROT_NONE > > > them to get usage information on these pages. Avoid > > > this for pinned pages. > > > > page_maybe_dma_pinned() can race with GUP-fast without > > mm->write_protect_seq. This is a real problem for vmscan() with > > concurrent GUP-fast as it can result in R/O mappings of pinned pages and > > GUP will lose synchronicity to the page table on write faults due to > > wrong COW. > > Urgh, so yeah, that might be a problem. Follow up code uses it like > this: > > +/* > + * Pinning a page inhibits rmap based unmap for Anon pages. Doing a load > + * through the user mapping ensures the user mapping exists. > + */ > +#define umcg_pin_and_load(_self, _pagep, _member) \ > +({ \ > + __label__ __out; \ > + int __ret = -EFAULT; \ > + \ > + if (pin_user_pages_fast((unsigned long)(_self), 1, 0, &(_pagep)) != 1) \ > + goto __out; \ > + \ > + if (!PageAnon(_pagep) || \ > + get_user(_member, &(_self)->_member)) { \ > + unpin_user_page(_pagep); \ > + goto __out; \ > + } \ > + __ret = 0; \ > +__out: __ret; \ > +}) > > And after that hard assumes (on the penalty of SIGKILL) that direct user > access works. Specifically it does RmW ops on it. So I suppose I'd > better upgrade that load to a RmW at the very least. > > But is that sufficient? Let me go find that race you mention... OK, so copy_page_range() vs lockless_pages_from_mm(). Since I use FOLL_PIN that should be sorted, it'll fall back the slow path and use mmap_sem and serialize against the fork(). (Also, can I express my hate for __gup_longterm_unlocked(), that function name is utter garbage) However, I'm not quite sure what fork() does with pages that have a pin. There's been a number of GUP vs fork() problems over the years, but I'm afraid I have lost track of that and I can't quickly find anything in the code.. Naively, a page that has async DMA activity should not be CoW'ed, or if it is, care must be taken to ensure the original pages stays in the original process, but I realize that's somewhat hard. Let me dig in a bit more.