From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5D29CC636D3 for ; Wed, 8 Feb 2023 20:25:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 756196B0074; Wed, 8 Feb 2023 15:25:16 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 7056C6B0075; Wed, 8 Feb 2023 15:25:16 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 61B9B6B0078; Wed, 8 Feb 2023 15:25:16 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 539836B0074 for ; Wed, 8 Feb 2023 15:25:16 -0500 (EST) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id F31801C66C3 for ; Wed, 8 Feb 2023 20:25:15 +0000 (UTC) X-FDA: 80445254190.06.8469ACB Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf26.hostedemail.com (Postfix) with ESMTP id D9F1114000D for ; Wed, 8 Feb 2023 20:25:13 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=d8Wq9JKN; dmarc=none; spf=none (imf26.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1675887914; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=nyH4i8LSvACbEl0fvr93NzvXMYjoS7v+0SiwtmSXxXY=; b=Pd8yEmCS/vk0/XgY4Y94kqO9DOpnJMn3Vq12O0e1yzStO5Qg3tQBrZ7i+/A0mLJcl0QpYN G0wSqQVPSFK169WeMfgfo3MPVFr2t867IjKYri+nmkm3+t5pgX5GHHivoKV4C1jJysau2E 336U03wikpA9o3EvFbslcWINyw7Jgrc= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=d8Wq9JKN; dmarc=none; spf=none (imf26.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1675887914; a=rsa-sha256; cv=none; b=Z9QcPiWqVGrIhYY1mw8+uZplA+mFT93M70jQ9QCLG4mVSAkBoeEwa8iEH3QjPXvFNsgmR8 fIUesZd6JKx17BgHo+TdGOR5teP86dYaY3UFFS1ruGi/Vr8L+Qckm2adfARMfMXrmt5Rcc +zYkIMtL8ETSjpYhKY7WvST/WzxOQbw= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=nyH4i8LSvACbEl0fvr93NzvXMYjoS7v+0SiwtmSXxXY=; b=d8Wq9JKNU9gkCl/Q1Bflutp98o dFkApDkE2fe3pJmOmDOyWamBn/IdGTYGNMwC65wJD1NFlkou/o1skgqWduXp9x0BnsDVmw/5uyR+w 0uj2P5+fFkEK0ZMpAop0Mvveij/InspcVqJlCHi+CZkKlinsl7pdrUDCAYVRqLYtEzeBatSYRLrtP UmsTgDoJkIc+CeUX2ZMR4hH8ePS7hgAeA+OT7WozMd2arFwTvQPRcoypoTiTS+GMWnmI7eWAjZ6i0 4QcluHwlKVXSGbX8+2JB2F5FHsVTwn0psWXgYmrkMuGxVDQb/fALakLFtS6Am6rzswvo2AdOkDD5I POKsTU7A==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1pPr0A-001WRV-Cj; Wed, 08 Feb 2023 20:25:10 +0000 Date: Wed, 8 Feb 2023 20:25:10 +0000 From: Matthew Wilcox To: Peter Xu Cc: linux-mm@kvack.org, Vishal Moola , Hugh Dickins , Rik van Riel , David Hildenbrand , "Yin, Fengwei" Subject: Re: Folio mapcount Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Queue-Id: D9F1114000D X-Rspamd-Server: rspam09 X-Rspam-User: X-Stat-Signature: xsa9rfpmt7bphgjkpuftdin4y1iiub75 X-HE-Tag: 1675887913-620975 X-HE-Meta: U2FsdGVkX1/gT2XOnC1oL0obmp3HZD+z6tJ8Dca6zHZkQHaQhfoz9Sx8hXlcxqGbnrRyVU92wz2G6OteVie+wPd2DiEnyQzDFyz6d7IvQcEPocjOtCztXIeelex2s4fEkUCks/M4gYscZyoziOu4I9UAYyMxGeBxj/KraIcLFbrycH/gixz3PYyl0G8LZBQttiKQ4pCn/kwJtBBjtIhm8IQGSIAMy4hbQ0uO1fMX1W1Mb6NY8xTp+YhNv6nF71HQCqxv2uOAm/I/o2axZnNu5sdF7eeaeKTNCmXI4Dq3z/MLWyyMZSfsOaxG0mV/OUHbAA4reVsvhhCL+EM9SCznrjk6eFLTDXPOOg+JBiuxR5GHt1qP/P7esJWloeax468ydQg+b/c0HEYPP+7dQlpGjtw6GmWELCIN053ei/lTiF49ijIDvZx5fPUcbKkKsAvIfmK9xwAWgiiALSMQjjNEKa61Rulfic48s4K1+Fk95NHU906SBW6cQoLsjRH+RumjieA22h6V3fBwCaaJ8J92lQNtzzv+uSsTp3S34Xe3yo+UxeIOq5BoaeZB4d4TaM4lUD0FBJEure4836lYfM1EJJJxZzSa1nixXj0JktRhfAJmTxM+7+sUxpYcGHKyP0zV/RRXqzWzpeewGKu1ZmfAB8KChrfKMa000YKuLOSNVjpu1s4KQW1SR5p+uVXL0g8A/j1CLIH4SB11/s/gzeasC5phRe6qzFGbuNCR1/b12TYE4R5LwcIvoXhpmB3xVN1lRv5cixomgwcRufHd1EKr6feWUMdFN00LyQfK16xygUiBcQI/F7PdDIJvhZU4FRzcKoMAD8zBALfHuSvXb4R1q0BKtGFX4P1NDvJW8u5wO7eamuxpK7diPkBiWUHYtOzqP5I/LkA5MweoJLXKUtWXQajruF7m3gQQ3i6xZ39EiRN8VKyHxG1/tFcIFXPzw/FE+mFiK93MpdYexPgIuPB JxD4NQRS w5AiFaHmR15PQ1zrWxReZ4sBWRZMLSu1Ee3/sjcxbQ7r4ipxOzRoKtgtZWHihSlJu+EGXSj/UFwb8tgtGMwgaVTi0iOgf23YaxLt7T6htRdwCZ3NOUCatjZRf4VK9bUHsx5Rjg61m+V4MhK/eGGAVwQ8CvR5MU/dyxNURHrMzsyvxyOFlEi1/uDe57j3Y84s15okM+7DF1a7XWTiEPYQLz1tFcLqnOFDT5FGz X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Feb 08, 2023 at 02:40:11PM -0500, Peter Xu wrote: > On Tue, Feb 07, 2023 at 11:27:17PM +0000, Matthew Wilcox wrote: > > I've been thinking about this one, and I wonder if we can do it > > without taking any pgtable locks. The locking environment we're in > > is the page fault handler, so we have the mmap_lock for read (for now > > anyway ...). We also hold the folio lock, so _if_ the folio is mapped, > > those entries can't disappear under us. > > Could MADV_DONTNEED do that from another pgtable that we don't hold the > pgtable lock? Oh, ugh, yes. And zap_pte_range() has the PTL first, so we can't sleep to get the folio lock. And we can't decline to zap the pte on a failed folio_trylock() (well, we could for MADV_DONTNEED, but not in general). So ... how about this for a solution: - If the folio overlaps into the next PMD table, spin_lock it. - If the folio overlaps into the previous PMD table, unlock our PTL, lock the previous PTL, re-lock our PTL. - Do the pvmw, telling it we already have the PTLs held (new PVMW flag). [explanation simplified; if there is no prior PMD table or if the VMA limits how far to search, we can skip this] We have prior art for taking two PTLs in copy_page_range(). There, the hierarchy is clear; one VMA belongs to the process parent and one to the child. I don't believe we have precedent for taking two PTLs in the same VMA, but I think my proposal (order by ascending address in the process) is the obvious order to choose.