From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A8750C636D3 for ; Wed, 8 Feb 2023 20:59:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 344A06B0071; Wed, 8 Feb 2023 15:59:06 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 2F49B6B0072; Wed, 8 Feb 2023 15:59:06 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1BC606B0074; Wed, 8 Feb 2023 15:59:06 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 0C6876B0071 for ; Wed, 8 Feb 2023 15:59:06 -0500 (EST) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id E23EF1C65AB for ; Wed, 8 Feb 2023 20:59:05 +0000 (UTC) X-FDA: 80445339450.13.5574A16 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf27.hostedemail.com (Postfix) with ESMTP id B1F7440010 for ; Wed, 8 Feb 2023 20:59:03 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=ckHrveEJ; spf=pass (imf27.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1675889943; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=mU3NeuIPoOKu63gOCZ54tW3bNpoRcpu1l+yMWlevqtg=; b=SK4kmZmuvQEKaZsO9AhLCXjUOa/BNFx4jWjPxOKI7a2rIUax2Fls7tl3uf+QeBcA6nfl4Y Af/P1Kl7LWBBvcCn+Xbcrew8I7dFQjQFQPiqgmP7KpNw1sJRaE6OpeSfjBPOx6A+FAjAiu dfEZaLCDOBx4LVxE1HdOK+2gPb7IDxo= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=ckHrveEJ; spf=pass (imf27.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1675889943; a=rsa-sha256; cv=none; b=1RMl1UGOgPaOpZQG51SXUyY6FIwgyeOAty7XjCxsj1kA5IU1C0cAARQWcM+5W9iZxAXVPC yRM8wXDcSSr13UBNfpHIEWBL3qI0Bs6DUmHqqMWFXyQXKTqfCEhLnL3xOTZmq20loQuK+m JHWVn92HESvMBmGc5yd0HjzdmXS4KeU= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1675889943; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=mU3NeuIPoOKu63gOCZ54tW3bNpoRcpu1l+yMWlevqtg=; b=ckHrveEJjB1ZdGe08Je++oTVTzH8K1VvVdP7WmJdWAn5wApStl/RoHM8n+NIlw9EhDrbf/ 2//ydpw6xSouBvdhe308OmqdZrI5haDtCJRHeSXdBrBna9VkdSeUVK5PqQYMo28EuY6qit /jG9z8ujD4gaGrOe2nGXBO9O1A1KmgY= Received: from mail-qt1-f198.google.com (mail-qt1-f198.google.com [209.85.160.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-474-BypDbSBDPwOio2rH4LWplw-1; Wed, 08 Feb 2023 15:58:59 -0500 X-MC-Unique: BypDbSBDPwOio2rH4LWplw-1 Received: by mail-qt1-f198.google.com with SMTP id t5-20020a05622a180500b003b9c03cd525so11606644qtc.20 for ; Wed, 08 Feb 2023 12:58:59 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=mU3NeuIPoOKu63gOCZ54tW3bNpoRcpu1l+yMWlevqtg=; b=pswt9msrLTkRhERlbwakUv2Cj6USAWfIML1gQlRJ5KjSRWy/F+in33TLF50oTR7aIV dyqmICxA2t59RGR96nyr0pKxvjrZXI6sOsF5pP9xxKRexaSbp9tiuUEkhKjxfxCwMtqf Xbp6oO2DxhEjzlY9EpexYOxcIj8cN+GBdGCVoHZoohEej6IwGeqF0yBQvF9c8Pya8Qkz 9yYcHh47OGeymDmya8XXT8ZpovGdcnIICRz4MU2psM8Rz+fxOz8jJatg8Kv4k4lmI7TF Ywq13ZBfjkPiGomt6/3eMbRKX6nBgO5MoErATEWX2ol8oPMI/tl7SmDCpVsbD/IHB2t/ wZfA== X-Gm-Message-State: AO0yUKXY+VcKKkxBrSBvE0mZOuH3gQ4DV+FEmZGmfbwcByQGH6HVOr7f w3wRNMGWX/wRAwpz9SfjUxxkRVmrpRU6z483+ZBRm1hpTOikmgwl/S9vkbwfW19lLLJ2mqN2yV1 Q295j64mnCfo= X-Received: by 2002:a05:622a:1109:b0:3b8:6bef:61df with SMTP id e9-20020a05622a110900b003b86bef61dfmr17005789qty.6.1675889939179; Wed, 08 Feb 2023 12:58:59 -0800 (PST) X-Google-Smtp-Source: AK7set9tjcVm4O0p79xlswjfTPYLy8eMS3Bn3wguleOFkG7r+Qbe32Mq6Fb4WfYVGbI3sH2DF26XwA== X-Received: by 2002:a05:622a:1109:b0:3b8:6bef:61df with SMTP id e9-20020a05622a110900b003b86bef61dfmr17005769qty.6.1675889938929; Wed, 08 Feb 2023 12:58:58 -0800 (PST) Received: from x1n (bras-base-aurron9127w-grc-56-70-30-145-63.dsl.bell.ca. [70.30.145.63]) by smtp.gmail.com with ESMTPSA id c29-20020ac8009d000000b003b323387c1asm12131146qtg.18.2023.02.08.12.58.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 08 Feb 2023 12:58:55 -0800 (PST) Date: Wed, 8 Feb 2023 15:58:54 -0500 From: Peter Xu To: Matthew Wilcox Cc: linux-mm@kvack.org, Vishal Moola , Hugh Dickins , Rik van Riel , David Hildenbrand , "Yin, Fengwei" Subject: Re: Folio mapcount Message-ID: References: MIME-Version: 1.0 In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: B1F7440010 X-Rspam-User: X-Stat-Signature: 9ezsobjp41gjuuz6tca3onzz7y4farks X-HE-Tag: 1675889943-992517 X-HE-Meta: U2FsdGVkX1+ybthev7D0dyD1wqnje3actEC0b+w1pHApL9aRzDAQlL+bQc4bXJMkSUMf6ItGEH2Cb8L4K7IlbiQoJVMJ9K/TeADmC2WgBsoQN8TD9vCah+MjEbMJkF5mPYgb/cwz6R+zaXTs8Y8iAsv8F2tAVCqw5fnKWwxKqIYEFaE72icolTOtr4W2ZiT1HdJRqH1/t4/ZdvOZfJTcIC+J242f00orRZfEk4B4q4eL4XILh61qFkLVYF0NXILwL2ANJ1pLg76nSXlASHTS/fF6Zs4vBKjjJks6XIOcBLZSqNa9cIBNdJmvkMiVzAzByCpueKxbLDOgPI1k3GNhP6W/y9DXh43Z+ixnC9WqhiEGO0fR8j+yM1KdbsOwONDmfcE+K7lxMemnTEChrLSmUyMtnFB239js6Ctah8FgV0x2d6HbhH1/ED0nS5Cq1Bl9TWFVd1uucQZa5SU33se3jtx2tpy9Jmxtk2LsO/YUod2npsrZf+xAgW3RvURiWXn2DEhyl8LZCoVn8w42O+HC0dp5qjpJm+up19P1YajDmsAV2pOllgzT9FfzGKCjIopGKbryV8C2tuZOYpbDRgMt7OZjDTS5jgr8+mjOib/bSpD8MpKXorTCkw+8TxuWxptGNwOZAOFFNax66kilsciJHxIMcGc526snBP++gBA9DoacfVGJrCg1tIdETZAvIOdsB3umWqSAgb6IM+fHT6ddYeWgSbvDvJyeeSuSP34Bq8s2uZtwgdU3pjc6uQBsMs7gGqg/7VJ0EPRUygEuMaRVeaUQSIvxIqM7ySn3R6nrIAAWTLWg0JTtAFutOjSFCVPDp7BzCidx1bcLk8dHXBtHQBD6yFFXa1Ddghul3BqfwrMmtykXTt0hkF7add7wJz8KXRO0T3aAnXL1IJYBOlrp06cN1SfbcMcyDgy2J/P2zMfGJt1D3RqD+dWFo4kbS881bF1oFitQCrwJLg1hcLA kuHhMO4m xCSxHASN/zDuOxue8xiTOWHw/SAr37WXfrmtrVGUSaGs6Y7FzSmdZrKelnr3QDvwca/+KUg0yIVDu3T/7fDt7fET4yp/JbjSOifIvxY8f/zjsx6vniJqGuoNdG6ubluWoLbtXsDZ8up+BO3lauyMG+2MgydrmXDxX+1eIKZVEYRFnpJKtEiTnBZ2ogM7cQdb6uqv95gRPwQpSP5uOMriQ9g1uF7SWJqZhc/pMoM7fBwh47+H42lM8HM3qQ8P6YL/YMFSi5d0dY+t/lz4tGaLEVFmVU+NLFfT4Fgf6Yz3pYIM0P5iHDqkTZuNKpbRTNuMlDdznbu+1mRxA2lyqpqXdbW7B+cmDZZAAUSJYd0pVbfRjQWSNIAaw1iCCmTcSQohw0GNkPLX8bLxZpoU9jFDg/eoCfjdQsT4W3UG1EJn6z2IRidzjLt69P8CZenHScLt+yRzm X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Feb 08, 2023 at 08:25:10PM +0000, Matthew Wilcox wrote: > On Wed, Feb 08, 2023 at 02:40:11PM -0500, Peter Xu wrote: > > On Tue, Feb 07, 2023 at 11:27:17PM +0000, Matthew Wilcox wrote: > > > I've been thinking about this one, and I wonder if we can do it > > > without taking any pgtable locks. The locking environment we're in > > > is the page fault handler, so we have the mmap_lock for read (for now > > > anyway ...). We also hold the folio lock, so _if_ the folio is mapped, > > > those entries can't disappear under us. > > > > Could MADV_DONTNEED do that from another pgtable that we don't hold the > > pgtable lock? > > Oh, ugh, yes. And zap_pte_range() has the PTL first, so we can't sleep > to get the folio lock. And we can't decline to zap the pte on a failed > folio_trylock() (well, we could for MADV_DONTNEED, but not in general). > > So ... how about this for a solution: > > - If the folio overlaps into the next PMD table, spin_lock it. > - If the folio overlaps into the previous PMD table, unlock our > PTL, lock the previous PTL, re-lock our PTL. > - Do the pvmw, telling it we already have the PTLs held (new PVMW flag). > > [explanation simplified; if there is no prior PMD table or if the VMA > limits how far to search, we can skip this] > > We have prior art for taking two PTLs in copy_page_range(). There, > the hierarchy is clear; one VMA belongs to the process parent and one > to the child. I don't believe we have precedent for taking two PTLs > in the same VMA, but I think my proposal (order by ascending address in > the process) is the obvious order to choose. Maybe it'll work? Not sure, but seems be something we'd be extremely careful with. Having a single mmap read lock covering both seems to guarantee that the order of the lock is stable, which is a good start.. But I have no good idea on other implications across the whole kernel. IMHO copy_page_range() is not a great example for proving deadlocks, because the dst_mm should not be exposed to the whole world yet at all when copying. Say, I don't see any case some thread can try to take the dst mm pgtable lock at all until it's all set. I'm even wondering whether it's safe to not take the dst mm pgtable lock at all during a fork().. -- Peter Xu