From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6366EC61DA4 for ; Thu, 9 Feb 2023 15:10:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D7ED16B0078; Thu, 9 Feb 2023 10:10:30 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D2EBA6B007B; Thu, 9 Feb 2023 10:10:30 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BF55C6B007D; Thu, 9 Feb 2023 10:10:30 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id ADAA86B0078 for ; Thu, 9 Feb 2023 10:10:30 -0500 (EST) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 3E61EC02AC for ; Thu, 9 Feb 2023 15:10:29 +0000 (UTC) X-FDA: 80448089778.26.C2FDBC0 Received: from mail-ej1-f43.google.com (mail-ej1-f43.google.com [209.85.218.43]) by imf05.hostedemail.com (Postfix) with ESMTP id EBD5710002B for ; Thu, 9 Feb 2023 15:10:26 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=aCEcwIVY; spf=pass (imf05.hostedemail.com: domain of shiyn.lin@gmail.com designates 209.85.218.43 as permitted sender) smtp.mailfrom=shiyn.lin@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1675955427; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=DeO41SRrk6RDQLcQMTeJQfQnuqj3bXNnwfsV2mIDqDk=; b=CDZvMa/hKhuFL7qn2rCJWruHGh/t3I/ZibMs3z5+V6vD2E5M9ojJLzWeJQW8e4gX6MUaUl GZ2vjOxKYgsM6w2xuN0G+fW0hqW36VlN5D1Wgiiu9O0/dPYoD+4Kzb8msQh+fnMf7cdIJQ iQqufUh+lHht4mHEC5sOFo0MvvrBBg8= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=aCEcwIVY; spf=pass (imf05.hostedemail.com: domain of shiyn.lin@gmail.com designates 209.85.218.43 as permitted sender) smtp.mailfrom=shiyn.lin@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1675955427; a=rsa-sha256; cv=none; b=hXb7/8E623OxMJnHKnF+k9uhIidrA9s+0YD/KjIUwvX20Gj5qbuGuZu6+9UILGUEjIkJYX PNo+0sgfaBK4AISSR9AjRbJB8whzJZVD3Q+VRdFzv6WiWPDc+w0XQjSs0CGEY6i8XTU6Rq hxG6s6Mb3olH3xEpamRIzvi+iVF4jYQ= Received: by mail-ej1-f43.google.com with SMTP id ud5so7291271ejc.4 for ; Thu, 09 Feb 2023 07:10:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=DeO41SRrk6RDQLcQMTeJQfQnuqj3bXNnwfsV2mIDqDk=; b=aCEcwIVYiHrBVJJZ8sRad7A1GwZPVkPlIUb8q5xACOarvIAKzx53aW1RCXFH9CtNrQ 5vyq897L1FGJcctdD8DbcYtxUHcd2qadjOdwQuO6E3F59UKsLKmc6fGgxK2dmyMvucY7 W9ZDeYSzEENSMlbuUJJJ6yXWblew8svRrrJIi0n+EaqEsxpk05Ym0lq0S9t1/iRyhA3f mEkA5d6UnEwgVs8MSUK9y5sY0y/dcdKFLMUcBxY3fIzWn2Hm9k+NrswT8tM97m1b3bGJ 3sgE4QcspkD0R3+WqFJhf82ccrnLSzzOK2vEGcx9t9a/gKFJDtFmjDXoORfSLRuE8zPj 2+dQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=DeO41SRrk6RDQLcQMTeJQfQnuqj3bXNnwfsV2mIDqDk=; b=fPZC6UWJZcFDQvikGkVr/RJVvF4RBSLSuJqY3o2GpeazMeKDP+n7SHairmtq0PrYMc 8ED6U7wgtHcSxFVWIX5n2tqLMSJ/F7jVEgrxJmVkR/jifrNLEJcf9h80y/HXLyycu/+4 NlE9Agu3QyhCvmHILAeAJk7k8QlpgVTGM+e3+prv7IVrtCAXY4QR3HfIoTXPY6oAUzFV egXAkjtulmvx3x1AcREAX2ATpsPxZR6EuyEqXpqwm27guhyAtzmMmCSA0BaTkAisF33b Ki26eN7XMGQPG5psSJ1AMXYZbhqbnh664Yejahvr8jRljUQB7wFWvde79VioCV3TeJ5z 96RQ== X-Gm-Message-State: AO0yUKURd6BXx6Mlc9jHGReYSng6En8+qbeyLoC7K0pRJiI/bq/iIojv jWJKBAt7aNVEn0EL1uPZgq557avizU400IcAkxY= X-Google-Smtp-Source: AK7set9It6xBaCBmruNOZQLIgj83NvDfxUnFp9vlRJivMLd1ydTcOjhWr9bzljwKjRRhaIlgzjjIOjdLJqALSxltSp4= X-Received: by 2002:a17:907:2150:b0:888:3594:6d58 with SMTP id rk16-20020a170907215000b0088835946d58mr2736182ejb.55.1675955425436; Thu, 09 Feb 2023 07:10:25 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Chih-En Lin Date: Thu, 9 Feb 2023 23:10:12 +0800 Message-ID: Subject: Re: Folio mapcount To: Peter Xu Cc: Matthew Wilcox , linux-mm@kvack.org, Vishal Moola , Hugh Dickins , Rik van Riel , David Hildenbrand , "Yin, Fengwei" Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: EBD5710002B X-Rspam-User: X-Stat-Signature: 3mkwxtbquksth3wty4f93zoykogk3sxx X-HE-Tag: 1675955426-14165 X-HE-Meta: U2FsdGVkX18CLyEXdj9QFbnRvYK0gJCM7DZ7Bhu+nVAF/MO/apuEr7olsYjcpTBucy/sazvz6JkhI7oQYNXe78YVO9IeAh8iizatx6f7WmHc8R/MQ3XKVWDr2ICJFDEbYWIl1L6JiqEOXLO0Q0F86agTXlrZjR/l/C2Ixgd9QQrjJucaFoZjpeDw1tcPQThEnC3mvjnXMsuxg1sCcaP88JbshmTTKKHJx6kp18H7DTdKAvZUAYXLlb3Y8IXR0W97adkAP4cczM01YdkgiOVjtiP4sL//NUizYBNDmes7zvV1CGfIWdsW6NTl87mWr4HoLfc45BvkvmvQMLb1w1lvjlBbuucbwXjdnHTB2SCB36PBuUS1vHqpqVEjlQ85ayK+3U0PsDML+BFmkoryeMOEQw6ByC+pLZ0Y+N3AM5++r/u7MmUgrPFly2BuUVeSQC61L+omfmcp54XHCxb+ttRukk5BR/V97v19Df2mz9nboFdCYq8mw1oyw4s7B9hxOY2orpD+lWfHbXo8OzSYCU/FmSuN5wILPtooSmRs8cv17lpYoL/kqFJwpXgrd7EcJNbVLtKucdMgrZnoOJnfrLkMwN5oMU8AjtTs5NjrVubdRswLVi3fl/2TUXecCpzhTSw7E+EGioOw+1ikD+WKAOB5+mWE8I40cdsU/3tA4/n2ANlejjqpGpiPjAwmdsKjsVe1XJjQPjcqYqahqrBgJ09jpcdVKkzCIV47gtobmL/uVeUiQ4wSNWRUwftQ+rHyFMoAbtfqVHKpWuXaSaZPRvh8ZZGS6LjLYIbPAvbT49mWL+QeN1rO2ghO4Nme3qcBo8vhy/2slaF5X0JRXSCmbW5PZPaFWposweDCXsqJjVTKlbKWDsOCyLp1SKZCjSUmF6g/K1Z5qMtyMqOnJRAOp86KodGG0rIeDUELpW20g7GM4wWuQpO01um3CI1rj+eU92JDJjK+WdpCAu/aYAHWHHy Yh1FtXXn Uo23Hids4oJcf33HgnYCIXv2FBDqgqjzoR/mweVoOSWJYJU7FQV0/t8SgQwi1mXe9bagHLKMKIWe9rxBogE0d6vqyZZmtYvfWqZifDsDxABYMpCXa208HpdVV/ESrMU5P7mE0IIWwyw+InJPWWGFaV5P0853BUr7vhNiQ23o0+Lh8aYeZCnl6WBjT+42u2Rwu9yEVAP2VuieGndYZJuSJdG1IQ9HYqPM2HZU5swdXngA2h1q4bFfAK8PqiH1iRbHbMmQ3FfS0dPnW6tCMYu4a/06PTfdGbivCnIJqeagW3C0la3NU0AG3C+dKoO0JIi52M2tf2Xem+jGoUnWuRYJUk5gUikDB/30CbiMhVxidHHxSYOlNA3qaKa/iVkzos2LEG3TRDimHLQRA1DZv6/778X6Hn1CF+paYVOockr21bE/6xVLEF08eW9+mdlNjeSLqpCNpY1VXN56+NxVSN7KAmQcpFIHnLYnRpSjv/VVcZL2KXsGIgxprW8rlTw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Feb 9, 2023 at 4:59 AM Peter Xu wrote: > > On Wed, Feb 08, 2023 at 08:25:10PM +0000, Matthew Wilcox wrote: > > On Wed, Feb 08, 2023 at 02:40:11PM -0500, Peter Xu wrote: > > > On Tue, Feb 07, 2023 at 11:27:17PM +0000, Matthew Wilcox wrote: > > > > I've been thinking about this one, and I wonder if we can do it > > > > without taking any pgtable locks. The locking environment we're in > > > > is the page fault handler, so we have the mmap_lock for read (for now > > > > anyway ...). We also hold the folio lock, so _if_ the folio is mapped, > > > > those entries can't disappear under us. > > > > > > Could MADV_DONTNEED do that from another pgtable that we don't hold the > > > pgtable lock? > > > > Oh, ugh, yes. And zap_pte_range() has the PTL first, so we can't sleep > > to get the folio lock. And we can't decline to zap the pte on a failed > > folio_trylock() (well, we could for MADV_DONTNEED, but not in general). > > > > So ... how about this for a solution: > > > > - If the folio overlaps into the next PMD table, spin_lock it. > > - If the folio overlaps into the previous PMD table, unlock our > > PTL, lock the previous PTL, re-lock our PTL. > > - Do the pvmw, telling it we already have the PTLs held (new PVMW flag). > > > > [explanation simplified; if there is no prior PMD table or if the VMA > > limits how far to search, we can skip this] > > > > We have prior art for taking two PTLs in copy_page_range(). There, > > the hierarchy is clear; one VMA belongs to the process parent and one > > to the child. I don't believe we have precedent for taking two PTLs > > in the same VMA, but I think my proposal (order by ascending address in > > the process) is the obvious order to choose. > > Maybe it'll work? Not sure, but seems be something we'd be extremely > careful with. Having a single mmap read lock covering both seems to > guarantee that the order of the lock is stable, which is a good start.. > But I have no good idea on other implications across the whole kernel. > > IMHO copy_page_range() is not a great example for proving deadlocks, > because the dst_mm should not be exposed to the whole world yet at all when > copying. Say, I don't see any case some thread can try to take the dst mm > pgtable lock at all until it's all set. I'm even wondering whether it's > safe to not take the dst mm pgtable lock at all during a fork().. I don't think it's safe without taking the dst mm pgtable lock during a fork(). Since copy_present_page() will add the page to the anon_vma, the page can be searched by the rmap. So, even the fork doesn't finish the duplication of pgtable. We can still use the existing (and COW mapping) page to access the dst pgtable by rmap + page_vma_mapped_walk(). But, I didn't consider the mmap_write_lock() here. So, I might be wrong here. Just provide some thoughts. Thanks, Chih-En Lin