From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0D9BEC636D6 for ; Thu, 9 Feb 2023 15:44:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 80D676B0075; Thu, 9 Feb 2023 10:44:57 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 7BD186B0080; Thu, 9 Feb 2023 10:44:57 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 65DD86B0082; Thu, 9 Feb 2023 10:44:57 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 57BC36B0075 for ; Thu, 9 Feb 2023 10:44:57 -0500 (EST) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 223D5C097E for ; Thu, 9 Feb 2023 15:44:57 +0000 (UTC) X-FDA: 80448176634.26.FAF350F Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf06.hostedemail.com (Postfix) with ESMTP id A5A4218001A for ; Thu, 9 Feb 2023 15:44:54 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="hUp4EA/8"; spf=pass (imf06.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1675957495; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=/QZYHNZUsmjk5AwcaZ95do8iTXdzP78h289U6XdQAfA=; b=nFNnfQcqduDKLKi/+JcOIaiHAb6nvX045IfCnAjF2PKe4sRzFTBfGm+Vvg/twrg3/Nig62 zwbbrhT5csoDN7HbQnyxJ3rboxHEj6GiRRoFl7/SWElRkhY8AHPk28JNBUK7JtgqCmwM0B JfGtSb/KOFV7vSTVcBQzJR44ywe7c6g= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="hUp4EA/8"; spf=pass (imf06.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1675957495; a=rsa-sha256; cv=none; b=fGi5hRreiIBRsYCJMajLyHhKULOE63AFgvDxqO74c5vCVESfQhm0N3VBU9f94x0tT744d9 oe1uLStGx6R4WjrE/xQWXFJh55y66YmeTuLKyFR92HdGuLP43P2EanwqNGF4eZAIXd4Y8d uWma4nh8/NfjQU3u48oM23fJtAZxy5E= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1675957493; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=/QZYHNZUsmjk5AwcaZ95do8iTXdzP78h289U6XdQAfA=; b=hUp4EA/8bdpHAyawyrir4EiaApvKA4ybULObAsHE8qELJ38RZXKB6+wY05RavO7AigRA8s dhQksoYiEqNHrJA+0WlPOMsJizvb+xwqGDqnrMGNIiBq0l9Sk/jhIuB9FnXk8WUEwG3SPs 5XnmosxsACQWTlg/EMPyiDe/em0EIn4= Received: from mail-qk1-f197.google.com (mail-qk1-f197.google.com [209.85.222.197]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-575-K6TVLFqXMCm0Su6Tbvm0Xw-1; Thu, 09 Feb 2023 10:43:53 -0500 X-MC-Unique: K6TVLFqXMCm0Su6Tbvm0Xw-1 Received: by mail-qk1-f197.google.com with SMTP id g6-20020ae9e106000000b00720f9e6e3e2so1464843qkm.13 for ; Thu, 09 Feb 2023 07:43:46 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=/QZYHNZUsmjk5AwcaZ95do8iTXdzP78h289U6XdQAfA=; b=nIo8bx3j04BKUcVYg878kF+/bct1rUfbbqd2PghktynJPti5xjKjFtoeREIgJXPMnq apQ4Gr6SAFauzF/buJouCH9D6y+7bzXlLchS+z4e2DI8hTdDEYJkp20cl3P8Iupkn2wn iwCvzq6Wnb7RP60Xa/cCvc/PguHxfe1R39fsl/MyTolRrJ1qAdWjKRukFgC3ppEBmwrs c1vaipWY05jqpOGcp1mTkgZvYgNCitqx44umRsUZZ7+4cOTUE+37Zyc82d8Z8GO7LufK T0MqAD3msS6HP7qHoD7rrhb3gJPKu0PNJLS06F2s0td6P+qXUER22sy/CL7J3lNNoGok M1hA== X-Gm-Message-State: AO0yUKUiA+eXcKuNZoAgLuOaIYzJdCdsrB5HF3rm/p2iPcWZjORH/Zhp 7FSgRN2U3MPTB62oKQB9oqkS7Y8ehvYY11tnmA/HGewjHrMXNv1SlOTiNKKifw5+r7O5hQyAB0L KVobq2YD4564= X-Received: by 2002:a0c:cd0e:0:b0:53b:8fc0:2f67 with SMTP id b14-20020a0ccd0e000000b0053b8fc02f67mr13932451qvm.5.1675957425748; Thu, 09 Feb 2023 07:43:45 -0800 (PST) X-Google-Smtp-Source: AK7set9YIVApT+9D115gi83V/6SUs+c7l7zZeZ/CjDjGauq50JkXAPHnx7WLpHMel+ZUkznnGLUClg== X-Received: by 2002:a0c:cd0e:0:b0:53b:8fc0:2f67 with SMTP id b14-20020a0ccd0e000000b0053b8fc02f67mr13932420qvm.5.1675957425430; Thu, 09 Feb 2023 07:43:45 -0800 (PST) Received: from x1n (bras-base-aurron9127w-grc-56-70-30-145-63.dsl.bell.ca. [70.30.145.63]) by smtp.gmail.com with ESMTPSA id r142-20020a37a894000000b00729a26e836esm1574653qke.84.2023.02.09.07.43.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 09 Feb 2023 07:43:44 -0800 (PST) Date: Thu, 9 Feb 2023 10:43:43 -0500 From: Peter Xu To: Chih-En Lin Cc: Matthew Wilcox , linux-mm@kvack.org, Vishal Moola , Hugh Dickins , Rik van Riel , David Hildenbrand , "Yin, Fengwei" Subject: Re: Folio mapcount Message-ID: References: MIME-Version: 1.0 In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Rspam-User: X-Rspamd-Server: rspam03 X-Stat-Signature: 6hzqqeq71icy914b4s6rkpyxwcm6zqyf X-Rspamd-Queue-Id: A5A4218001A X-HE-Tag: 1675957494-328225 X-HE-Meta: U2FsdGVkX1+P+pFTKY5kYwMDUnIO6FdWYG8bakkAYWisyRVmbxEMVqg+5nq2JqzuXlYFi2HD5aRKc1H2G811c1d3VWupBrAl2Typ35/TL8g0yfBLXpzW41lshsKVc0TtgS5jWgODQkZ93VMBz5BNMHBIn4clmQvJIqpvVCI7mdTZsfIvW+w7jjGmAvabqW0F1lL5WxXNwRHsxvOd8xblMrcmTwxqW0vH1U87++7oPZ62MHMIKJZWJ2TjYCHoZI7jUsORybflzER7Qx5f0ud4K1ljrsHXB06bmbZ8RCF4X1CQqDfKTS7kAFrgD8paF3OdOD57KyIBZ6R/PRUOXboXxosHns0UNxlpBFY/2saxkRLR17QKQyU8ry1CG8ILsduYvP6lMm9X9pVBfLMay5PuGAeSNY14+WjroYNAickWUVhbxHOkgiJPfZ3OfKfM/iBBU74lwj40wnkoEhKc7AJdJxwSSXm/t/Lg9PqjSS1YfAaOXCoGU970rDZSdeeiER06sNhYoR8N9dO8yGd6LI0AB7m/+Lt5OQrG28HsXbGME8Tvs0bjZ4NMqSwTJ1Tq3BtnZUPLgOf+E6bR37Z4NzX3YDowQwZKZBk4QtTIhVXmvEuYaKfNPEvQsdvmsxBitZPExJ+h/oCrfGw1lWpozfARBcIqXgxzMYMAzVeUDl50BWwGUlU577NTtJayigzvtuxVjDTCsrU4j3qWvT/9xFhWfNMVYOHTA2RI8W5ZtJygYnYIXhHmQl5xOqKmq9AA2mHMQjseqf/BkR3cJRq5LBpOzM7Lrd5Z4zxu/nJBNihV1aWZZG/wduFaDuHQhiYqnwphRLW9fbXFi1HcgPVI7NDhSTSC2RSV5WvuJ4ey/kqgx7EsOE1hoIYZDgRu+ncEp7dhdUQZc3YulPF/CxAVxN/iCqQ9vlj9GMxTNbV6hqPuYYVw5a9NjG87ANPlx7R6k1jhcXrYEUaKiIT9sBoxljE u4R/Kdhe yW4NAo0Mx0TT4Zq/t2WlB60sA2khTmiYFKGOw9/nUqh21OwOnE6T8zjc6MmPIB0cmoGmAUdgoHBIk42G6fytwx5bXnUpdu/T4JrCmUUO3IQ6CUAo8ZErlh9B37GqGneWFoChxqeKG5Qqyo2eUt0coi3a1JklwwkzZMZ84GO+CSe4EGwVzZ3F3sJ7wntmTp+yoa7lCui6tlH900rMhR/WP7m3Q1BpMud124NAOS/XSsnmqcZdMgXGKlvTyNU1m6KLtLLQWRGY3XiKaxikGNc68SqtkVPGMDHRe+dWZcI9xtOUcLfqxYaC94T1basT1G4KhjKiZ8cXHEdJVku0a3iTLINHwsqrdUGGfK7gjcoXjpwwsJg0+JZlmiRjWw2xCPm+u4Dp88fgxL7D4+4CfxW7ST2iAF3OUi2dz9quH4+7z3/8Nx2joz3YRBvQJ0YsH6NdFTZAWj5CDI9OK9qETGKp/OI6v8eJYw17VgLtb X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Feb 09, 2023 at 11:10:12PM +0800, Chih-En Lin wrote: > On Thu, Feb 9, 2023 at 4:59 AM Peter Xu wrote: > > > > On Wed, Feb 08, 2023 at 08:25:10PM +0000, Matthew Wilcox wrote: > > > On Wed, Feb 08, 2023 at 02:40:11PM -0500, Peter Xu wrote: > > > > On Tue, Feb 07, 2023 at 11:27:17PM +0000, Matthew Wilcox wrote: > > > > > I've been thinking about this one, and I wonder if we can do it > > > > > without taking any pgtable locks. The locking environment we're in > > > > > is the page fault handler, so we have the mmap_lock for read (for now > > > > > anyway ...). We also hold the folio lock, so _if_ the folio is mapped, > > > > > those entries can't disappear under us. > > > > > > > > Could MADV_DONTNEED do that from another pgtable that we don't hold the > > > > pgtable lock? > > > > > > Oh, ugh, yes. And zap_pte_range() has the PTL first, so we can't sleep > > > to get the folio lock. And we can't decline to zap the pte on a failed > > > folio_trylock() (well, we could for MADV_DONTNEED, but not in general). > > > > > > So ... how about this for a solution: > > > > > > - If the folio overlaps into the next PMD table, spin_lock it. > > > - If the folio overlaps into the previous PMD table, unlock our > > > PTL, lock the previous PTL, re-lock our PTL. > > > - Do the pvmw, telling it we already have the PTLs held (new PVMW flag). > > > > > > [explanation simplified; if there is no prior PMD table or if the VMA > > > limits how far to search, we can skip this] > > > > > > We have prior art for taking two PTLs in copy_page_range(). There, > > > the hierarchy is clear; one VMA belongs to the process parent and one > > > to the child. I don't believe we have precedent for taking two PTLs > > > in the same VMA, but I think my proposal (order by ascending address in > > > the process) is the obvious order to choose. > > > > Maybe it'll work? Not sure, but seems be something we'd be extremely > > careful with. Having a single mmap read lock covering both seems to > > guarantee that the order of the lock is stable, which is a good start.. > > But I have no good idea on other implications across the whole kernel. > > > > IMHO copy_page_range() is not a great example for proving deadlocks, > > because the dst_mm should not be exposed to the whole world yet at all when > > copying. Say, I don't see any case some thread can try to take the dst mm > > pgtable lock at all until it's all set. I'm even wondering whether it's > > safe to not take the dst mm pgtable lock at all during a fork().. > > I don't think it's safe without taking the dst mm pgtable lock during a fork(). > Since copy_present_page() will add the page to the anon_vma, the page > can be searched by the rmap. > So, even the fork doesn't finish the duplication of pgtable. > We can still use the existing (and COW mapping) page to access the dst > pgtable by rmap + page_vma_mapped_walk(). > But, I didn't consider the mmap_write_lock() here. So, I might be wrong here. > Just provide some thoughts. Yes I think you're right - I thought the rmap locks was held but after I rechecked they're not. It's safe because AFAICT any rmap can only take either of the pgtable locks but not both. To trigger any potential deadlock on the two spinlocks we need another concurrent thread trying to take the same two locks, but it will not happen in this case. Thanks, -- Peter Xu