From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 424A6C636CD for ; Wed, 8 Feb 2023 00:26:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4E4D66B0074; Tue, 7 Feb 2023 19:26:42 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 495CE6B0075; Tue, 7 Feb 2023 19:26:42 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 35D686B0078; Tue, 7 Feb 2023 19:26:42 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 2748D6B0074 for ; Tue, 7 Feb 2023 19:26:42 -0500 (EST) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 04CC58054E for ; Wed, 8 Feb 2023 00:26:41 +0000 (UTC) X-FDA: 80442233844.21.B751B87 Received: from mail-vs1-f51.google.com (mail-vs1-f51.google.com [209.85.217.51]) by imf04.hostedemail.com (Postfix) with ESMTP id 56CC440006 for ; Wed, 8 Feb 2023 00:26:40 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=thI4fmZg; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf04.hostedemail.com: domain of jthoughton@google.com designates 209.85.217.51 as permitted sender) smtp.mailfrom=jthoughton@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1675816000; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=1T7/74jCVAc+g57YAgAxUYhK4iJ3pnz2+ZFC3WYkFMg=; b=1BNwY+tz+38Fxo9lohp4Ek3UaLp6o25m4ttCtg59cxM8AUTuOxoO00vo+N9SFK4g1z3MCm m2Xxx1qFaTwbobIbyRCP9yjUA7a8rW3d2BKP3M6kacSJ0fyj6RoUgziL5thhib8uRQx4R+ p2jbfhVlDL6g80NLGU6faSoWsXuVzZI= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=thI4fmZg; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf04.hostedemail.com: domain of jthoughton@google.com designates 209.85.217.51 as permitted sender) smtp.mailfrom=jthoughton@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1675816000; a=rsa-sha256; cv=none; b=lbmHrJLVT/5xsTptGngSsWZiHo+TICIMiU6Kv9wLUEeu93q7odKP0G5kk+zR+JNjY/wVYi SlRSqD9UoT91DkR65kYctJdYjSbPg+mULUH0fuB/6blPLtArcrg1lteZTdLmZnR07Uzr9l 9v4tb3ePrmXf2EFnZzgqI82f2+tCDGI= Received: by mail-vs1-f51.google.com with SMTP id l8so40003vsm.11 for ; Tue, 07 Feb 2023 16:26:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=1T7/74jCVAc+g57YAgAxUYhK4iJ3pnz2+ZFC3WYkFMg=; b=thI4fmZgrbgW0nNxnPtfQEso6w5NRBtwUDXsvrWKp0mRhwHBx9FqpA5sO0TwhpcNrI QiZzsEHqNSufX3dbjJpHi14VfSpDWb1mW2U2zWPm462OuwOfx3R0D0OlTvnU1smwhXu4 96m++oWifVnn7B62x3Die2uJQqZ0UtVHI/Z/gS8muN+AzukmvVxZ1tJ8d9NBU1KEN+gL +FORO+C6CL7n5wD/ozJ9aKHS2XlppFNFbR7K4T4hG9w6z01BQVmrTDBQUzpBKQcEFWLw uBEkCsfbrGvjnT6bsm5mFqRq+AM+HVW4w78+pM5smI/s4IqgkjiiHJ8HJcoMDnL0Xpya 3R4g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=1T7/74jCVAc+g57YAgAxUYhK4iJ3pnz2+ZFC3WYkFMg=; b=v00hBvaFG2tic7Nc4X0X4ttQkygX8ve6tHGNrzTDExXduBl+P+i/LsMlfbHpi7qgXX x4Mr5m97aeggA3OO+YVpSvUNauIS9UvCkcGvSMC1GQFH4wR4erRKF6eplKicRbLlpEMX C1oPUWGOG/R8/dyDkB/Oo4E37Nrm5pMS+XDLIkhd/o3gZPXS5R2Wa86qoNRSeASotWhu 3IZf6/1bMVXXzgurscV7dakj8TO3mha26sN6MK24l9xKuv5dy1nWJ+u9OXRRix8EX6zH FGOpMjHhICs81mdgCE6Fts/xN0xBcyGJcAIM+1UtjELMkjO6MBtBBC3j667TKDB259g/ qDbQ== X-Gm-Message-State: AO0yUKXBUfDVBj80kZuS5TSfZ2QRIABUrPyj/cCRs93vWW2n9qASRuZo qqLi60XyYfQ/IpyY1hNoDQ1M1uIF9veC9XupOE9INA== X-Google-Smtp-Source: AK7set/elwIdhkoBY9LPpQUGHPPZY/sW4DrUDovfAxLSzGuA2JGBEI7HfvMMbiI84rvHwqLB5FzHcQjcr0+TUIb/Glg= X-Received: by 2002:a05:6102:304e:b0:3fc:58d:f90f with SMTP id w14-20020a056102304e00b003fc058df90fmr1378566vsa.60.1675815999311; Tue, 07 Feb 2023 16:26:39 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: James Houghton Date: Tue, 7 Feb 2023 16:26:02 -0800 Message-ID: Subject: Re: [PATCH 21/46] hugetlb: use struct hugetlb_pte for walk_hugetlb_range To: Peter Xu Cc: Mike Kravetz , David Hildenbrand , Muchun Song , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 56CC440006 X-Stat-Signature: 96r49m7336hmeb1ek7qun8acg75zz5f7 X-HE-Tag: 1675816000-869828 X-HE-Meta: U2FsdGVkX1/uXlNlKeno9OdqxjTIyPu/dXaFIuD7RdIBj0kx94k857xbs40FBsjqHEU/Y07pGrwZDpKqBpc2qu2O/UEFvynBcA0xRMpZxrmh/5Fe9OIV8SrwXQAixtTITEhQZRfQZwr0Pv6fWLmntiLO7HLcyY0BNFITC2Mx4hplwwG6MdQwlx66L3ycn3MW6Lp08Scw826g6ssbVZSCjWYdQdeUQMOsvd3p1HKocKlG8j0PJ/VTwqZX8fbzJdizQrYtXfb4SkdFYrHX9ot8zs9fIcwyEmtKKyH807fnW5/SEZzv7vhYs6jnyYygP2pDVo/aPFoPO8e/QEcsKTdiUIhNmXlUK0XTSjNTx3k2LO1PLpT+dgaDqZ+Tz4qXB47RvOZHNO0Jk9LgT2ZWz4gDeVZ44UUs3fvo2bCn2okyyR4Sgb047jE7Cqf84qM9iiIffHHkRsBVPX6huo3BcZZoNjj2J4zq/sUcAg0kf0F/PTvl2wY9gKFJOsnzyn+gqSz6fnAPIyJ4tCABHXD/zcrv3tnvo1d4FGtb9ea49KCwgx9YAQ50mtlPVbPLOP1doblOyMo5af06A7a/a30eJtpQw+LLrZagBCH5LkM7Ai0XknqzEl0XRcyEaQwJzDw+gJ7JXJp07hnLT9J2ezRmaKJeEmzcGqxEAxb+XnjFC5PXNd7m/gPq7vZy386PvbJpqnwaks76PNQSJzdMXihD8F3kpvAD0ANRrWNdIY4Hs37R2JXqu8zpyV9gRENqBq7RBczACaPgbn5jdraYW3CEgo/fjsVHNQ8/QB99xSmB4Gwhznyei/0Yi0pDPyJ+rf1LKOUZ1vFfNQf2sgBQRsRO5T0fKGswqII4p7AuRHB4vUZ+TL/Ir/sm+LnF4L5O2xZK0jjkRPOHQls0GpG4SXq8Ftxv725drYYVvT2UhimAbSL5D5hts27cvfhyLEwFii+BCq3bdaEF0AMCYJWzQn3FONR wYsOBGoV Zr1BJTeQX2v32Eersb0Eiuk+e/rTkpwfaQgcs4KOK+pHl1176mZmsgX5QumuHeuXbV9u4kJqfi7KutJd2a4dHS2XxljQn1fL2c7t5JmCmOimUXOTgF26evol7DLkxxPscuxhXJikm4FyDZrHM3LhQv7Iqi7xyPJzZEpWdkqYESC/tEzh+RuUBXauYMuH8YqNgsT+otK/42mhDUGpUNUQzWUXP697Mo8TG2v8V3/oFxtFDQKMnG5AuxrkdNDzIwI3Q3rlAZbxEpzKO3rilKGowXn+3gKjCQIUArv66/iwU1WsZXiUmVHyNrLKtpm+Q7WyMr0ra X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Feb 7, 2023 at 3:13 PM Peter Xu wrote: > > James, > > On Tue, Feb 07, 2023 at 02:46:04PM -0800, James Houghton wrote: > > > Here is the result: [1] (sorry it took a little while heh). The > > Thanks. From what I can tell, that number shows that it'll be great we > start with your rfcv1 mapcount approach, which mimics what's proposed by > Matthew for generic folio. Do you think the RFC v1 way is better than doing the THP-like way *with the additional MMU notifier*? > > > > implementation of the "RFC v1" way is pretty horrible[2] (and this > > Any more information on why it's horrible? :) I figured the code would speak for itself, heh. It's quite complicated. I really didn't like: 1. The 'inc' business in copy_hugetlb_page_range. 2. How/where I call put_page()/folio_put() to keep the refcount and mapcount synced up. 3. Having to check the page cache in UFFDIO_CONTINUE. > > A quick comment is I'm wondering whether that "whether we should boost the > mapcount" value can be hidden in hugetlb_pte* so you don't need to pass > over a lot of bool* deep into the hgm walk routines. Oh yeah, that's a great idea. > > > > implementation probably has bugs anyway; it doesn't account for the > > > folio_referenced() problem). > > I thought we reached a consensus on the resolution, by a proposal to remove > folio_referenced_arg.mapcount. Is it not working for some reason? I think that works, I just didn't bother here. I just wanted to show you approximately what it would look like to implement the RFC v1 approach. > > > > > > > Matthew is trying to solve the same problem with THPs right now: [3]. > > > I haven't figured out how we can apply Matthews's approach to HGM > > > right now, but there probably is a way. (If we left the mapcount > > > increment bits in the same place, we couldn't just check the > > > hstate-level PTE; it would have already been made present.) > > I'm just worried that (1) this may add yet another dependency to your work > which is still during discussion phase, and (2) whether the folio approach > is easily applicable here, e.g., we may not want to populate all the ptes > for hugetlb HGMs by default. That's true. I definitely don't want to wait for this either. It seems like Matthew's approach won't work very well for us -- when doing a lot of high-granularity UFFDIO_CONTINUEs on a 1G page, checking all the PTEs to see if any of them are mapped would get really slow. > > > > > > > We could: > > > - use the THP-like way and tolerate ~1 second collapses > > > > Another thought here. We don't necessarily *need* to collapse the page > > table mappings in between mmu_notifier_invalidate_range_start() and > > mmu_notifier_invalidate_range_end(), as the pfns aren't changing, > > we aren't punching any holes, and we aren't changing permission bits. > > If we had an MMU notifier that simply informed KVM that we collapsed > > the page tables *after* we finished collapsing, then it would be ok > > for hugetlb_collapse() to be slow. > > That's a great point! It'll definitely apply to either approach. > > > > > If this MMU notifier is something that makes sense, it probably > > applies to MADV_COLLAPSE for THPs as well. > > THPs are definitely different, mmu notifiers should be required there, > afaict. Isn't that what the current code does? > > See collapse_and_free_pmd() for shmem and collapse_huge_page() for anon. Oh, yes, of course, MADV_COLLAPSE can actually move things around and properly make THPs. Thanks. But it would apply if we were only collapsing PTE-mapped THPs, I think? - James