From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5A233C636CD for ; Tue, 7 Feb 2023 23:13:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CBC226B008A; Tue, 7 Feb 2023 18:13:28 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C44E76B008C; Tue, 7 Feb 2023 18:13:28 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id ABDF96B0096; Tue, 7 Feb 2023 18:13:28 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 952326B008A for ; Tue, 7 Feb 2023 18:13:28 -0500 (EST) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 6964AAAC6F for ; Tue, 7 Feb 2023 23:13:28 +0000 (UTC) X-FDA: 80442049296.23.EA80BF7 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf19.hostedemail.com (Postfix) with ESMTP id 57A801A000F for ; Tue, 7 Feb 2023 23:13:26 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="TbO5Qr/l"; spf=pass (imf19.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1675811606; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=oNi+RiOGKSiKkCvek9MkTYvmX4eAt2gu4PyrGvqMVkw=; b=TS+bZJpP5dDW2P6+ma2TuaJCzUEElrkxSLU3TPATQ3FTgGPOTJOnLfjMXr6Q8iWI5dgVyg J1DFrb7qTCkJ0u6eivdx1NjXIb+iVvWVNUKwE1xqNFIMbG6EyzJmILZ2tBwrZxCJVPmx5D YDklIV3PNITEX/9NeieNyJuCbHQvv74= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="TbO5Qr/l"; spf=pass (imf19.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1675811606; a=rsa-sha256; cv=none; b=FkNNNGefX3H4GLXg69MOcmE7PSJgoECeXn3ECsJhyCHibdqR/Xhg5wOWzcvA7DQPpMrci7 Sg8yugZuFDaq/Q3mI/3OQlhWn2UKEjezRBCYqXuTz6jQLsDKN7IIVsnItrwsnbdYCJcErZ a7XWT5rqBrebZrpgqzdepd9QAK/gLNc= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1675811605; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=oNi+RiOGKSiKkCvek9MkTYvmX4eAt2gu4PyrGvqMVkw=; b=TbO5Qr/l79VEd38JnLEcXBJx+j71A4mhdDQpo+I9Pcc5pQw96MixU6p+Btzjz3iMpMah3T YO+/JXE1Q6kvLn04NDmKFIK3AFZ0x3QIjSGaOe/44Y3Isy7P3uevOnBpQohcnTcNBiilsH AYI3bkU8WOkbmNwNOUa6J7s5C1RKqGY= Received: from mail-qt1-f200.google.com (mail-qt1-f200.google.com [209.85.160.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-310-0k3YYN1sN6uez-BaNEInHw-1; Tue, 07 Feb 2023 18:13:24 -0500 X-MC-Unique: 0k3YYN1sN6uez-BaNEInHw-1 Received: by mail-qt1-f200.google.com with SMTP id j26-20020ac84c9a000000b003b9b7c60108so9560465qtv.16 for ; Tue, 07 Feb 2023 15:13:24 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=oNi+RiOGKSiKkCvek9MkTYvmX4eAt2gu4PyrGvqMVkw=; b=bP5NJrwVBIKr5bdqJrsbzISrzhOfzUvowR+R6EU2tzn5xF3MTWhbtcl4HCbFyJDVZB B5rwrLzhFHHSVeVVwtIGk+k33I9FAzqdM245j/W44dmG5iJQ1Pme7YcwsQGSusDAPy2K kRYVBW0TAPcX1cVVS7artyFFqFDGxtQQsYOh0p5HKSGduZUlUzSPJJwLs1Gb2w2JgEmd Mu2n6Ci7VVzdJzUqqvPIPMbBc1qlvVnfW7NBKc5522vEMZFYT9Xu6OISfbSRp6+znags /gFmCYMdl/7sDhxChoCr5WWDYnWGeiBDmiCtJtzq4J9PaJJzRZkcfbEJwyVr4E1ntCRp oI6w== X-Gm-Message-State: AO0yUKUU/+9wY+4CH+ql2sbqyTNr22BYYzCC8ebboieKxjAOE3wasDrY VoIOflC8xh4VTKlmvb7T8WqOfGoJyNO/aJbCvbJSrz7fP+wcUfefs8gpYmeTJXLd8ZE9F+wy9Oy lYsDq3j+EppI= X-Received: by 2002:a05:622a:4d2:b0:3b8:58d0:b4d4 with SMTP id q18-20020a05622a04d200b003b858d0b4d4mr10092215qtx.2.1675811604203; Tue, 07 Feb 2023 15:13:24 -0800 (PST) X-Google-Smtp-Source: AK7set/fD/RYcwuAx/Q8KI9xxvqAde2ZVSyohDdEwjWF4qCzRiGAo8bC9aG2GlTiryK0d9nBiXoktw== X-Received: by 2002:a05:622a:4d2:b0:3b8:58d0:b4d4 with SMTP id q18-20020a05622a04d200b003b858d0b4d4mr10092186qtx.2.1675811603900; Tue, 07 Feb 2023 15:13:23 -0800 (PST) Received: from x1n (bras-base-aurron9127w-grc-56-70-30-145-63.dsl.bell.ca. [70.30.145.63]) by smtp.gmail.com with ESMTPSA id h184-20020a376cc1000000b006fcb77f3bd6sm10272129qkc.98.2023.02.07.15.13.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 07 Feb 2023 15:13:23 -0800 (PST) Date: Tue, 7 Feb 2023 18:13:21 -0500 From: Peter Xu To: James Houghton Cc: Mike Kravetz , David Hildenbrand , Muchun Song , David Rientjes , Axel Rasmussen , Mina Almasry , Zach O'Keefe , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 21/46] hugetlb: use struct hugetlb_pte for walk_hugetlb_range Message-ID: References: MIME-Version: 1.0 In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Rspam-User: X-Rspamd-Server: rspam03 X-Stat-Signature: 711ykutph7rnir8q4yerajq8gh13bwpg X-Rspamd-Queue-Id: 57A801A000F X-HE-Tag: 1675811606-762034 X-HE-Meta: U2FsdGVkX19Pq2rQ/degfQYeCX+2faCSnpq9QREbCj3hsQYV/CS/+gPRuR5L9PRsnDA2APZ9nq8s6FxA8idIV/z3oHV4EXOp0cQLSSpO5aM35i5JRgARMYr8KLTTvk8eHro54h4kdDyF53A6e5BgMOIvjxTrmY/qdjsYWl1CNv+eX77OI7dJdNgDOGqJIlfksWXPt4NP1bdh7QxE4ecKNUTnrOgE5kmfrNVsVv41bf92D+UYXNCytFhHe/B83Z7GmOKRiT7EETjb/OTZMW5caKy1t0Pu8437GXQjPWG4h3emWW2/ZS/5nlSoerXkGVopH4edrVeLC9JRVvuvNxzc8uKcwAxio+ESbyEPubWAIPz/pB3rz/i1UKVXCcyGg5U56u32sGvssoUc70gK+HhuCQVBCdlylrVkirfjQIrW3tSCzPu32rIJjCDObVMY5RetY1qljOac61IGZ5OmpLn1/DzsIjEs8CGTJM1F38DMfQO8FGzIi0trkXB9XrR1rk8IRcvndpE5K6vlsPG6ti0SzaaMEArGmqGtgWVODiOC2yceUpMbqVNEohMKFM3JlJAfSyqERy2nC2UGWUpIwssHDDytZ6JOdq2BOYDcAcN8PKbngo5AwSO36lhbczBDVjnvCuTAoYMO2j1SNh27kCpfyrSPEZhH7bCXcWwg2zqj7r+taqr3hZmOufT7TCwtrQrCVxtM6zcx1/BVgJoZe6SNoqJ44n+v16AUY63ytVlUJTjzqhMelxA9/NabA1+qqEx9fKnYBY/QSB5su+g03jENVqibHhz7a0kU3MFxRl3wht50cW6AOs7d3jWu4JQ1H+2F6Uwqj5ueUpaD+4IJxVZS4WYnGYUQRy1v0ikvqW5ENa2Wgg28zOLDf9Xccm+ROwnJj6lcF0M8DuURF4FhLMIoxn45P+yFmga+7AWI48sO278++T7wtEClgLKiC9EGoYtD6UcB/qlw1Vob7a8L09O CDjBO2Vi 3djbPfBWF6Aptueh7AYvAJDVWyPT0gEsc3h6bAso1F39Duz+RH1xQXzT/e4laFXUzfKrl2fVtoB5yuqZkqsVlCa+YlDPO2K8XOcShoO1tmoENBTUUiPcfX0UBZhi7C2Wo5GitSKkNCIWUVU2NCwRekVzF/JA7h02pPKiILFydwOMvMnoQgNbbe7HD/nimmxoXBcKRXeeEzSVy353YACAzM2KHsiRgdAPsPGAsvT6+ZVfsJMrmIzf5QNiRfqdzJ4xzh6rPIUVdO4+piE+H0IMeAy+LHvUaTf+H1W14jfWgbKm5Ic835DXBtdVxpqTdpKNMFTpzUWFEdBX1n3hTpma0/XEM5emw9bFBtVxteRRlLWD4t/GpPj1IrNsFdRsPDAGi9aA7MlFcVfnUKP4CeGCf2TH+m2lT90AsKUxk/ofoe6F+NXf82Gs8cGtPxJSlDncbH9PDT0U/dnE3Ze7zOB9wY0xxJH/JzVROGg6MduuCd6A7u20LFaxPtCfNO9H9FnImvK8dSTKHCqMq86mbaFgW8/1t3JEwAIee6TDStZ8PPuC2EkSpn1XEwNabmuWRQXj5Jt6T2xLbtPxaUhJ0Gh3r+NZppg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: James, On Tue, Feb 07, 2023 at 02:46:04PM -0800, James Houghton wrote: > > Here is the result: [1] (sorry it took a little while heh). The Thanks. From what I can tell, that number shows that it'll be great we start with your rfcv1 mapcount approach, which mimics what's proposed by Matthew for generic folio. > > implementation of the "RFC v1" way is pretty horrible[2] (and this Any more information on why it's horrible? :) A quick comment is I'm wondering whether that "whether we should boost the mapcount" value can be hidden in hugetlb_pte* so you don't need to pass over a lot of bool* deep into the hgm walk routines. > > implementation probably has bugs anyway; it doesn't account for the > > folio_referenced() problem). I thought we reached a consensus on the resolution, by a proposal to remove folio_referenced_arg.mapcount. Is it not working for some reason? > > > > Matthew is trying to solve the same problem with THPs right now: [3]. > > I haven't figured out how we can apply Matthews's approach to HGM > > right now, but there probably is a way. (If we left the mapcount > > increment bits in the same place, we couldn't just check the > > hstate-level PTE; it would have already been made present.) I'm just worried that (1) this may add yet another dependency to your work which is still during discussion phase, and (2) whether the folio approach is easily applicable here, e.g., we may not want to populate all the ptes for hugetlb HGMs by default. > > > > We could: > > - use the THP-like way and tolerate ~1 second collapses > > Another thought here. We don't necessarily *need* to collapse the page > table mappings in between mmu_notifier_invalidate_range_start() and > mmu_notifier_invalidate_range_end(), as the pfns aren't changing, > we aren't punching any holes, and we aren't changing permission bits. > If we had an MMU notifier that simply informed KVM that we collapsed > the page tables *after* we finished collapsing, then it would be ok > for hugetlb_collapse() to be slow. That's a great point! It'll definitely apply to either approach. > > If this MMU notifier is something that makes sense, it probably > applies to MADV_COLLAPSE for THPs as well. THPs are definitely different, mmu notifiers should be required there, afaict. Isn't that what the current code does? See collapse_and_free_pmd() for shmem and collapse_huge_page() for anon. > > > > - use the (non-RFC) v1 way and tolerate the migration/smaps differences > > - use the RFC v1 way and tolerate the complicated mapcount accounting > > - flesh out [3] and see if it can be applied to HGM nicely > > > > I'm happy to go with any of these approaches. > > > > [1]: https://pastebin.com/raw/hJzFJHiD > > [2]: https://github.com/48ca/linux/commit/4495f16a09b660aff44b3edcc125aa3a3df85976 > > [3]: https://lore.kernel.org/linux-mm/Y+FkV4fBxHlp6FTH@casper.infradead.org/ > > - James > -- Peter Xu