From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BD3D9C433EF for ; Fri, 27 May 2022 00:55:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 255068D0003; Thu, 26 May 2022 20:55:07 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1FFF08D0002; Thu, 26 May 2022 20:55:07 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 116C98D0003; Thu, 26 May 2022 20:55:07 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 02A398D0002 for ; Thu, 26 May 2022 20:55:07 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id B6BBA34A3E for ; Fri, 27 May 2022 00:55:06 +0000 (UTC) X-FDA: 79509703812.26.7A00011 Received: from mail-lf1-f42.google.com (mail-lf1-f42.google.com [209.85.167.42]) by imf01.hostedemail.com (Postfix) with ESMTP id D35FC40040 for ; Fri, 27 May 2022 00:55:01 +0000 (UTC) Received: by mail-lf1-f42.google.com with SMTP id p22so4622595lfo.10 for ; Thu, 26 May 2022 17:55:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=KqqrYxco3rvuroGbGF3DfUrDTbDckj30ktXVslKbacY=; b=eJgEH8mtuLF68E6SZDkowBnylVEh8C3v1zNiTyxIlYFnUmeQ/TsAse8y+rT/WIebTp cDc4GgwAzttpCRJxfobo93lqcBrQc6hixl9tpS/dlYdFMB5e2UKJ5Phmic+JSTargXRW gp9JaAySDDopwfuWbzT8b+Qd6uLKGob3K82go6VO9rDADLcldXqov5ZuRBgJNUZW2hiH c0y4zpYyU0W551m6B04aGsQQ/f7YQiLFNB5fDeFpNx3y612ebnBm1+ai+WH6PqX+4Tlm c577i5qwHZUnnPwrdO3SdBzQ37LShukCmLppjS/GeEfFMziUwutvlQ7n//xICr9q10Vv QyaA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=KqqrYxco3rvuroGbGF3DfUrDTbDckj30ktXVslKbacY=; b=FIJoUT6Gp2WUzZTJlCibNQvB6BN1kyQxBWZICfBO/GLb6LGpqFbUubNlJOSqrOs7PQ qlQUjT1xGmIAlEipazTBBjaNcZj8sQ4XEA7U08GRRcZTEL2ZAQRxxqODAW3F8nt7vBOM R+pGcdFZcPfjcbHhA6DrSfq2RMzTwT8XrCX+Nyf5cp1EiBReg5CRfvMrQSEuvczMQtJn aV2X/VQpn5mSrfwQu2C9P4U0Riir9ZfbH9BJwHgDNRt3PjM77ImK8oP3hC7P/G3gqAI5 BqaHCJqByFlrHYXVrgUKggTKsIAWUUXhkYAI9ALt2rSm+U4DkjiK8tKV/QIZamVSSL2m o7HA== X-Gm-Message-State: AOAM532IwBv1hfVdZdsjZpx8nGWC+RacQw5Ax95MP8B/UdcsvNbaZPWd r67NEfkz39Fw4KHC1wY+3YPCLdXRTikYMs4vyvMCaMEW1NmuXw== X-Google-Smtp-Source: ABdhPJz8C7Ld6GDhjUXSSkaEN2wCN9p57K7FzEG80PxGUqD2CG8uMiUYG2jKQWyOednYk27zmXr5WfECPf8Rf/OsfqE= X-Received: by 2002:a05:6512:a88:b0:473:e080:40e8 with SMTP id m8-20020a0565120a8800b00473e08040e8mr28431870lfu.359.1653612904317; Thu, 26 May 2022 17:55:04 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: "Zach O'Keefe" Date: Thu, 26 May 2022 17:54:27 -0700 Message-ID: Subject: Re: mm/khugepaged: collapse file/shmem compound pages To: Matthew Wilcox Cc: David Rientjes , "linux-mm@kvack.org" Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: D35FC40040 X-Rspam-User: X-Stat-Signature: f5ieutw8ige59aqx7r5k49z3ufpp5ued Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=eJgEH8mt; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf01.hostedemail.com: domain of zokeefe@google.com designates 209.85.167.42 as permitted sender) smtp.mailfrom=zokeefe@google.com X-HE-Tag: 1653612901-287038 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hey Matthew, Thanks for the response! On Wed, May 25, 2022 at 8:36 PM Matthew Wilcox wrote: > > On Wed, May 25, 2022 at 06:23:52PM -0700, Zach O'Keefe wrote: > > On Wed, May 25, 2022 at 12:07 PM Matthew Wilcox wrote: > > > The khugepaged code (like much of the mm used to) assumes that memory > > > comes in two sizes, PTE and PMD. That's still true for anon and shmem > > > for now, but hopefully we'll start managing both anon & shmem memory in > > > larger chunks, without necessarily going as far as PMD. > > > > > > I think the purpose of khugepaged should continue to be to construct > > > PMD-size pages; I don't see the point of it wandering through process VMs > > > replacing order-2 pages with order-5 pages. I may be wrong about that, > > > of course, so feel free to argue with me. > > > > I'd agree here. > > > > > Anyway, that meaning behind that comment is that the PageTransCompound() > > > test is going to be true on any compound page (TransCompound doesn't > > > check that the page is necessarily a THP). So that particular test should > > > be folio_test_pmd_mappable(), but there are probably other things which > > > ought to be changed, including converting the entire file from dealing > > > in pages to dealing in folios. > > > > Right, at this point, the page might be a pmd-mapped THP, or it could > > be a pte-mapped compound page (I'm unsure if we can encounter compound > > pages outside hugepages). > > Today, there is a way. We can find a folio with an order between 0 and > PMD_ORDER if the underlying filesystem supports large folios and the > file is executable and we've enabled CONFIG_READ_ONLY_THP_FOR_FS. > In this case, we'll simply skip over it because the code believes that > means it's already a PMD. I think I'm missing something here - sorry. If the folio order is < HPAGE_PMD_ORDER, why does the code think it's a pmd? > > If we could tell it's already pmd-mapped, we're done :) IIUC, > > folio_test_pmd_mappable() is a necessary but not sufficient condition > > to determine this. > > It is necessary, but from khugepaged's point of view, it's sufficient > because khugepaged's job is to create PMD-sized folios -- it's not up to > khugepaged to ensure that PMD-sized folios are actually mapped using > a PMD. I thought the point / benefit of khugepaged was precisely to try and find places where we can collapse many pte entries into a single pmd mapping? > There may be some other component of the system (eg DAMON?) > which has chosen to temporarily map the PMD-sized folio using PTEs > in order to track whether the memory is all being used. It may also > be the case that (for file-based memory), the VMA is mis-aligned and > despite creating a PMD-sized folio, it can't be mapped with a PMD. AFAIK DAMON doesn't do this pmd splitting to do subpage tracking for THPs. Also, I believe retract_page_tables() does make the check to see if the address is suitably hugepage aligned/sized. > > Else, if it's not, is it safe to try and continue? Suppose we find a > > folio of 0 < order < HPAGE_PMD_ORDER. Are we safely able to try and > > extend it, or will we break some filesystems that expect a certain > > order folio? > > We're not giving filesystems the opportunity to request that ;-) > Filesystems are expected to handle folios of arbitrary order (if they > claim the ability to support large folios at all). In practice, I've > capped the folio creation size at PMD_ORDER (because I don't want to track > down all the places that assume pmd_page() is necessarily a head page), > but filesystems shouldn't rely on it. Ok, that's good to hear :) > shmem still expects folios to be of order either 0 or PMD_ORDER. > That assumption extends into the swap code and I haven't had the heart > to go and fix all those places yet. Plus Neil was doing major surgery > to the swap code in the most recent deveopment cycle and I didn't want > to get in his way. > > So I am absolutely fine with khugepaged allocating a PMD-size folio for > any inode that claims mapping_large_folio_support(). If any filesystems > break, we'll fix them. Just for clarification, what is the equivalent code today that enforces mapping_large_folio_support()? I.e. today, khugepaged can successfully collapse file without checking if the inode supports it (we only check that it's a regular file not opened for writing). Also, just to check, there isn't anything wrong with following collapse_file()'s approach, even for folios of 0 < order < HPAGE_PMD_ORDER? I.e this part: * Basic scheme is simple, details are more complex: * - allocate and lock a new huge page; * - scan page cache replacing old pages with the new one * + swap/gup in pages if necessary; * + fill in gaps; * + keep old pages around in case rollback is required; * - if replacing succeeds: * + copy data over; * + free old pages; * + unlock huge page; * - if replacing failed; * + put all pages back and unfreeze them; * + restore gaps in the page cache; * + unlock and free huge page; */ > > > I actually have one patch which starts in that direction, but I haven't > > > followed it up yet with all the other patches to that file which will > > > be needed: > > > > Thanks for the head start! Not an expert here, but would you say > > converting this file to use folios is a necessary first step? > > Not _necessary_, but I find it helps keep things clearer. Plus it's > something that needs to happen anyway. Got it. Perhaps something I can help with then. Thanks again for your time advising on this! Best, Zach