From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9D7D5C001B0 for ; Tue, 15 Aug 2023 02:24:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E68ED94000D; Mon, 14 Aug 2023 22:24:54 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E199290000B; Mon, 14 Aug 2023 22:24:54 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D08C194000D; Mon, 14 Aug 2023 22:24:54 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id BBB6790000B for ; Mon, 14 Aug 2023 22:24:54 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 8BA381C8DF9 for ; Tue, 15 Aug 2023 02:24:54 +0000 (UTC) X-FDA: 81124746108.23.21CB20A Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf15.hostedemail.com (Postfix) with ESMTP id 6C4EBA0010 for ; Tue, 15 Aug 2023 02:24:51 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=PVDVlReZ; dmarc=none; spf=none (imf15.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1692066292; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=jGHqBFoHTGGxthfJ8DuweAb418DGrKcXt4a8moA6GQI=; b=JLOExVo1crSjU+g9PDrkw/TW+3zmjNntF8WxaXfeNEYra1zFozVhVWFr7/ilfiGSyDKq+G UNWp7Trbj10e442zF+QkQpIdypLwbsK6uxSXEN1RnnXEX/46eRbumzHezURLbDx+Xxh50G 9rjaQ+iUt1Otos9EzERIVYiSguykAco= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=PVDVlReZ; dmarc=none; spf=none (imf15.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1692066292; a=rsa-sha256; cv=none; b=8ls+klhofdvijf6RCZ4gUccSFLBXpAdt+yMkQJy5rPOsDmdQphNvKoSK9MtTJFtGROK+Yz pZJ6vC/enyMBzq9QsN8KFf5SkbPgqaUvn5cSO0J4LPDGR9vw3lFkK4Qr56AXi8UIXPeKUk E4oY2AOalELWoU8ZQ4jyJos6/6JhEqs= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=jGHqBFoHTGGxthfJ8DuweAb418DGrKcXt4a8moA6GQI=; b=PVDVlReZKuqzSo8dh817AjyvoY viyUIzFCHLMs6nOZEwBLcFpDo9T0ApL1NTCn2F0SxHmhVmPCYYtvs6dME31R6RP2gpn22J57JwUZc 0jSS/CRJ0eJyupc0O1il2UGCRj2XSA/WTwc0BTi3BDULnQUX9tTYRvzJ0vogFaPCXSq7+7tNjPg4K nMO/EnIU7RoZVPMXk9cyq3uyeB2vDD0dHk6jVgYWKcMxINZ6O/TgqQ77vWI9QGGlzyX2eJwz+FHFl q2bTLJ2vfELgc5JquYvhpljVGLW/KF34q9ItZYlynJtJVax8u9eThdW+7VCyXVnNmfuMBjbncyCap UcTROjtw==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1qVjjf-005azd-HJ; Tue, 15 Aug 2023 02:24:43 +0000 Date: Tue, 15 Aug 2023 03:24:43 +0100 From: Matthew Wilcox To: Zach O'Keefe Cc: Saurabh Singh Sengar , Dan Williams , "linux-mm@kvack.org" , Yang Shi , "linux-kernel@vger.kernel.org" Subject: Re: [EXTERNAL] [PATCH] mm/thp: fix "mm: thp: kill __transhuge_page_enabled()" Message-ID: References: <20230812210053.2325091-1-zokeefe@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 6C4EBA0010 X-Stat-Signature: j3st9kzc7pgqngwei71pss9uq7pnhroc X-Rspam-User: X-HE-Tag: 1692066291-753710 X-HE-Meta: U2FsdGVkX18JJOf64x6iNuGnpxXUME3SzRJvvXOyxuOIlBXrlWOLI5oJRA0SU6KgKame68bhT9EsRGcflHvx/fVGCa+Y4cZdGNBymOmRRXfCIH27WjPN9Ac2WpEyFtReDQN1dxXYukhlZ+g1oDRtRE6v7sfcu8SPbeik0sDJDT8iJ3ySVATKYVmLy1zDn69Vmu6X4oC72pdWKhjp0bv1Q/YRiGCuyJQ0JoPSS2Efg0VjnM2wEB2jxqxxLxoHKJLQ9dyvf1xeawoN0eJSDcFcA77OHNn2HPkyixwxcO6bUHAEFXHsntPhex7oNkCbL7tXkIuPeeapAQo1e0lHf5yxBPB40BzYMzGL9YeQYBEAZAQYIOheaWgskIoGv1V63zQDv4xG6ebNv5P63LnZ7VtOBSvP1IRRoKZAxJeOXT3hX2LWdR7w7nZPOmTj2EDLuNrk4cudB7Mxl5VWYIM1ovjHlrjI1Zrt0GZwEto5LnOKDgpnssr5/kmTiHwd9lYsF1jWxWzocD6Of6jEY1kO5ZBvfaa1J3a4kC/c0zDe4E8sdq8BljS7mcrbadIvOLuLV+zH7Wp2rY65SzJyPaxDioeuQqJV1mpyGgS0YpEhR/8vbhDd2Nw8P4PbNsQf36056PgBkg+meDZPZ5V9WHdYrWmYBMncwL+tcRI1dzGKkou2iVgQiIr+GGcUZvcG6j0t2BPtuL/L2FcBzjyXP/TwcM8E4HMUQhq2bH6DTz9QjCr86bL9JkRfIot7sWufvRpP9shcZjtco9ofhLL0xnNRpTj2l5uagnMkiNXjLbjp8a2EqaFzMF9P+euuHfdZ/KipHHhNMXIAoUoVQ14tQoUboMLXBx8w46LHavnHbOVQDB1aUwaVMhBklFLBI6NRAttKEveTbvXw+I6r4uV85Z7+kOhjiBKeGdjTuZZLnNhfyWO5PGxdlQixpoCeST2bs0GiP+oolTFBQmjl2APoz6rAuAy OLkkV8Oa Xt9AbkpiFi00XG8+uJrUanj6PE2TMGEn+fkR6t3D5QDN0bB85Epw73QfJQmb6nNFQXinmnTybbBM8XSDDhGh3fc55tChkmyUQwCKXcfApIbsuS2smQedBVHn9izErQEq1TuMYwJeMsaqc/VMxLkbS5Vk/S8yi4m4i5pyXrPVQjZNOCOdAuH/8CXaGHM+GL8rjkt2ePQM7PxHgM/qDEtp9wm9c+CIRkbXz3m60 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Aug 14, 2023 at 05:04:47PM -0700, Zach O'Keefe wrote: > > From a large folios perspective, filesystems do not implement a special > > handler. They call filemap_fault() (directly or indirectly) from their > > ->fault handler. If there is already a folio in the page cache which > > satisfies this fault, we insert it into the page tables (no matter what > > size it is). If there is no folio, we call readahead to populate that > > index in the page cache, and probably some other indices around it. > > That's do_sync_mmap_readahead(). > > > > If you look at that, you'll see that we check the VM_HUGEPAGE flag, and > > if set we align to a PMD boundary and read two PMD-size pages (so that we > > can do async readahead for the second page, if we're doing a linear scan). > > If the VM_HUGEPAGE flag isn't set, we'll use the readahead algorithm to > > decide how large the folio should be that we're reading into; if it's a > > random read workload, we'll stick to order-0 pages, but if we're getting > > good hit rate from the linear scan, we'll increase the size (although > > we won't go past PMD size) > > > > There's also the ->map_pages() optimisation which handles page faults > > locklessly, and will fail back to ->fault() if there's even a light > > breeze. I don't think that's of any particular use in answering your > > question, so I'm not going into details about it. > > > > I'm not sure I understand the code that's being modified well enough to > > be able to give you a straight answer to your question, but hopefully > > this is helpful to you. > > Thank you, this was great info. I had thought, incorrectly, that large > folio work would eventually tie into that ->huge_fault() handler > (should be dax_huge_fault() ?) > > If that's the case, then faulting file-backed, non-DAX memory as > (pmd-mapped-)THPs isn't supported at all, and no fault lies with the > aforementioned patches. Ah, wait, hang on. You absolutely can get a PMD mapping by calling into ->fault. Look at how finish_fault() works: if (pmd_none(*vmf->pmd)) { if (PageTransCompound(page)) { ret = do_set_pmd(vmf, page); if (ret != VM_FAULT_FALLBACK) return ret; } if (vmf->prealloc_pte) pmd_install(vma->vm_mm, vmf->pmd, &vmf->prealloc_pte); So if we find a large folio that is PMD mappable, and there's nothing at vmf->pmd, we install a PMD-sized mapping at that spot. If that fails, we install the preallocated PTE table at vmf->pmd and continue to trying set one or more PTEs to satisfy this page fault. So why, you may be asking, do we have ->huge_fault. Well, you should ask the clown who did commit b96375f74a6d ... in fairness to me, finish_fault() did not exist at the time, and the ability to return a PMD-sized page was added later.