From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 255EFC4363A for ; Fri, 23 Oct 2020 12:55:56 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 8243B20E65 for ; Fri, 23 Oct 2020 12:55:55 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="HKRzA2I3" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8243B20E65 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id D18186B005D; Fri, 23 Oct 2020 08:55:54 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CC9676B0062; Fri, 23 Oct 2020 08:55:54 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BDE5B6B0070; Fri, 23 Oct 2020 08:55:54 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0113.hostedemail.com [216.40.44.113]) by kanga.kvack.org (Postfix) with ESMTP id 8F5556B005D for ; Fri, 23 Oct 2020 08:55:54 -0400 (EDT) Received: from smtpin06.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 322E11EF1 for ; Fri, 23 Oct 2020 12:55:54 +0000 (UTC) X-FDA: 77403187428.06.oven35_420ccd127259 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin06.hostedemail.com (Postfix) with ESMTP id F0D19101141E7 for ; Fri, 23 Oct 2020 12:55:53 +0000 (UTC) X-HE-Tag: oven35_420ccd127259 X-Filterd-Recvd-Size: 4933 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf42.hostedemail.com (Postfix) with ESMTP for ; Fri, 23 Oct 2020 12:55:53 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=7Mf06wropCYoz+pvT981q8UQrBcoNfoNDUpkgD3jJ20=; b=HKRzA2I3isHd29L2LTasFeK7Gl eVyGuPncFwx3/bDp3I953xn0jwRz4Ov1NaaIYuNugteTRfR9RtPfXjQ26VpzlxiarCXubuKO9f8Oe p9xNbWu3BpaUpqZMDd4HCO4KLq4FLmsEoTkMnUwHpajGigc9bNGohXF+1acla/Z5yFThXmGUkt+o8 SAM+t0xz7JW/OwRSwV1ie3lbw2NOrGa+SHI4W6IdycLZjeD657Lp+26y9qF635yjagWyuC0YO5lJ0 u1+gvL384695DcXj0iRNyU1S1mnvfq6EOVxvChlHQy7i1lt6tgfJSSd4HPBUobwzqnSp3RYCUVw9s igQTAgTw==; Received: from willy by casper.infradead.org with local (Exim 4.92.3 #3 (Red Hat Linux)) id 1kVwbE-0003wU-34; Fri, 23 Oct 2020 12:55:16 +0000 Date: Fri, 23 Oct 2020 13:55:16 +0100 From: Matthew Wilcox To: Rik van Riel Cc: Hugh Dickins , Yu Xu , Andrew Morton , Mel Gorman , Andrea Arcangeli , Michal Hocko , Vlastimil Babka , "Kirill A. Shutemov" , linux-mm@kvack.org, kernel-team@fb.com, linux-kernel@vger.kernel.org Subject: Re: [PATCH v2] mm,thp,shmem: limit shmem THP alloc gfp_mask Message-ID: <20201023125516.GA20115@casper.infradead.org> References: <20201022124511.72448a5f@imladris.surriel.com> <932f5931911e5ad7d730127b0784b0913045639c.camel@surriel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <932f5931911e5ad7d730127b0784b0913045639c.camel@surriel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Oct 22, 2020 at 11:40:53PM -0400, Rik van Riel wrote: > On Thu, 2020-10-22 at 19:54 -0700, Hugh Dickins wrote: > > Michal is right to remember pushback before, because tmpfs is a > > filesystem, and "huge=" is a mount option: in using a huge=always > > filesystem, the user has already declared a preference for huge > > pages. > > Whereas the original anon THP had to deduce that preference from sys > > tunables and vma madvice. > > ... > > > But it's likely that they have accumulated some defrag wisdom, which > > tmpfs can take on board - but please accept that in using a huge > > mount, > > the preference for huge has already been expressed, so I don't expect > > anon THP alloc_hugepage_direct_gfpmask() choices will map one to one. > > In my mind, the huge= mount options for tmpfs corresponded > to the "enabled" anon THP options, denoting a desired end > state, not necessarily how much we will stall allocations > to get there immediately. > > The underlying allocation behavior has been changed repeatedly, > with changes to the direct reclaim code and the compaction > deferral code. > > The shmem THP gfp_mask never tried really hard anyway, > with __GFP_NORETRY being the default, which matches what > is used for non-VM_HUGEPAGE anon VMAs. > > Likewise, the direct reclaim done from the opportunistic > THP allocations done by the shmem code limited itself to > reclaiming 32 4kB pages per THP allocation. > > In other words, mounting > with huge=always has never behaved > the same as the more aggressive allocations done for > MADV_HUGEPAGE VMAs. > > This patch would leave shmem THP allocations for non-MADV_HUGEPAGE > mapped files opportunistic like today, and make shmem THP > allocations for files mapped with MADV_HUGEPAGE more aggressive > than today. > > However, I would like to know what people think the shmem > huge= mount options should do, and how things should behave > when memory gets low, before pushing in a patch just because > it makes the system run smoother "without changing current > behavior too much". > > What do people want tmpfs THP allocations to do? I'm also interested for non-tmpfs THP allocations. In my patchset, THPs are no longer limited to being PMD sized, and allocating smaller pages isn't such a tax on the VM. So currently I'm doing: gfp_t gfp = readahead_gfp_mask(mapping); ... struct page *page = __page_cache_alloc_order(gfp, order); which translates to: mapping_gfp_mask(mapping) | __GFP_NORETRY | __GFP_NOWARN; gfp |= GFP_TRANSHUGE_LIGHT; gfp &= ~__GFP_DIRECT_RECLAIM; Everything's very willing to fall back to order-0 pages, but I can see that, eg, for a VM_HUGEPAGE vma, we should perhaps be less willing to fall back to small pages. I would prefer not to add a mount option to every filesystem. People will only get it wrong.