From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0BB80C432BE for ; Mon, 23 Aug 2021 21:25:06 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 73E89613AD for ; Mon, 23 Aug 2021 21:25:04 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 73E89613AD Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=cmpxchg.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id A1DB96B006C; Mon, 23 Aug 2021 17:25:03 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9CD546B0071; Mon, 23 Aug 2021 17:25:03 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8BCD68D0001; Mon, 23 Aug 2021 17:25:03 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0209.hostedemail.com [216.40.44.209]) by kanga.kvack.org (Postfix) with ESMTP id 6FD786B006C for ; Mon, 23 Aug 2021 17:25:03 -0400 (EDT) Received: from smtpin10.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 0FE1A1BCA5 for ; Mon, 23 Aug 2021 21:25:03 +0000 (UTC) X-FDA: 78507625686.10.9AB558C Received: from mail-qk1-f175.google.com (mail-qk1-f175.google.com [209.85.222.175]) by imf20.hostedemail.com (Postfix) with ESMTP id 74E1ED000643 for ; Mon, 23 Aug 2021 21:25:02 +0000 (UTC) Received: by mail-qk1-f175.google.com with SMTP id t4so6445375qkb.9 for ; Mon, 23 Aug 2021 14:25:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=rFm1OeTA67q6+Ox7ZeDoPnLQPHqm+qvBnfSJlYS48GI=; b=J9H51Cd9V1SHaSBkCeV3m7v6aK33OllLPSk2HbNQpv8ykecbCEZrLGXXkbN/WJBGwI ZQ/QA4GwGhOsSAruXvfUjmfr2tcYtwyHuyMJwiwdL6c8N3R6t0XdpDFvroCyfIWbSmko iFu3SehppMkpteN9bAcoZOVOiVyrwMln+FH//yUlv8pscvhg4+ui8s28rs7nhiTNYFNO bLg9OUyO3UDXWS6cetEn5B4eIvAaWl7hixyOWrxhix/j1A7oE6tkgLZSig0l3lAqoqax OfEUBW3AyNzbtYnAFlIDq8CJp3HiGeBRP5uA8FZQSMX+U+a6vh52t2kqQDcf5SaTvAU+ W8xg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=rFm1OeTA67q6+Ox7ZeDoPnLQPHqm+qvBnfSJlYS48GI=; b=HcmdB0C0iGFFB1gQT0AFVaOsytrjvfTrzXqSkiFVqHSxGLWMHUefKGbfjY/7QphzNw 40gQblGZqrlOFb3SRXxzptjDeElVemezpWOzdHRZmAdh2Luil8qcVsscuSIr3HoWXMFJ dhPq6WGiMndZUY1Okah1SBeYVS4pbT33AAliAqM7jPc2UQwzEy2dvxODMFXfaiFnxXPG ntbzHd3k0hgM2Pd8uexY+J8+vPmcJBGYxzPYjBteEZ0HFFgI2YwauLSvJ3qfXUAvx9xT vWf45LUMOLB2IaRT2Mse+AhmNwNL2PmDY50fuZmc8TzjWt2O48MFxPlaNHfszACZRUbI BaqQ== X-Gm-Message-State: AOAM532i4ctJt8Dsl/AVNcTzRPRVb3xQw06BxJNif5rcWqQhI6h0VUDS yjy9++e2eQqRgV36uGAHEuxihg== X-Google-Smtp-Source: ABdhPJwjpnChjk6PVNru/JPKLrrjhzM0iSABCMfGqqq0BplI/+4JE0INTEKMzLYMVZmdm/mLK09yDA== X-Received: by 2002:a37:9d09:: with SMTP id g9mr22917986qke.269.1629753900524; Mon, 23 Aug 2021 14:25:00 -0700 (PDT) Received: from localhost (cpe-98-15-154-102.hvc.res.rr.com. [98.15.154.102]) by smtp.gmail.com with ESMTPSA id d129sm9382198qkf.136.2021.08.23.14.24.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 23 Aug 2021 14:24:59 -0700 (PDT) Date: Mon, 23 Aug 2021 17:26:41 -0400 From: Johannes Weiner To: Matthew Wilcox Cc: Linus Torvalds , linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Andrew Morton Subject: Re: [GIT PULL] Memory folios for v5.15 Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=cmpxchg-org.20150623.gappssmtp.com header.s=20150623 header.b=J9H51Cd9; spf=pass (imf20.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.222.175 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (policy=none) header.from=cmpxchg.org X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 74E1ED000643 X-Stat-Signature: it4oaoe5w4wt897upeaxx6fg5ugo65hg X-HE-Tag: 1629753902-979910 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Aug 23, 2021 at 08:01:44PM +0100, Matthew Wilcox wrote: > Hi Linus, > > I'm sending this pull request a few days before the merge window > opens so you have time to think about it. I don't intend to make any > further changes to the branch, so I've created the tag and signed it. > It's been in Stephen's next tree for a few weeks with only minor problems > (now addressed). > > The point of all this churn is to allow filesystems and the page cache > to manage memory in larger chunks than PAGE_SIZE. The original plan was > to use compound pages like THP does, but I ran into problems with some > functions that take a struct page expect only a head page while others > expect the precise page containing a particular byte. > > This pull request converts just parts of the core MM and the page cache. > For 5.16, we intend to convert various filesystems (XFS and AFS are ready; > other filesystems may make it) and also convert more of the MM and page > cache to folios. For 5.17, multi-page folios should be ready. > > The multi-page folios offer some improvement to some workloads. The 80% > win is real, but appears to be an artificial benchmark (postgres startup, > which isn't a serious workload). Real workloads (eg building the kernel, > running postgres in a steady state, etc) seem to benefit between 0-10%. > I haven't heard of any performance losses as a result of this series. > Nobody has done any serious performance tuning; I imagine that tweaking > the readahead algorithm could provide some more interesting wins. > There are also other places where we could choose to create large folios > and currently do not, such as writes that are larger than PAGE_SIZE. > > I'd like to thank all my reviewers who've offered review/ack tags: > > Christoph Hellwig > David Howells > Jan Kara > Jeff Layton > Johannes Weiner Just to clarify, I'm only on this list because I acked 3 smaller, independent memcg cleanup patches in this series. I have repeatedly expressed strong reservations over folios themselves. The arguments for a better data interface between mm and filesystem in light of variable page sizes are plentiful and convincing. But from an MM point of view, it's all but clear where the delineation between the page and folio is, and what the endgame is supposed to look like. One one hand, the ambition appears to substitute folio for everything that could be a base page or a compound page even inside core MM code. Since there are very few places in the MM code that expressly deal with tail pages in the first place, this amounts to a conversion of most MM code - including the LRU management, reclaim, rmap, migrate, swap, page fault code etc. - away from "the page". However, this far exceeds the goal of a better mm-fs interface. And the value proposition of a full MM-internal conversion, including e.g. the less exposed anon page handling, is much more nebulous. It's been proposed to leave anon pages out, but IMO to keep that direction maintainable, the folio would have to be translated to a page quite early when entering MM code, rather than propagating it inward, in order to avoid huge, massively overlapping page and folio APIs. It's also not clear to me that using the same abstraction for compound pages and the file cache object is future proof. It's evident from scalability issues in the allocator, reclaim, compaction, etc. that with current memory sizes and IO devices, we're hitting the limits of efficiently managing memory in 4k base pages per default. It's also clear that we'll continue to have a need for 4k cache granularity for quite a few workloads that work with large numbers of small files. I'm not sure how this could be resolved other than divorcing the idea of a (larger) base page from the idea of cache entries that can correspond, if necessary, to memory chunks smaller than a default page. A longer thread on that can be found here: https://lore.kernel.org/linux-fsdevel/YFja%2FLRC1NI6quL6@cmpxchg.org/ As an MM stakeholder, I don't think folios are the answer for MM code.