From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CC244C433EF for ; Tue, 21 Sep 2021 21:57:35 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 475AD6115A for ; Tue, 21 Sep 2021 21:57:35 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 475AD6115A Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=cmpxchg.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id BD8FC900002; Tue, 21 Sep 2021 17:57:34 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B61956B0071; Tue, 21 Sep 2021 17:57:34 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 98CE9900002; Tue, 21 Sep 2021 17:57:34 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0076.hostedemail.com [216.40.44.76]) by kanga.kvack.org (Postfix) with ESMTP id 83EC26B006C for ; Tue, 21 Sep 2021 17:57:34 -0400 (EDT) Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 29AA02D4D1 for ; Tue, 21 Sep 2021 21:57:34 +0000 (UTC) X-FDA: 78612942828.22.9E7C98D Received: from mail-qt1-f181.google.com (mail-qt1-f181.google.com [209.85.160.181]) by imf23.hostedemail.com (Postfix) with ESMTP id 8986E90000AF for ; Tue, 21 Sep 2021 21:57:33 +0000 (UTC) Received: by mail-qt1-f181.google.com with SMTP id u21so770584qtw.8 for ; Tue, 21 Sep 2021 14:57:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20210112.gappssmtp.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=UVeSCGRKIiyvWycmoGx6GlZVf+OBysJsxMROIYkTp+E=; b=VoYhdhCluGoYbkdSs9ZviRrXOCFdtZ5H099jEvPGW5j298L+m7OYhjvZTXrWQEpzuH IqHC5TNqkf0Ig6ZGyOuZ4kHUFiv3leqg6iJTAxKfm8dnORvZKQZE/2gtybjrS2srK1Wt FLF9r5VMSDiIZy+Sh4FL0V9blVUOrfIjCIoJjsPyMIsGf53hSsbu9wxQQDsCJ1q0voF2 zrm1tT25pHefoQTyqIFKfFXdnuEZ3cGQHC/hJOeaT7pu3YklcjJn+eB7FzXRx/JMsPZc v82mao0R5OX4ZnE+11YHpJuLTpr2LjTcTC5fCet6NvV+ofdj53hv8sWr3fUwPTiRLnCU on8Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=UVeSCGRKIiyvWycmoGx6GlZVf+OBysJsxMROIYkTp+E=; b=MU2Eat7mgiITpLFZepLI/jsWe8It9LX/FrXylG0Pktj8scLpIuRu9G4zup615MgA02 ugojSlqngFkzljjSZp5rUfqoWpOgP8VznwIPnwhg3jEG3FFWM/VhhXtkTaM6ZJvwFHSi OuqRAG89Ax9ckHUFUlEOJh8nEdBGq7o6Idx2CFXWxhzpSFC3/UpuXnZSITKhU9/KjQp/ dque/qJSpXEqszLrcHTQNUtPXGmRDLce58WqBPl1GyombbSZkk6GU/oSAefZVPcwvT0o R2i/6BX8QxBwoth6Uo6wxbrGkq9WY51A8CS3Wj7DfSA5E98VDf51Hg4R9/6Gcp9yUEN3 b8IQ== X-Gm-Message-State: AOAM533GBj6k+CLsBCqqRwndj3AvuflYrlcFnAF2eSktkBR5nIBtAwB2 HuujLfXujUFPbvbu8zyZwA7Aiw== X-Google-Smtp-Source: ABdhPJxKO5CZhPNluh8HozyZ0PrmHrVvmnYlcPVGIoDlha+Wmwkfv0kh/u+uKkiXQYXa/nyXEWp8sw== X-Received: by 2002:ac8:5cd0:: with SMTP id s16mr20150301qta.378.1632261452739; Tue, 21 Sep 2021 14:57:32 -0700 (PDT) Received: from localhost (cpe-98-15-154-102.hvc.res.rr.com. [98.15.154.102]) by smtp.gmail.com with ESMTPSA id k17sm176027qtx.67.2021.09.21.14.57.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 21 Sep 2021 14:57:32 -0700 (PDT) Date: Tue, 21 Sep 2021 17:59:33 -0400 From: Johannes Weiner To: Matthew Wilcox Cc: Kent Overstreet , Linus Torvalds , linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Andrew Morton , "Darrick J. Wong" , Christoph Hellwig , David Howells Subject: Re: Folio discussion recap Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=cmpxchg-org.20210112.gappssmtp.com header.s=20210112 header.b=VoYhdhCl; spf=pass (imf23.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.160.181 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (policy=none) header.from=cmpxchg.org X-Stat-Signature: i4p3jrnxw1j8ggz7561krbqp6r6z1n1o X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 8986E90000AF X-HE-Tag: 1632261453-5767 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Sep 21, 2021 at 09:38:54PM +0100, Matthew Wilcox wrote: > On Tue, Sep 21, 2021 at 03:47:29PM -0400, Johannes Weiner wrote: > > This discussion is now about whether folio are suitable for anon pages > > as well. I'd like to reiterate that regardless of the outcome of this > > discussion I think we should probably move ahead with the page cache > > bits, since people are specifically blocked on those and there is no > > dependency on the anon stuff, as the conversion is incremental. > > So you withdraw your NAK for the 5.15 pull request which is now four > weeks old and has utterly missed the merge window? Once you drop the bits that convert shared anon and file infrastructure, yes. Because we haven't discussed yet, nor agree on, that folio are the way forward for anon pages. > > and so the justification for replacing page with folio *below* those > > entry points to address tailpage confusion becomes nil: there is no > > confusion. Move the anon bits to anon_page and leave the shared bits > > in page. That's 912 lines of swap_state.c we could mostly leave alone. > > Your argument seems to be based on "minimising churn". Which is certainly > a goal that one could have, but I think in this case is actually harmful. > There are hundreds, maybe thousands, of functions throughout the kernel > (certainly throughout filesystems) which assume that a struct page is > PAGE_SIZE bytes. Yes, every single one of them is buggy to assume that, > but tracking them all down is a never-ending task as new ones will be > added as fast as they can be removed. What does that have to do with anon pages? > > The same is true for the LRU code in swap.c. Conceptually, already no > > tailpages *should* make it onto the LRU. Once the high-level page > > instantiation functions - add_to_page_cache_lru, do_anonymous_page - > > have type safety, you really do not need to worry about tail pages > > deep in the LRU code. 1155 more lines of swap.c. > > It's actually impossible in practice as well as conceptually. The list > LRU is in the union with compound_head, so you cannot put a tail page > onto the LRU. But yet we call compound_head() on every one of them > multiple times because our current type system does not allow us to > express "this is not a tail page". No, because we haven't identified *who actually needs* these calls and move them up and out of the low-level helpers. It was a mistake to add them there, yes. But they were added recently for rather few callers. And we've had people send patches already to move them where they are actually needed. Of course converting *absolutely everybody else* to not-tailpage instead will also fix the problem... I just don't agree that this is an appropriate response to the issue. Asking again: who conceptually deals with tail pages in MM? LRU and reclaim don't. The page cache doesn't. Compaction doesn't. Migration doesn't. All these data structures and operations are structured around headpages, because that's the logical unit they operate on. The notable exception, of course, are the page tables because they map the pfns of tail pages. But is that it? Does it come down to page table walkers encountering pte-mapped tailpages? And needing compound_head() before calling mark_page_accessed() or set_page_dirty()? We couldn't fix vm_normal_page() to handle this? And switch khugepaged to a new vm_raw_page() or whatever? It should be possible to answer this question as part of the case for converting tens of thousands of lines of code to folio.