From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AD679C83F1A for ; Fri, 11 Jul 2025 16:04:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 524676B00A4; Fri, 11 Jul 2025 12:04:44 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4D5146B00A5; Fri, 11 Jul 2025 12:04:44 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3EB0A6B00A6; Fri, 11 Jul 2025 12:04:44 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 2DA0E6B00A4 for ; Fri, 11 Jul 2025 12:04:44 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id CC44E80ACD for ; Fri, 11 Jul 2025 16:04:43 +0000 (UTC) X-FDA: 83652456846.08.C63CC9E Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf25.hostedemail.com (Postfix) with ESMTP id 90E65A0020 for ; Fri, 11 Jul 2025 16:04:41 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=Yjcza2Br; dmarc=pass (policy=quarantine) header.from=redhat.com; spf=pass (imf25.hostedemail.com: domain of bfoster@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=bfoster@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1752249881; a=rsa-sha256; cv=none; b=izR6DKYFJL8kg+2u5T4xAXijg7Ndx5VHhGSzEL8/BGtOykWlagwd0HNVWlVosPCaAt5DGx dW7dRNjwLoMxC14zk1mSkGRvwiuqKtihWn36I3tqKGiEUCbLBcMJiQtT1Ri4jE1/ZZKfxk rpQNhswB12v5lAbdjsU3PNPP9fHORFQ= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=Yjcza2Br; dmarc=pass (policy=quarantine) header.from=redhat.com; spf=pass (imf25.hostedemail.com: domain of bfoster@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=bfoster@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1752249881; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=FPzwah4pL3ASKESobDw30keJimh5xOYsBInXx9v5Fdg=; b=UswJFdu2ybt3G8epV0jmXPE53RN/HcQTvjjohS44yOkJFdOJ8j0aXkuCc5Wxt+QaWxocxi 8v0Z0jRKA1oauofiEN1Fau+x7kFHv7Rf5Jt32M3Q1qA1eOlY+hND7t+0uVVMKPCrMRgyjB F5E/CILanaPf9Ngpa0VkYxLM3CJGqRM= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1752249881; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=FPzwah4pL3ASKESobDw30keJimh5xOYsBInXx9v5Fdg=; b=Yjcza2Br9wUmSjedBaSulI8qnnqx0l1eOoQ3eS0vTgo9dOZVHEaBTWn+AalABFe/f+GmEE b+h6mjiiXJwcM5M1PzZXsPogoy6QOtY9IKXeMX9nZ93XQcNk0yM0CZ4/YMyWbVaS6ggSFN +FodWuxMN4cmZCp7p57dc7ATW4VXXNs= Received: from mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-48-kCnH_o70NNOxi8Bjk0qsiw-1; Fri, 11 Jul 2025 12:04:37 -0400 X-MC-Unique: kCnH_o70NNOxi8Bjk0qsiw-1 X-Mimecast-MFC-AGG-ID: kCnH_o70NNOxi8Bjk0qsiw_1752249876 Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id F1A79180135B; Fri, 11 Jul 2025 16:04:35 +0000 (UTC) Received: from bfoster (unknown [10.22.64.43]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id AA57C3000225; Fri, 11 Jul 2025 16:04:34 +0000 (UTC) Date: Fri, 11 Jul 2025 12:08:16 -0400 From: Brian Foster To: Baolin Wang Cc: Hugh Dickins , linux-mm@kvack.org, Matthew Wilcox , Usama Arif Subject: Re: [PATCH] tmpfs: zero post-eof folio range on file extension Message-ID: References: <20250625184930.269727-1-bfoster@redhat.com> <297e44e9-1b58-d7c4-192c-9408204ab1e3@google.com> <67f0461b-3359-41e7-a7cd-b059cbef4154@linux.alibaba.com> <097c0b07-1f43-51c3-3591-aaa2015226c2@google.com> <0224ed0f-d207-4c79-8c9d-f4915a91c11d@linux.alibaba.com> MIME-Version: 1.0 In-Reply-To: <0224ed0f-d207-4c79-8c9d-f4915a91c11d@linux.alibaba.com> X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: xYgQyDnU6S0f50VL59SqUMH31fJmlgQbR-T1wvSiD8w_1752249876 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Rspamd-Server: rspam11 X-Rspam-User: X-Stat-Signature: 1bh13m146r78gdx8yks6er1jhme4cj9e X-Rspamd-Queue-Id: 90E65A0020 X-HE-Tag: 1752249881-798286 X-HE-Meta: U2FsdGVkX1903f2CqAdexNOdGrygyIecB4zsBN0Kc5oQzmyaxqeCl6SPxITb4hTmFaH1JSrVuJLLJvV4eTiGcMMqrWwJ/MkVaVPpXX9NLPRp4naauvgT0nYWty+MUmjDhiN6YayxV4glqX0+E75z19vIBJoKRznvQWiu38QoaViN19sI6Yr1/nvBYNIL7ct49fTcgs75JdkRR8FeoTLOuQBJKkzT1NREVPTANTS4Pop7IR6B59WMyDNXKpHk/ozOebnI+MStHkCrYKVQOHvCZ+2QPOIA48inktcKny+0WcHLhgLRR7TWpjFc+vBmacvZJb1RA1khmwla0kQh1YHMvM1xGnJmZPuUcHCHzPq0pZwUxOgDg5WF5xk8ePypNTpWqSbEa+ZrwzGMfHe/cm+iVYjSjDN+GuKULw+C00vr7Atez+BA0wgRz0f4My10JDC/PAWAsFlf/mr0VRBoKk/1MVbw0qBQTbz/QESW+QxmZljNaKD78AugikbJ2d1zJV55TsCWPl+lt3aHY7S56ZiEf9DH6rwEYCIpq3fWhw2qfeIwBnIIO6n+gs9RDZBf4+LszK/I2hEcvg6++DUoz1kB1khATd9pf1wUEgFqqzvBHPsMiLCGgFpeTvtyXMu3IRW1wl48YRokQOAdSUXP4sUIqv+Cv8HwxU2Usx38tFR81Tl6narHFD1gZ57ZILBq0XOnxV6RJaS9InD1/li46Wxi9tjE6l1sfKmB53nhRCw2/fnGNciNY1HGB2v96T8u3vfN38AkdMlQtJozZ299Pb20slNkXog24gbRsUEaP8n8v7t6UDOnhGqIofBnLpOsspGi57AY4o7PD/ysj3EgSM6FTwaPCWlgCFy+GoOjSANQk3cwntrYSrUV+qguQ5R/WqWP6BYW9VgoStNDl3C833+R85pEyplNEM2uDO+hhuxC/AxmFlQr5lxLPsugVAR3JhnoWig0fubF7x7/pFu7/lW E1HOBl0B A+ZhkkPxqdWUg9A6NoIlKVjeI8poRtMKwQeNrF04qukPOAWJ+5AIGbwNzfwItCBbZ93B1NkvqACdz6OvzQ8YQ/nf/iak0JRLkCM1FJCV5avUgPyBMvpjopSNCenzCg8PBxtK4XdWEZ8VCiBogt2c6po91o/ROFZt6NRjsuwwfrbpn4mrS+a3eFP2QUTzmTBZLE55dn/I7/kdJt/pBa1qXAKO+XDYdhhHzy1W8PgJJjr724d8sLKfHl6oAF4JaJxirX8sxzItG8NgXoWoIRmKUEb89U1mwm6f4zcfJ/idEGcKvbEz1j9ExOzKauzEIjS35UsSNYSSKO/8vA+qAE8KnD+Fbd54pSHf53lkZZ+TvNlIqR2zVmqsvzex2yX3GPyV2ynzIcHpTHdhdCADPLsuY1NGFBw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Jul 11, 2025 at 11:50:05AM +0800, Baolin Wang wrote: > > > On 2025/7/11 06:20, Hugh Dickins wrote: > > On Thu, 10 Jul 2025, Baolin Wang wrote: > > > On 2025/7/9 15:57, Hugh Dickins wrote: > > ... > > > > > > > > The problem is with huge pages (or large folios) in shmem_writeout(): > > > > what goes in as a large folio may there have to be split into small > > > > pages; or it may be swapped out as one large folio, but fragmentation > > > > at swapin time demand that it be split into small pages when swapped in. > > > > > > Good point. > > > > > > > So, if there has been swapout since the large folio was modified beyond > > > > EOF, the folio that shmem_zero_eof() brings in does not guarantee what > > > > length needs to be zeroed. > > > > > > > > We could set that aside as a deficiency to be fixed later on: that > > > > would not be unreasonable, but I'm guessing that won't satisfy you. > > > > > > > > We could zero the maximum (the remainder of PMD size I believe) in > > > > shmem_zero_eof(): looping over small folios within the range, skipping > > > > !uptodate ones (but we do force them uptodate when swapping out, in > > > > order to keep the space reservation). TBH I've ignored that as a bad > > > > option, but it doesn't seem so bad to me now: ugly, but maybe not bad. > > > > > > However, IIUC, if the large folios are split in shmem_writeout(), and those > > > small folios which beyond EOF will be dropped and freed in > > > __split_unmapped_folio(), should we still consider them? > > > > You're absolutely right about the normal case, and thank you for making > > that point. Had I forgotten that when writing? Or was I already > > jumping ahead to the problem case? I don't recall, but was certainly > > wrong for not mentioning it. > > > > The abnormal case is when there's a "fallocend" beyond i_size (or beyond > > the small page extent spanning i_size) i.e. fallocate() has promised to > > keep pages allocated beyond EOF. In that case, __split_unmapped_folio() > > is keeping those pages. > > Ah, yes, you are right. > > > There could well be some optimization, involving fallocend, to avoid > > zeroing more than necessary; but I wouldn't want to say what in a hurry, > > it's quite confusing! > > Like you said, not only can a large folio split occur during swapout, but it > can also happen during a punch hole operation. Moreover, considering the > abnormal case of fallocate() you mentioned, we should find a more common > approach to mitigate the impact of fallocate(). > > For instance, when splitting, we could clear the 'uptodate' flag for these > EOF small folios that are beyond 'i_size' but less than the 'fallocend', so > that these EOF small folios will be re-initialized if they are used again. > What do you think? > ... Hi Baolin, So I'm still digesting Hugh's clarification wrt the falloc case, but I'm a little curious here given that I intended to implement the writeout zeroing suggestion regardless of that discussion.. I see the hole punch case falls into truncate_inode_[partial_]folio(), which looks to me like it handles zeroing. The full truncate case just tosses the folio of course, but the partial case zeroes according to the target range prior to doing any potential split from that codepath. That looks kind of similar to what I have prototyped for the shmem_writeout() case: tail zero the EOF straddling folio before falling into the split call. [1] Does that not solve the same general issue in the swapout path as potentially clearing uptodate via the split? I'm mainly trying to understand if that is just a potential alternative approach, or if this solves a corner case that I'm missing. Hm? If the former, I suspect we'd need to tail zero on writeout regardless of folio size. Given that, and IIUC that clearing uptodate as such will basically cause the split folios to fall back into the !uptodate -> zero -> mark_uptodate sequence of shmem_writeout(), I wonder what the advantage of that is. It feels a bit circular to me when considered along with the tail zeroing below, but again I'm peeling away at complexity as I go here.. ;) Thoughts? Brian [1] prototype writeout logic: diff --git a/mm/shmem.c b/mm/shmem.c index 634e499b6197..535021ae5a2f 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1579,7 +1579,8 @@ int shmem_writeout(struct folio *folio, struct writeback_control *wbc) struct inode *inode = mapping->host; struct shmem_inode_info *info = SHMEM_I(inode); struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb); - pgoff_t index; + loff_t i_size = i_size_read(inode); + pgoff_t index = i_size >> PAGE_SHIFT; int nr_pages; bool split = false; @@ -1592,6 +1593,17 @@ int shmem_writeout(struct folio *folio, struct writeback_control *wbc) if (!total_swap_pages) goto redirty; + /* + * If the folio straddles EOF, the tail portion must be zeroed on + * every swapout. + */ + if (folio_test_uptodate(folio) && + folio->index <= index && folio_next_index(folio) > index) { + size_t from = offset_in_folio(folio, i_size); + if (from) + folio_zero_segment(folio, from, folio_size(folio)); + } + /* * If CONFIG_THP_SWAP is not enabled, the large folio should be * split when swapping.