From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EDDE9C83F09 for ; Thu, 10 Jul 2025 06:47:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 799436B00A8; Thu, 10 Jul 2025 02:47:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 749B96B00A9; Thu, 10 Jul 2025 02:47:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 686AD6B00AA; Thu, 10 Jul 2025 02:47:28 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 5777F6B00A8 for ; Thu, 10 Jul 2025 02:47:28 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id D47C31406EF for ; Thu, 10 Jul 2025 06:47:27 +0000 (UTC) X-FDA: 83647423734.08.95FDE33 Received: from out30-111.freemail.mail.aliyun.com (out30-111.freemail.mail.aliyun.com [115.124.30.111]) by imf11.hostedemail.com (Postfix) with ESMTP id F1D1140004 for ; Thu, 10 Jul 2025 06:47:24 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=ZOJgN305; spf=pass (imf11.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.111 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com; dmarc=pass (policy=none) header.from=linux.alibaba.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1752130046; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=jRHkckgo9qWqchMV3tekSIHawsUPML+SqN1702Gz4FE=; b=l2wosISNGK6gTy2WoJdDhLjdFV8HU1c7IhuY90ep90YWE0GpYVOGh5h6Iq7ZAsM8yj1/kX Kv5RlMvdytaO+52B7Gu1mgSpFTNXRh+8NVBYS+DBMwJd0WAdJRjMQM4kZatvnJOrrDM3oz tVZxsn8GjNyvHn0V3oPGP7CAeWPGudw= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=ZOJgN305; spf=pass (imf11.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.111 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com; dmarc=pass (policy=none) header.from=linux.alibaba.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1752130046; a=rsa-sha256; cv=none; b=A2KjKRTIQIl3T44oYEEswbwEEvyCmsVOVepcC0eALQwm8/m5fFZU10AEQH5MlzIYpmJcWL 8eHQIMYzozxJovth215Che/wpRdu7xGfNMOE3Cypsa8SJxyEgzoaTY6jOYPoBMUbpIH/mc nwfDCS5xdbfVREJZ1vxnpAPvNQltul8= DKIM-Signature:v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1752130042; h=Message-ID:Date:MIME-Version:Subject:To:From:Content-Type; bh=jRHkckgo9qWqchMV3tekSIHawsUPML+SqN1702Gz4FE=; b=ZOJgN305UMjzE34kp7PCDqQ/M2Z4jGTLbZiWvNPUOHVkaMXzqz4Re5T8IjyURQBhTjR7y5IIfqinsUjboVN+7G8Xt29U0TeFOzAIwaeneAdwpvwSqJs29AOld0+njIl4uaRpVtYS9LVv4u5EUtX2Ha/bpbo14FpHXN551QYjUKM= Received: from 30.74.144.111(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0Wibpll0_1752130040 cluster:ay36) by smtp.aliyun-inc.com; Thu, 10 Jul 2025 14:47:20 +0800 Message-ID: <67f0461b-3359-41e7-a7cd-b059cbef4154@linux.alibaba.com> Date: Thu, 10 Jul 2025 14:47:20 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH] tmpfs: zero post-eof folio range on file extension To: Hugh Dickins , Brian Foster Cc: linux-mm@kvack.org, Matthew Wilcox , Usama Arif References: <20250625184930.269727-1-bfoster@redhat.com> <297e44e9-1b58-d7c4-192c-9408204ab1e3@google.com> From: Baolin Wang In-Reply-To: <297e44e9-1b58-d7c4-192c-9408204ab1e3@google.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: F1D1140004 X-Stat-Signature: thed1e1qezyogecpkpaqesnio3rp6bxd X-Rspam-User: X-HE-Tag: 1752130044-272222 X-HE-Meta: U2FsdGVkX1+95Vwk4Ux5TEwsmUl/bhVeApePHqpqNR8UyGvTymK/1y3McZxnfI8nGqcUNPK4F961ywpXUUZVOEPL2YvLQvgGQLp413s7d5gsq5LzRD5zDwN7WZIpKy/2YIbA/3gKH+E1Mhyjf8lfZQGPx4oWCNbRBK5YSilFIZIXiJtHuR8mliCKCNn8Dj8RMnbz9LXh/p5sztSXUEwGRJQ63Mu+0mtrJpMBV8UskP0Nmov6Sl5DzQ6a5uobqp5S4SEfrJc5tTFEKeA1w5ZRwj409uzGNJGYcaGYNtDHw3gx/alnRiBYLTOO73emizZFWLz24PvtG5s0UrVpgILaDljg8OwWtAvZrX22VrFs1vaBiLdDh4BvxuDTtcQlmth+P7sokerRsIOwgKtlw+gTI5IoszAzPzS/Pf1Tk/MW8RDo/apMg3OfIbIxtMo0UaBOJH1ku0wGiRQ54gSTFK5nkeQspCkeb/JKQwQX2XodNN+uIpiJKSUhfl6YJVBnP4CyiquS43w7yWRol+wMLEXW+ktCd5eK3V3GCe72a1E7bcMBl9TS8Hcw4UHYN83ELiRDItmpKc7iSE/OuLD1LMeiu1e1D3T/j8ngzQn1E6dFi+nl0l8w4fJc3PkCF4dtqvvbq/3yBByb9K8MgGDAgGcVJdxQgvD7DdT3Sf+jyj3gnjAlN7Mg2Bvcim+TGzVCCL5BZ9/tJN1/9j3iuvOg7z9I3cTg0aYD9jVHMZ83Tq16ZqS4yd/MzS1bM52hQz0rmgc32slq/NhTz4417d6ZhFI1FNsNfd0uoVzYY7beWdcy0IjUaHG62FU73CPZ057+QqPmIldtlPQRNeO4CGmK6B9cXpI83uFhriO4GgJEzrgveSpcPaptwbqI+qANrHEoa8gy6Mu6LAqQXL0F2xDUxtlcMFqwDjrTlpV1sEpEjUUVeqUNv8O/+Di7zFOIA3N6EgHfI8WmlYitoQpyrMgftPw 2WyRKOjo OQt/CjZBdianXkeBUIYpMe4mkWDQVYMy5pqef37vRjHFcUgH05KaYDahr8R1g5FU8mxNdWifPbpTeDRoFv0BOnJAT6YltmFRSemcDFhBvDZ2CGR74eeli4NAK6DLHAgUW+udQ60qHWGEMjcIBoMObgbe96viWpLnpppF7tSc7NvZWymjSzZCL3e0lclhcbTCv8xE7+y/vedhhBeXXnR2msSWyCL5G1zy/7tKYcFsyHGiJMH64CDJVRa6om0JVqsfFg9z5f+5HvPJvr831YI2Wmn1eDtJeDsoWLksQiQzNyFscUDDyDvOvXB9x6jz/z2YpToVX2CtdjvTMXRLt78YDwqoLBmsa4MYvbLYXvr6/Go7l7IW0Da0DmBHuT2KQEfuXaoGP4sboMCR7dAs= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2025/7/9 15:57, Hugh Dickins wrote: > Thanks for suggestions, let me come back to comment on your original. > > On Wed, 25 Jun 2025, Brian Foster wrote: >> > > Although Matthew's alternative commit was insufficient and wrong > (because use of truncation risks deleting folios already promised > by fallocate), I found his brief commit message very very helpful > in understanding your patch - it makes clear what POSIX promises, > is a better justification than just passing an xfstest you wrote, > and explains why you're right to target places where i_size changes: > > POSIX requires that "If the file size is increased, the extended area > shall appear as if it were zero-filled". It is possible to use mmap to > write past EOF and that data will become visible instead of zeroes. > > Please add that paragraph here, right at the head of your commit message. > >> Most traditional filesystems zero the post-eof portion of the eof >> folio at writeback time, or when the file size is extended by >> truncate or extending writes. This ensures that the previously >> post-eof range of the folio is zeroed before it is exposed to the >> file. >> >> tmpfs doesn't implement the writeback path the way a traditional >> filesystem does, so zeroing behavior won't be exactly the same. >> However, it can still perform explicit zeroing from the various >> operations that extend a file and expose a post-eof portion of the >> eof folio. The current lack of zeroing is observed via failure of >> fstests test generic/363 on tmpfs. This test injects post-eof mapped >> writes in certain situations to detect gaps in zeroing behavior. >> >> Add a new eof zeroing helper for file extending operations. Look up >> the current eof folio, and if one exists, zero the range about to be >> exposed. This allows generic/363 to pass on tmpfs. >> >> Signed-off-by: Brian Foster >> --- >> >> Hi all, >> >> This survives the aforemented reproducer, an fstests regression run, and >> ~100m fsx operations without issues. Let me know if there are any other >> recommended tests for tmpfs and I'm happy to run them. Otherwise, a >> couple notes as I'm not terribly familiar with tmpfs... >> >> First, I used _get_partial_folio() because we really only want to zero >> an eof folio if one has been previously allocated. My understanding is >> that lookup path will avoid unnecessary folio allocation in such cases, >> but let me know if that's wrong. >> >> Also, it seems that the falloc path leaves newly preallocated folios >> !uptodate until they are used. This had me wondering if perhaps >> shmem_zero_eof() could just skip out if the eof folio happens to be >> !uptodate. Hm? > > Yes, you were right to think that it's better to skip the !uptodates, > and Baolin made good suggestion there (though I'll unravel it a bit). > >> >> Thoughts, reviews, flames appreciated. > > Sorry, much as you'd appreciate a flame, I cannot oblige: I think you've > done a very good job here (but it's not ready yet), and you've done it > in such a way that huge=always passes generic/363 with no trouble. > > I did keep on wanting to change this and that of your patch below; but > then later came around to seeing why your choices were better than what > I was going to suggest. > > I had difficulty getting deep enough into it, but think I'm there now. > And have identified one missed aspect, which rather changes around what > you should do - I'd have preferred to get into that at the end, but > since it affects what shmem_zero_eof() should look like, I'd better > talk about it here first. > > The problem is with huge pages (or large folios) in shmem_writeout(): > what goes in as a large folio may there have to be split into small > pages; or it may be swapped out as one large folio, but fragmentation > at swapin time demand that it be split into small pages when swapped in. Good point. > So, if there has been swapout since the large folio was modified beyond > EOF, the folio that shmem_zero_eof() brings in does not guarantee what > length needs to be zeroed. > > We could set that aside as a deficiency to be fixed later on: that > would not be unreasonable, but I'm guessing that won't satisfy you. > > We could zero the maximum (the remainder of PMD size I believe) in > shmem_zero_eof(): looping over small folios within the range, skipping > !uptodate ones (but we do force them uptodate when swapping out, in > order to keep the space reservation). TBH I've ignored that as a bad > option, but it doesn't seem so bad to me now: ugly, but maybe not bad. However, IIUC, if the large folios are split in shmem_writeout(), and those small folios which beyond EOF will be dropped and freed in __split_unmapped_folio(), should we still consider them? Please correct me if I missed anything. > The solution I've had in mind (and pursue in comments below) is to do > the EOF zeroing in shmem_writeout() before it splits; and then avoid > swapin in shmem_zero_eof() when i_size is raised. > > That solution partly inspired by the shmem symlink uninit-value bug > https://lore.kernel.org/linux-mm/670793eb.050a0220.8109b.0003.GAE@google.com/ > which I haven't rushed to fix, but ought to be fixed along with this one > (by "along with" I don't mean that both have to be fixed in one single > patch, but it makes sense to consider them together). I was inclined > not to zero the whole page in shmem_symlink(), but zero before swapout. > > It worries me that an EOF page might be swapped out and in a thousand > times, but i_size set only once: I'm going for a solution which memsets > a thousand times rather than once? But if that actually shows up as an > issue in any workload, then we can add a shmem inode flag to say whether > the EOF folio has been exposed via mmap (or symlink) so may need zeroing. > > What's your preference? My comments below assume the latter solution, > but that may be wrong.