From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C1C2EC83F1A for ; Fri, 11 Jul 2025 08:43:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2D0CA6B0092; Fri, 11 Jul 2025 04:43:03 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2A8C56B009D; Fri, 11 Jul 2025 04:43:03 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1E54F6B00A0; Fri, 11 Jul 2025 04:43:03 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 0DCC86B0092 for ; Fri, 11 Jul 2025 04:43:03 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id ACBA41DE760 for ; Fri, 11 Jul 2025 08:43:02 +0000 (UTC) X-FDA: 83651343804.17.BDC3FAB Received: from out30-101.freemail.mail.aliyun.com (out30-101.freemail.mail.aliyun.com [115.124.30.101]) by imf25.hostedemail.com (Postfix) with ESMTP id 6C729A0003 for ; Fri, 11 Jul 2025 08:42:59 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b="txTZde/5"; dmarc=pass (policy=none) header.from=linux.alibaba.com; spf=pass (imf25.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.101 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1752223380; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=9FUwzfUcJGVevyx5K/DESPH7eQmpFwhXZwRtUpywhJw=; b=l5VrhnX3Pg8j9p2JJyQEzQ3GPpXSgh8qGT3oqmxIk2qw0X0YWBqTwRw11NzNekAmcQXaHM c7GqsULOH+ZojUUSmb6V4t9UkgAu5c5Wc587NZZYgVr6GQoAau1OS7f3jVHP3ty7vBcwTu uJoreghKjwOtzx6M0EPCyivzmPTkN2M= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1752223380; a=rsa-sha256; cv=none; b=s/GWBHkP3i/80clM1Ig0JzLv9B9e804trzzz5yuCdHLalnVaHWJ8lsSKe/Lim4IKIoo+db y2f2Y9RJKnDuN8magc0LnROcxfiWUfl9/vjdit1douCAh7zN4mpLreBJhOsXq68Rfequef b/x85oqjZ0nUXZ9gPQox3/4jdizXzBk= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b="txTZde/5"; dmarc=pass (policy=none) header.from=linux.alibaba.com; spf=pass (imf25.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.101 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com DKIM-Signature:v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1752223376; h=Message-ID:Date:MIME-Version:Subject:To:From:Content-Type; bh=9FUwzfUcJGVevyx5K/DESPH7eQmpFwhXZwRtUpywhJw=; b=txTZde/5or78owILHc958r/09f3h0Kd8cyO4Pu6MfJRWzPVpjuu6kQUUVwaHkRDCkTkR/yLnckO25qBYSZ+x2Edqv+BjYRrhvPwzGGBTQcd9smlOocYlGqbnWPNhQx3xkXa+VEl+jDcVYzdx1IPrPQGpRlmPBOjt0/QE+svhABU= Received: from 30.74.144.131(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0WigO8Hf_1752223373 cluster:ay36) by smtp.aliyun-inc.com; Fri, 11 Jul 2025 16:42:54 +0800 Message-ID: <4a8f64f8-cdef-4081-8654-7e8c43b6f18b@linux.alibaba.com> Date: Fri, 11 Jul 2025 16:42:53 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH] tmpfs: zero post-eof folio range on file extension To: Hugh Dickins Cc: Brian Foster , linux-mm@kvack.org, Matthew Wilcox , Usama Arif References: <20250625184930.269727-1-bfoster@redhat.com> <297e44e9-1b58-d7c4-192c-9408204ab1e3@google.com> <67f0461b-3359-41e7-a7cd-b059cbef4154@linux.alibaba.com> <097c0b07-1f43-51c3-3591-aaa2015226c2@google.com> <0224ed0f-d207-4c79-8c9d-f4915a91c11d@linux.alibaba.com> From: Baolin Wang In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 6C729A0003 X-Stat-Signature: nmegbbzs3dfii37hrzn5ote9t6k1st5c X-Rspam-User: X-Rspamd-Server: rspam06 X-HE-Tag: 1752223379-948507 X-HE-Meta: U2FsdGVkX1+4PzwRKmefDocGMj2anA1N4ZqAS9IQuiB0z0sx97tGSIrMY6WOwnYmv3y+EVPYafxyugM4J1BqeWvdv77sfEWfb2FmiE419Lr0dfA0RfKYSNT1mOeTAFi9bSU6IAIaj64zkdZPHbLS7RukL96HFY1TCEWYnfpK0B0whdP3PBj+/5Jcp3In5WJLXejBFkDWTn41v8Q4tPx0ZvB/Kzlfj22WGfmF0I5VO5KkJh4600Y2QgLsZQ2kzYZKCcfCyOJZ6Dd6XjALzL6eZoXJaPgclM+S5ell6VfQ97mAle6qTTiw+CgsJuh3PgeK3du0EphNvWMyjE3Pq01Wtt3A3q5RJpQ2zJCcnQ7BAvb9AfI/EsorVOhCaO5ImbejngC+d0ZZ24LGirTv6crRD6QZ17Nk9wvreWDJDcM3th8tjcuG7jv3nscpCqKqrDtG7jcnjl6xMzjugC6ZwdyQ3h+3U7I+jysJc2TcfVdO/4B1y2SpbWAumk0YJvr567T32eui1/RJNKMp8qZQ0bXkM/SiEWnjNwGXe+sp6KZ7r1A6IbQrg8vt4YuPIy7SZgLjMjc1eRZZqc1xwyQbExijOcZGC79NCg0UYHBmXyw/ReNViccGI68D9MWOoLyUafyhB5hAoo+3dv/PSI+rPdHsg0oHTd1QGffsBmQEtzGDkl5svTuoU9gbSDnFr+uo4nMy6TjwgL+n5PnwlMNlv4aTqpVsfzb0RIj6H1HW4JunRRtnfwR5uXcVV7Ml5h1jAW8zFlpBFXK5DAwLzI4Jnqiw4Z3/rQXnOZmTmmZq+Bjo/ojlQewXEnE/Ux90wliIYA5/UcECnCHnrlCfE+SewnmrZP9W9f837gKzAxPYCnfd68yn4fdT6k2MfJq9n6P3XXPNvqEnNvKzUFNYx1oer1GGTtPjyCvoEKqfIg92RArc9+fIzvKKP5+CymVj8nONDEbY8tQWFqb7eAT9+Akdi59 ZfigOEY4 0M0QYtTBwtZHtz3HDoljEefGEfo62M0MAbYGtegAVP6kc+GZhRHcJg7zgrqoh+lZw20Dmgk6soovfzWWbtsYoFXR1k2eu6U5ija8hTu/DB90CkB6zP2c2ez5fppqUxk/NI9LIxP1X+fxkbnfu22TRq4tIvDjDwgbXpEllUTJAc8SKPxMuAq2wrqgvDNVNHZ3Rd1LMzv78yQKK+7ih2jpVfw/QpNQrXqy0BMRNygYk+RifG+3RJSWxn1K3F/kIui1TtpUPeYtYAw4DIkGLFBNhihbfip3Af0Gdrjx4pfJ2XihZbO386GS0Ia9TSdruCHJyoQVz X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2025/7/11 15:50, Hugh Dickins wrote: > On Fri, 11 Jul 2025, Baolin Wang wrote: >> On 2025/7/11 06:20, Hugh Dickins wrote: >>> On Thu, 10 Jul 2025, Baolin Wang wrote: >>>> On 2025/7/9 15:57, Hugh Dickins wrote: >>> ... >>>>> >>>>> The problem is with huge pages (or large folios) in shmem_writeout(): >>>>> what goes in as a large folio may there have to be split into small >>>>> pages; or it may be swapped out as one large folio, but fragmentation >>>>> at swapin time demand that it be split into small pages when swapped in. >>>> >>>> Good point. >>>> >>>>> So, if there has been swapout since the large folio was modified beyond >>>>> EOF, the folio that shmem_zero_eof() brings in does not guarantee what >>>>> length needs to be zeroed. >>>>> >>>>> We could set that aside as a deficiency to be fixed later on: that >>>>> would not be unreasonable, but I'm guessing that won't satisfy you. >>>>> >>>>> We could zero the maximum (the remainder of PMD size I believe) in >>>>> shmem_zero_eof(): looping over small folios within the range, skipping >>>>> !uptodate ones (but we do force them uptodate when swapping out, in >>>>> order to keep the space reservation). TBH I've ignored that as a bad >>>>> option, but it doesn't seem so bad to me now: ugly, but maybe not bad. >>>> >>>> However, IIUC, if the large folios are split in shmem_writeout(), and those >>>> small folios which beyond EOF will be dropped and freed in >>>> __split_unmapped_folio(), should we still consider them? >>> >>> You're absolutely right about the normal case, and thank you for making >>> that point. Had I forgotten that when writing? Or was I already >>> jumping ahead to the problem case? I don't recall, but was certainly >>> wrong for not mentioning it. >>> >>> The abnormal case is when there's a "fallocend" beyond i_size (or beyond >>> the small page extent spanning i_size) i.e. fallocate() has promised to >>> keep pages allocated beyond EOF. In that case, __split_unmapped_folio() >>> is keeping those pages. >> >> Ah, yes, you are right. >> >>> There could well be some optimization, involving fallocend, to avoid >>> zeroing more than necessary; but I wouldn't want to say what in a hurry, >>> it's quite confusing! >> >> Like you said, not only can a large folio split occur during swapout, but it >> can also happen during a punch hole operation. Moreover, considering the >> abnormal case of fallocate() you mentioned, we should find a more common >> approach to mitigate the impact of fallocate(). >> >> For instance, when splitting, we could clear the 'uptodate' flag for these EOF >> small folios that are beyond 'i_size' but less than the 'fallocend', so that >> these EOF small folios will be re-initialized if they are used again. What do >> you think? > > First impression: that's a great idea, much better than anything I was > proposing. Let's hope I don't perceive some drawback overnight and renege. > > I don't love your patch below so much, we would probably want to gather > the shmem_mapping() peculiarities together better (and seeing that > repeated i_size_read(): IIRC 32-bit locking doesn't allow it in there). Absolutely yes, the following code just shows my thought:) > And there tends to be an assumption (don't ask me where) that a page once > uptodate remains that way until it's freed: maybe no problem before it's > inserted in the xarray (as you have), or maybe better before unfreezing, > or maybe the page lock is already enough. These EOF small folios are unmapped, frozen, locked, and not yet in the xarray. It seems there is no other way to get them (pfn walker could, but the page lock or freeze will abort the pfn walker process), so it appears to be safe to change the 'uptodate' flag. Anyway, let me still do some investigation. > Those my initial reactions. Thanks for comments. >> diff --git a/mm/huge_memory.c b/mm/huge_memory.c >> index ce130225a8e5..2ccb442525d1 100644 >> --- a/mm/huge_memory.c >> +++ b/mm/huge_memory.c >> @@ -3546,6 +3546,18 @@ static int __split_unmapped_folio(struct folio *folio, >> int new_order, >> lru_add_split_folio(origin_folio, release, lruvec, >> list); >> >> + /* >> + * fallocate() will keep folios allocated beyond EOF, >> we should >> + * clear the uptodate flag for these folios to re-zero >> them >> + * if necessary. >> + */ >> + if (shmem_mapping(mapping)) { >> + loff_t i_size = i_size_read(mapping->host); >> + >> + if (i_size < end && release->index >= i_size) >> + folio_clear_uptodate(release); >> + } >> + >> /* Some pages can be beyond EOF: drop them from cache >> */ >> if (release->index >= end) { >> if (shmem_mapping(mapping))