* [PATCH 0/2] mm: memfd_luo: fixes for folio flag preservation @ 2026-02-23 17:39 Pratyush Yadav 2026-02-23 17:39 ` [PATCH 1/2] mm: memfd_luo: always make all folios uptodate Pratyush Yadav 2026-02-23 17:39 ` [PATCH 2/2] mm: memfd_luo: always dirty all folios Pratyush Yadav 0 siblings, 2 replies; 5+ messages in thread From: Pratyush Yadav @ 2026-02-23 17:39 UTC (permalink / raw) To: Pasha Tatashin, Mike Rapoport, Pratyush Yadav, Andrew Morton Cc: linux-kernel, linux-mm From: "Pratyush Yadav (Google)" <pratyush@kernel.org> Hi, This series contains a couple fixes for flag preservation for memfd live update. The first patch fixes memfd preservation when fallocate() was used to pre-allocate some pages. For these memfds, all the writes to fallocated pages touched after preserve were lost. The second patch fixes dirty flag tracking. If the dirty flag is not tracked correctly, the next kernel might incorrectly reclaim some folios under memory pressure, losing user data. This is a theoretical bug that I observed when reading the code, and haven't been able to reproduce it. Regards, Pratyush Yadav Pratyush Yadav (Google) (2): mm: memfd_luo: always make all folios uptodate mm: memfd_luo: always dirty all folios mm/memfd_luo.c | 49 +++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 43 insertions(+), 6 deletions(-) base-commit: 8bf22c33e7a172fbc72464f4cc484d23a6b412ba -- 2.53.0.371.g1d285c8824-goog ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 1/2] mm: memfd_luo: always make all folios uptodate 2026-02-23 17:39 [PATCH 0/2] mm: memfd_luo: fixes for folio flag preservation Pratyush Yadav @ 2026-02-23 17:39 ` Pratyush Yadav 2026-02-25 8:53 ` Mike Rapoport 2026-02-23 17:39 ` [PATCH 2/2] mm: memfd_luo: always dirty all folios Pratyush Yadav 1 sibling, 1 reply; 5+ messages in thread From: Pratyush Yadav @ 2026-02-23 17:39 UTC (permalink / raw) To: Pasha Tatashin, Mike Rapoport, Pratyush Yadav, Andrew Morton Cc: linux-kernel, linux-mm, stable From: "Pratyush Yadav (Google)" <pratyush@kernel.org> When a folio is added to a shmem file via fallocate, it is not zeroed on allocation. This is done as a performance optimization since it is possible the folio will never end up being used at all. When the folio is used, shmem checks for the uptodate flag, and if absent, zeroes the folio (and sets the flag) before returning to user. With LUO, the flags of each folio are saved at preserve time. It is possible to have a memfd with some folios fallocated but not uptodate. For those, the uptodate flag doesn't get saved. The folios might later end up being used and become uptodate. They would get passed to the next kernel via KHO correctly since they did get preserved. But they won't have the MEMFD_LUO_FOLIO_UPTODATE flag. This means that when the memfd is retrieved, the folios will be added to the shmem file without the uptodate flag. They will be zeroed before first use, losing the data in those folios. Since we take a big performance hit in allocating, zeroing, and pinning all folios at prepare time anyway, take some more and zero all non-uptodate ones too. Later when there is a stronger need to make prepare faster, this can be optimized. To avoid racing with another uptodate operation, take the folio lock. Fixes: b3749f174d68 ("mm: memfd_luo: allow preserving memfd") Cc: stable@vger.kernel.org Signed-off-by: Pratyush Yadav (Google) <pratyush@kernel.org> --- mm/memfd_luo.c | 25 +++++++++++++++++++++++-- 1 file changed, 23 insertions(+), 2 deletions(-) diff --git a/mm/memfd_luo.c b/mm/memfd_luo.c index a34fccc23b6a..ccbf1337f650 100644 --- a/mm/memfd_luo.c +++ b/mm/memfd_luo.c @@ -152,10 +152,31 @@ static int memfd_luo_preserve_folios(struct file *file, if (err) goto err_unpreserve; + folio_lock(folio); + if (folio_test_dirty(folio)) flags |= MEMFD_LUO_FOLIO_DIRTY; - if (folio_test_uptodate(folio)) - flags |= MEMFD_LUO_FOLIO_UPTODATE; + + /* + * If the folio is not uptodate, it was fallocated but never + * used. Saving this flag at prepare() doesn't work since it + * might change later when someone uses the folio. + * + * Since we have taken the performance penalty of allocating, + * zeroing, and pinning all the folios in the holes, take a bit + * more and zero all non-uptodate folios too. + * + * NOTE: For someone looking to improve preserve performance, + * this is a good place to look. + */ + if (!folio_test_uptodate(folio)) { + folio_zero_range(folio, 0, folio_size(folio)); + flush_dcache_folio(folio); + folio_mark_uptodate(folio); + } + flags |= MEMFD_LUO_FOLIO_UPTODATE; + + folio_unlock(folio); pfolio->pfn = folio_pfn(folio); pfolio->flags = flags; -- 2.53.0.371.g1d285c8824-goog ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 1/2] mm: memfd_luo: always make all folios uptodate 2026-02-23 17:39 ` [PATCH 1/2] mm: memfd_luo: always make all folios uptodate Pratyush Yadav @ 2026-02-25 8:53 ` Mike Rapoport 0 siblings, 0 replies; 5+ messages in thread From: Mike Rapoport @ 2026-02-25 8:53 UTC (permalink / raw) To: Pratyush Yadav Cc: Pasha Tatashin, Andrew Morton, linux-kernel, linux-mm, stable On Mon, Feb 23, 2026 at 06:39:28PM +0100, Pratyush Yadav wrote: > From: "Pratyush Yadav (Google)" <pratyush@kernel.org> > > When a folio is added to a shmem file via fallocate, it is not zeroed on > allocation. This is done as a performance optimization since it is > possible the folio will never end up being used at all. When the folio > is used, shmem checks for the uptodate flag, and if absent, zeroes the > folio (and sets the flag) before returning to user. > > With LUO, the flags of each folio are saved at preserve time. It is > possible to have a memfd with some folios fallocated but not uptodate. > For those, the uptodate flag doesn't get saved. The folios might later > end up being used and become uptodate. They would get passed to the next > kernel via KHO correctly since they did get preserved. But they won't > have the MEMFD_LUO_FOLIO_UPTODATE flag. > > This means that when the memfd is retrieved, the folios will be added to > the shmem file without the uptodate flag. They will be zeroed before > first use, losing the data in those folios. > > Since we take a big performance hit in allocating, zeroing, and pinning > all folios at prepare time anyway, take some more and zero all > non-uptodate ones too. > > Later when there is a stronger need to make prepare faster, this can be > optimized. > > To avoid racing with another uptodate operation, take the folio lock. > > Fixes: b3749f174d68 ("mm: memfd_luo: allow preserving memfd") > Cc: stable@vger.kernel.org > Signed-off-by: Pratyush Yadav (Google) <pratyush@kernel.org> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> > --- > mm/memfd_luo.c | 25 +++++++++++++++++++++++-- > 1 file changed, 23 insertions(+), 2 deletions(-) > > diff --git a/mm/memfd_luo.c b/mm/memfd_luo.c > index a34fccc23b6a..ccbf1337f650 100644 > --- a/mm/memfd_luo.c > +++ b/mm/memfd_luo.c > @@ -152,10 +152,31 @@ static int memfd_luo_preserve_folios(struct file *file, > if (err) > goto err_unpreserve; > > + folio_lock(folio); > + > if (folio_test_dirty(folio)) > flags |= MEMFD_LUO_FOLIO_DIRTY; > - if (folio_test_uptodate(folio)) > - flags |= MEMFD_LUO_FOLIO_UPTODATE; > + > + /* > + * If the folio is not uptodate, it was fallocated but never > + * used. Saving this flag at prepare() doesn't work since it > + * might change later when someone uses the folio. > + * > + * Since we have taken the performance penalty of allocating, > + * zeroing, and pinning all the folios in the holes, take a bit > + * more and zero all non-uptodate folios too. > + * > + * NOTE: For someone looking to improve preserve performance, > + * this is a good place to look. I'd add a larger comment above memfd_luo_preserve_folios() that says that it allocates, pins etc and fold the last two paragraphs of this comment there. > + */ > + if (!folio_test_uptodate(folio)) { > + folio_zero_range(folio, 0, folio_size(folio)); > + flush_dcache_folio(folio); > + folio_mark_uptodate(folio); > + } > + flags |= MEMFD_LUO_FOLIO_UPTODATE; > + > + folio_unlock(folio); > > pfolio->pfn = folio_pfn(folio); > pfolio->flags = flags; > -- > 2.53.0.371.g1d285c8824-goog > -- Sincerely yours, Mike. ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 2/2] mm: memfd_luo: always dirty all folios 2026-02-23 17:39 [PATCH 0/2] mm: memfd_luo: fixes for folio flag preservation Pratyush Yadav 2026-02-23 17:39 ` [PATCH 1/2] mm: memfd_luo: always make all folios uptodate Pratyush Yadav @ 2026-02-23 17:39 ` Pratyush Yadav 2026-02-25 8:58 ` Mike Rapoport 1 sibling, 1 reply; 5+ messages in thread From: Pratyush Yadav @ 2026-02-23 17:39 UTC (permalink / raw) To: Pasha Tatashin, Mike Rapoport, Pratyush Yadav, Andrew Morton Cc: linux-kernel, linux-mm, stable From: "Pratyush Yadav (Google)" <pratyush@kernel.org> A dirty folio is one which has been written to. A clean folio is its opposite. Since a clean folio has no user data, it can be freed under memory pressure. memfd preservation with LUO saves the flag at preserve(). This is problematic. The folio might get dirtied later. Saving it at freeze() also doesn't work, since the dirty bit from PTE is normally synced at unmap and there might still be mappings of the file at freeze(). To see why this is a problem, say a folio is clean at preserve, but gets dirtied later. The serialized state of the folio will mark it as clean. After retrieve, the next kernel will see the folio as clean and might try to reclaim it under memory pressure. This will result in losing user data. Mark all folios of the file as dirty, and always set the MEMFD_LUO_FOLIO_DIRTY flag. This comes with the side effect of making all clean folios un-reclaimable. This is a cost that has to be paid for participants of live update. It is not expected to be a common use case to preserve a lot of clean folios anyway. Since the value of pfolio->flags is a constant now, drop the flags variable and set it directly. Fixes: b3749f174d68 ("mm: memfd_luo: allow preserving memfd") Cc: stable@vger.kernel.org Signed-off-by: Pratyush Yadav (Google) <pratyush@kernel.org> --- mm/memfd_luo.c | 26 +++++++++++++++++++++----- 1 file changed, 21 insertions(+), 5 deletions(-) diff --git a/mm/memfd_luo.c b/mm/memfd_luo.c index ccbf1337f650..9eac02d06b5a 100644 --- a/mm/memfd_luo.c +++ b/mm/memfd_luo.c @@ -146,7 +146,6 @@ static int memfd_luo_preserve_folios(struct file *file, for (i = 0; i < nr_folios; i++) { struct memfd_luo_folio_ser *pfolio = &folios_ser[i]; struct folio *folio = folios[i]; - unsigned int flags = 0; err = kho_preserve_folio(folio); if (err) @@ -154,8 +153,26 @@ static int memfd_luo_preserve_folios(struct file *file, folio_lock(folio); - if (folio_test_dirty(folio)) - flags |= MEMFD_LUO_FOLIO_DIRTY; + /* + * A dirty folio is one which has been written to. A clean folio + * is its opposite. Since a clean folio does not carry user + * data, it can be freed by page reclaim under memory pressure. + * + * Saving the dirty flag at prepare() time doesn't work since it + * can change later. Saving it at freeze() also won't work + * because the dirty bit is normally synced at unmap and there + * might still be a mapping of the file at freeze(). + * + * To see why this is a problem, say a folio is clean at + * preserve, but gets dirtied later. The pfolio flags will mark + * it as clean. After retrieve, the next kernel might try to + * reclaim this folio under memory pressure, losing user data. + * + * Unconditionally mark it dirty to avoid this problem. This + * comes at the cost of making clean folios un-reclaimable after + * live update. + */ + folio_mark_dirty(folio); /* * If the folio is not uptodate, it was fallocated but never @@ -174,12 +191,11 @@ static int memfd_luo_preserve_folios(struct file *file, flush_dcache_folio(folio); folio_mark_uptodate(folio); } - flags |= MEMFD_LUO_FOLIO_UPTODATE; folio_unlock(folio); pfolio->pfn = folio_pfn(folio); - pfolio->flags = flags; + pfolio->flags = MEMFD_LUO_FOLIO_DIRTY | MEMFD_LUO_FOLIO_UPTODATE; pfolio->index = folio->index; } -- 2.53.0.371.g1d285c8824-goog ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 2/2] mm: memfd_luo: always dirty all folios 2026-02-23 17:39 ` [PATCH 2/2] mm: memfd_luo: always dirty all folios Pratyush Yadav @ 2026-02-25 8:58 ` Mike Rapoport 0 siblings, 0 replies; 5+ messages in thread From: Mike Rapoport @ 2026-02-25 8:58 UTC (permalink / raw) To: Pratyush Yadav Cc: Pasha Tatashin, Andrew Morton, linux-kernel, linux-mm, stable On Mon, Feb 23, 2026 at 06:39:29PM +0100, Pratyush Yadav wrote: > From: "Pratyush Yadav (Google)" <pratyush@kernel.org> > > A dirty folio is one which has been written to. A clean folio is its > opposite. Since a clean folio has no user data, it can be freed under > memory pressure. > > memfd preservation with LUO saves the flag at preserve(). This is > problematic. The folio might get dirtied later. Saving it at freeze() > also doesn't work, since the dirty bit from PTE is normally synced at > unmap and there might still be mappings of the file at freeze(). > > To see why this is a problem, say a folio is clean at preserve, but gets > dirtied later. The serialized state of the folio will mark it as clean. > After retrieve, the next kernel will see the folio as clean and might > try to reclaim it under memory pressure. This will result in losing user > data. > > Mark all folios of the file as dirty, and always set the > MEMFD_LUO_FOLIO_DIRTY flag. This comes with the side effect of making > all clean folios un-reclaimable. This is a cost that has to be paid for > participants of live update. It is not expected to be a common use case > to preserve a lot of clean folios anyway. > > Since the value of pfolio->flags is a constant now, drop the flags > variable and set it directly. > > Fixes: b3749f174d68 ("mm: memfd_luo: allow preserving memfd") > Cc: stable@vger.kernel.org > Signed-off-by: Pratyush Yadav (Google) <pratyush@kernel.org> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> > --- > mm/memfd_luo.c | 26 +++++++++++++++++++++----- > 1 file changed, 21 insertions(+), 5 deletions(-) > > diff --git a/mm/memfd_luo.c b/mm/memfd_luo.c > index ccbf1337f650..9eac02d06b5a 100644 > --- a/mm/memfd_luo.c > +++ b/mm/memfd_luo.c > @@ -146,7 +146,6 @@ static int memfd_luo_preserve_folios(struct file *file, > for (i = 0; i < nr_folios; i++) { > struct memfd_luo_folio_ser *pfolio = &folios_ser[i]; > struct folio *folio = folios[i]; > - unsigned int flags = 0; > > err = kho_preserve_folio(folio); > if (err) > @@ -154,8 +153,26 @@ static int memfd_luo_preserve_folios(struct file *file, > > folio_lock(folio); > > - if (folio_test_dirty(folio)) > - flags |= MEMFD_LUO_FOLIO_DIRTY; > + /* > + * A dirty folio is one which has been written to. A clean folio > + * is its opposite. Since a clean folio does not carry user > + * data, it can be freed by page reclaim under memory pressure. > + * > + * Saving the dirty flag at prepare() time doesn't work since it > + * can change later. Saving it at freeze() also won't work > + * because the dirty bit is normally synced at unmap and there > + * might still be a mapping of the file at freeze(). > + * > + * To see why this is a problem, say a folio is clean at > + * preserve, but gets dirtied later. The pfolio flags will mark > + * it as clean. After retrieve, the next kernel might try to > + * reclaim this folio under memory pressure, losing user data. > + * > + * Unconditionally mark it dirty to avoid this problem. This > + * comes at the cost of making clean folios un-reclaimable after > + * live update. > + */ Can we make the comment here shorter to only contain the gist of the issue? > + folio_mark_dirty(folio); > > /* > * If the folio is not uptodate, it was fallocated but never > @@ -174,12 +191,11 @@ static int memfd_luo_preserve_folios(struct file *file, > flush_dcache_folio(folio); > folio_mark_uptodate(folio); > } > - flags |= MEMFD_LUO_FOLIO_UPTODATE; > > folio_unlock(folio); > > pfolio->pfn = folio_pfn(folio); > - pfolio->flags = flags; > + pfolio->flags = MEMFD_LUO_FOLIO_DIRTY | MEMFD_LUO_FOLIO_UPTODATE; > pfolio->index = folio->index; > } > > -- > 2.53.0.371.g1d285c8824-goog > -- Sincerely yours, Mike. ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-02-25 8:58 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2026-02-23 17:39 [PATCH 0/2] mm: memfd_luo: fixes for folio flag preservation Pratyush Yadav 2026-02-23 17:39 ` [PATCH 1/2] mm: memfd_luo: always make all folios uptodate Pratyush Yadav 2026-02-25 8:53 ` Mike Rapoport 2026-02-23 17:39 ` [PATCH 2/2] mm: memfd_luo: always dirty all folios Pratyush Yadav 2026-02-25 8:58 ` Mike Rapoport
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox