* [PATCH v2] mm/swap_state: update zswap LRU's protection range with the folio locked
@ 2024-02-06 18:08 Nhat Pham
2024-02-06 18:51 ` Johannes Weiner
` (3 more replies)
0 siblings, 4 replies; 7+ messages in thread
From: Nhat Pham @ 2024-02-06 18:08 UTC (permalink / raw)
To: akpm
Cc: hannes, chengming.zhou, yosryahmed, linux-mm, kernel-team, linux-kernel
When a folio is swapped in, the protection size of the corresponding
zswap LRU is incremented, so that the zswap shrinker is more
conservative with its reclaiming action. This field is embedded within
the struct lruvec, so updating it requires looking up the folio's memcg
and lruvec. However, currently this lookup can happen after the folio is
unlocked, for instance if a new folio is allocated, and
swap_read_folio() unlocks the folio before returning. In this scenario,
there is no stability guarantee for the binding between a folio and its
memcg and lruvec:
* A folio's memcg and lruvec can be freed between the lookup and the
update, leading to a UAF.
* Folio migration can clear the now-unlocked folio's memcg_data, which
directs the zswap LRU protection size update towards the root memcg
instead of the original memcg. This was recently picked up by the
syzbot thanks to a warning in the inlined folio_lruvec() call.
Move the zswap LRU protection range update above the swap_read_folio()
call, and only when a new page is allocated, to prevent this.
Reported-by: syzbot+17a611d10af7d18a7092@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/000000000000ae47f90610803260@google.com/
Fixes: b5ba474f3f51 ("zswap: shrink zswap pool based on memory pressure")
Signed-off-by: Nhat Pham <nphamcs@gmail.com>
---
mm/swap_state.c | 10 ++++++----
mm/zswap.c | 1 +
2 files changed, 7 insertions(+), 4 deletions(-)
diff --git a/mm/swap_state.c b/mm/swap_state.c
index e671266ad772..7255c01a1e4e 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -680,9 +680,10 @@ struct folio *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask,
/* The page was likely read above, so no need for plugging here */
folio = __read_swap_cache_async(entry, gfp_mask, mpol, ilx,
&page_allocated, false);
- if (unlikely(page_allocated))
+ if (unlikely(page_allocated)) {
+ zswap_folio_swapin(folio);
swap_read_folio(folio, false, NULL);
- zswap_folio_swapin(folio);
+ }
return folio;
}
@@ -855,9 +856,10 @@ static struct folio *swap_vma_readahead(swp_entry_t targ_entry, gfp_t gfp_mask,
/* The folio was likely read above, so no need for plugging here */
folio = __read_swap_cache_async(targ_entry, gfp_mask, mpol, targ_ilx,
&page_allocated, false);
- if (unlikely(page_allocated))
+ if (unlikely(page_allocated)) {
+ zswap_folio_swapin(folio);
swap_read_folio(folio, false, NULL);
- zswap_folio_swapin(folio);
+ }
return folio;
}
diff --git a/mm/zswap.c b/mm/zswap.c
index 4aea03285532..8c548f73d52e 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -827,6 +827,7 @@ void zswap_folio_swapin(struct folio *folio)
struct lruvec *lruvec;
if (folio) {
+ VM_WARN_ON_ONCE(!folio_test_locked(folio));
lruvec = folio_lruvec(folio);
atomic_long_inc(&lruvec->zswap_lruvec_state.nr_zswap_protected);
}
base-commit: 91f3daa1765ee4e0c89987dc25f72c40f07af34d
--
2.39.3
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2] mm/swap_state: update zswap LRU's protection range with the folio locked
2024-02-06 18:08 [PATCH v2] mm/swap_state: update zswap LRU's protection range with the folio locked Nhat Pham
@ 2024-02-06 18:51 ` Johannes Weiner
2024-02-06 19:15 ` Nhat Pham
2024-02-06 19:13 ` [PATCH v2] mm/swap_state: update zswap LRU's protection range with the folio locked (fix) Nhat Pham
` (2 subsequent siblings)
3 siblings, 1 reply; 7+ messages in thread
From: Johannes Weiner @ 2024-02-06 18:51 UTC (permalink / raw)
To: Nhat Pham
Cc: akpm, chengming.zhou, yosryahmed, linux-mm, kernel-team, linux-kernel
On Tue, Feb 06, 2024 at 10:08:55AM -0800, Nhat Pham wrote:
> When a folio is swapped in, the protection size of the corresponding
> zswap LRU is incremented, so that the zswap shrinker is more
> conservative with its reclaiming action. This field is embedded within
> the struct lruvec, so updating it requires looking up the folio's memcg
> and lruvec. However, currently this lookup can happen after the folio is
> unlocked, for instance if a new folio is allocated, and
> swap_read_folio() unlocks the folio before returning. In this scenario,
> there is no stability guarantee for the binding between a folio and its
> memcg and lruvec:
>
> * A folio's memcg and lruvec can be freed between the lookup and the
> update, leading to a UAF.
> * Folio migration can clear the now-unlocked folio's memcg_data, which
> directs the zswap LRU protection size update towards the root memcg
> instead of the original memcg. This was recently picked up by the
> syzbot thanks to a warning in the inlined folio_lruvec() call.
>
> Move the zswap LRU protection range update above the swap_read_folio()
> call, and only when a new page is allocated, to prevent this.
>
> Reported-by: syzbot+17a611d10af7d18a7092@syzkaller.appspotmail.com
> Closes: https://lore.kernel.org/all/000000000000ae47f90610803260@google.com/
> Fixes: b5ba474f3f51 ("zswap: shrink zswap pool based on memory pressure")
> Signed-off-by: Nhat Pham <nphamcs@gmail.com>
Looks great, thanks for updating it!
One more thing I just realized:
> ---
> mm/swap_state.c | 10 ++++++----
> mm/zswap.c | 1 +
> 2 files changed, 7 insertions(+), 4 deletions(-)
>
> diff --git a/mm/swap_state.c b/mm/swap_state.c
> index e671266ad772..7255c01a1e4e 100644
> --- a/mm/swap_state.c
> +++ b/mm/swap_state.c
> @@ -680,9 +680,10 @@ struct folio *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask,
> /* The page was likely read above, so no need for plugging here */
> folio = __read_swap_cache_async(entry, gfp_mask, mpol, ilx,
> &page_allocated, false);
> - if (unlikely(page_allocated))
> + if (unlikely(page_allocated)) {
> + zswap_folio_swapin(folio);
> swap_read_folio(folio, false, NULL);
> - zswap_folio_swapin(folio);
> + }
> return folio;
> }
>
> @@ -855,9 +856,10 @@ static struct folio *swap_vma_readahead(swp_entry_t targ_entry, gfp_t gfp_mask,
> /* The folio was likely read above, so no need for plugging here */
> folio = __read_swap_cache_async(targ_entry, gfp_mask, mpol, targ_ilx,
> &page_allocated, false);
> - if (unlikely(page_allocated))
> + if (unlikely(page_allocated)) {
> + zswap_folio_swapin(folio);
> swap_read_folio(folio, false, NULL);
> - zswap_folio_swapin(folio);
> + }
> return folio;
> }
>
> diff --git a/mm/zswap.c b/mm/zswap.c
> index 4aea03285532..8c548f73d52e 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -827,6 +827,7 @@ void zswap_folio_swapin(struct folio *folio)
> struct lruvec *lruvec;
>
> if (folio) {
> + VM_WARN_ON_ONCE(!folio_test_locked(folio));
> lruvec = folio_lruvec(folio);
> atomic_long_inc(&lruvec->zswap_lruvec_state.nr_zswap_protected);
> }
The NULL check is now also no longer necessary.
It used to be called unconditionally, even if
__read_swap_cache_async() failed and returned NULL.
However, page_allocated == true implies success. That newly allocated
and locked folio is always returned.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v2] mm/swap_state: update zswap LRU's protection range with the folio locked (fix)
2024-02-06 18:08 [PATCH v2] mm/swap_state: update zswap LRU's protection range with the folio locked Nhat Pham
2024-02-06 18:51 ` Johannes Weiner
@ 2024-02-06 19:13 ` Nhat Pham
2024-02-06 20:25 ` Johannes Weiner
2024-02-06 20:26 ` [PATCH v2] mm/swap_state: update zswap LRU's protection range with the folio locked Johannes Weiner
2024-02-07 3:03 ` Chengming Zhou
3 siblings, 1 reply; 7+ messages in thread
From: Nhat Pham @ 2024-02-06 19:13 UTC (permalink / raw)
To: akpm
Cc: hannes, chengming.zhou, yosryahmed, linux-mm, kernel-team, linux-kernel
The if (folio) checks inside zswap_folio_swapin() is no longer needed.
Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Nhat Pham <nphamcs@gmail.com>
---
mm/zswap.c | 8 +++-----
1 file changed, 3 insertions(+), 5 deletions(-)
diff --git a/mm/zswap.c b/mm/zswap.c
index 8c548f73d52e..e91e3f10a5c8 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -826,11 +826,9 @@ void zswap_folio_swapin(struct folio *folio)
{
struct lruvec *lruvec;
- if (folio) {
- VM_WARN_ON_ONCE(!folio_test_locked(folio));
- lruvec = folio_lruvec(folio);
- atomic_long_inc(&lruvec->zswap_lruvec_state.nr_zswap_protected);
- }
+ VM_WARN_ON_ONCE(!folio_test_locked(folio));
+ lruvec = folio_lruvec(folio);
+ atomic_long_inc(&lruvec->zswap_lruvec_state.nr_zswap_protected);
}
void zswap_memcg_offline_cleanup(struct mem_cgroup *memcg)
--
2.39.3
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2] mm/swap_state: update zswap LRU's protection range with the folio locked
2024-02-06 18:51 ` Johannes Weiner
@ 2024-02-06 19:15 ` Nhat Pham
0 siblings, 0 replies; 7+ messages in thread
From: Nhat Pham @ 2024-02-06 19:15 UTC (permalink / raw)
To: Johannes Weiner
Cc: akpm, chengming.zhou, yosryahmed, linux-mm, kernel-team, linux-kernel
On Tue, Feb 6, 2024 at 10:51 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> On Tue, Feb 06, 2024 at 10:08:55AM -0800, Nhat Pham wrote:
> > When a folio is swapped in, the protection size of the corresponding
> > zswap LRU is incremented, so that the zswap shrinker is more
> > conservative with its reclaiming action. This field is embedded within
> > the struct lruvec, so updating it requires looking up the folio's memcg
> > and lruvec. However, currently this lookup can happen after the folio is
> > unlocked, for instance if a new folio is allocated, and
> > swap_read_folio() unlocks the folio before returning. In this scenario,
> > there is no stability guarantee for the binding between a folio and its
> > memcg and lruvec:
> >
> > * A folio's memcg and lruvec can be freed between the lookup and the
> > update, leading to a UAF.
> > * Folio migration can clear the now-unlocked folio's memcg_data, which
> > directs the zswap LRU protection size update towards the root memcg
> > instead of the original memcg. This was recently picked up by the
> > syzbot thanks to a warning in the inlined folio_lruvec() call.
> >
> > Move the zswap LRU protection range update above the swap_read_folio()
> > call, and only when a new page is allocated, to prevent this.
> >
> > Reported-by: syzbot+17a611d10af7d18a7092@syzkaller.appspotmail.com
> > Closes: https://lore.kernel.org/all/000000000000ae47f90610803260@google.com/
> > Fixes: b5ba474f3f51 ("zswap: shrink zswap pool based on memory pressure")
> > Signed-off-by: Nhat Pham <nphamcs@gmail.com>
>
> Looks great, thanks for updating it!
>
> One more thing I just realized:
>
> > ---
> > mm/swap_state.c | 10 ++++++----
> > mm/zswap.c | 1 +
> > 2 files changed, 7 insertions(+), 4 deletions(-)
> >
> > diff --git a/mm/swap_state.c b/mm/swap_state.c
> > index e671266ad772..7255c01a1e4e 100644
> > --- a/mm/swap_state.c
> > +++ b/mm/swap_state.c
> > @@ -680,9 +680,10 @@ struct folio *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask,
> > /* The page was likely read above, so no need for plugging here */
> > folio = __read_swap_cache_async(entry, gfp_mask, mpol, ilx,
> > &page_allocated, false);
> > - if (unlikely(page_allocated))
> > + if (unlikely(page_allocated)) {
> > + zswap_folio_swapin(folio);
> > swap_read_folio(folio, false, NULL);
> > - zswap_folio_swapin(folio);
> > + }
> > return folio;
> > }
> >
> > @@ -855,9 +856,10 @@ static struct folio *swap_vma_readahead(swp_entry_t targ_entry, gfp_t gfp_mask,
> > /* The folio was likely read above, so no need for plugging here */
> > folio = __read_swap_cache_async(targ_entry, gfp_mask, mpol, targ_ilx,
> > &page_allocated, false);
> > - if (unlikely(page_allocated))
> > + if (unlikely(page_allocated)) {
> > + zswap_folio_swapin(folio);
> > swap_read_folio(folio, false, NULL);
> > - zswap_folio_swapin(folio);
> > + }
> > return folio;
> > }
> >
> > diff --git a/mm/zswap.c b/mm/zswap.c
> > index 4aea03285532..8c548f73d52e 100644
> > --- a/mm/zswap.c
> > +++ b/mm/zswap.c
> > @@ -827,6 +827,7 @@ void zswap_folio_swapin(struct folio *folio)
> > struct lruvec *lruvec;
> >
> > if (folio) {
> > + VM_WARN_ON_ONCE(!folio_test_locked(folio));
> > lruvec = folio_lruvec(folio);
> > atomic_long_inc(&lruvec->zswap_lruvec_state.nr_zswap_protected);
> > }
>
> The NULL check is now also no longer necessary.
>
> It used to be called unconditionally, even if
> __read_swap_cache_async() failed and returned NULL.
>
> However, page_allocated == true implies success. That newly allocated
> and locked folio is always returned.
Ah yeah, I forgot the context of that :) Just sent a fixlet to do away
with the check. Thanks for picking that out!
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2] mm/swap_state: update zswap LRU's protection range with the folio locked (fix)
2024-02-06 19:13 ` [PATCH v2] mm/swap_state: update zswap LRU's protection range with the folio locked (fix) Nhat Pham
@ 2024-02-06 20:25 ` Johannes Weiner
0 siblings, 0 replies; 7+ messages in thread
From: Johannes Weiner @ 2024-02-06 20:25 UTC (permalink / raw)
To: Nhat Pham
Cc: akpm, chengming.zhou, yosryahmed, linux-mm, kernel-team, linux-kernel
On Tue, Feb 06, 2024 at 11:13:55AM -0800, Nhat Pham wrote:
> The if (folio) checks inside zswap_folio_swapin() is no longer needed.
>
> Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
> Signed-off-by: Nhat Pham <nphamcs@gmail.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2] mm/swap_state: update zswap LRU's protection range with the folio locked
2024-02-06 18:08 [PATCH v2] mm/swap_state: update zswap LRU's protection range with the folio locked Nhat Pham
2024-02-06 18:51 ` Johannes Weiner
2024-02-06 19:13 ` [PATCH v2] mm/swap_state: update zswap LRU's protection range with the folio locked (fix) Nhat Pham
@ 2024-02-06 20:26 ` Johannes Weiner
2024-02-07 3:03 ` Chengming Zhou
3 siblings, 0 replies; 7+ messages in thread
From: Johannes Weiner @ 2024-02-06 20:26 UTC (permalink / raw)
To: Nhat Pham
Cc: akpm, chengming.zhou, yosryahmed, linux-mm, kernel-team, linux-kernel
On Tue, Feb 06, 2024 at 10:08:55AM -0800, Nhat Pham wrote:
> When a folio is swapped in, the protection size of the corresponding
> zswap LRU is incremented, so that the zswap shrinker is more
> conservative with its reclaiming action. This field is embedded within
> the struct lruvec, so updating it requires looking up the folio's memcg
> and lruvec. However, currently this lookup can happen after the folio is
> unlocked, for instance if a new folio is allocated, and
> swap_read_folio() unlocks the folio before returning. In this scenario,
> there is no stability guarantee for the binding between a folio and its
> memcg and lruvec:
>
> * A folio's memcg and lruvec can be freed between the lookup and the
> update, leading to a UAF.
> * Folio migration can clear the now-unlocked folio's memcg_data, which
> directs the zswap LRU protection size update towards the root memcg
> instead of the original memcg. This was recently picked up by the
> syzbot thanks to a warning in the inlined folio_lruvec() call.
>
> Move the zswap LRU protection range update above the swap_read_folio()
> call, and only when a new page is allocated, to prevent this.
>
> Reported-by: syzbot+17a611d10af7d18a7092@syzkaller.appspotmail.com
> Closes: https://lore.kernel.org/all/000000000000ae47f90610803260@google.com/
> Fixes: b5ba474f3f51 ("zswap: shrink zswap pool based on memory pressure")
> Signed-off-by: Nhat Pham <nphamcs@gmail.com>
With the fixlet applied,
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2] mm/swap_state: update zswap LRU's protection range with the folio locked
2024-02-06 18:08 [PATCH v2] mm/swap_state: update zswap LRU's protection range with the folio locked Nhat Pham
` (2 preceding siblings ...)
2024-02-06 20:26 ` [PATCH v2] mm/swap_state: update zswap LRU's protection range with the folio locked Johannes Weiner
@ 2024-02-07 3:03 ` Chengming Zhou
3 siblings, 0 replies; 7+ messages in thread
From: Chengming Zhou @ 2024-02-07 3:03 UTC (permalink / raw)
To: Nhat Pham, akpm; +Cc: hannes, yosryahmed, linux-mm, kernel-team, linux-kernel
On 2024/2/7 02:08, Nhat Pham wrote:
> When a folio is swapped in, the protection size of the corresponding
> zswap LRU is incremented, so that the zswap shrinker is more
> conservative with its reclaiming action. This field is embedded within
> the struct lruvec, so updating it requires looking up the folio's memcg
> and lruvec. However, currently this lookup can happen after the folio is
> unlocked, for instance if a new folio is allocated, and
> swap_read_folio() unlocks the folio before returning. In this scenario,
> there is no stability guarantee for the binding between a folio and its
> memcg and lruvec:
>
> * A folio's memcg and lruvec can be freed between the lookup and the
> update, leading to a UAF.
> * Folio migration can clear the now-unlocked folio's memcg_data, which
> directs the zswap LRU protection size update towards the root memcg
> instead of the original memcg. This was recently picked up by the
> syzbot thanks to a warning in the inlined folio_lruvec() call.
>
> Move the zswap LRU protection range update above the swap_read_folio()
> call, and only when a new page is allocated, to prevent this.
>
> Reported-by: syzbot+17a611d10af7d18a7092@syzkaller.appspotmail.com
> Closes: https://lore.kernel.org/all/000000000000ae47f90610803260@google.com/
> Fixes: b5ba474f3f51 ("zswap: shrink zswap pool based on memory pressure")
> Signed-off-by: Nhat Pham <nphamcs@gmail.com>
LGTM, thanks!
Reviewed-by: Chengming Zhou <zhouchengming@bytedance.com>
> ---
> mm/swap_state.c | 10 ++++++----
> mm/zswap.c | 1 +
> 2 files changed, 7 insertions(+), 4 deletions(-)
>
> diff --git a/mm/swap_state.c b/mm/swap_state.c
> index e671266ad772..7255c01a1e4e 100644
> --- a/mm/swap_state.c
> +++ b/mm/swap_state.c
> @@ -680,9 +680,10 @@ struct folio *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask,
> /* The page was likely read above, so no need for plugging here */
> folio = __read_swap_cache_async(entry, gfp_mask, mpol, ilx,
> &page_allocated, false);
> - if (unlikely(page_allocated))
> + if (unlikely(page_allocated)) {
> + zswap_folio_swapin(folio);
> swap_read_folio(folio, false, NULL);
> - zswap_folio_swapin(folio);
> + }
> return folio;
> }
>
> @@ -855,9 +856,10 @@ static struct folio *swap_vma_readahead(swp_entry_t targ_entry, gfp_t gfp_mask,
> /* The folio was likely read above, so no need for plugging here */
> folio = __read_swap_cache_async(targ_entry, gfp_mask, mpol, targ_ilx,
> &page_allocated, false);
> - if (unlikely(page_allocated))
> + if (unlikely(page_allocated)) {
> + zswap_folio_swapin(folio);
> swap_read_folio(folio, false, NULL);
> - zswap_folio_swapin(folio);
> + }
> return folio;
> }
>
> diff --git a/mm/zswap.c b/mm/zswap.c
> index 4aea03285532..8c548f73d52e 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -827,6 +827,7 @@ void zswap_folio_swapin(struct folio *folio)
> struct lruvec *lruvec;
>
> if (folio) {
> + VM_WARN_ON_ONCE(!folio_test_locked(folio));
> lruvec = folio_lruvec(folio);
> atomic_long_inc(&lruvec->zswap_lruvec_state.nr_zswap_protected);
> }
>
> base-commit: 91f3daa1765ee4e0c89987dc25f72c40f07af34d
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-02-07 3:04 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-06 18:08 [PATCH v2] mm/swap_state: update zswap LRU's protection range with the folio locked Nhat Pham
2024-02-06 18:51 ` Johannes Weiner
2024-02-06 19:15 ` Nhat Pham
2024-02-06 19:13 ` [PATCH v2] mm/swap_state: update zswap LRU's protection range with the folio locked (fix) Nhat Pham
2024-02-06 20:25 ` Johannes Weiner
2024-02-06 20:26 ` [PATCH v2] mm/swap_state: update zswap LRU's protection range with the folio locked Johannes Weiner
2024-02-07 3:03 ` Chengming Zhou
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox