* Re: [PATCH v2] mm/swap_state: update zswap LRU's protection range with the folio locked
2024-02-06 18:08 [PATCH v2] mm/swap_state: update zswap LRU's protection range with the folio locked Nhat Pham
@ 2024-02-06 18:51 ` Johannes Weiner
2024-02-06 19:15 ` Nhat Pham
2024-02-06 19:13 ` [PATCH v2] mm/swap_state: update zswap LRU's protection range with the folio locked (fix) Nhat Pham
` (2 subsequent siblings)
3 siblings, 1 reply; 7+ messages in thread
From: Johannes Weiner @ 2024-02-06 18:51 UTC (permalink / raw)
To: Nhat Pham
Cc: akpm, chengming.zhou, yosryahmed, linux-mm, kernel-team, linux-kernel
On Tue, Feb 06, 2024 at 10:08:55AM -0800, Nhat Pham wrote:
> When a folio is swapped in, the protection size of the corresponding
> zswap LRU is incremented, so that the zswap shrinker is more
> conservative with its reclaiming action. This field is embedded within
> the struct lruvec, so updating it requires looking up the folio's memcg
> and lruvec. However, currently this lookup can happen after the folio is
> unlocked, for instance if a new folio is allocated, and
> swap_read_folio() unlocks the folio before returning. In this scenario,
> there is no stability guarantee for the binding between a folio and its
> memcg and lruvec:
>
> * A folio's memcg and lruvec can be freed between the lookup and the
> update, leading to a UAF.
> * Folio migration can clear the now-unlocked folio's memcg_data, which
> directs the zswap LRU protection size update towards the root memcg
> instead of the original memcg. This was recently picked up by the
> syzbot thanks to a warning in the inlined folio_lruvec() call.
>
> Move the zswap LRU protection range update above the swap_read_folio()
> call, and only when a new page is allocated, to prevent this.
>
> Reported-by: syzbot+17a611d10af7d18a7092@syzkaller.appspotmail.com
> Closes: https://lore.kernel.org/all/000000000000ae47f90610803260@google.com/
> Fixes: b5ba474f3f51 ("zswap: shrink zswap pool based on memory pressure")
> Signed-off-by: Nhat Pham <nphamcs@gmail.com>
Looks great, thanks for updating it!
One more thing I just realized:
> ---
> mm/swap_state.c | 10 ++++++----
> mm/zswap.c | 1 +
> 2 files changed, 7 insertions(+), 4 deletions(-)
>
> diff --git a/mm/swap_state.c b/mm/swap_state.c
> index e671266ad772..7255c01a1e4e 100644
> --- a/mm/swap_state.c
> +++ b/mm/swap_state.c
> @@ -680,9 +680,10 @@ struct folio *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask,
> /* The page was likely read above, so no need for plugging here */
> folio = __read_swap_cache_async(entry, gfp_mask, mpol, ilx,
> &page_allocated, false);
> - if (unlikely(page_allocated))
> + if (unlikely(page_allocated)) {
> + zswap_folio_swapin(folio);
> swap_read_folio(folio, false, NULL);
> - zswap_folio_swapin(folio);
> + }
> return folio;
> }
>
> @@ -855,9 +856,10 @@ static struct folio *swap_vma_readahead(swp_entry_t targ_entry, gfp_t gfp_mask,
> /* The folio was likely read above, so no need for plugging here */
> folio = __read_swap_cache_async(targ_entry, gfp_mask, mpol, targ_ilx,
> &page_allocated, false);
> - if (unlikely(page_allocated))
> + if (unlikely(page_allocated)) {
> + zswap_folio_swapin(folio);
> swap_read_folio(folio, false, NULL);
> - zswap_folio_swapin(folio);
> + }
> return folio;
> }
>
> diff --git a/mm/zswap.c b/mm/zswap.c
> index 4aea03285532..8c548f73d52e 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -827,6 +827,7 @@ void zswap_folio_swapin(struct folio *folio)
> struct lruvec *lruvec;
>
> if (folio) {
> + VM_WARN_ON_ONCE(!folio_test_locked(folio));
> lruvec = folio_lruvec(folio);
> atomic_long_inc(&lruvec->zswap_lruvec_state.nr_zswap_protected);
> }
The NULL check is now also no longer necessary.
It used to be called unconditionally, even if
__read_swap_cache_async() failed and returned NULL.
However, page_allocated == true implies success. That newly allocated
and locked folio is always returned.
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [PATCH v2] mm/swap_state: update zswap LRU's protection range with the folio locked
2024-02-06 18:51 ` Johannes Weiner
@ 2024-02-06 19:15 ` Nhat Pham
0 siblings, 0 replies; 7+ messages in thread
From: Nhat Pham @ 2024-02-06 19:15 UTC (permalink / raw)
To: Johannes Weiner
Cc: akpm, chengming.zhou, yosryahmed, linux-mm, kernel-team, linux-kernel
On Tue, Feb 6, 2024 at 10:51 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> On Tue, Feb 06, 2024 at 10:08:55AM -0800, Nhat Pham wrote:
> > When a folio is swapped in, the protection size of the corresponding
> > zswap LRU is incremented, so that the zswap shrinker is more
> > conservative with its reclaiming action. This field is embedded within
> > the struct lruvec, so updating it requires looking up the folio's memcg
> > and lruvec. However, currently this lookup can happen after the folio is
> > unlocked, for instance if a new folio is allocated, and
> > swap_read_folio() unlocks the folio before returning. In this scenario,
> > there is no stability guarantee for the binding between a folio and its
> > memcg and lruvec:
> >
> > * A folio's memcg and lruvec can be freed between the lookup and the
> > update, leading to a UAF.
> > * Folio migration can clear the now-unlocked folio's memcg_data, which
> > directs the zswap LRU protection size update towards the root memcg
> > instead of the original memcg. This was recently picked up by the
> > syzbot thanks to a warning in the inlined folio_lruvec() call.
> >
> > Move the zswap LRU protection range update above the swap_read_folio()
> > call, and only when a new page is allocated, to prevent this.
> >
> > Reported-by: syzbot+17a611d10af7d18a7092@syzkaller.appspotmail.com
> > Closes: https://lore.kernel.org/all/000000000000ae47f90610803260@google.com/
> > Fixes: b5ba474f3f51 ("zswap: shrink zswap pool based on memory pressure")
> > Signed-off-by: Nhat Pham <nphamcs@gmail.com>
>
> Looks great, thanks for updating it!
>
> One more thing I just realized:
>
> > ---
> > mm/swap_state.c | 10 ++++++----
> > mm/zswap.c | 1 +
> > 2 files changed, 7 insertions(+), 4 deletions(-)
> >
> > diff --git a/mm/swap_state.c b/mm/swap_state.c
> > index e671266ad772..7255c01a1e4e 100644
> > --- a/mm/swap_state.c
> > +++ b/mm/swap_state.c
> > @@ -680,9 +680,10 @@ struct folio *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask,
> > /* The page was likely read above, so no need for plugging here */
> > folio = __read_swap_cache_async(entry, gfp_mask, mpol, ilx,
> > &page_allocated, false);
> > - if (unlikely(page_allocated))
> > + if (unlikely(page_allocated)) {
> > + zswap_folio_swapin(folio);
> > swap_read_folio(folio, false, NULL);
> > - zswap_folio_swapin(folio);
> > + }
> > return folio;
> > }
> >
> > @@ -855,9 +856,10 @@ static struct folio *swap_vma_readahead(swp_entry_t targ_entry, gfp_t gfp_mask,
> > /* The folio was likely read above, so no need for plugging here */
> > folio = __read_swap_cache_async(targ_entry, gfp_mask, mpol, targ_ilx,
> > &page_allocated, false);
> > - if (unlikely(page_allocated))
> > + if (unlikely(page_allocated)) {
> > + zswap_folio_swapin(folio);
> > swap_read_folio(folio, false, NULL);
> > - zswap_folio_swapin(folio);
> > + }
> > return folio;
> > }
> >
> > diff --git a/mm/zswap.c b/mm/zswap.c
> > index 4aea03285532..8c548f73d52e 100644
> > --- a/mm/zswap.c
> > +++ b/mm/zswap.c
> > @@ -827,6 +827,7 @@ void zswap_folio_swapin(struct folio *folio)
> > struct lruvec *lruvec;
> >
> > if (folio) {
> > + VM_WARN_ON_ONCE(!folio_test_locked(folio));
> > lruvec = folio_lruvec(folio);
> > atomic_long_inc(&lruvec->zswap_lruvec_state.nr_zswap_protected);
> > }
>
> The NULL check is now also no longer necessary.
>
> It used to be called unconditionally, even if
> __read_swap_cache_async() failed and returned NULL.
>
> However, page_allocated == true implies success. That newly allocated
> and locked folio is always returned.
Ah yeah, I forgot the context of that :) Just sent a fixlet to do away
with the check. Thanks for picking that out!
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v2] mm/swap_state: update zswap LRU's protection range with the folio locked (fix)
2024-02-06 18:08 [PATCH v2] mm/swap_state: update zswap LRU's protection range with the folio locked Nhat Pham
2024-02-06 18:51 ` Johannes Weiner
@ 2024-02-06 19:13 ` Nhat Pham
2024-02-06 20:25 ` Johannes Weiner
2024-02-06 20:26 ` [PATCH v2] mm/swap_state: update zswap LRU's protection range with the folio locked Johannes Weiner
2024-02-07 3:03 ` Chengming Zhou
3 siblings, 1 reply; 7+ messages in thread
From: Nhat Pham @ 2024-02-06 19:13 UTC (permalink / raw)
To: akpm
Cc: hannes, chengming.zhou, yosryahmed, linux-mm, kernel-team, linux-kernel
The if (folio) checks inside zswap_folio_swapin() is no longer needed.
Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Nhat Pham <nphamcs@gmail.com>
---
mm/zswap.c | 8 +++-----
1 file changed, 3 insertions(+), 5 deletions(-)
diff --git a/mm/zswap.c b/mm/zswap.c
index 8c548f73d52e..e91e3f10a5c8 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -826,11 +826,9 @@ void zswap_folio_swapin(struct folio *folio)
{
struct lruvec *lruvec;
- if (folio) {
- VM_WARN_ON_ONCE(!folio_test_locked(folio));
- lruvec = folio_lruvec(folio);
- atomic_long_inc(&lruvec->zswap_lruvec_state.nr_zswap_protected);
- }
+ VM_WARN_ON_ONCE(!folio_test_locked(folio));
+ lruvec = folio_lruvec(folio);
+ atomic_long_inc(&lruvec->zswap_lruvec_state.nr_zswap_protected);
}
void zswap_memcg_offline_cleanup(struct mem_cgroup *memcg)
--
2.39.3
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [PATCH v2] mm/swap_state: update zswap LRU's protection range with the folio locked
2024-02-06 18:08 [PATCH v2] mm/swap_state: update zswap LRU's protection range with the folio locked Nhat Pham
2024-02-06 18:51 ` Johannes Weiner
2024-02-06 19:13 ` [PATCH v2] mm/swap_state: update zswap LRU's protection range with the folio locked (fix) Nhat Pham
@ 2024-02-06 20:26 ` Johannes Weiner
2024-02-07 3:03 ` Chengming Zhou
3 siblings, 0 replies; 7+ messages in thread
From: Johannes Weiner @ 2024-02-06 20:26 UTC (permalink / raw)
To: Nhat Pham
Cc: akpm, chengming.zhou, yosryahmed, linux-mm, kernel-team, linux-kernel
On Tue, Feb 06, 2024 at 10:08:55AM -0800, Nhat Pham wrote:
> When a folio is swapped in, the protection size of the corresponding
> zswap LRU is incremented, so that the zswap shrinker is more
> conservative with its reclaiming action. This field is embedded within
> the struct lruvec, so updating it requires looking up the folio's memcg
> and lruvec. However, currently this lookup can happen after the folio is
> unlocked, for instance if a new folio is allocated, and
> swap_read_folio() unlocks the folio before returning. In this scenario,
> there is no stability guarantee for the binding between a folio and its
> memcg and lruvec:
>
> * A folio's memcg and lruvec can be freed between the lookup and the
> update, leading to a UAF.
> * Folio migration can clear the now-unlocked folio's memcg_data, which
> directs the zswap LRU protection size update towards the root memcg
> instead of the original memcg. This was recently picked up by the
> syzbot thanks to a warning in the inlined folio_lruvec() call.
>
> Move the zswap LRU protection range update above the swap_read_folio()
> call, and only when a new page is allocated, to prevent this.
>
> Reported-by: syzbot+17a611d10af7d18a7092@syzkaller.appspotmail.com
> Closes: https://lore.kernel.org/all/000000000000ae47f90610803260@google.com/
> Fixes: b5ba474f3f51 ("zswap: shrink zswap pool based on memory pressure")
> Signed-off-by: Nhat Pham <nphamcs@gmail.com>
With the fixlet applied,
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [PATCH v2] mm/swap_state: update zswap LRU's protection range with the folio locked
2024-02-06 18:08 [PATCH v2] mm/swap_state: update zswap LRU's protection range with the folio locked Nhat Pham
` (2 preceding siblings ...)
2024-02-06 20:26 ` [PATCH v2] mm/swap_state: update zswap LRU's protection range with the folio locked Johannes Weiner
@ 2024-02-07 3:03 ` Chengming Zhou
3 siblings, 0 replies; 7+ messages in thread
From: Chengming Zhou @ 2024-02-07 3:03 UTC (permalink / raw)
To: Nhat Pham, akpm; +Cc: hannes, yosryahmed, linux-mm, kernel-team, linux-kernel
On 2024/2/7 02:08, Nhat Pham wrote:
> When a folio is swapped in, the protection size of the corresponding
> zswap LRU is incremented, so that the zswap shrinker is more
> conservative with its reclaiming action. This field is embedded within
> the struct lruvec, so updating it requires looking up the folio's memcg
> and lruvec. However, currently this lookup can happen after the folio is
> unlocked, for instance if a new folio is allocated, and
> swap_read_folio() unlocks the folio before returning. In this scenario,
> there is no stability guarantee for the binding between a folio and its
> memcg and lruvec:
>
> * A folio's memcg and lruvec can be freed between the lookup and the
> update, leading to a UAF.
> * Folio migration can clear the now-unlocked folio's memcg_data, which
> directs the zswap LRU protection size update towards the root memcg
> instead of the original memcg. This was recently picked up by the
> syzbot thanks to a warning in the inlined folio_lruvec() call.
>
> Move the zswap LRU protection range update above the swap_read_folio()
> call, and only when a new page is allocated, to prevent this.
>
> Reported-by: syzbot+17a611d10af7d18a7092@syzkaller.appspotmail.com
> Closes: https://lore.kernel.org/all/000000000000ae47f90610803260@google.com/
> Fixes: b5ba474f3f51 ("zswap: shrink zswap pool based on memory pressure")
> Signed-off-by: Nhat Pham <nphamcs@gmail.com>
LGTM, thanks!
Reviewed-by: Chengming Zhou <zhouchengming@bytedance.com>
> ---
> mm/swap_state.c | 10 ++++++----
> mm/zswap.c | 1 +
> 2 files changed, 7 insertions(+), 4 deletions(-)
>
> diff --git a/mm/swap_state.c b/mm/swap_state.c
> index e671266ad772..7255c01a1e4e 100644
> --- a/mm/swap_state.c
> +++ b/mm/swap_state.c
> @@ -680,9 +680,10 @@ struct folio *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask,
> /* The page was likely read above, so no need for plugging here */
> folio = __read_swap_cache_async(entry, gfp_mask, mpol, ilx,
> &page_allocated, false);
> - if (unlikely(page_allocated))
> + if (unlikely(page_allocated)) {
> + zswap_folio_swapin(folio);
> swap_read_folio(folio, false, NULL);
> - zswap_folio_swapin(folio);
> + }
> return folio;
> }
>
> @@ -855,9 +856,10 @@ static struct folio *swap_vma_readahead(swp_entry_t targ_entry, gfp_t gfp_mask,
> /* The folio was likely read above, so no need for plugging here */
> folio = __read_swap_cache_async(targ_entry, gfp_mask, mpol, targ_ilx,
> &page_allocated, false);
> - if (unlikely(page_allocated))
> + if (unlikely(page_allocated)) {
> + zswap_folio_swapin(folio);
> swap_read_folio(folio, false, NULL);
> - zswap_folio_swapin(folio);
> + }
> return folio;
> }
>
> diff --git a/mm/zswap.c b/mm/zswap.c
> index 4aea03285532..8c548f73d52e 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -827,6 +827,7 @@ void zswap_folio_swapin(struct folio *folio)
> struct lruvec *lruvec;
>
> if (folio) {
> + VM_WARN_ON_ONCE(!folio_test_locked(folio));
> lruvec = folio_lruvec(folio);
> atomic_long_inc(&lruvec->zswap_lruvec_state.nr_zswap_protected);
> }
>
> base-commit: 91f3daa1765ee4e0c89987dc25f72c40f07af34d
^ permalink raw reply [flat|nested] 7+ messages in thread