* [PATCH] mm: Always initialise folio->_deferred_list
@ 2024-03-12 3:50 Matthew Wilcox (Oracle)
2024-03-12 13:17 ` David Hildenbrand
0 siblings, 1 reply; 6+ messages in thread
From: Matthew Wilcox (Oracle) @ 2024-03-12 3:50 UTC (permalink / raw)
To: Andrew Morton; +Cc: Matthew Wilcox (Oracle), linux-mm
For compound pages which are at least order-2 (and hence have a
deferred_list), initialise it and then we can check at free that the
page is not part of a deferred list. We recently found this useful to
rule out a source of corruption.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
mm/huge_memory.c | 2 --
mm/internal.h | 2 ++
mm/memcontrol.c | 2 ++
mm/page_alloc.c | 9 +++++----
4 files changed, 9 insertions(+), 6 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 9859aa4f7553..04fb994a7b0b 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -792,8 +792,6 @@ void folio_prep_large_rmappable(struct folio *folio)
{
if (!folio || !folio_test_large(folio))
return;
- if (folio_order(folio) > 1)
- INIT_LIST_HEAD(&folio->_deferred_list);
folio_set_large_rmappable(folio);
}
diff --git a/mm/internal.h b/mm/internal.h
index 7e486f2c502c..10895ec52546 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -525,6 +525,8 @@ static inline void prep_compound_head(struct page *page, unsigned int order)
atomic_set(&folio->_entire_mapcount, -1);
atomic_set(&folio->_nr_pages_mapped, 0);
atomic_set(&folio->_pincount, 0);
+ if (order > 1)
+ INIT_LIST_HEAD(&folio->_deferred_list);
}
static inline void prep_compound_tail(struct page *head, int tail_idx)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 138bcfa18234..e2334c4ee550 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -7483,6 +7483,8 @@ static void uncharge_folio(struct folio *folio, struct uncharge_gather *ug)
struct obj_cgroup *objcg;
VM_BUG_ON_FOLIO(folio_test_lru(folio), folio);
+ VM_BUG_ON_FOLIO(folio_order(folio) > 1 &&
+ !list_empty(&folio->_deferred_list), folio);
/*
* Nobody should be changing or seriously looking at
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 62240014a498..8374ba9b6249 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1006,10 +1006,11 @@ static int free_tail_page_prepare(struct page *head_page, struct page *page)
}
break;
case 2:
- /*
- * the second tail page: ->mapping is
- * deferred_list.next -- ignore value.
- */
+ /* the second tail page: deferred_list overlaps ->mapping */
+ if (unlikely(!list_empty(&folio->_deferred_list))) {
+ bad_page(page, "on deferred list");
+ goto out;
+ }
break;
default:
if (page->mapping != TAIL_MAPPING) {
--
2.43.0
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] mm: Always initialise folio->_deferred_list
2024-03-12 3:50 [PATCH] mm: Always initialise folio->_deferred_list Matthew Wilcox (Oracle)
@ 2024-03-12 13:17 ` David Hildenbrand
2024-03-12 14:01 ` Matthew Wilcox
0 siblings, 1 reply; 6+ messages in thread
From: David Hildenbrand @ 2024-03-12 13:17 UTC (permalink / raw)
To: Matthew Wilcox (Oracle), Andrew Morton; +Cc: linux-mm
On 12.03.24 04:50, Matthew Wilcox (Oracle) wrote:
> For compound pages which are at least order-2 (and hence have a
> deferred_list), initialise it and then we can check at free that the
> page is not part of a deferred list. We recently found this useful to
> rule out a source of corruption.
>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
> mm/huge_memory.c | 2 --
> mm/internal.h | 2 ++
> mm/memcontrol.c | 2 ++
> mm/page_alloc.c | 9 +++++----
> 4 files changed, 9 insertions(+), 6 deletions(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 9859aa4f7553..04fb994a7b0b 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -792,8 +792,6 @@ void folio_prep_large_rmappable(struct folio *folio)
> {
> if (!folio || !folio_test_large(folio))
> return;
> - if (folio_order(folio) > 1)
> - INIT_LIST_HEAD(&folio->_deferred_list);
> folio_set_large_rmappable(folio);
> }
>
> diff --git a/mm/internal.h b/mm/internal.h
> index 7e486f2c502c..10895ec52546 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -525,6 +525,8 @@ static inline void prep_compound_head(struct page *page, unsigned int order)
> atomic_set(&folio->_entire_mapcount, -1);
> atomic_set(&folio->_nr_pages_mapped, 0);
> atomic_set(&folio->_pincount, 0);
> + if (order > 1)
> + INIT_LIST_HEAD(&folio->_deferred_list);
> }
>
> static inline void prep_compound_tail(struct page *head, int tail_idx)
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 138bcfa18234..e2334c4ee550 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -7483,6 +7483,8 @@ static void uncharge_folio(struct folio *folio, struct uncharge_gather *ug)
> struct obj_cgroup *objcg;
>
> VM_BUG_ON_FOLIO(folio_test_lru(folio), folio);
> + VM_BUG_ON_FOLIO(folio_order(folio) > 1 &&
> + !list_empty(&folio->_deferred_list), folio);
>
> /*
> * Nobody should be changing or seriously looking at
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 62240014a498..8374ba9b6249 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1006,10 +1006,11 @@ static int free_tail_page_prepare(struct page *head_page, struct page *page)
> }
> break;
> case 2:
> - /*
> - * the second tail page: ->mapping is
> - * deferred_list.next -- ignore value.
> - */
> + /* the second tail page: deferred_list overlaps ->mapping */
> + if (unlikely(!list_empty(&folio->_deferred_list))) {
> + bad_page(page, "on deferred list");
> + goto out;
> + }
IIRC, hugetlb might overwrite this with _hugetlb_subpool? Isn't that a
problem we have to handle?
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] mm: Always initialise folio->_deferred_list
2024-03-12 13:17 ` David Hildenbrand
@ 2024-03-12 14:01 ` Matthew Wilcox
2024-03-12 14:34 ` David Hildenbrand
0 siblings, 1 reply; 6+ messages in thread
From: Matthew Wilcox @ 2024-03-12 14:01 UTC (permalink / raw)
To: David Hildenbrand; +Cc: Andrew Morton, linux-mm
On Tue, Mar 12, 2024 at 02:17:01PM +0100, David Hildenbrand wrote:
> On 12.03.24 04:50, Matthew Wilcox (Oracle) wrote:
> > +++ b/mm/page_alloc.c
> > @@ -1006,10 +1006,11 @@ static int free_tail_page_prepare(struct page *head_page, struct page *page)
> > }
> > break;
> > case 2:
> > - /*
> > - * the second tail page: ->mapping is
> > - * deferred_list.next -- ignore value.
> > - */
> > + /* the second tail page: deferred_list overlaps ->mapping */
> > + if (unlikely(!list_empty(&folio->_deferred_list))) {
> > + bad_page(page, "on deferred list");
> > + goto out;
> > + }
>
> IIRC, hugetlb might overwrite this with _hugetlb_subpool? Isn't that a
> problem we have to handle?
Ah yes, you're right. I think this should do the trick?
+++ b/mm/hugetlb.c
@@ -1796,6 +1796,7 @@ static void __update_and_free_hugetlb_folio(struct hstate *h,
destroy_compound_gigantic_folio(folio, huge_page_order(h));
free_gigantic_folio(folio, huge_page_order(h));
} else {
+ INIT_LIST_HEAD(&folio->_deferred_list);
__free_pages(&folio->page, huge_page_order(h));
}
}
(that should also be a folio_put(), IMO)
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] mm: Always initialise folio->_deferred_list
2024-03-12 14:01 ` Matthew Wilcox
@ 2024-03-12 14:34 ` David Hildenbrand
2024-03-12 15:38 ` Matthew Wilcox
0 siblings, 1 reply; 6+ messages in thread
From: David Hildenbrand @ 2024-03-12 14:34 UTC (permalink / raw)
To: Matthew Wilcox; +Cc: Andrew Morton, linux-mm
On 12.03.24 15:01, Matthew Wilcox wrote:
> On Tue, Mar 12, 2024 at 02:17:01PM +0100, David Hildenbrand wrote:
>> On 12.03.24 04:50, Matthew Wilcox (Oracle) wrote:
>>> +++ b/mm/page_alloc.c
>>> @@ -1006,10 +1006,11 @@ static int free_tail_page_prepare(struct page *head_page, struct page *page)
>>> }
>>> break;
>>> case 2:
>>> - /*
>>> - * the second tail page: ->mapping is
>>> - * deferred_list.next -- ignore value.
>>> - */
>>> + /* the second tail page: deferred_list overlaps ->mapping */
>>> + if (unlikely(!list_empty(&folio->_deferred_list))) {
>>> + bad_page(page, "on deferred list");
>>> + goto out;
>>> + }
>>
>> IIRC, hugetlb might overwrite this with _hugetlb_subpool? Isn't that a
>> problem we have to handle?
>
> Ah yes, you're right. I think this should do the trick?
>
> +++ b/mm/hugetlb.c
> @@ -1796,6 +1796,7 @@ static void __update_and_free_hugetlb_folio(struct hstate *h,
> destroy_compound_gigantic_folio(folio, huge_page_order(h));
> free_gigantic_folio(folio, huge_page_order(h));
> } else {
> + INIT_LIST_HEAD(&folio->_deferred_list);
> __free_pages(&folio->page, huge_page_order(h));
Heh, __free_pages() says:
"This function can free multi-page allocations that are not compound
pages". I suspect that means "can also free compound pages", but it's
confusing.
Especially, I thought we recently learned that free hugetlb folios do
have a refcount of 0? Confusing.
Anyhow, I suspect your change does the trick and agree that
__free_pages() might need a replacement.
... but it's confusing.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] mm: Always initialise folio->_deferred_list
2024-03-12 14:34 ` David Hildenbrand
@ 2024-03-12 15:38 ` Matthew Wilcox
2024-03-12 15:45 ` David Hildenbrand
0 siblings, 1 reply; 6+ messages in thread
From: Matthew Wilcox @ 2024-03-12 15:38 UTC (permalink / raw)
To: David Hildenbrand; +Cc: Andrew Morton, linux-mm
On Tue, Mar 12, 2024 at 03:34:13PM +0100, David Hildenbrand wrote:
> On 12.03.24 15:01, Matthew Wilcox wrote:
> > __free_pages(&folio->page, huge_page_order(h));
>
> Heh, __free_pages() says:
>
> "This function can free multi-page allocations that are not compound pages".
> I suspect that means "can also free compound pages", but it's confusing.
Sorry. I wrote that documentation and I was focused on one thing,
but failed to think of the other thing ... it can indeed free compound
pages. I'll add this:
* If the pages were allocated with __GFP_COMP, prefer using put_page()
* to using this function. Also prefer to call put_page() rather than
* calling __free_page() or __free_pages(page, 0).
> Especially, I thought we recently learned that free hugetlb folios do have a
> refcount of 0? Confusing.
Yes, that is confusing. This function wouldn't work if the refcount
were 0 (put_page_testzero() would complain). Ah, here we go:
In __folio_put(), we know that refcount is 0.
__folio_put_large() -> destroy_large_folio -> free_huge_folio
(refcount still 0 here, indeed it asserts it)
If we add it back to the pool, the refcount stays at 0.
But if we're removing it, we call __remove_hugetlb_folio()
which calls folio_ref_unfreeze().
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] mm: Always initialise folio->_deferred_list
2024-03-12 15:38 ` Matthew Wilcox
@ 2024-03-12 15:45 ` David Hildenbrand
0 siblings, 0 replies; 6+ messages in thread
From: David Hildenbrand @ 2024-03-12 15:45 UTC (permalink / raw)
To: Matthew Wilcox; +Cc: Andrew Morton, linux-mm
On 12.03.24 16:38, Matthew Wilcox wrote:
> On Tue, Mar 12, 2024 at 03:34:13PM +0100, David Hildenbrand wrote:
>> On 12.03.24 15:01, Matthew Wilcox wrote:
>>> __free_pages(&folio->page, huge_page_order(h));
>>
>> Heh, __free_pages() says:
>>
>> "This function can free multi-page allocations that are not compound pages".
>> I suspect that means "can also free compound pages", but it's confusing.
>
> Sorry. I wrote that documentation and I was focused on one thing,
> but failed to think of the other thing ... it can indeed free compound
> pages. I'll add this:
>
> * If the pages were allocated with __GFP_COMP, prefer using put_page()
> * to using this function. Also prefer to call put_page() rather than
> * calling __free_page() or __free_pages(page, 0).
Sounds good.
>
>> Especially, I thought we recently learned that free hugetlb folios do have a
>> refcount of 0? Confusing.
>
> Yes, that is confusing. This function wouldn't work if the refcount
> were 0 (put_page_testzero() would complain). Ah, here we go:
>
> In __folio_put(), we know that refcount is 0.
> __folio_put_large() -> destroy_large_folio -> free_huge_folio
> (refcount still 0 here, indeed it asserts it)
> If we add it back to the pool, the refcount stays at 0.
> But if we're removing it, we call __remove_hugetlb_folio()
> which calls folio_ref_unfreeze().
Ah, that's the case I was missing. The joy of hugetlb :)
As a side note: hugetlb.c still talks about "destructor" and "dtor". I
assume that dates back to the times where we identified hugetlb folios
by a deconstrucor in the compound page.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2024-03-12 15:45 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-12 3:50 [PATCH] mm: Always initialise folio->_deferred_list Matthew Wilcox (Oracle)
2024-03-12 13:17 ` David Hildenbrand
2024-03-12 14:01 ` Matthew Wilcox
2024-03-12 14:34 ` David Hildenbrand
2024-03-12 15:38 ` Matthew Wilcox
2024-03-12 15:45 ` David Hildenbrand
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox