Re: Some questions about shrink_folio

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Re: Some questions about shrink_folio_list
       [not found] <AS8P189MB165601127DEE02A7FA9C2D11F53D2@AS8P189MB1656.EURP189.PROD.OUTLOOK.COM>
@ 2024-04-04 15:35 ` Matthew Wilcox
  2024-04-04 21:32   ` yueyang.pan
  0 siblings, 1 reply; 2+ messages in thread
From: Matthew Wilcox @ 2024-04-04 15:35 UTC (permalink / raw)
  To: yueyang.pan; +Cc: linux-mm

On Wed, Apr 03, 2024 at 10:06:16PM +0000, yueyang.pan@epfl.ch wrote:
> Dear Matthew,
>     I am Yueyang Pan a PhD student from EPFL, and I am currently
> checking the swap code in the kernel. Sorry to bother you because this
> email should go to Mel Gorman. He did not reply to me so I turned to
> you for some help.

Hi Yueyang,

You'd probably have more luck if you cc'd the mailing list.  Somebody
other than Mel might have answered you.  Added it now.

> 1)  I have some questions about `try_to_unmap_flush_dirty`. I
> wonder why this is necessary in shrink_folio_list because in
> the folio_check_references, we have already checked that the PTEs
> pointing to this page does not have any access bit set. The current
> shrink_folio_list then unmaps the page, clears the dirty bit, issues
> the TLB flush if the dirty bit was set previously and then starts
> to write the page to the swap. I wonder why here we cannot take an
> opportunistic approach. My understanding is that if we don’t unmap the
> page and perform flush, when there is a concurrent write to the page,
> both the access bit and dirty bit will be set (because the dirty bit
> is cleared) so we can simply check the access bit again after pageout
> to see whether we can free this page or not.
>    I checked the git blame and saw Mel's commit in 2015 where he
> mentioned that it was better to assume a writeable entry exist
> in TLB but I wonder why this can be true if we have already use
> folio_check_references to check the PTE access bit. Does this imply
> even if the folio_check_references gives no reference there can be
> still entries in the TLB?

This is far outside my realm of expertise.  I suspect it's
possible that there can be stale entries in the TLB if
ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH.

> 2) I have some questions at the end of shrink_folio_list before the
> ref_folios are spliced again back to folio_list. What if at the same
> time, there is another function trying to free the page or mlock
> the page? Will this page be circulated again in the inactive LRU
> list and being double freed since the page lock was released at the
> list_splice? Because from what I understood, the mlock will take the
> page directly from the inactive/active list the page is in and move
> the page to the mlock list but at this moment the page does not reside
> in either of the list.

I think the answer is that these folios have their LRU flag cleared
throughout shrink_folio_list() so they cannot be mlocked?

static struct lruvec *__mlock_folio(struct folio *folio, struct lruvec *lruvec)
{
        /* There is nothing more we can do while it's off LRU */
        if (!folio_test_clear_lru(folio))
                return lruvec;



^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Some questions about shrink_folio_list
  2024-04-04 15:35 ` Some questions about shrink_folio_list Matthew Wilcox
@ 2024-04-04 21:32   ` yueyang.pan
  0 siblings, 0 replies; 2+ messages in thread
From: yueyang.pan @ 2024-04-04 21:32 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-mm

[-- Attachment #1: Type: text/plain, Size: 3687 bytes --]


________________________________
From: Matthew Wilcox <willy@infradead.org>
Sent: Thursday, April 4, 2024 08:35
To: yueyang.pan@epfl.ch <yueyang.pan@epfl.ch>
Cc: linux-mm@kvack.org <linux-mm@kvack.org>
Subject: Re: Some questions about shrink_folio_list

On Wed, Apr 03, 2024 at 10:06:16PM +0000, yueyang.pan@epfl.ch wrote:
> Dear Matthew,
>     I am Yueyang Pan a PhD student from EPFL, and I am currently
> checking the swap code in the kernel. Sorry to bother you because this
> email should go to Mel Gorman. He did not reply to me so I turned to
> you for some help.

Hi Yueyang,

You'd probably have more luck if you cc'd the mailing list.  Somebody
other than Mel might have answered you.  Added it now.
Thanks a lot for forwarding it for me.
> 1)  I have some questions about `try_to_unmap_flush_dirty`. I
> wonder why this is necessary in shrink_folio_list because in
> the folio_check_references, we have already checked that the PTEs
> pointing to this page does not have any access bit set. The current
> shrink_folio_list then unmaps the page, clears the dirty bit, issues
> the TLB flush if the dirty bit was set previously and then starts
> to write the page to the swap. I wonder why here we cannot take an
> opportunistic approach. My understanding is that if we don’t unmap the
> page and perform flush, when there is a concurrent write to the page,
> both the access bit and dirty bit will be set (because the dirty bit
> is cleared) so we can simply check the access bit again after pageout
> to see whether we can free this page or not.
>    I checked the git blame and saw Mel's commit in 2015 where he
> mentioned that it was better to assume a writeable entry exist
> in TLB but I wonder why this can be true if we have already use
> folio_check_references to check the PTE access bit. Does this imply
> even if the folio_check_references gives no reference there can be
> still entries in the TLB?

This is far outside my realm of expertise.  I suspect it's
possible that there can be stale entries in the TLB if
ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH.
If this is the case, I guess this TLB flush should be configurable depending on
whether this flag is set or not? I am not sure and probably more information is welcome.

> 2) I have some questions at the end of shrink_folio_list before the
> ref_folios are spliced again back to folio_list. What if at the same
> time, there is another function trying to free the page or mlock
> the page? Will this page be circulated again in the inactive LRU
> list and being double freed since the page lock was released at the
> list_splice? Because from what I understood, the mlock will take the
> page directly from the inactive/active list the page is in and move
> the page to the mlock list but at this moment the page does not reside
> in either of the list.

I think the answer is that these folios have their LRU flag cleared
throughout shrink_folio_list() so they cannot be mlocked?

static struct lruvec *__mlock_folio(struct folio *folio, struct lruvec *lruvec)
{
        /* There is nothing more we can do while it's off LRU */
        if (!folio_test_clear_lru(folio))
                return lruvec;
So I checked the code in the mlock.c. If I understand correctly, the call chain
is like this mlock_fixup -> mlock_vma_pages_range -> mlock_pte_range -> mlock_folio -> mlock_folio_batch -> __mlock_folio and mlock_vma_pages_range does not have return value. Does this mean in the case I mentioned (folio does not have a LRU flag set) even if the mlock return 0, it can still be the case that the page is not successfully locked in the memory?

[-- Attachment #2: Type: text/html, Size: 5967 bytes --]

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2024-04-04 21:32 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <AS8P189MB165601127DEE02A7FA9C2D11F53D2@AS8P189MB1656.EURP189.PROD.OUTLOOK.COM>
2024-04-04 15:35 ` Some questions about shrink_folio_list Matthew Wilcox
2024-04-04 21:32   ` yueyang.pan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox