linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Baolin Wang <baolin.wang@linux.alibaba.com>,
	yangge1116 <yangge1116@126.com>,
	akpm@linux-foundation.org, Matthew Wilcox <willy@infradead.org>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, liuzixing@hygon.cn
Subject: Re: [PATCH] mm/gup: don't check page lru flag before draining it
Date: Wed, 5 Jun 2024 14:20:28 +0200	[thread overview]
Message-ID: <697a9bc2-a655-4035-aa5e-7d3acb23e79d@redhat.com> (raw)
In-Reply-To: <11ef3deb-d1e3-46d5-97ed-9ba3c1fbbba9@redhat.com>

On 05.06.24 13:41, David Hildenbrand wrote:
> On 05.06.24 13:37, Baolin Wang wrote:
>>
>>
>> On 2024/6/5 17:53, David Hildenbrand wrote:
>>> On 05.06.24 11:41, David Hildenbrand wrote:
>>>> On 05.06.24 03:18, yangge1116 wrote:
>>>>>
>>>>>
>>>>> 在 2024/6/4 下午9:47, David Hildenbrand 写道:
>>>>>> On 04.06.24 12:48, yangge1116@126.com wrote:
>>>>>>> From: yangge <yangge1116@126.com>
>>>>>>>
>>>>>>> If a page is added in pagevec, its ref count increases one, remove
>>>>>>> the page from pagevec decreases one. Page migration requires the
>>>>>>> page is not referenced by others except page mapping. Before
>>>>>>> migrating a page, we should try to drain the page from pagevec in
>>>>>>> case the page is in it, however, folio_test_lru() is not sufficient
>>>>>>> to tell whether the page is in pagevec or not, if the page is in
>>>>>>> pagevec, the migration will fail.
>>>>>>>
>>>>>>> Remove the condition and drain lru once to ensure the page is not
>>>>>>> referenced by pagevec.
>>>>>>
>>>>>> What you are saying is that we might have a page on which
>>>>>> folio_test_lru() succeeds, that was added to one of the cpu_fbatches,
>>>>>> correct?
>>>>>
>>>>> Yes
>>>>>
>>>>>>
>>>>>> Can you describe under which circumstances that happens?
>>>>>>
>>>>>
>>>>> If we call folio_activate() to move a page from inactive LRU list to
>>>>> active LRU list, the page is not only in LRU list, but also in one of
>>>>> the cpu_fbatches.
>>>>>
>>>>> void folio_activate(struct folio *folio)
>>>>> {
>>>>>          if (folio_test_lru(folio) && !folio_test_active(folio) &&
>>>>>              !folio_test_unevictable(folio)) {
>>>>>              struct folio_batch *fbatch;
>>>>>
>>>>>              folio_get(folio);
>>>>>              //After this, folio is in LRU list, and its ref count have
>>>>> increased one.
>>>>>
>>>>>              local_lock(&cpu_fbatches.lock);
>>>>>              fbatch = this_cpu_ptr(&cpu_fbatches.activate);
>>>>>              folio_batch_add_and_move(fbatch, folio, folio_activate_fn);
>>>>>              local_unlock(&cpu_fbatches.lock);
>>>>>          }
>>>>> }
>>>>
>>>> Interesting, the !SMP variant does the folio_test_clear_lru().
>>>>
>>>> It would be really helpful if we could reliably identify whether LRU
>>>> batching code has a raised reference on a folio.
>>>>
>>>> We have the same scenario in
>>>> * folio_deactivate()
>>>> * folio_mark_lazyfree()
>>>>
>>>> In folio_batch_move_lru() we do the folio_test_clear_lru(folio).
>>>>
>>>> No expert on that code, I'm wondering if we could move the
>>>> folio_test_clear_lru() out, such that we can more reliably identify
>>>> whether a folio is on the LRU batch or not.
>>>
>>> I'm sure there would be something extremely broken with the following
>>> (I don't know what I'm doing ;) ), but I wonder if there would be a way
>>> to make something like that work (and perform well enough?).
>>>
>>> diff --git a/mm/swap.c b/mm/swap.c
>>> index 67786cb771305..642e471c3ec5a 100644
>>> --- a/mm/swap.c
>>> +++ b/mm/swap.c
>>> @@ -212,10 +212,6 @@ static void folio_batch_move_lru(struct folio_batch
>>> *fbatch, move_fn_t move_fn)
>>>            for (i = 0; i < folio_batch_count(fbatch); i++) {
>>>                    struct folio *folio = fbatch->folios[i];
>>>
>>> -               /* block memcg migration while the folio moves between
>>> lru */
>>> -               if (move_fn != lru_add_fn && !folio_test_clear_lru(folio))
>>> -                       continue;
>>> -
>>>                    folio_lruvec_relock_irqsave(folio, &lruvec, &flags);
>>>                    move_fn(lruvec, folio);
>>>
>>> @@ -255,8 +251,9 @@ static void lru_move_tail_fn(struct lruvec *lruvec,
>>> struct folio *folio)
>>>      */
>>>     void folio_rotate_reclaimable(struct folio *folio)
>>>     {
>>> -       if (!folio_test_locked(folio) && !folio_test_dirty(folio) &&
>>> -           !folio_test_unevictable(folio) && folio_test_lru(folio)) {
>>> +       if (folio_test_lru(folio) && !folio_test_locked(folio) &&
>>> +           !folio_test_dirty(folio) && !folio_test_unevictable(folio) &&
>>> +           folio_test_clear_lru(folio)) {
>>>                    struct folio_batch *fbatch;
>>>                    unsigned long flags;
>>>
>>> @@ -354,7 +351,7 @@ static void folio_activate_drain(int cpu)
>>>     void folio_activate(struct folio *folio)
>>>     {
>>>            if (folio_test_lru(folio) && !folio_test_active(folio) &&
>>> -           !folio_test_unevictable(folio)) {
>>> +           !folio_test_unevictable(folio) &&
>>> folio_test_clear_lru(folio)) {
>>
>> IMO, this seems violate the semantics of the LRU flag, since it's clear
>> that this folio is still in the LRU list.
> 
> Good point.
> 
> But regarding "violation": we already do clear the flag temporarily in
> there, so it's rather that we make the visible time where it is cleared
> "longer". (yes, it can be much longer :) )

Some random thoughts about some folio_test_lru() users:

mm/khugepaged.c: skips pages if !folio_test_lru(), but would fail skip 
it either way if there is the unexpected reference from the LRU batch!

mm/compaction.c: skips pages if !folio_test_lru(), but would fail skip 
it either way if there is the unexpected reference from the LRU batch!

mm/memory.c: would love to identify this case and to a lru_add_drain() 
to free up that reference.

mm/huge_memory.c: splitting with the additional reference will fail 
already. Maybe we'd want to drain the LRU batch.

mm/madvise.c: skips pages if !folio_test_lru(). I wonder what happens if 
we have the same page twice in an LRU batch with different target goals ...


Some other users (there are not that many that don't use it for sanity 
checks though) might likely be a bit different.

-- 
Cheers,

David / dhildenb



  reply	other threads:[~2024-06-05 12:20 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-04 10:48 yangge1116
2024-06-04 13:47 ` David Hildenbrand
2024-06-05  1:18   ` yangge1116
2024-06-05  9:41     ` David Hildenbrand
2024-06-05  9:53       ` David Hildenbrand
2024-06-05 11:37         ` Baolin Wang
2024-06-05 11:41           ` David Hildenbrand
2024-06-05 12:20             ` David Hildenbrand [this message]
2024-06-06  1:57               ` Baolin Wang
2024-06-06  7:56                 ` David Hildenbrand
2024-06-08  4:38                   ` yangge1116
2024-06-08 15:15                     ` Matthew Wilcox
2024-06-08 16:03                       ` David Hildenbrand
2024-06-11 11:20                         ` yangge1116
2024-06-12  7:32                           ` David Hildenbrand
2024-06-15 11:44                             ` yangge1116
2024-06-17  9:50                             ` yangge1116
2024-06-17  9:52                               ` David Hildenbrand
2024-06-17 11:22                                 ` yangge1116
2024-06-06  1:35         ` yangge1116
2024-06-06  7:39           ` David Hildenbrand
2024-06-06  8:50             ` yangge1116
  -- strict thread matches above, loose matches on Subject: below --
2024-06-04  8:09 yangge1116
2024-06-04  8:56 ` Baolin Wang
2024-06-04  9:18   ` yangge1116

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=697a9bc2-a655-4035-aa5e-7d3acb23e79d@redhat.com \
    --to=david@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=liuzixing@hygon.cn \
    --cc=willy@infradead.org \
    --cc=yangge1116@126.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox