From: Hugh Dickins <hughd@google.com>
To: Charan Teja Kalla <quic_charante@quicinc.com>
Cc: Hugh Dickins <hughd@google.com>,
Andrew Morton <akpm@linux-foundation.org>,
willy@infradead.org, rientjes@google.com, surenb@google.com,
fvdl@google.com, quic_pkondeti@quicinc.com, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, Minchan Kim <minchan@kernel.org>
Subject: Re: [PATCH V7 2/2] mm: shmem: implement POSIX_FADV_[WILL|DONT]NEED for shmem
Date: Mon, 19 Feb 2024 21:10:13 -0800 (PST) [thread overview]
Message-ID: <c3e3117d-e752-9a34-0000-05811c7327e2@google.com> (raw)
In-Reply-To: <64ed46f4-459c-63b0-a69e-81353e9fcbc9@quicinc.com>
[-- Attachment #1: Type: text/plain, Size: 4571 bytes --]
On Wed, 14 Feb 2024, Charan Teja Kalla wrote:
> Hello Hugh,
>
> Based on offline discussion with some folks in the list, it seems that
> this syscall can be helpful. This patch might have forgotten and I hope
> this ping helps in resurrecting this thread.
Charan, it's not forgotten, but it was relayed to you through another
channel a month ago, that I did not expect to have time to think about
this for 3 months. Countdown says 2 months to go now.
I realize that it's frustrating for you; it's unpleasant for me too.
>
> On 5/18/2023 6:16 PM, Charan Teja Kalla wrote:
> > On 5/17/2023 5:02 PM, Hugh Dickins wrote:
> >>> Sure, will include those range calculations for shmem pages too.
> >> Oh, I forgot this issue, you would have liked me to look at V8 by now,
> >> to see whether I agree with your resolution there. Sorry, no, I've
> >> not been able to divert my concentration to it yet.
> >>
> >> And it's quite likely that I shall disagree, because I've a history of
> >> disagreeing even with myself on such range widening/narrowing issues -
> >> reconciling conflicting precedents is difficult 🙁
> >>
> > If you can at least help by commenting which part of the patch you
> > disagree with, I can try hard to convince you there:) .
> >
> >>> Please let me know if I'm missing something where I should be counting
> >>> these as NR_ISOLATED.
> >> Please grep for NR_ISOLATED, to see where and how they get manipulated
> >> already, and follow the existing examples. The case that sticks in my
> >> mind is in mm/mempolicy.c, where the migrate_pages() syscall can build
> >> up a gigantic quantity of transiently isolated pages: your syscall can
> >> do the same, so should account for itself in the same way.
>
> Based on the grep, it seems almost all the call stacks that isolates the
> folios is for migrating the pages where after migration the NR_ISOLATED
> is decremented (in migrate_folio_done()). The call paths are(compaction,
> memory hotplug, mempolicy).
>
> The another call path is reclaim where we isolate 'nr' pages belongs to
> a pgdat, account/unaccount them in NR_ISOLATED across the reclaim.
>
> I think it is easy to account for the above call paths as we know "which
> folio corresponds to which pgdat".
>
> Where as in this patch, we are isolating a set of folios(can corresponds
> to different nodes) and relying on the reclaim_pages() to do the swap
> out. It is straightforward to account NR_ISOLATED while isolating, but
> it requires unaccounting changes in the shrink_folio_list() where folio
> is being freed after swap out. Doing so requires changes in all the
> code places(eg: shrink_inactive_list()), where it now requires to
> account NR_ISOLATED while isolating and the shrink_folio_list()
> unaccounts it.
>
> So, accounting NR_ISOLATED requires changes in other code places where
> this patch has not touched.
That surprises me, though I do recall that there's an irritating
asymmetry in where NR_ISOLATED is accounted and unaccounted.
I have not checked what you say there, may do so in 2 months.
>
> If isolating a large amount of pages and not being recorded in
> NR_ISOLATED is really a problem, then may I please know your opinion on
> isolating(with out accounting) and reclaiming in small batches? The
> batch size can be considered as SWAP_CLUSTER_MAX of pages.
In most circumstances, omitting to account NR_ISOLATED wouldn't show
up as a problem; in low memory it would. Splitting into small batches
without accounting might be an easier and better way; but whatever I
say in a hurried unthoughtful reply is likely to be wrong.
I am not convinced that isolating is even appropriate: I think I
hinted before that I would want to compare what you do here with what
shmem_swapin_range() does in mm/madvise.c, and the shmem_collapse_swapin()
I'll be proposing to avoid swapin while building up THP in collapse_file().
But it may well be that you've found the switching of LRUs essential:
I'm not prejudging, just saying I cannot rush to judgment. And this
is also a new UAPI for tmpfs, so should not be rushed then regretted.
But if you can find another champion to force this into mm/shmem.c
for you faster than I can manage, well, I don't own any Linux source.
It's not unusual for me to limp along later and rearrange things to
suit my preference.
Hugh
>
> > I had a V8 posted without this into accounting. Let me make the changes
> > to account for the NR_ISOLATED too.
>
> Thanks,
> Charan
next prev parent reply other threads:[~2024-02-20 5:10 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-02-14 12:51 [PATCH V7 0/2] mm: shmem: support POSIX_FADV_[WILL|DONT]NEED for shmem files Charan Teja Kalla
2023-02-14 12:51 ` [PATCH V7 1/2] mm: fadvise: move 'endbyte' calculations to helper function Charan Teja Kalla
2023-02-14 12:51 ` [PATCH V7 2/2] mm: shmem: implement POSIX_FADV_[WILL|DONT]NEED for shmem Charan Teja Kalla
2023-04-06 23:44 ` Minchan Kim
2023-04-10 13:52 ` Charan Teja Kalla
2023-04-11 3:42 ` Andrew Morton
2023-04-21 0:07 ` Hugh Dickins
2023-04-24 15:04 ` Charan Teja Kalla
2023-05-17 11:32 ` Hugh Dickins
2023-05-18 12:46 ` Charan Teja Kalla
2024-02-14 9:13 ` Charan Teja Kalla
2024-02-20 5:10 ` Hugh Dickins [this message]
2023-03-28 22:50 ` [PATCH V7 0/2] mm: shmem: support POSIX_FADV_[WILL|DONT]NEED for shmem files Andrew Morton
2023-03-29 21:36 ` Suren Baghdasaryan
2023-04-13 19:45 ` Frank van der Linden
2023-04-14 17:44 ` Frank van der Linden
2023-04-14 19:10 ` Charan Teja Kalla
2023-04-14 22:02 ` Frank van der Linden
2023-04-17 6:11 ` Charan Teja Kalla
2023-04-18 17:29 ` Frank van der Linden
2023-04-19 4:19 ` Pavan Kondeti
2023-04-19 17:27 ` Frank van der Linden
2023-04-19 14:39 ` Charan Teja Kalla
2023-04-19 17:29 ` Frank van der Linden
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c3e3117d-e752-9a34-0000-05811c7327e2@google.com \
--to=hughd@google.com \
--cc=akpm@linux-foundation.org \
--cc=fvdl@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=minchan@kernel.org \
--cc=quic_charante@quicinc.com \
--cc=quic_pkondeti@quicinc.com \
--cc=rientjes@google.com \
--cc=surenb@google.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox