linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Hugh Dickins <hughd@google.com>
To: Charan Teja Kalla <quic_charante@quicinc.com>
Cc: Hugh Dickins <hughd@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	 willy@infradead.org, rientjes@google.com, surenb@google.com,
	 fvdl@google.com, quic_pkondeti@quicinc.com, linux-mm@kvack.org,
	 linux-kernel@vger.kernel.org, Minchan Kim <minchan@kernel.org>
Subject: Re: [PATCH V7 2/2] mm: shmem: implement POSIX_FADV_[WILL|DONT]NEED for shmem
Date: Mon, 19 Feb 2024 21:10:13 -0800 (PST)	[thread overview]
Message-ID: <c3e3117d-e752-9a34-0000-05811c7327e2@google.com> (raw)
In-Reply-To: <64ed46f4-459c-63b0-a69e-81353e9fcbc9@quicinc.com>

[-- Attachment #1: Type: text/plain, Size: 4571 bytes --]

On Wed, 14 Feb 2024, Charan Teja Kalla wrote:

> Hello Hugh,
> 
> Based on offline discussion with some folks in the list, it seems that
> this syscall can be helpful. This patch might have forgotten and I hope
> this ping helps in resurrecting this thread.

Charan, it's not forgotten, but it was relayed to you through another
channel a month ago, that I did not expect to have time to think about
this for 3 months.  Countdown says 2 months to go now.

I realize that it's frustrating for you; it's unpleasant for me too.

> 
> On 5/18/2023 6:16 PM, Charan Teja Kalla wrote:
> > On 5/17/2023 5:02 PM, Hugh Dickins wrote:
> >>> Sure, will include those range calculations for shmem pages too.
> >> Oh, I forgot this issue, you would have liked me to look at V8 by now,
> >> to see whether I agree with your resolution there.  Sorry, no, I've
> >> not been able to divert my concentration to it yet.
> >>
> >> And it's quite likely that I shall disagree, because I've a history of
> >> disagreeing even with myself on such range widening/narrowing issues -
> >> reconciling conflicting precedents is difficult 🙁
> >>
> > If you can at least help by commenting which part of the patch you
> > disagree with, I can try hard to convince you there:) .
> > 
> >>> Please let me know if I'm missing something where I should be counting
> >>> these as NR_ISOLATED.
> >> Please grep for NR_ISOLATED, to see where and how they get manipulated
> >> already, and follow the existing examples.  The case that sticks in my
> >> mind is in mm/mempolicy.c, where the migrate_pages() syscall can build
> >> up a gigantic quantity of transiently isolated pages: your syscall can
> >> do the same, so should account for itself in the same way.
> 
> Based on the grep, it seems almost all the call stacks that isolates the
> folios is for migrating the pages where after migration the NR_ISOLATED
> is decremented (in migrate_folio_done()). The call paths are(compaction,
> memory hotplug, mempolicy).
> 
> The another call path is reclaim where we isolate 'nr' pages belongs to
> a pgdat, account/unaccount them in NR_ISOLATED across the reclaim.
> 
> I think it is easy to account for the above call paths as we know "which
> folio corresponds to which pgdat".
> 
> Where as in this patch, we are isolating a set of folios(can corresponds
> to different nodes) and relying on the reclaim_pages() to do the swap
> out. It is straightforward to account NR_ISOLATED while isolating, but
> it requires unaccounting changes in the shrink_folio_list() where folio
> is being freed after swap out.  Doing so requires changes in all the
> code places(eg: shrink_inactive_list()), where it now requires to
> account NR_ISOLATED while isolating and the shrink_folio_list()
> unaccounts it.
> 
> So, accounting NR_ISOLATED requires changes in other code places where
> this patch has not touched.

That surprises me, though I do recall that there's an irritating
asymmetry in where NR_ISOLATED is accounted and unaccounted.
I have not checked what you say there, may do so in 2 months.

> 
> If isolating a large amount of pages and not being recorded in
> NR_ISOLATED is really a problem, then may I please know your opinion on
> isolating(with out accounting) and reclaiming in small batches? The
> batch size can be considered as SWAP_CLUSTER_MAX of pages.

In most circumstances, omitting to account NR_ISOLATED wouldn't show
up as a problem; in low memory it would.  Splitting into small batches
without accounting might be an easier and better way; but whatever I
say in a hurried unthoughtful reply is likely to be wrong.

I am not convinced that isolating is even appropriate: I think I
hinted before that I would want to compare what you do here with what
shmem_swapin_range() does in mm/madvise.c, and the shmem_collapse_swapin()
I'll be proposing to avoid swapin while building up THP in collapse_file().

But it may well be that you've found the switching of LRUs essential:
I'm not prejudging, just saying I cannot rush to judgment.  And this
is also a new UAPI for tmpfs, so should not be rushed then regretted.

But if you can find another champion to force this into mm/shmem.c
for you faster than I can manage, well, I don't own any Linux source.
It's not unusual for me to limp along later and rearrange things to
suit my preference.

Hugh

> 
> > I had a V8 posted without this into accounting. Let me make the changes
> > to account for the NR_ISOLATED too.
> 
> Thanks,
> Charan

  reply	other threads:[~2024-02-20  5:10 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-14 12:51 [PATCH V7 0/2] mm: shmem: support POSIX_FADV_[WILL|DONT]NEED for shmem files Charan Teja Kalla
2023-02-14 12:51 ` [PATCH V7 1/2] mm: fadvise: move 'endbyte' calculations to helper function Charan Teja Kalla
2023-02-14 12:51 ` [PATCH V7 2/2] mm: shmem: implement POSIX_FADV_[WILL|DONT]NEED for shmem Charan Teja Kalla
2023-04-06 23:44   ` Minchan Kim
2023-04-10 13:52     ` Charan Teja Kalla
2023-04-11  3:42       ` Andrew Morton
2023-04-21  0:07   ` Hugh Dickins
2023-04-24 15:04     ` Charan Teja Kalla
2023-05-17 11:32       ` Hugh Dickins
2023-05-18 12:46         ` Charan Teja Kalla
2024-02-14  9:13           ` Charan Teja Kalla
2024-02-20  5:10             ` Hugh Dickins [this message]
2023-03-28 22:50 ` [PATCH V7 0/2] mm: shmem: support POSIX_FADV_[WILL|DONT]NEED for shmem files Andrew Morton
2023-03-29 21:36   ` Suren Baghdasaryan
2023-04-13 19:45 ` Frank van der Linden
2023-04-14 17:44   ` Frank van der Linden
2023-04-14 19:10     ` Charan Teja Kalla
2023-04-14 22:02       ` Frank van der Linden
2023-04-17  6:11         ` Charan Teja Kalla
2023-04-18 17:29           ` Frank van der Linden
2023-04-19  4:19             ` Pavan Kondeti
2023-04-19 17:27               ` Frank van der Linden
2023-04-19 14:39             ` Charan Teja Kalla
2023-04-19 17:29               ` Frank van der Linden

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c3e3117d-e752-9a34-0000-05811c7327e2@google.com \
    --to=hughd@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=fvdl@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan@kernel.org \
    --cc=quic_charante@quicinc.com \
    --cc=quic_pkondeti@quicinc.com \
    --cc=rientjes@google.com \
    --cc=surenb@google.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox