linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Frank van der Linden <fvdl@google.com>
To: Charan Teja Kalla <quic_charante@quicinc.com>
Cc: akpm@linux-foundation.org, hughd@google.com, willy@infradead.org,
	 markhemm@googlemail.com, rientjes@google.com, surenb@google.com,
	 shakeelb@google.com, quic_pkondeti@quicinc.com,
	linux-mm@kvack.org,  linux-kernel@vger.kernel.org
Subject: Re: [PATCH V7 0/2] mm: shmem: support POSIX_FADV_[WILL|DONT]NEED for shmem files
Date: Thu, 13 Apr 2023 12:45:24 -0700	[thread overview]
Message-ID: <CAPTztWYgRORXKp83Spm3DX8qJsi1rw5s=WbPcjUYfOxFXxRAwg@mail.gmail.com> (raw)
In-Reply-To: <cover.1676378702.git.quic_charante@quicinc.com>

On Tue, Feb 14, 2023 at 4:53 AM Charan Teja Kalla
<quic_charante@quicinc.com> wrote:
>
> This patch aims to implement POSIX_FADV_WILLNEED and POSIX_FADV_DONTNEED
> advices to shmem files which can be helpful for the drivers who may want
> to manage the pages of shmem files on their own, like, that are created
> through shmem_file_setup[_with_mnt]().
>
> Changes in V7:
>  -- Use folio based interface, shmem_read_folio(), for FADV_WILLNEED.
>  -- Don't swap the SHM_LOCK'ed pages.
>
> Changes in V6:
>  -- Replaced the pages with folio's for shmem changes.
>  -- https://lore.kernel.org/all/cover.1675690847.git.quic_charante@quicinc.com/
>
> Changes in V5:
>  -- Moved the 'endbyte' calculations to a header function for use by shmem_fadvise().
>  -- Addressed comments from suren.
>  -- No changes in resend. Retested on the latest tip.
>  -- https://lore.kernel.org/all/cover.1648706231.git.quic_charante@quicinc.com/
>
> Changes in V4:
>   -- Changed the code to use reclaim_pages() to writeout the shmem pages to swap and then reclaim.
>   -- Addressed comments from Mark Hemment and Matthew.
>   -- fadvise() on shmem file may even unmap a page.
>   -- https://patchwork.kernel.org/project/linux-mm/patch/1644572051-24091-1-git-send-email-quic_charante@quicinc.com/
>
> Changes in V3:
>   -- Considered THP pages while doing FADVISE_[DONT|WILL]NEED, identified by Matthew.
>   -- xarray used properly, as identified by Matthew.
>   -- Excluded mapped pages as it requires unmapping and the man pages of fadvise don't talk about them.
>   -- RESEND: Fixed the compilation issue when CONFIG_TMPFS is not defined.
>   -- https://patchwork.kernel.org/project/linux-mm/patch/1641488717-13865-1-git-send-email-quic_charante@quicinc.com/
>
> Changes in V2:
>   -- Rearranged the code to not to sleep with rcu_lock while using xas_() functionality.
>   -- Addressed the comments from Suren.
>   -- https://patchwork.kernel.org/project/linux-mm/patch/1638442253-1591-1-git-send-email-quic_charante@quicinc.com/
>
> changes in V1:
>   -- Created the interface for fadvise(2) to work on shmem files.
>   -- https://patchwork.kernel.org/project/linux-mm/patch/1633701982-22302-1-git-send-email-charante@codeaurora.org/
>
>
> Charan Teja Kalla (2):
>   mm: fadvise: move 'endbyte' calculations to helper function
>   mm: shmem: implement POSIX_FADV_[WILL|DONT]NEED for shmem
>
>  mm/fadvise.c  |  11 +-----
>  mm/internal.h |  21 +++++++++++
>  mm/shmem.c    | 116 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 138 insertions(+), 10 deletions(-)
>
> --
> 2.7.4
>
>

I didn't see this patch before, so I looked a bit at the history. At
some point, in v3, dealing with mapped pages for DONTNEED was left
out, they are now skipped. Unfortunately, that makes this patch no
longer usable for a case that we have: restoring the (approximate)
swap state of a tmpfs file. This involves walking a potentially large
number of regions, and explicitly pushing them out to swap. This can
be used to e.g. restore the state VM memory that is backed by a tmpfs
file, avoiding memory usage by cold VM pages after resume.

If DONTNEED also reclaims mapped pages (e.g. they get pushed out to
swap, if any), implementing the above use case efficiently is simple:
use io_uring with a vector that contains each region and the fadvise
method.

Without DONTNEED reclaiming mapped pages, you'd have to do mmap +
madvise(MADV_PAGEOUT) for each region that you want swapped out, which
is rather inefficient.

I understand that the semantics for POSIX_FADV_DONTNEED on shmem/tmpfs
files are open to interpretation, as it is a special case. And you can
certainly make the argument that relying on behavior caused by what
can be considered an implementation detail is bad.

So, is there any way we can make this use case work efficiently using
this patch?

You state in the commit message:

> So, FADV_DONTNEED also covers the semantics of MADV_PAGEOUT for file pages
> and there is no purpose of PAGEOUT for file pages.

But that doesn't seem correct: for shmem file pages, there actually
can be a purpose, and the FADV_DONTNEED as implemented for shmem in
this patch set does not cover the semantics.

You can say that it doesn't need to cover the pageout case of mapped
shmem pages, and that's fair. But I don't think you can claim that it
covers the case as currently implemented.

I suppose there are three options here:

1) Do nothing, this use case will just have to spend more time doing
mmap+madvise
2) Don't skip mapped pages for POSIX_FADV_DONTNEED in shmem_fadvise
3) Implement something like POSIX_FADV_PAGEOUT_NP, which would include
mapped pages.

What do people think?

- Frank


  parent reply	other threads:[~2023-04-13 19:45 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-14 12:51 Charan Teja Kalla
2023-02-14 12:51 ` [PATCH V7 1/2] mm: fadvise: move 'endbyte' calculations to helper function Charan Teja Kalla
2023-02-14 12:51 ` [PATCH V7 2/2] mm: shmem: implement POSIX_FADV_[WILL|DONT]NEED for shmem Charan Teja Kalla
2023-04-06 23:44   ` Minchan Kim
2023-04-10 13:52     ` Charan Teja Kalla
2023-04-11  3:42       ` Andrew Morton
2023-04-21  0:07   ` Hugh Dickins
2023-04-24 15:04     ` Charan Teja Kalla
2023-05-17 11:32       ` Hugh Dickins
2023-05-18 12:46         ` Charan Teja Kalla
2024-02-14  9:13           ` Charan Teja Kalla
2024-02-20  5:10             ` Hugh Dickins
2023-03-28 22:50 ` [PATCH V7 0/2] mm: shmem: support POSIX_FADV_[WILL|DONT]NEED for shmem files Andrew Morton
2023-03-29 21:36   ` Suren Baghdasaryan
2023-04-13 19:45 ` Frank van der Linden [this message]
2023-04-14 17:44   ` Frank van der Linden
2023-04-14 19:10     ` Charan Teja Kalla
2023-04-14 22:02       ` Frank van der Linden
2023-04-17  6:11         ` Charan Teja Kalla
2023-04-18 17:29           ` Frank van der Linden
2023-04-19  4:19             ` Pavan Kondeti
2023-04-19 17:27               ` Frank van der Linden
2023-04-19 14:39             ` Charan Teja Kalla
2023-04-19 17:29               ` Frank van der Linden

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAPTztWYgRORXKp83Spm3DX8qJsi1rw5s=WbPcjUYfOxFXxRAwg@mail.gmail.com' \
    --to=fvdl@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=markhemm@googlemail.com \
    --cc=quic_charante@quicinc.com \
    --cc=quic_pkondeti@quicinc.com \
    --cc=rientjes@google.com \
    --cc=shakeelb@google.com \
    --cc=surenb@google.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox