From: Charan Teja Kalla <quic_charante@quicinc.com>
To: Frank van der Linden <fvdl@google.com>
Cc: <akpm@linux-foundation.org>, <hughd@google.com>,
<willy@infradead.org>, <markhemm@googlemail.com>,
<rientjes@google.com>, <surenb@google.com>, <shakeelb@google.com>,
<quic_pkondeti@quicinc.com>, <linux-mm@kvack.org>,
<linux-kernel@vger.kernel.org>
Subject: Re: [PATCH V7 0/2] mm: shmem: support POSIX_FADV_[WILL|DONT]NEED for shmem files
Date: Wed, 19 Apr 2023 20:09:36 +0530 [thread overview]
Message-ID: <4b8b370b-fc8e-0c5f-910d-552ccebf0e08@quicinc.com> (raw)
In-Reply-To: <20230418172942.740769-1-fvdl@google.com>
Hi Frank,
Lets start a separate thread to add the support for mapped pages. I
think one question that can come while review is: "what is the overhead
an application has in collecting the memory regions and issuing the
MADV_PAGEOUT, that made to add this support?". Let me try to get details
for this from my side too.
BTW, thanks for this POC code!!
Thanks,
Charan
On 4/18/2023 10:59 PM, Frank van der Linden wrote:
> Below is a quick patch to allow FADVISE_DONTNEED for shmem to reclaim
> mapped pages too. This would fit our usecase, and matches MADV_PAGEOUT
> more closely.
>
> The patch series as posted skips mapped pages even if you remove
> the folio_mapped() check, because page_referenced() in
> shrink_page_list() will find page tables with the page mapped,
> and ignore_references is not set when called from reclaim_pages().
>
> You can make this work in a similar fashion to MADV_PAGEOUT by
> first unmapping a page, but only if the mapping belongs to
> the caller. You just have to change the check for "is there
> only one mapping and am I the owner". To do that, change a few
> lines in try_to_unmap to allow for checking the mm the mapping
> belongs to, and pass in current->mm (NULL still unmaps all mappings).
>
> I lightly tested this in a somewhat older codebase, so the diff
> below isn't fully tested. But if there are no objections to
> this approach, we could add it on top of your patchset after
> better testing.
>
> - Frank
>
> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
> index b87d01660412..4403cc2ccc4c 100644
> --- a/include/linux/rmap.h
> +++ b/include/linux/rmap.h
> @@ -368,6 +368,8 @@ int folio_referenced(struct folio *, int is_locked,
>
> void try_to_migrate(struct folio *folio, enum ttu_flags flags);
> void try_to_unmap(struct folio *, enum ttu_flags flags);
> +void try_to_unmap_mm(struct mm_struct *mm, struct folio *folio,
> + enum ttu_flags flags);
>
> int make_device_exclusive_range(struct mm_struct *mm, unsigned long start,
> unsigned long end, struct page **pages,
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 8632e02661ac..4d30e8f5afe2 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1443,6 +1443,11 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma,
> munlock_vma_folio(folio, vma, compound);
> }
>
> +struct unmap_arg {
> + enum ttu_flags flags;
> + struct mm_struct *mm;
> +};
> +
> /*
> * @arg: enum ttu_flags will be passed to this argument
> */
> @@ -1455,7 +1460,11 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
> struct page *subpage;
> bool anon_exclusive, ret = true;
> struct mmu_notifier_range range;
> - enum ttu_flags flags = (enum ttu_flags)(long)arg;
> + struct unmap_arg *uap = (struct unmap_arg *)arg;
> + enum ttu_flags flags = uap->flags;
> +
> + if (uap->mm && uap->mm != mm)
> + return true;
>
> /*
> * When racing against e.g. zap_pte_range() on another cpu,
> @@ -1776,6 +1785,7 @@ static int folio_not_mapped(struct folio *folio)
>
> /**
> * try_to_unmap - Try to remove all page table mappings to a folio.
> + * @mm: mm to unmap from (NULL to unmap from all)
> * @folio: The folio to unmap.
> * @flags: action and flags
> *
> @@ -1785,11 +1795,16 @@ static int folio_not_mapped(struct folio *folio)
> *
> * Context: Caller must hold the folio lock.
> */
> -void try_to_unmap(struct folio *folio, enum ttu_flags flags)
> +void try_to_unmap_mm(struct mm_struct *mm, struct folio *folio,
> + enum ttu_flags flags)
> {
> + struct unmap_arg ua = {
> + .flags = flags,
> + .mm = mm,
> + };
> struct rmap_walk_control rwc = {
> .rmap_one = try_to_unmap_one,
> - .arg = (void *)flags,
> + .arg = (void *)&ua,
> .done = folio_not_mapped,
> .anon_lock = folio_lock_anon_vma_read,
> };
> @@ -1800,6 +1815,11 @@ void try_to_unmap(struct folio *folio, enum ttu_flags flags)
> rmap_walk(folio, &rwc);
> }
>
> +void try_to_unmap(struct folio *folio, enum ttu_flags flags)
> +{
> + try_to_unmap_mm(NULL, folio, flags);
> +}
> +
> /*
> * @arg: enum ttu_flags will be passed to this argument.
> *
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 1af85259b6fc..b24af2fb3378 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -2362,8 +2362,24 @@ static void shmem_isolate_pages_range(struct address_space *mapping, loff_t star
>
> if (!folio_try_get(folio))
> continue;
> - if (folio_test_unevictable(folio) || folio_mapped(folio) ||
> - folio_isolate_lru(folio)) {
> +
> + if (folio_test_unevictable(folio)) {
> + folio_put(folio);
> + continue;
> + }
> +
> + /*
> + * If the folio is mapped once, try to unmap it from the
> + * caller's page table. If it's still mapped afterwards,
> + * it belongs to someone else, and we're not going to
> + * change someone else's mapping.
> + */
> + if (folio_mapcount(folio) == 1 && folio_trylock(folio)) {
> + try_to_unmap_mm(current->mm, folio, TTU_BATCH_FLUSH);
> + folio_unlock(folio);
> + }
> +
> + if (folio_mapped(folio) || folio_isolate_lru(folio)) {
> folio_put(folio);
> continue;
> }
> @@ -2383,6 +2399,8 @@ static void shmem_isolate_pages_range(struct address_space *mapping, loff_t star
> }
> }
> rcu_read_unlock();
> +
> + try_to_unmap_flush();
> }
>
> static int shmem_fadvise_dontneed(struct address_space *mapping, loff_t start,
next prev parent reply other threads:[~2023-04-19 14:39 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-02-14 12:51 Charan Teja Kalla
2023-02-14 12:51 ` [PATCH V7 1/2] mm: fadvise: move 'endbyte' calculations to helper function Charan Teja Kalla
2023-02-14 12:51 ` [PATCH V7 2/2] mm: shmem: implement POSIX_FADV_[WILL|DONT]NEED for shmem Charan Teja Kalla
2023-04-06 23:44 ` Minchan Kim
2023-04-10 13:52 ` Charan Teja Kalla
2023-04-11 3:42 ` Andrew Morton
2023-04-21 0:07 ` Hugh Dickins
2023-04-24 15:04 ` Charan Teja Kalla
2023-05-17 11:32 ` Hugh Dickins
2023-05-18 12:46 ` Charan Teja Kalla
2024-02-14 9:13 ` Charan Teja Kalla
2024-02-20 5:10 ` Hugh Dickins
2023-03-28 22:50 ` [PATCH V7 0/2] mm: shmem: support POSIX_FADV_[WILL|DONT]NEED for shmem files Andrew Morton
2023-03-29 21:36 ` Suren Baghdasaryan
2023-04-13 19:45 ` Frank van der Linden
2023-04-14 17:44 ` Frank van der Linden
2023-04-14 19:10 ` Charan Teja Kalla
2023-04-14 22:02 ` Frank van der Linden
2023-04-17 6:11 ` Charan Teja Kalla
2023-04-18 17:29 ` Frank van der Linden
2023-04-19 4:19 ` Pavan Kondeti
2023-04-19 17:27 ` Frank van der Linden
2023-04-19 14:39 ` Charan Teja Kalla [this message]
2023-04-19 17:29 ` Frank van der Linden
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4b8b370b-fc8e-0c5f-910d-552ccebf0e08@quicinc.com \
--to=quic_charante@quicinc.com \
--cc=akpm@linux-foundation.org \
--cc=fvdl@google.com \
--cc=hughd@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=markhemm@googlemail.com \
--cc=quic_pkondeti@quicinc.com \
--cc=rientjes@google.com \
--cc=shakeelb@google.com \
--cc=surenb@google.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox