Re: [PATCH] mm: split thp synchronously on MADV_DONTNEED

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Shakeel Butt <shakeelb@google.com>
To: David Hildenbrand <david@redhat.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
	Yang Shi <shy828301@gmail.com>,  Zi Yan <ziy@nvidia.com>,
	Matthew Wilcox <willy@infradead.org>,
	 Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org,  linux-kernel@vger.kernel.org
Subject: Re: [PATCH] mm: split thp synchronously on MADV_DONTNEED
Date: Mon, 22 Nov 2021 10:40:54 -0800	[thread overview]
Message-ID: <CALvZod5L1C1DV_DVs9O3xZm6CJnriunAoj89YLDdCp7ef5yBxA@mail.gmail.com> (raw)
In-Reply-To: <25b36a5c-5bbd-5423-0c67-05cd6c1432a7@redhat.com>

On Mon, Nov 22, 2021 at 12:32 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 20.11.21 21:12, Shakeel Butt wrote:
> > Many applications do sophisticated management of their heap memory for
> > better performance but with low cost. We have a bunch of such
> > applications running on our production and examples include caching and
> > data storage services. These applications keep their hot data on the
> > THPs for better performance and release the cold data through
> > MADV_DONTNEED to keep the memory cost low.
> >
> > The kernel defers the split and release of THPs until there is memory
> > pressure. This causes complicates the memory management of these
> > sophisticated applications which then needs to look into low level
> > kernel handling of THPs to better gauge their headroom for expansion.
>
> Can you elaborate a bit on that point? What exactly does such an
> application do? I would have assumed that it's mostly transparent for
> applications.
>

The application monitors its cgroup usage to decide if it can expand
the memory footprint or release some (unneeded/cold) buffer. It
releases madvise(MADV_DONTNEED) to release the memory which basically
puts the THP into defer list. These deferred THPs are still charged to
the cgroup which leads to bloated usage read by the application and
making wrong decisions. Internally we added a cgroup interface to
trigger the split of deferred THPs for that cgroup but this is hacky
and exposing kernel internals to users. I want to solve this problem
in a more general way for the users.

> > In
> > addition these applications are very latency sensitive and would prefer
> > to not face memory reclaim due to non-deterministic nature of reclaim.
>
> That makes sense.
>
> >
> > This patch let such applications not worry about the low level handling
> > of THPs in the kernel and splits the THPs synchronously on
> > MADV_DONTNEED.
>
> The main user I'm concerned about is virtio-balloon, which ends up
> discarding VM memory via MADV_DONTNEED when inflating the balloon in the
> guest in 4k granularity, but also during "free page reporting"
> continuously when e.g., a 2MiB page becomes free in the guest. We want
> both activities to be fast, and especially during "free page reporting",
> to defer any heavy work.

Thanks for the info. What is the source virtio-balloon used for free pages?

>
> Do we have a performance evaluation how much overhead is added e.g., for
> a single 4k MADV_DONTNEED call on a THP or on a MADV_DONTNEED call that
> covers the whole THP?

I did a simple benchmark of madvise(MADV_DONTNEED) on 10000 THPs on
x86 for both settings you suggested. I don't see any statistically
significant difference with and without the patch. Let me know if you
want me to try something else.

Thanks for the review.
Shakeel

next prev parent reply	other threads:[~2021-11-22 18:41 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-20 20:12 Shakeel Butt
2021-11-21  4:35 ` Matthew Wilcox
2021-11-21  5:25   ` Shakeel Butt
2021-11-22  0:50 ` Kirill A. Shutemov
2021-11-22  3:42   ` Shakeel Butt
2021-11-22  4:56 ` Matthew Wilcox
2021-11-22  9:19   ` David Hildenbrand
2021-12-08 13:23     ` Pankaj Gupta
2021-11-22  8:32 ` David Hildenbrand
2021-11-22 18:40   ` Shakeel Butt [this message]
2021-11-22 18:59     ` David Hildenbrand
2021-11-23  1:20       ` Shakeel Butt
2021-11-23 16:56         ` David Hildenbrand
2021-11-23 17:17           ` Shakeel Butt
2021-11-23 17:20             ` David Hildenbrand
2021-11-23 17:24               ` Shakeel Butt
2021-11-23 17:26                 ` David Hildenbrand
2021-11-23 17:28                   ` Shakeel Butt
2021-11-25 10:09                     ` Peter Xu
2021-11-25 17:14                       ` Shakeel Butt
2021-11-26  0:00                         ` Peter Xu
2021-11-25 10:24     ` Peter Xu
2021-11-25 10:32       ` David Hildenbrand
2021-11-26  2:52         ` Peter Xu
2021-11-26  9:04           ` David Hildenbrand
2021-11-29 22:00             ` Yang Shi
2021-11-26  3:21       ` Shakeel Butt
2021-11-26  4:12         ` Peter Xu
2021-11-26  9:16           ` David Hildenbrand
2021-11-26  9:39             ` Peter Xu
2021-11-29 21:32             ` Yang Shi
2022-01-24 18:48           ` David Rientjes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CALvZod5L1C1DV_DVs9O3xZm6CJnriunAoj89YLDdCp7ef5yBxA@mail.gmail.com \
    --to=shakeelb@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=shy828301@gmail.com \
    --cc=willy@infradead.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox