Re: [mm, net-next v2] mm: net: memcg accounting for TCP rx zerocopy

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Johannes Weiner <hannes@cmpxchg.org>
To: Arjun Roy <arjunroy.kdev@gmail.com>
Cc: akpm@linux-foundation.org, davem@davemloft.net,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	cgroups@vger.kernel.org, linux-mm@kvack.org, arjunroy@google.com,
	shakeelb@google.com, edumazet@google.com, soheil@google.com,
	kuba@kernel.org, mhocko@kernel.org, shy828301@gmail.com,
	guro@fb.com
Subject: Re: [mm, net-next v2] mm: net: memcg accounting for TCP rx zerocopy
Date: Tue, 16 Mar 2021 06:26:58 -0400	[thread overview]
Message-ID: <YFCH8vzFGmfFRCvV@cmpxchg.org> (raw)
In-Reply-To: <20210316041645.144249-1-arjunroy.kdev@gmail.com>

Hello,

On Mon, Mar 15, 2021 at 09:16:45PM -0700, Arjun Roy wrote:
> From: Arjun Roy <arjunroy@google.com>
> 
> TCP zerocopy receive is used by high performance network applications
> to further scale. For RX zerocopy, the memory containing the network
> data filled by the network driver is directly mapped into the address
> space of high performance applications. To keep the TLB cost low,
> these applications unmap the network memory in big batches. So, this
> memory can remain mapped for long time. This can cause a memory
> isolation issue as this memory becomes unaccounted after getting
> mapped into the application address space. This patch adds the memcg
> accounting for such memory.
> 
> Accounting the network memory comes with its own unique challenges.
> The high performance NIC drivers use page pooling to reuse the pages
> to eliminate/reduce expensive setup steps like IOMMU. These drivers
> keep an extra reference on the pages and thus we can not depend on the
> page reference for the uncharging. The page in the pool may keep a
> memcg pinned for arbitrary long time or may get used by other memcg.

The page pool knows when a page is unmapped again and becomes
available for recycling, right? Essentially the 'free' phase of that
private allocator. That's where the uncharge should be done.

For one, it's more aligned with the usual memcg charge lifetime rules.

But also it doesn't add what is essentially a private driver callback
to the generic file unmapping path.

Finally, this will eliminate the need for making up a new charge type
(MEMCG_DATA_SOCK) and allow using the standard kmem charging API.

> This patch decouples the uncharging of the page from the refcnt and
> associates it with the map count i.e. the page gets uncharged when the
> last address space unmaps it. Now the question is, what if the driver
> drops its reference while the page is still mapped? That is fine as
> the address space also holds a reference to the page i.e. the
> reference count can not drop to zero before the map count.
> 
> Signed-off-by: Arjun Roy <arjunroy@google.com>
> Co-developed-by: Shakeel Butt <shakeelb@google.com>
> Signed-off-by: Shakeel Butt <shakeelb@google.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
> ---
> 
> Changelog since v1:
> - Pages accounted for in this manner are now tracked via MEMCG_SOCK.
> - v1 allowed for a brief period of double-charging, now we have a
>   brief period of under-charging to avoid undue memory pressure.

I'm afraid we'll have to go back to v1.

Let's address the issues raised with it:

1. The NR_FILE_MAPPED accounting. It is longstanding Linux behavior
   that driver pages mapped into userspace are accounted as file
   pages, because userspace is actually doing mmap() against a driver
   file/fd (as opposed to an anon mmap). That is how they show up in
   vmstat, in meminfo, and in the per process stats. There is no
   reason to make memcg deviate from this. If we don't like it, it
   should be taken on by changing vm_insert_page() - not trick rmap
   into thinking these arent memcg pages and then fixing it up with
   additional special-cased accounting callbacks.

   v1 did this right, it charged the pages the way we handle all other
   userspace pages: before rmap, and then let the generic VM code do
   the accounting for us with the cgroup-aware vmstat infrastructure.

2. The double charging. Could you elaborate how much we're talking
   about in any given batch? Is this a problem worth worrying about?

   The way I see it, any conflict here is caused by the pages being
   counted in the SOCK counter already, but not actually *tracked* on
   a per page basis. If it's worth addressing, we should look into
   fixing the root cause over there first if possible, before trying
   to work around it here.

   The newly-added GFP_NOFAIL is especially worrisome. The pages
   should be charged before we make promises to userspace, not be
   force-charged when it's too late.

   We have sk context when charging the inserted pages. Can we
   uncharge MEMCG_SOCK after each batch of inserts? That's only 32
   pages worth of overcharging, so not more than the regular charge
   batch memcg is using.

   An even better way would be to do charge stealing where we reuse
   the existing MEMCG_SOCK charges and don't have to get any new ones
   at all - just set up page->memcg and remove the charge from the sk.

   But yeah, it depends a bit if this is a practical concern.

Thanks,
Johannes

next prev parent reply	other threads:[~2021-03-16 10:27 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-16  4:16 Arjun Roy
2021-03-16  4:20 ` Arjun Roy
2021-03-16  4:29   ` Shakeel Butt
2021-03-16  6:22     ` Arjun Roy
2021-03-16  6:28       ` Arjun Roy
2021-03-16 21:02         ` Jakub Kicinski
2021-03-16 10:26 ` Johannes Weiner [this message]
2021-03-17  6:05   ` Arjun Roy
2021-03-17 22:12     ` Johannes Weiner
2021-03-22 21:35       ` Arjun Roy
2021-03-23 17:01         ` Johannes Weiner
2021-03-23 18:42           ` Arjun Roy
2021-03-24 18:25             ` Shakeel Butt
2021-03-24 22:21               ` Arjun Roy
2021-03-23 14:34       ` Michal Hocko
2021-03-23 18:47         ` Arjun Roy
2021-03-24  9:12           ` Michal Hocko
2021-03-24 20:39             ` Arjun Roy
2021-03-24 20:53               ` Shakeel Butt
2021-03-24 21:56                 ` Michal Hocko
2021-03-24 21:24             ` Johannes Weiner
2021-03-24 22:49               ` Arjun Roy
2021-03-25  9:02                 ` Michal Hocko
2021-03-25 16:47                   ` Johannes Weiner
2021-03-25 17:50                     ` Michal Hocko
  -- strict thread matches above, loose matches on Subject: below --
2021-03-16  1:30 Arjun Roy
2021-03-18  3:21 ` Andrew Morton
2021-03-22 21:19   ` Arjun Roy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YFCH8vzFGmfFRCvV@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=akpm@linux-foundation.org \
    --cc=arjunroy.kdev@gmail.com \
    --cc=arjunroy@google.com \
    --cc=cgroups@vger.kernel.org \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=guro@fb.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=shakeelb@google.com \
    --cc=shy828301@gmail.com \
    --cc=soheil@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox