linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Dmitry Rokosov <ddrokosov@salutedevices.com>
To: Michal Hocko <mhocko@suse.com>
Cc: <rostedt@goodmis.org>, <mhiramat@kernel.org>,
	<hannes@cmpxchg.org>, <roman.gushchin@linux.dev>,
	<shakeelb@google.com>, <muchun.song@linux.dev>,
	<akpm@linux-foundation.org>, <kernel@sberdevices.ru>,
	<rockosov@gmail.com>, <cgroups@vger.kernel.org>,
	<linux-mm@kvack.org>, <linux-kernel@vger.kernel.org>,
	<bpf@vger.kernel.org>
Subject: Re: [PATCH v3 2/2] mm: memcg: introduce new event to trace shrink_memcg
Date: Wed, 29 Nov 2023 18:26:13 +0300	[thread overview]
Message-ID: <20231129152613.6vfz4b675u7wbz25@CAB-WSD-L081021> (raw)
In-Reply-To: <20231129152057.x7fhbcvwtsmkbdpb@CAB-WSD-L081021>

On Wed, Nov 29, 2023 at 06:20:57PM +0300, Dmitry Rokosov wrote:
> On Tue, Nov 28, 2023 at 10:32:50AM +0100, Michal Hocko wrote:
> > On Mon 27-11-23 19:16:37, Dmitry Rokosov wrote:
> > > On Mon, Nov 27, 2023 at 01:50:22PM +0100, Michal Hocko wrote:
> > > > On Mon 27-11-23 14:36:44, Dmitry Rokosov wrote:
> > > > > On Mon, Nov 27, 2023 at 10:33:49AM +0100, Michal Hocko wrote:
> > > > > > On Thu 23-11-23 22:39:37, Dmitry Rokosov wrote:
> > > > > > > The shrink_memcg flow plays a crucial role in memcg reclamation.
> > > > > > > Currently, it is not possible to trace this point from non-direct
> > > > > > > reclaim paths. However, direct reclaim has its own tracepoint, so there
> > > > > > > is no issue there. In certain cases, when debugging memcg pressure,
> > > > > > > developers may need to identify all potential requests for memcg
> > > > > > > reclamation including kswapd(). The patchset introduces the tracepoints
> > > > > > > mm_vmscan_memcg_shrink_{begin|end}() to address this problem.
> > > > > > > 
> > > > > > > Example of output in the kswapd context (non-direct reclaim):
> > > > > > >     kswapd0-39      [001] .....   240.356378: mm_vmscan_memcg_shrink_begin: order=0 gfp_flags=GFP_KERNEL memcg=16
> > > > > > >     kswapd0-39      [001] .....   240.356396: mm_vmscan_memcg_shrink_end: nr_reclaimed=0 memcg=16
> > > > > > >     kswapd0-39      [001] .....   240.356420: mm_vmscan_memcg_shrink_begin: order=0 gfp_flags=GFP_KERNEL memcg=16
> > > > > > >     kswapd0-39      [001] .....   240.356454: mm_vmscan_memcg_shrink_end: nr_reclaimed=1 memcg=16
> > > > > > >     kswapd0-39      [001] .....   240.356479: mm_vmscan_memcg_shrink_begin: order=0 gfp_flags=GFP_KERNEL memcg=16
> > > > > > >     kswapd0-39      [001] .....   240.356506: mm_vmscan_memcg_shrink_end: nr_reclaimed=4 memcg=16
> > > > > > >     kswapd0-39      [001] .....   240.356525: mm_vmscan_memcg_shrink_begin: order=0 gfp_flags=GFP_KERNEL memcg=16
> > > > > > >     kswapd0-39      [001] .....   240.356593: mm_vmscan_memcg_shrink_end: nr_reclaimed=11 memcg=16
> > > > > > >     kswapd0-39      [001] .....   240.356614: mm_vmscan_memcg_shrink_begin: order=0 gfp_flags=GFP_KERNEL memcg=16
> > > > > > >     kswapd0-39      [001] .....   240.356738: mm_vmscan_memcg_shrink_end: nr_reclaimed=25 memcg=16
> > > > > > >     kswapd0-39      [001] .....   240.356790: mm_vmscan_memcg_shrink_begin: order=0 gfp_flags=GFP_KERNEL memcg=16
> > > > > > >     kswapd0-39      [001] .....   240.357125: mm_vmscan_memcg_shrink_end: nr_reclaimed=53 memcg=16
> > > > > > 
> > > > > > In the previous version I have asked why do we need this specific
> > > > > > tracepoint when we already do have trace_mm_vmscan_lru_shrink_{in}active
> > > > > > which already give you a very good insight. That includes the number of
> > > > > > reclaimed pages but also more. I do see that we do not include memcg id
> > > > > > of the reclaimed LRU, but that shouldn't be a big problem to add, no?
> > > > > 
> > > > > >From my point of view, memcg reclaim includes two points: LRU shrink and
> > > > > slab shrink, as mentioned in the vmscan.c file.
> > > > > 
> > > > > 
> > > > > static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc)
> > > > > ...
> > > > > 		reclaimed = sc->nr_reclaimed;
> > > > > 		scanned = sc->nr_scanned;
> > > > > 
> > > > > 		shrink_lruvec(lruvec, sc);
> > > > > 
> > > > > 		shrink_slab(sc->gfp_mask, pgdat->node_id, memcg,
> > > > > 			    sc->priority);
> > > > > ...
> > > > > 
> > > > > So, both of these operations are important for understanding whether
> > > > > memcg reclaiming was successful or not, as well as its effectiveness. I
> > > > > believe it would be beneficial to summarize them, which is why I have
> > > > > created new tracepoints.
> > > > 
> > > > This sounds like nice to have rather than must. Put it differently. If
> > > > you make existing reclaim trace points memcg aware (print memcg id) then
> > > > what prevents you from making analysis you need?
> > > 
> > > You are right, nothing prevents me from making this analysis... but...
> > > 
> > > This approach does have some disadvantages:
> > > 1) It requires more changes to vmscan. At the very least, the memcg
> > > object should be forwarded to all subfunctions for LRU and SLAB
> > > shrinkers.
> > 
> > We should have lruvec or memcg available. lruvec_memcg() could be used
> > to get memcg from the lruvec. It might be more places to add the id but
> > arguably this would improve them to identify where the memory has been
> > scanned/reclaimed from.
> >  
> 
> Oh, thank you, didn't see this conversion function before...
> 
> > > 2) With this approach, we will not have the ability to trace a situation
> > > where the kernel is requesting reclaim for a specific memcg, but due to
> > > limits issues, we are unable to run it.
> > 
> > I do not follow. Could you be more specific please?
> > 
> 
> I'm referring to a situation where kswapd() or another kernel mm code
> requests some reclaim pages from memcg, but memcg rejects it due to
> limits checkers. This occurs in the shrink_node_memcgs() function.
> 
> ===
> 		mem_cgroup_calculate_protection(target_memcg, memcg);
> 
> 		if (mem_cgroup_below_min(target_memcg, memcg)) {
> 			/*
> 			 * Hard protection.
> 			 * If there is no reclaimable memory, OOM.
> 			 */
> 			continue;
> 		} else if (mem_cgroup_below_low(target_memcg, memcg)) {
> 			/*
> 			 * Soft protection.
> 			 * Respect the protection only as long as
> 			 * there is an unprotected supply
> 			 * of reclaimable memory from other cgroups.
> 			 */
> 			if (!sc->memcg_low_reclaim) {
> 				sc->memcg_low_skipped = 1;
> 				continue;
> 			}
> 			memcg_memory_event(memcg, MEMCG_LOW);
> 		}
> ===
> 
> With separate shrink begin()/end() tracepoints we can detect such
> problem.
> 
> 
> > > 3) LRU and SLAB shrinkers are too common places to handle memcg-related
> > > tasks. Additionally, memcg can be disabled in the kernel configuration.
> > 
> > Right. This could be all hidden in the tracing code. You simply do not
> > print memcg id when the controller is disabled. Or just simply print 0.
> > I do not really see any major problems with that.
> > 
> > I would really prefer to focus on that direction rather than adding
> > another begin/end tracepoint which overalaps with existing begin/end
> > traces and provides much more limited information because I would bet we
> > will have somebody complaining that mere nr_reclaimed is not sufficient.
> 
> Okay, I will try to prepare a new patch version with memcg printing from
> lruvec and slab tracepoints.
> 
> Then Andrew should drop the previous patchsets, I suppose. Please advise
> on the correct workflow steps here.

Actually, it has already been merged into linux-next... I just checked.
Maybe it would be better to prepare lruvec and slab memcg printing as a
separate patch series?

https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=0e7f0c52a76cb22c8633f21bff6e48fabff6016e

-- 
Thank you,
Dmitry


  reply	other threads:[~2023-11-29 15:26 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-23 19:39 [PATCH v3 0/2] mm: memcg: improve vmscan tracepoints Dmitry Rokosov
2023-11-23 19:39 ` [PATCH v3 1/2] mm: memcg: print out cgroup ino in the memcg tracepoints Dmitry Rokosov
2023-11-25  4:11   ` Shakeel Butt
     [not found] ` <20231123193937.11628-3-ddrokosov@salutedevices.com>
2023-11-25  6:36   ` [PATCH v3 2/2] mm: memcg: introduce new event to trace shrink_memcg Shakeel Butt
2023-11-25  8:01     ` Dmitry Rokosov
2023-11-25 17:38       ` Shakeel Butt
2023-11-25 17:47   ` Shakeel Butt
2023-11-27  9:33   ` Michal Hocko
2023-11-27 11:36     ` Dmitry Rokosov
2023-11-27 12:50       ` Michal Hocko
2023-11-27 16:16         ` Dmitry Rokosov
2023-11-28  9:32           ` Michal Hocko
2023-11-29 15:20             ` Dmitry Rokosov
2023-11-29 15:26               ` Dmitry Rokosov [this message]
2023-11-29 16:06               ` Michal Hocko
2023-11-29 16:57                 ` Dmitry Rokosov
2023-11-29 17:10                   ` Michal Hocko
2023-11-29 17:34                     ` Steven Rostedt
2023-11-29 17:35                     ` Dmitry Rokosov
2023-11-29 17:33               ` Andrew Morton
2023-11-29 17:49                 ` Dmitry Rokosov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231129152613.6vfz4b675u7wbz25@CAB-WSD-L081021 \
    --to=ddrokosov@salutedevices.com \
    --cc=akpm@linux-foundation.org \
    --cc=bpf@vger.kernel.org \
    --cc=cgroups@vger.kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=kernel@sberdevices.ru \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhiramat@kernel.org \
    --cc=mhocko@suse.com \
    --cc=muchun.song@linux.dev \
    --cc=rockosov@gmail.com \
    --cc=roman.gushchin@linux.dev \
    --cc=rostedt@goodmis.org \
    --cc=shakeelb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox