linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Vlastimil Babka <vbabka@suse.cz>
To: Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	cgroups@vger.kernel.org, linux-mm@kvack.org
Cc: "Andrew Morton" <akpm@linux-foundation.org>,
	"Johannes Weiner" <hannes@cmpxchg.org>,
	"Michal Hocko" <mhocko@kernel.org>,
	"Michal Koutný" <mkoutny@suse.com>,
	"Peter Zijlstra" <peterz@infradead.org>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Vladimir Davydov" <vdavydov.dev@gmail.com>,
	"Waiman Long" <longman@redhat.com>
Subject: Re: [PATCH 3/4] mm/memcg: Add a local_lock_t for IRQ and TASK object.
Date: Wed, 26 Jan 2022 17:57:14 +0100	[thread overview]
Message-ID: <7f4928b8-16e2-88b3-2688-1519a19653a9@suse.cz> (raw)
In-Reply-To: <20220125164337.2071854-4-bigeasy@linutronix.de>

On 1/25/22 17:43, Sebastian Andrzej Siewior wrote:
> The members of the per-CPU structure memcg_stock_pcp are protected
> either by disabling interrupts or by disabling preemption if the
> invocation occurred in process context.
> Disabling interrupts protects most of the structure excluding task_obj
> while disabling preemption protects only task_obj.
> This schema is incompatible with PREEMPT_RT because it creates atomic
> context in which actions are performed which require preemptible
> context. One example is obj_cgroup_release().
> 
> The IRQ-disable and preempt-disable sections can be replaced with
> local_lock_t which preserves the explicit disabling of interrupts while
> keeps the code preemptible on PREEMPT_RT.
> 
> The task_obj has been added for performance reason on non-preemptible
> kernels where preempt_disable() is a NOP. On the PREEMPT_RT preemption
> model preempt_disable() is always implemented. Also there are no memory
> allocations in_irq() context and softirqs are processed in (preemptible)
> process context. Therefore it makes sense to avoid using task_obj.
> 
> Don't use task_obj on PREEMPT_RT and replace manual disabling of
> interrupts with a local_lock_t. This change requires some factoring:
> 
> - drain_obj_stock() drops a reference on obj_cgroup which leads to an
>   invocation of obj_cgroup_release() if it is the last object. This in
>   turn leads to recursive locking of the local_lock_t. To avoid this,
>   obj_cgroup_release() is invoked outside of the locked section.
> 
> - drain_obj_stock() gets a memcg_stock_pcp passed if the stock_lock has been
>   acquired (instead of the task_obj_lock) to avoid recursive locking later
>   in refill_stock().

Looks like this was maybe true in some previous version but now
drain_obj_stock() gets a bool parameter that is passed to
obj_cgroup_uncharge_pages(). But drain_local_stock() uses a NULL or
stock_pcp for that bool parameter which is weird.

> - drain_all_stock() disables preemption via get_cpu() and then invokes
>   drain_local_stock() if it is the local CPU to avoid scheduling a worker
>   (which invokes the same function). Disabling preemption here is
>   problematic due to the sleeping locks in drain_local_stock().
>   This can be avoided by always scheduling a worker, even for the local
>   CPU. Using cpus_read_lock() stabilizes cpu_online_mask which ensures
>   that no worker is scheduled for an offline CPU. Since there is no
>   flush_work(), it is still possible that a worker is invoked on the wrong
>   CPU but it is okay since it operates always on the local-CPU data.
> 
> - drain_local_stock() is always invoked as a worker so it can be optimized
>   by removing in_task() (it is always true) and avoiding the "irq_save"
>   variant because interrupts are always enabled here. Operating on
>   task_obj first allows to acquire the lock_lock_t without lockdep
>   complains.
> 
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

The problem is that this pattern where get_obj_stock() sets a
stock_lock_acquried bool and this is passed down and acted upon elsewhere,
is a well known massive red flag for Linus :/
Maybe we should indeed just revert 559271146efc, as Michal noted there were
no hard numbers to justify it, and in previous discussion it seemed to
surface that the costs of irq disable/enable are not that bad on recent cpus
as assumed?

> ---
>  mm/memcontrol.c | 174 +++++++++++++++++++++++++++++++-----------------
>  1 file changed, 114 insertions(+), 60 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 3d1b7cdd83db0..2d8be88c00888 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -260,8 +260,10 @@ bool mem_cgroup_kmem_disabled(void)
>  	return cgroup_memory_nokmem;
>  }
>  
> +struct memcg_stock_pcp;

Seems this forward declaration is unused.

>  static void obj_cgroup_uncharge_pages(struct obj_cgroup *objcg,
> -				      unsigned int nr_pages);
> +				      unsigned int nr_pages,
> +				      bool stock_lock_acquried);
>  
>  static void obj_cgroup_release(struct percpu_ref *ref)
>  {


  parent reply	other threads:[~2022-01-26 16:57 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-25 16:43 [PATCH 0/4] mm/memcg: Address PREEMPT_RT problems instead of disabling it Sebastian Andrzej Siewior
2022-01-25 16:43 ` [PATCH 1/4] mm/memcg: Disable threshold event handlers on PREEMPT_RT Sebastian Andrzej Siewior
2022-01-26 14:40   ` Michal Hocko
2022-01-26 14:45     ` Sebastian Andrzej Siewior
2022-01-26 15:04       ` Michal Koutný
2022-01-27 13:36         ` Sebastian Andrzej Siewior
2022-01-26 15:21       ` Michal Hocko
2022-01-25 16:43 ` [PATCH 2/4] mm/memcg: Protect per-CPU counter by disabling preemption on PREEMPT_RT where needed Sebastian Andrzej Siewior
2022-01-26 10:06   ` Vlastimil Babka
2022-01-26 11:24     ` Sebastian Andrzej Siewior
2022-01-26 14:56   ` Michal Hocko
2022-01-25 16:43 ` [PATCH 3/4] mm/memcg: Add a local_lock_t for IRQ and TASK object Sebastian Andrzej Siewior
2022-01-26 15:20   ` Michal Hocko
2022-01-27 11:53     ` Sebastian Andrzej Siewior
2022-02-01 12:04       ` Michal Hocko
2022-02-01 12:11         ` Sebastian Andrzej Siewior
2022-02-01 15:29           ` Michal Hocko
2022-02-03  9:54             ` Sebastian Andrzej Siewior
2022-02-03 10:09               ` Michal Hocko
2022-02-03 11:09                 ` Sebastian Andrzej Siewior
2022-02-08 17:58                 ` Shakeel Butt
2022-02-09  9:17                   ` Michal Hocko
2022-01-26 16:57   ` Vlastimil Babka [this message]
2022-01-31 15:06     ` Sebastian Andrzej Siewior
2022-02-03 16:01       ` Vlastimil Babka
2022-02-08 17:17         ` Sebastian Andrzej Siewior
2022-02-08 17:28           ` Michal Hocko
2022-02-09  1:48   ` [mm/memcg] 86895e1e85: WARNING:possible_circular_locking_dependency_detected kernel test robot
2022-01-25 16:43 ` [PATCH 4/4] mm/memcg: Allow the task_obj optimization only on non-PREEMPTIBLE kernels Sebastian Andrzej Siewior
2022-01-25 23:21 ` [PATCH 0/4] mm/memcg: Address PREEMPT_RT problems instead of disabling it Andrew Morton
2022-01-26  7:30   ` Sebastian Andrzej Siewior

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7f4928b8-16e2-88b3-2688-1519a19653a9@suse.cz \
    --to=vbabka@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=bigeasy@linutronix.de \
    --cc=cgroups@vger.kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=linux-mm@kvack.org \
    --cc=longman@redhat.com \
    --cc=mhocko@kernel.org \
    --cc=mkoutny@suse.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=vdavydov.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox