From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id A0508C4345F
	for <linux-mm@archiver.kernel.org>; Wed,  1 May 2024 18:42:01 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id F119B6B0087; Wed,  1 May 2024 14:42:00 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id EC14C6B0088; Wed,  1 May 2024 14:42:00 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id CECE16B0089; Wed,  1 May 2024 14:42:00 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15])
	by kanga.kvack.org (Postfix) with ESMTP id A8C066B0087
	for <linux-mm@kvack.org>; Wed,  1 May 2024 14:42:00 -0400 (EDT)
Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay07.hostedemail.com (Postfix) with ESMTP id 28249160984
	for <linux-mm@kvack.org>; Wed,  1 May 2024 18:42:00 +0000 (UTC)
X-FDA: 82070696400.01.E2F817A
Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124])
	by imf04.hostedemail.com (Postfix) with ESMTP id 2A3D84000C
	for <linux-mm@kvack.org>; Wed,  1 May 2024 18:41:58 +0000 (UTC)
Authentication-Results: imf04.hostedemail.com;
	dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=X8Nvn+38;
	dmarc=pass (policy=none) header.from=redhat.com;
	spf=pass (imf04.hostedemail.com: domain of longman@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=longman@redhat.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1714588918;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=oRu+D1M6p0EyajE2UjtfBXYrZ3C2Id1s2UvhPHfErTU=;
	b=cGYfJpTR/0Get+FMCZpgg+M4IRCJAp6LFVDjmC24z4pXeF5g7DRTis+JQsYYj3bLxdQ/rc
	RVM2/Ij3dgAhlKAlG+F3tK05ZYGlHkOtVQDTTZ/jTmlh5UjovJ1pwjGmx1XdlqxZuR8PTE
	YeDyTpcJYykTK6P/qTp27rlmxYSeMj0=
ARC-Authentication-Results: i=1;
	imf04.hostedemail.com;
	dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=X8Nvn+38;
	dmarc=pass (policy=none) header.from=redhat.com;
	spf=pass (imf04.hostedemail.com: domain of longman@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=longman@redhat.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1714588918; a=rsa-sha256;
	cv=none;
	b=cQgJvh1qtOFA3LkfWPDaq8LbqRUoYFP8YZfvgQwFWhfpWGR9He48ugfCGP/CJh/SMhRUo9
	S13ozn+OyiIULSmHyqOtmlgLKzA2lwN4enU/AHwN5jA/5hFH/lfC6NeFQdUfCMlnED04JQ
	ULbT8uz1qOfYf8dl8/kqJxfSHKSOTek=
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;
	s=mimecast20190719; t=1714588917;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
	 in-reply-to:in-reply-to:references:references;
	bh=oRu+D1M6p0EyajE2UjtfBXYrZ3C2Id1s2UvhPHfErTU=;
	b=X8Nvn+38vJQCofJ555teNjhmjL2JZgU3pCtZljpsuG3bhkPhCG4Qa1yiMsUZ8noEEKZFef
	KsAA+ykDUgvQpvp0pDfIkrZYJzNbKigGCQrFSjs2BCGz6GbxO4Cn138KPWFC14zOKF5ClE
	8wmjRcMlEH7xY/ZA9rUUWx7ZcX2bgFc=
Received: from mimecast-mx02.redhat.com (mx-ext.redhat.com [66.187.233.73])
 by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3,
 cipher=TLS_AES_256_GCM_SHA384) id us-mta-260-gsJa64B_MiWNU3tunt8kMA-1; Wed,
 01 May 2024 14:41:51 -0400
X-MC-Unique: gsJa64B_MiWNU3tunt8kMA-1
Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256)
	(No client certificate requested)
	by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 2F4431C05156;
	Wed,  1 May 2024 18:41:51 +0000 (UTC)
Received: from [10.22.34.18] (unknown [10.22.34.18])
	by smtp.corp.redhat.com (Postfix) with ESMTP id 352C71121312;
	Wed,  1 May 2024 18:41:50 +0000 (UTC)
Content-Type: multipart/alternative;
 boundary="------------6JchC95ltvw4Y0gZTVxHEEe1"
Message-ID: <203fdb35-f4cf-4754-9709-3c024eecade9@redhat.com>
Date: Wed, 1 May 2024 14:41:49 -0400
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
Subject: Re: [PATCH v1] cgroup/rstat: add cgroup_rstat_cpu_lock helpers and
 tracepoints
To: Jesper Dangaard Brouer <hawk@kernel.org>, tj@kernel.org,
 hannes@cmpxchg.org, lizefan.x@bytedance.com, cgroups@vger.kernel.org,
 yosryahmed@google.com
Cc: netdev@vger.kernel.org, linux-mm@kvack.org, shakeel.butt@linux.dev,
 kernel-team@cloudflare.com, Arnaldo Carvalho de Melo <acme@kernel.org>,
 Sebastian Andrzej Siewior <bigeasy@linutronix.de>
References: <171457225108.4159924.12821205549807669839.stgit@firesoul>
 <30d64e25-561a-41c6-ab95-f0820248e9b6@redhat.com>
 <4a680b80-b296-4466-895a-13239b982c85@kernel.org>
Content-Language: en-US
From: Waiman Long <longman@redhat.com>
In-Reply-To: <4a680b80-b296-4466-895a-13239b982c85@kernel.org>
X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.3
X-Rspamd-Server: rspam01
X-Stat-Signature: pwt3jr171tb9uoxwur7qp8qz87xcic9h
X-Rspam-User: 
X-Rspamd-Queue-Id: 2A3D84000C
X-HE-Tag: 1714588918-248554
X-HE-Meta: U2FsdGVkX183m3eWYRfk+uOIC+f7FVAo2Dhlaqdmpqdk4vHrSkBf5H+5qaBDTrd/BDKQgg/NsYUPHTlcWx3UjympTM0HRFvYekZO+Uqx+n6cAYXthojRM6rCxBm1WEEqcmoOMlP3MoOkWupIm+OrwPcw519Brm6ptmX4BKaRnREEw1Nf2AJf+V2Vd2QSIDmoLhmu7sK3G771Qx5o/KMJuuBBh9yH0fQcz0Ac1kV0z1W04v92IHpjaYPxfFNZi5cZyMln8zzr/b3z9v+qp5LXtacYO41lnnZsPmAG/pyoUYkLZ0rwt3yqsvIZZ0Dc5oSQ5KB7gpEVlqpFKAUlRucvpSmxOhCKVRrM+Ed19nZRTIKlKY7OXskIxMOfl6FSNAZ2D+NY6xl/5spg6nmNUqrxDtqZhmlzGLd7DC7eIduvHx6tKoTYJvJyJcaTiJeEmUu/SIWL0mnvJUIYtxvP15gEY4Rs94kiCEUGROrTBhjgEEcaCt9eAljjTMDACeM2Gg1Q7hzfPa+LpTUD2NTcJTLgLplCwPrKkRvNE0qD8uytGco2FOHq6H/gf1I/EBDNrisX1PaTqbM8GmlPShvK+2mevkLUpTYM/qcbXBx4twMQlAIUfWv+UrxROIjeayN4ENNMrCqNTOlcAmeWAUszjctOz+vshnFy6X9Zl7sAJgUJMIck/tYKo7z/LTbpPpioGv0C+J5wh0AriXzIUcxRMbmPHSFn53aHAt52spar3RFurrWab7K85OXTY5O9oXy5pPI7qdgMC17YarEo4/E/EnSUYKbYKYShIfUWqe2Pz4ecowgPqMw7QSQE3ZiEee36aUlCLfbf7l1PSy2XQoYD86uSeTctKIq1ISbRvLLM+GkkzHjk/NmzCGmIQ5B/YlQRV4tQzJie5FV4V8+ex/1iGipI978wa5yxJRPKGhmYvqav5zPv3EeUybPmOrBdE+YyY5Y2zWE2Sz3UZIPDt8sFA+o
 fsqRgi2C
 hsh7RQL6CDkLzUhMA85sgEkqx+FwvWUXBM8QhQFnFs6HrZHieOYGtlsjnREZ2yYqf4Pug54dguZOGScmH8xWbwSkqzKd7zXwUozOroW+UC2mJxsN+lTkYenvZ3pjc6tgOU1LHgElw3NpLJmooWd2jtJiCQf6BzwAKtIV1deH4oI4hRK0MP1plRfroq9oBAi7FU6Ef4riXw+iJk5HfrEyLQaqFaVjVqPCOyG17IvGyDjKrKsoVvo/XTLkqh1Q44rp8gU53kZoL8f5xbc9BUrEuumFmSKq4Sw4Gak+eVwS526U8dMRGh6tRUrmOo9qadPac1T1xV6pzsZ2ZRk5wH9ILcicR7SKXffIijXyA2ZxBn3nUqLu32LVLrbVahsI9BTWt/LK0rDn6pa0Mp4BOBFgXdjpBrPok3Gfh95sBr2VhvgXVERc7Wk1TgRWgGSWUYbuGPaYoZNaB0YCyn77+YwWdBBqmb/a18b38uLZvOHIGM0VLOjYn2ayQdJiz8Gl48yl30teRKGU8Cw5KcKU=
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

This is a multi-part message in MIME format.
--------------6JchC95ltvw4Y0gZTVxHEEe1
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit

On 5/1/24 13:22, Jesper Dangaard Brouer wrote:
>
>
> On 01/05/2024 16.24, Waiman Long wrote:
>> On 5/1/24 10:04, Jesper Dangaard Brouer wrote:
>>> This closely resembles helpers added for the global 
>>> cgroup_rstat_lock in
>>> commit fc29e04ae1ad ("cgroup/rstat: add cgroup_rstat_lock helpers and
>>> tracepoints"). This is for the per CPU lock cgroup_rstat_cpu_lock.
>>>
>>> Based on production workloads, we observe the fast-path "update" 
>>> function
>>> cgroup_rstat_updated() is invoked around 3 million times per sec, 
>>> while the
>>> "flush" function cgroup_rstat_flush_locked(), walking each possible 
>>> CPU,
>>> can see periodic spikes of 700 invocations/sec.
>>>
>>> For this reason, the tracepoints are split into normal and fastpath
>>> versions for this per-CPU lock. Making it feasible for production to
>>> continuously monitor the non-fastpath tracepoint to detect lock 
>>> contention
>>> issues. The reason for monitoring is that lock disables IRQs which can
>>> disturb e.g. softirq processing on the local CPUs involved. When the
>>> global cgroup_rstat_lock stops disabling IRQs (e.g converted to a 
>>> mutex),
>>> this per CPU lock becomes the next bottleneck that can introduce 
>>> latency
>>> variations.
>>>
>>> A practical bpftrace script for monitoring contention latency:
>>>
>>>   bpftrace -e '
>>>     tracepoint:cgroup:cgroup_rstat_cpu_lock_contended {
>>>       @start[tid]=nsecs; @cnt[probe]=count()}
>>>     tracepoint:cgroup:cgroup_rstat_cpu_locked {
>>>       if (args->contended) {
>>>         @wait_ns=hist(nsecs-@start[tid]); delete(@start[tid]);}
>>>       @cnt[probe]=count()}
>>>     interval:s:1 {time("%H:%M:%S "); print(@wait_ns); print(@cnt); 
>>> clear(@cnt);}'
>>
>> This is a per-cpu lock. So the only possible contention involves only 
>> 2 CPUs - a local CPU invoking cgroup_rstat_updated(). A flusher CPU 
>> doing cgroup_rstat_flush_locked() calling into 
>> cgroup_rstat_updated_list(). With recent commits to reduce the percpu 
>> lock hold time, I doubt lock contention on the percpu lock will have 
>> a great impact on latency. 
>
> I do appriciate your recent changes to reduce the percpu lock hold time.
> These tracepoints allow me to measure and differentiate the percpu lock
> hold time vs. the flush time.
>
> In production (using [1]) I'm seeing "Long lock-hold time" [L100] e.g.
> upto 29 ms, which is time spend after obtaining the lock (runtime under
> lock).  I was expecting to see "High Lock-contention wait" [L82] which
> is the time waiting for obtaining the lock.
>
> This is why I'm adding these tracepoints, as they allow me to digg
> deeper, to understand where this high runtime variations originate from.
>
>
> Data:
>
>  16:52:09 Long lock-hold time: 14950 usec (14 ms) on CPU:34 comm:kswapd4
>  16:52:09 Long lock-hold time: 14821 usec (14 ms) on CPU:34 comm:kswapd4
>  16:52:09 Long lock-hold time: 11299 usec (11 ms) on CPU:98 comm:kswapd4
>  16:52:09 Long lock-hold time: 17237 usec (17 ms) on CPU:113 comm:kswapd6
>  16:52:09 Long lock-hold time: 29000 usec (29 ms) on CPU:36 
> comm:kworker/u261:12
That lock hold time is much higher than I would have expected.
>  16:52:09 time elapsed: 80 sec (interval = 1 sec)
>   Flushes(5033) 294/interval (avg 62/sec)
>   Locks(53374) 1748/interval (avg 667/sec)
>   Yields(48341) 1454/interval (avg 604/sec)
>   Contended(48104) 1450/interval (avg 601/sec)
>
>
>> So do we really need such an elaborate scheme to monitor this? BTW, 
>> the additional code will also add to the worst case latency.
>
> Hmm, I designed this code to have minimal impact, as tracepoints are
> no-ops until activated.  I really doubt this code will change the 
> latency.
>
>
> [1] 
> https://github.com/xdp-project/xdp-project/blob/master/areas/latency/cgroup_rstat_tracepoint.bt
>
> [L100] 
> https://github.com/xdp-project/xdp-project/blob/master/areas/latency/cgroup_rstat_tracepoint.bt#L100
>
> [L82] 
> https://github.com/xdp-project/xdp-project/blob/master/areas/latency/cgroup_rstat_tracepoint.bt#L82
>
>>>
>>> Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org>
>
> More data, the histogram of time spend under the lock have some strange
> variation issues with a group in 4ms to 65ms area. Investigating what
> can be causeing this... which next step depend in these tracepoints.
>
> @lock_cnt: 759146
>
> @locked_ns:
> [1K, 2K)             499 |      |
> [2K, 4K)          206928 
> |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> [4K, 8K)          147904 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@      |
> [8K, 16K)          64453 |@@@@@@@@@@@@@@@@      |
> [16K, 32K)        135467 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
> [32K, 64K)         75943 |@@@@@@@@@@@@@@@@@@@      |
> [64K, 128K)        38359 |@@@@@@@@@      |
> [128K, 256K)       46597 |@@@@@@@@@@@      |
> [256K, 512K)       32466 |@@@@@@@@      |
> [512K, 1M)          3945 |      |
> [1M, 2M)             642 |      |
> [2M, 4M)             750 |      |
> [4M, 8M)            1932 |      |
> [8M, 16M)           2114 |      |
> [16M, 32M)          1039 |      |
> [32M, 64M)           108 |      |
>
>
>
>
>>> ---
>>>   include/trace/events/cgroup.h |   56 
>>> +++++++++++++++++++++++++++++----
>>>   kernel/cgroup/rstat.c         |   70 
>>> ++++++++++++++++++++++++++++++++++-------
>>>   2 files changed, 108 insertions(+), 18 deletions(-)
>>>
>>> diff --git a/include/trace/events/cgroup.h 
>>> b/include/trace/events/cgroup.h
>>> index 13f375800135..0b95865a90f3 100644
>>> --- a/include/trace/events/cgroup.h
>>> +++ b/include/trace/events/cgroup.h
>>> @@ -206,15 +206,15 @@ DEFINE_EVENT(cgroup_event, cgroup_notify_frozen,
>>>   DECLARE_EVENT_CLASS(cgroup_rstat,
>>> -    TP_PROTO(struct cgroup *cgrp, int cpu_in_loop, bool contended),
>>> +    TP_PROTO(struct cgroup *cgrp, int cpu, bool contended),
>>> -    TP_ARGS(cgrp, cpu_in_loop, contended),
>>> +    TP_ARGS(cgrp, cpu, contended),
>>>       TP_STRUCT__entry(
>>>           __field(    int,        root            )
>>>           __field(    int,        level            )
>>>           __field(    u64,        id            )
>>> -        __field(    int,        cpu_in_loop        )
>>> +        __field(    int,        cpu            )
>>>           __field(    bool,        contended        )
>>>       ),
>>> @@ -222,15 +222,16 @@ DECLARE_EVENT_CLASS(cgroup_rstat,
>>>           __entry->root = cgrp->root->hierarchy_id;
>>>           __entry->id = cgroup_id(cgrp);
>>>           __entry->level = cgrp->level;
>>> -        __entry->cpu_in_loop = cpu_in_loop;
>>> +        __entry->cpu = cpu;
>>>           __entry->contended = contended;
>>>       ),
>>> -    TP_printk("root=%d id=%llu level=%d cpu_in_loop=%d lock 
>>> contended:%d",
>>> +    TP_printk("root=%d id=%llu level=%d cpu=%d lock contended:%d",
>>>             __entry->root, __entry->id, __entry->level,
>>> -          __entry->cpu_in_loop, __entry->contended)
>>> +          __entry->cpu, __entry->contended)
>>>   );
>>> +/* Related to global: cgroup_rstat_lock */
>>>   DEFINE_EVENT(cgroup_rstat, cgroup_rstat_lock_contended,
>>>       TP_PROTO(struct cgroup *cgrp, int cpu, bool contended),
>>> @@ -252,6 +253,49 @@ DEFINE_EVENT(cgroup_rstat, cgroup_rstat_unlock,
>>>       TP_ARGS(cgrp, cpu, contended)
>>>   );
>>> +/* Related to per CPU: cgroup_rstat_cpu_lock */
>>> +DEFINE_EVENT(cgroup_rstat, cgroup_rstat_cpu_lock_contended,
>>> +
>>> +    TP_PROTO(struct cgroup *cgrp, int cpu, bool contended),
>>> +
>>> +    TP_ARGS(cgrp, cpu, contended)
>>> +);
>>> +
>>> +DEFINE_EVENT(cgroup_rstat, cgroup_rstat_cpu_lock_contended_fastpath,
>>> +
>>> +    TP_PROTO(struct cgroup *cgrp, int cpu, bool contended),
>>> +
>>> +    TP_ARGS(cgrp, cpu, contended)
>>> +);
>>> +
>>> +DEFINE_EVENT(cgroup_rstat, cgroup_rstat_cpu_locked,
>>> +
>>> +    TP_PROTO(struct cgroup *cgrp, int cpu, bool contended),
>>> +
>>> +    TP_ARGS(cgrp, cpu, contended)
>>> +);
>>> +
>>> +DEFINE_EVENT(cgroup_rstat, cgroup_rstat_cpu_locked_fastpath,
>>> +
>>> +    TP_PROTO(struct cgroup *cgrp, int cpu, bool contended),
>>> +
>>> +    TP_ARGS(cgrp, cpu, contended)
>>> +);
>>> +
>>> +DEFINE_EVENT(cgroup_rstat, cgroup_rstat_cpu_unlock,
>>> +
>>> +    TP_PROTO(struct cgroup *cgrp, int cpu, bool contended),
>>> +
>>> +    TP_ARGS(cgrp, cpu, contended)
>>> +);
>>> +
>>> +DEFINE_EVENT(cgroup_rstat, cgroup_rstat_cpu_unlock_fastpath,
>>> +
>>> +    TP_PROTO(struct cgroup *cgrp, int cpu, bool contended),
>>> +
>>> +    TP_ARGS(cgrp, cpu, contended)
>>> +);
>>> +
>>>   #endif /* _TRACE_CGROUP_H */
>>>   /* This part must be outside protection */
>>> diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c
>>> index 52e3b0ed1cee..fb8b49437573 100644
>>> --- a/kernel/cgroup/rstat.c
>>> +++ b/kernel/cgroup/rstat.c
>>> @@ -19,6 +19,60 @@ static struct cgroup_rstat_cpu 
>>> *cgroup_rstat_cpu(struct cgroup *cgrp, int cpu)
>>>       return per_cpu_ptr(cgrp->rstat_cpu, cpu);
>>>   }
>>> +/*
>>> + * Helper functions for rstat per CPU lock (cgroup_rstat_cpu_lock).
>>> + *
>>> + * This makes it easier to diagnose locking issues and contention in
>>> + * production environments. The parameter @fast_path determine the
>>> + * tracepoints being added, allowing us to diagnose "flush" related
>>> + * operations without handling high-frequency fast-path "update" 
>>> events.
>>> + */
>>> +static __always_inline
>>> +unsigned long _cgroup_rstat_cpu_lock(raw_spinlock_t *cpu_lock, int 
>>> cpu,
>>> +                     struct cgroup *cgrp, const bool fast_path)
>>> +{
>>> +    unsigned long flags;
>>> +    bool contended;
>>> +
>>> +    /*
>>> +     * The _irqsave() is needed because cgroup_rstat_lock is
>>> +     * spinlock_t which is a sleeping lock on PREEMPT_RT. Acquiring
>>> +     * this lock with the _irq() suffix only disables interrupts on
>>> +     * a non-PREEMPT_RT kernel. The raw_spinlock_t below disables
>>> +     * interrupts on both configurations. The _irqsave() ensures
>>> +     * that interrupts are always disabled and later restored.
>>> +     */
>>> +    contended = !raw_spin_trylock_irqsave(cpu_lock, flags);
>>> +    if (contended) {
>>> +        if (fast_path)
>>> + trace_cgroup_rstat_cpu_lock_contended_fastpath(cgrp, cpu, contended);
>>> +        else
>>> +            trace_cgroup_rstat_cpu_lock_contended(cgrp, cpu, 
>>> contended);
>>> +
>>> +        raw_spin_lock_irqsave(cpu_lock, flags);

Could you do a local_irq_save() before calling trace_cgroup*() and 
raw_spin_lock()? Would that help in eliminating this high lock hold time?

You can also do a local_irq_save() first before the trylock. That will 
eliminate the duplicated irq_restore() and irq_save() when there is 
contention.

If not, there may be NMIs mixed in.

Thanks,
Longman

>>> +    }
>>> +
>>> +    if (fast_path)
>>> +        trace_cgroup_rstat_cpu_locked_fastpath(cgrp, cpu, contended);
>>> +    else
>>> +        trace_cgroup_rstat_cpu_locked(cgrp, cpu, contended);
>>> +
>>> +    return flags;
>>> +}
>>> +
>>> +static __always_inline
>>> +void _cgroup_rstat_cpu_unlock(raw_spinlock_t *cpu_lock, int cpu,
>>> +                  struct cgroup *cgrp, unsigned long flags,
>>> +                  const bool fast_path)
>>> +{
>>> +    if (fast_path)
>>> +        trace_cgroup_rstat_cpu_unlock_fastpath(cgrp, cpu, false);
>>> +    else
>>> +        trace_cgroup_rstat_cpu_unlock(cgrp, cpu, false);
>>> +
>>> +    raw_spin_unlock_irqrestore(cpu_lock, flags);
>>> +}
>>> +
>>>   /**
>>>    * cgroup_rstat_updated - keep track of updated rstat_cpu
>>>    * @cgrp: target cgroup
>>> @@ -44,7 +98,7 @@ __bpf_kfunc void cgroup_rstat_updated(struct 
>>> cgroup *cgrp, int cpu)
>>>       if (data_race(cgroup_rstat_cpu(cgrp, cpu)->updated_next))
>>>           return;
>>> -    raw_spin_lock_irqsave(cpu_lock, flags);
>>> +    flags = _cgroup_rstat_cpu_lock(cpu_lock, cpu, cgrp, true);
>>>       /* put @cgrp and all ancestors on the corresponding updated 
>>> lists */
>>>       while (true) {
>>> @@ -72,7 +126,7 @@ __bpf_kfunc void cgroup_rstat_updated(struct 
>>> cgroup *cgrp, int cpu)
>>>           cgrp = parent;
>>>       }
>>> -    raw_spin_unlock_irqrestore(cpu_lock, flags);
>>> +    _cgroup_rstat_cpu_unlock(cpu_lock, cpu, cgrp, flags, true);
>>>   }
>>>   /**
>>> @@ -153,15 +207,7 @@ static struct cgroup 
>>> *cgroup_rstat_updated_list(struct cgroup *root, int cpu)
>>>       struct cgroup *head = NULL, *parent, *child;
>>>       unsigned long flags;
>>> -    /*
>>> -     * The _irqsave() is needed because cgroup_rstat_lock is
>>> -     * spinlock_t which is a sleeping lock on PREEMPT_RT. Acquiring
>>> -     * this lock with the _irq() suffix only disables interrupts on
>>> -     * a non-PREEMPT_RT kernel. The raw_spinlock_t below disables
>>> -     * interrupts on both configurations. The _irqsave() ensures
>>> -     * that interrupts are always disabled and later restored.
>>> -     */
>>> -    raw_spin_lock_irqsave(cpu_lock, flags);
>>> +    flags = _cgroup_rstat_cpu_lock(cpu_lock, cpu, root, false);
>>>       /* Return NULL if this subtree is not on-list */
>>>       if (!rstatc->updated_next)
>>> @@ -198,7 +244,7 @@ static struct cgroup 
>>> *cgroup_rstat_updated_list(struct cgroup *root, int cpu)
>>>       if (child != root)
>>>           head = cgroup_rstat_push_children(head, child, cpu); 
>>> Changes by *Zhili Li* on 2024/04/18 9:51 PM
>>>   unlock_ret:
>>> -    raw_spin_unlock_irqrestore(cpu_lock, flags);
>>> +    _cgroup_rstat_cpu_unlock(cpu_lock, cpu, root, flags, false);
>>>       return head;
>>>   }
>>>
>>>
>>
>
--------------6JchC95ltvw4Y0gZTVxHEEe1
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: 8bit

<!DOCTYPE html>
<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body>
    <div class="moz-cite-prefix">On 5/1/24 13:22, Jesper Dangaard Brouer
      wrote:<br>
    </div>
    <blockquote type="cite"
      cite="mid:4a680b80-b296-4466-895a-13239b982c85@kernel.org">
      <br>
      <br>
      On 01/05/2024 16.24, Waiman Long wrote:
      <br>
      <blockquote type="cite">On 5/1/24 10:04, Jesper Dangaard Brouer
        wrote:
        <br>
        <blockquote type="cite">This closely resembles helpers added for
          the global cgroup_rstat_lock in
          <br>
          commit fc29e04ae1ad ("cgroup/rstat: add cgroup_rstat_lock
          helpers and
          <br>
          tracepoints"). This is for the per CPU lock
          cgroup_rstat_cpu_lock.
          <br>
          <br>
          Based on production workloads, we observe the fast-path
          "update" function
          <br>
          cgroup_rstat_updated() is invoked around 3 million times per
          sec, while the
          <br>
          "flush" function cgroup_rstat_flush_locked(), walking each
          possible CPU,
          <br>
          can see periodic spikes of 700 invocations/sec.
          <br>
          <br>
          For this reason, the tracepoints are split into normal and
          fastpath
          <br>
          versions for this per-CPU lock. Making it feasible for
          production to
          <br>
          continuously monitor the non-fastpath tracepoint to detect
          lock contention
          <br>
          issues. The reason for monitoring is that lock disables IRQs
          which can
          <br>
          disturb e.g. softirq processing on the local CPUs involved.
          When the
          <br>
          global cgroup_rstat_lock stops disabling IRQs (e.g converted
          to a mutex),
          <br>
          this per CPU lock becomes the next bottleneck that can
          introduce latency
          <br>
          variations.
          <br>
          <br>
          A practical bpftrace script for monitoring contention latency:
          <br>
          <br>
            bpftrace -e '
          <br>
              tracepoint:cgroup:cgroup_rstat_cpu_lock_contended {
          <br>
                @start[tid]=nsecs; @cnt[probe]=count()}
          <br>
              tracepoint:cgroup:cgroup_rstat_cpu_locked {
          <br>
                if (args-&gt;contended) {
          <br>
                  @wait_ns=hist(nsecs-@start[tid]);
          delete(@start[tid]);}
          <br>
                @cnt[probe]=count()}
          <br>
              interval:s:1 {time("%H:%M:%S "); print(@wait_ns);
          print(@cnt); clear(@cnt);}'
          <br>
        </blockquote>
        <br>
        This is a per-cpu lock. So the only possible contention involves
        only 2 CPUs - a local CPU invoking cgroup_rstat_updated(). A
        flusher CPU doing cgroup_rstat_flush_locked() calling into
        cgroup_rstat_updated_list(). With recent commits to reduce the
        percpu lock hold time, I doubt lock contention on the percpu
        lock will have a great impact on latency. </blockquote>
      <br>
      I do appriciate your recent changes to reduce the percpu lock hold
      time.
      <br>
      These tracepoints allow me to measure and differentiate the percpu
      lock
      <br>
      hold time vs. the flush time.
      <br>
      <br>
      In production (using [1]) I'm seeing "Long lock-hold time" [L100]
      e.g.
      <br>
      upto 29 ms, which is time spend after obtaining the lock (runtime
      under
      <br>
      lock).  I was expecting to see "High Lock-contention wait" [L82]
      which
      <br>
      is the time waiting for obtaining the lock.
      <br>
      <br>
      This is why I'm adding these tracepoints, as they allow me to digg
      <br>
      deeper, to understand where this high runtime variations originate
      from.
      <br>
      <br>
      <br>
      Data:
      <br>
      <br>
       16:52:09 Long lock-hold time: 14950 usec (14 ms) on CPU:34
      comm:kswapd4
      <br>
       16:52:09 Long lock-hold time: 14821 usec (14 ms) on CPU:34
      comm:kswapd4
      <br>
       16:52:09 Long lock-hold time: 11299 usec (11 ms) on CPU:98
      comm:kswapd4
      <br>
       16:52:09 Long lock-hold time: 17237 usec (17 ms) on CPU:113
      comm:kswapd6
      <br>
       16:52:09 Long lock-hold time: 29000 usec (29 ms) on CPU:36
      comm:kworker/u261:12
      <br>
    </blockquote>
    That lock hold time is much higher than I would have expected.<br>
    <blockquote type="cite"
      cite="mid:4a680b80-b296-4466-895a-13239b982c85@kernel.org"> 16:52:09 
      time elapsed: 80 sec (interval = 1 sec)
      <br>
        Flushes(5033) 294/interval (avg 62/sec)
      <br>
        Locks(53374) 1748/interval (avg 667/sec)
      <br>
        Yields(48341) 1454/interval (avg 604/sec)
      <br>
        Contended(48104) 1450/interval (avg 601/sec)
      <br>
      <br>
      <br>
      <blockquote type="cite">So do we really need such an elaborate
        scheme to monitor this? BTW, the additional code will also add
        to the worst case latency.
        <br>
      </blockquote>
      <br>
      Hmm, I designed this code to have minimal impact, as tracepoints
      are
      <br>
      no-ops until activated.  I really doubt this code will change the
      latency.
      <br>
      <br>
      <br>
      [1]
<a class="moz-txt-link-freetext" href="https://github.com/xdp-project/xdp-project/blob/master/areas/latency/cgroup_rstat_tracepoint.bt">https://github.com/xdp-project/xdp-project/blob/master/areas/latency/cgroup_rstat_tracepoint.bt</a><br>
      <br>
      [L100]
<a class="moz-txt-link-freetext" href="https://github.com/xdp-project/xdp-project/blob/master/areas/latency/cgroup_rstat_tracepoint.bt#L100">https://github.com/xdp-project/xdp-project/blob/master/areas/latency/cgroup_rstat_tracepoint.bt#L100</a><br>
      <br>
      [L82]
<a class="moz-txt-link-freetext" href="https://github.com/xdp-project/xdp-project/blob/master/areas/latency/cgroup_rstat_tracepoint.bt#L82">https://github.com/xdp-project/xdp-project/blob/master/areas/latency/cgroup_rstat_tracepoint.bt#L82</a><br>
      <br>
      <blockquote type="cite">
        <blockquote type="cite">
          <br>
          Signed-off-by: Jesper Dangaard Brouer <a class="moz-txt-link-rfc2396E" href="mailto:hawk@kernel.org">&lt;hawk@kernel.org&gt;</a>
          <br>
        </blockquote>
      </blockquote>
      <br>
      More data, the histogram of time spend under the lock have some
      strange
      <br>
      variation issues with a group in 4ms to 65ms area. Investigating
      what
      <br>
      can be causeing this... which next step depend in these
      tracepoints.
      <br>
      <br>
      @lock_cnt: 759146
      <br>
      <br>
      @locked_ns:
      <br>
      [1K, 2K)             499 |      |
      <br>
      [2K, 4K)          206928
      |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
      <br>
      [4K, 8K)          147904 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
           |
      <br>
      [8K, 16K)          64453 |@@@@@@@@@@@@@@@@      |
      <br>
      [16K, 32K)        135467 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@     
      |
      <br>
      [32K, 64K)         75943 |@@@@@@@@@@@@@@@@@@@      |
      <br>
      [64K, 128K)        38359 |@@@@@@@@@      |
      <br>
      [128K, 256K)       46597 |@@@@@@@@@@@      |
      <br>
      [256K, 512K)       32466 |@@@@@@@@      |
      <br>
      [512K, 1M)          3945 |      |
      <br>
      [1M, 2M)             642 |      |
      <br>
      [2M, 4M)             750 |      |
      <br>
      [4M, 8M)            1932 |      |
      <br>
      [8M, 16M)           2114 |      |
      <br>
      [16M, 32M)          1039 |      |
      <br>
      [32M, 64M)           108 |      |
      <br>
      <br>
      <br>
      <br>
      <br>
      <blockquote type="cite">
        <blockquote type="cite">---
          <br>
            include/trace/events/cgroup.h |   56
          +++++++++++++++++++++++++++++----
          <br>
            kernel/cgroup/rstat.c         |   70
          ++++++++++++++++++++++++++++++++++-------
          <br>
            2 files changed, 108 insertions(+), 18 deletions(-)
          <br>
          <br>
          diff --git a/include/trace/events/cgroup.h
          b/include/trace/events/cgroup.h
          <br>
          index 13f375800135..0b95865a90f3 100644
          <br>
          --- a/include/trace/events/cgroup.h
          <br>
          +++ b/include/trace/events/cgroup.h
          <br>
          @@ -206,15 +206,15 @@ DEFINE_EVENT(cgroup_event,
          cgroup_notify_frozen,
          <br>
            DECLARE_EVENT_CLASS(cgroup_rstat,
          <br>
          -    TP_PROTO(struct cgroup *cgrp, int cpu_in_loop, bool
          contended),
          <br>
          +    TP_PROTO(struct cgroup *cgrp, int cpu, bool contended),
          <br>
          -    TP_ARGS(cgrp, cpu_in_loop, contended),
          <br>
          +    TP_ARGS(cgrp, cpu, contended),
          <br>
                TP_STRUCT__entry(
          <br>
                    __field(    int,        root            )
          <br>
                    __field(    int,        level            )
          <br>
                    __field(    u64,        id            )
          <br>
          -        __field(    int,        cpu_in_loop        )
          <br>
          +        __field(    int,        cpu            )
          <br>
                    __field(    bool,        contended        )
          <br>
                ),
          <br>
          @@ -222,15 +222,16 @@ DECLARE_EVENT_CLASS(cgroup_rstat,
          <br>
                    __entry-&gt;root = cgrp-&gt;root-&gt;hierarchy_id;
          <br>
                    __entry-&gt;id = cgroup_id(cgrp);
          <br>
                    __entry-&gt;level = cgrp-&gt;level;
          <br>
          -        __entry-&gt;cpu_in_loop = cpu_in_loop;
          <br>
          +        __entry-&gt;cpu = cpu;
          <br>
                    __entry-&gt;contended = contended;
          <br>
                ),
          <br>
          -    TP_printk("root=%d id=%llu level=%d cpu_in_loop=%d lock
          contended:%d",
          <br>
          +    TP_printk("root=%d id=%llu level=%d cpu=%d lock
          contended:%d",
          <br>
                      __entry-&gt;root, __entry-&gt;id,
          __entry-&gt;level,
          <br>
          -          __entry-&gt;cpu_in_loop, __entry-&gt;contended)
          <br>
          +          __entry-&gt;cpu, __entry-&gt;contended)
          <br>
            );
          <br>
          +/* Related to global: cgroup_rstat_lock */
          <br>
            DEFINE_EVENT(cgroup_rstat, cgroup_rstat_lock_contended,
          <br>
                TP_PROTO(struct cgroup *cgrp, int cpu, bool contended),
          <br>
          @@ -252,6 +253,49 @@ DEFINE_EVENT(cgroup_rstat,
          cgroup_rstat_unlock,
          <br>
                TP_ARGS(cgrp, cpu, contended)
          <br>
            );
          <br>
          +/* Related to per CPU: cgroup_rstat_cpu_lock */
          <br>
          +DEFINE_EVENT(cgroup_rstat, cgroup_rstat_cpu_lock_contended,
          <br>
          +
          <br>
          +    TP_PROTO(struct cgroup *cgrp, int cpu, bool contended),
          <br>
          +
          <br>
          +    TP_ARGS(cgrp, cpu, contended)
          <br>
          +);
          <br>
          +
          <br>
          +DEFINE_EVENT(cgroup_rstat,
          cgroup_rstat_cpu_lock_contended_fastpath,
          <br>
          +
          <br>
          +    TP_PROTO(struct cgroup *cgrp, int cpu, bool contended),
          <br>
          +
          <br>
          +    TP_ARGS(cgrp, cpu, contended)
          <br>
          +);
          <br>
          +
          <br>
          +DEFINE_EVENT(cgroup_rstat, cgroup_rstat_cpu_locked,
          <br>
          +
          <br>
          +    TP_PROTO(struct cgroup *cgrp, int cpu, bool contended),
          <br>
          +
          <br>
          +    TP_ARGS(cgrp, cpu, contended)
          <br>
          +);
          <br>
          +
          <br>
          +DEFINE_EVENT(cgroup_rstat, cgroup_rstat_cpu_locked_fastpath,
          <br>
          +
          <br>
          +    TP_PROTO(struct cgroup *cgrp, int cpu, bool contended),
          <br>
          +
          <br>
          +    TP_ARGS(cgrp, cpu, contended)
          <br>
          +);
          <br>
          +
          <br>
          +DEFINE_EVENT(cgroup_rstat, cgroup_rstat_cpu_unlock,
          <br>
          +
          <br>
          +    TP_PROTO(struct cgroup *cgrp, int cpu, bool contended),
          <br>
          +
          <br>
          +    TP_ARGS(cgrp, cpu, contended)
          <br>
          +);
          <br>
          +
          <br>
          +DEFINE_EVENT(cgroup_rstat, cgroup_rstat_cpu_unlock_fastpath,
          <br>
          +
          <br>
          +    TP_PROTO(struct cgroup *cgrp, int cpu, bool contended),
          <br>
          +
          <br>
          +    TP_ARGS(cgrp, cpu, contended)
          <br>
          +);
          <br>
          +
          <br>
            #endif /* _TRACE_CGROUP_H */
          <br>
            /* This part must be outside protection */
          <br>
          diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c
          <br>
          index 52e3b0ed1cee..fb8b49437573 100644
          <br>
          --- a/kernel/cgroup/rstat.c
          <br>
          +++ b/kernel/cgroup/rstat.c
          <br>
          @@ -19,6 +19,60 @@ static struct cgroup_rstat_cpu
          *cgroup_rstat_cpu(struct cgroup *cgrp, int cpu)
          <br>
                return per_cpu_ptr(cgrp-&gt;rstat_cpu, cpu);
          <br>
            }
          <br>
          +/*
          <br>
          + * Helper functions for rstat per CPU lock
          (cgroup_rstat_cpu_lock).
          <br>
          + *
          <br>
          + * This makes it easier to diagnose locking issues and
          contention in
          <br>
          + * production environments. The parameter @fast_path
          determine the
          <br>
          + * tracepoints being added, allowing us to diagnose "flush"
          related
          <br>
          + * operations without handling high-frequency fast-path
          "update" events.
          <br>
          + */
          <br>
          +static __always_inline
          <br>
          +unsigned long _cgroup_rstat_cpu_lock(raw_spinlock_t
          *cpu_lock, int cpu,
          <br>
          +                     struct cgroup *cgrp, const bool
          fast_path)
          <br>
          +{
          <br>
          +    unsigned long flags;
          <br>
          +    bool contended;
          <br>
          +
          <br>
          +    /*
          <br>
          +     * The _irqsave() is needed because cgroup_rstat_lock is
          <br>
          +     * spinlock_t which is a sleeping lock on PREEMPT_RT.
          Acquiring
          <br>
          +     * this lock with the _irq() suffix only disables
          interrupts on
          <br>
          +     * a non-PREEMPT_RT kernel. The raw_spinlock_t below
          disables
          <br>
          +     * interrupts on both configurations. The _irqsave()
          ensures
          <br>
          +     * that interrupts are always disabled and later
          restored.
          <br>
          +     */
          <br>
          +    contended = !raw_spin_trylock_irqsave(cpu_lock, flags);
          <br>
          +    if (contended) {
          <br>
          +        if (fast_path)
          <br>
          +           
          trace_cgroup_rstat_cpu_lock_contended_fastpath(cgrp, cpu,
          contended);
          <br>
          +        else
          <br>
          +            trace_cgroup_rstat_cpu_lock_contended(cgrp, cpu,
          contended);
          <br>
          +
          <br>
          +        raw_spin_lock_irqsave(cpu_lock, flags);
          <br>
        </blockquote>
      </blockquote>
    </blockquote>
    <p>Could you do a local_irq_save() before calling trace_cgroup*()
      and raw_spin_lock()? Would that help in eliminating this high lock
      hold time?</p>
    <p>You can also do a local_irq_save() first before the trylock. That
      will eliminate the duplicated irq_restore() and irq_save() when
      there is contention.<br>
    </p>
    <p>If not, there may be NMIs mixed in.<br>
    </p>
    <p>Thanks,<br>
      Longman<br>
    </p>
    <blockquote type="cite"
      cite="mid:4a680b80-b296-4466-895a-13239b982c85@kernel.org">
      <blockquote type="cite">
        <blockquote type="cite">+    }
          <br>
          +
          <br>
          +    if (fast_path)
          <br>
          +        trace_cgroup_rstat_cpu_locked_fastpath(cgrp, cpu,
          contended);
          <br>
          +    else
          <br>
          +        trace_cgroup_rstat_cpu_locked(cgrp, cpu, contended);
          <br>
          +
          <br>
          +    return flags;
          <br>
          +}
          <br>
          +
          <br>
          +static __always_inline
          <br>
          +void _cgroup_rstat_cpu_unlock(raw_spinlock_t *cpu_lock, int
          cpu,
          <br>
          +                  struct cgroup *cgrp, unsigned long flags,
          <br>
          +                  const bool fast_path)
          <br>
          +{
          <br>
          +    if (fast_path)
          <br>
          +        trace_cgroup_rstat_cpu_unlock_fastpath(cgrp, cpu,
          false);
          <br>
          +    else
          <br>
          +        trace_cgroup_rstat_cpu_unlock(cgrp, cpu, false);
          <br>
          +
          <br>
          +    raw_spin_unlock_irqrestore(cpu_lock, flags);
          <br>
          +}
          <br>
          +
          <br>
            /**
          <br>
             * cgroup_rstat_updated - keep track of updated rstat_cpu
          <br>
             * @cgrp: target cgroup
          <br>
          @@ -44,7 +98,7 @@ __bpf_kfunc void cgroup_rstat_updated(struct
          cgroup *cgrp, int cpu)
          <br>
                if (data_race(cgroup_rstat_cpu(cgrp,
          cpu)-&gt;updated_next))
          <br>
                    return;
          <br>
          -    raw_spin_lock_irqsave(cpu_lock, flags);
          <br>
          +    flags = _cgroup_rstat_cpu_lock(cpu_lock, cpu, cgrp,
          true);
          <br>
                /* put @cgrp and all ancestors on the corresponding
          updated lists */
          <br>
                while (true) {
          <br>
          @@ -72,7 +126,7 @@ __bpf_kfunc void
          cgroup_rstat_updated(struct cgroup *cgrp, int cpu)
          <br>
                    cgrp = parent;
          <br>
                }
          <br>
          -    raw_spin_unlock_irqrestore(cpu_lock, flags);
          <br>
          +    _cgroup_rstat_cpu_unlock(cpu_lock, cpu, cgrp, flags,
          true);
          <br>
            }
          <br>
            /**
          <br>
          @@ -153,15 +207,7 @@ static struct cgroup
          *cgroup_rstat_updated_list(struct cgroup *root, int cpu)
          <br>
                struct cgroup *head = NULL, *parent, *child;
          <br>
                unsigned long flags;
          <br>
          -    /*
          <br>
          -     * The _irqsave() is needed because cgroup_rstat_lock is
          <br>
          -     * spinlock_t which is a sleeping lock on PREEMPT_RT.
          Acquiring
          <br>
          -     * this lock with the _irq() suffix only disables
          interrupts on
          <br>
          -     * a non-PREEMPT_RT kernel. The raw_spinlock_t below
          disables
          <br>
          -     * interrupts on both configurations. The _irqsave()
          ensures
          <br>
          -     * that interrupts are always disabled and later
          restored.
          <br>
          -     */
          <br>
          -    raw_spin_lock_irqsave(cpu_lock, flags);
          <br>
          +    flags = _cgroup_rstat_cpu_lock(cpu_lock, cpu, root,
          false);
          <br>
                /* Return NULL if this subtree is not on-list */
          <br>
                if (!rstatc-&gt;updated_next)
          <br>
          @@ -198,7 +244,7 @@ static struct cgroup
          *cgroup_rstat_updated_list(struct cgroup *root, int cpu)
          <br>
                if (child != root)
          <br>
                    head = cgroup_rstat_push_children(head, child, cpu);
          Changes by <strong>Zhili Li</strong> on 2024/04/18 9:51 PM <br>
            unlock_ret:
          <br>
          -    raw_spin_unlock_irqrestore(cpu_lock, flags);
          <br>
          +    _cgroup_rstat_cpu_unlock(cpu_lock, cpu, root, flags,
          false);
          <br>
                return head;
          <br>
            }
          <br>
          <br>
          <br>
        </blockquote>
        <br>
      </blockquote>
      <br>
    </blockquote>
  </body>
</html>

--------------6JchC95ltvw4Y0gZTVxHEEe1--