From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A325AC3DA61 for ; Thu, 18 Jul 2024 21:44:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CC4916B0082; Thu, 18 Jul 2024 17:44:19 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C74C06B0083; Thu, 18 Jul 2024 17:44:19 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B3B986B0089; Thu, 18 Jul 2024 17:44:19 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 96D2F6B0082 for ; Thu, 18 Jul 2024 17:44:19 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 3BA891A024C for ; Thu, 18 Jul 2024 21:44:19 +0000 (UTC) X-FDA: 82354202238.25.6CE271D Received: from out-189.mta1.migadu.com (out-189.mta1.migadu.com [95.215.58.189]) by imf26.hostedemail.com (Postfix) with ESMTP id 82C5D14001C for ; Thu, 18 Jul 2024 21:44:16 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=tCyFIP8K; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf26.hostedemail.com: domain of kent.overstreet@linux.dev designates 95.215.58.189 as permitted sender) smtp.mailfrom=kent.overstreet@linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721339023; a=rsa-sha256; cv=none; b=LOM3cYTrk0Z6FmsLTZVT6dfCOn23zZvsd84a1aO/wWen16lFpVCEN2peHEgcgD0ahUZbvI bcB6OIX576c0ODdMKVEph2qbFr8wp+YucKuXTEXZbVMJYuMzeQDSYJrpg09TuLtq4pQeEg aJt82GmwIXYfC3TvCDKoalrMqNe042s= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=tCyFIP8K; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf26.hostedemail.com: domain of kent.overstreet@linux.dev designates 95.215.58.189 as permitted sender) smtp.mailfrom=kent.overstreet@linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721339023; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=vRcjS/iHB/YMBfjlrTFTUUEyvvd22iRJeaks2kJ4lCk=; b=NQs+rm5Td2B5Y+p/mC1euysV4m6oAPKH3DmWTSQg18B1WCxjumW+9ggOUdsvEfIkod9Qw6 k8CXvtdzXcp9enyqQayhgcalat5mv1wz/FVMm6QOoMjIGeS6uSDz9lZKKHOiqlr466PHf6 EzTVbJ1P90nRw2WLM2mCzumm6XP88UI= X-Envelope-To: pasha.tatashin@soleen.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1721339054; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=vRcjS/iHB/YMBfjlrTFTUUEyvvd22iRJeaks2kJ4lCk=; b=tCyFIP8K2ww9uIxE7uIITgek9x/8XLOkaaN0OmLFwXXbwq6iwLKnzgteBd/+Xs/R4PbNCK zUaveYFIHq/XnTl7iFmnPuSMM0/VW7COuiMKKWqn4q3QdjYJ/eiS4g+XPLQCQdsEvaaoh5 U3o/ixatn80HJmut1CJbxUcvCDL5Erk= X-Envelope-To: akpm@linux-foundation.org X-Envelope-To: jpoimboe@kernel.org X-Envelope-To: peterz@infradead.org X-Envelope-To: nphamcs@gmail.com X-Envelope-To: cerasuolodomenico@gmail.com X-Envelope-To: surenb@google.com X-Envelope-To: lizhijian@fujitsu.com X-Envelope-To: willy@infradead.org X-Envelope-To: shakeel.butt@linux.dev X-Envelope-To: vbabka@suse.cz X-Envelope-To: ziy@nvidia.com X-Envelope-To: linux-kernel@vger.kernel.org X-Envelope-To: linux-mm@kvack.org Date: Thu, 18 Jul 2024 17:44:12 -0400 X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Kent Overstreet To: Pasha Tatashin Cc: akpm@linux-foundation.org, jpoimboe@kernel.org, peterz@infradead.org, nphamcs@gmail.com, cerasuolodomenico@gmail.com, surenb@google.com, lizhijian@fujitsu.com, willy@infradead.org, shakeel.butt@linux.dev, vbabka@suse.cz, ziy@nvidia.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH v4] vmstat: Kernel stack usage histogram Message-ID: References: <20240718202611.1695164-1-pasha.tatashin@soleen.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20240718202611.1695164-1-pasha.tatashin@soleen.com> X-Migadu-Flow: FLOW_OUT X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 82C5D14001C X-Stat-Signature: ca87bnx95rnugwbgf9c3q3nwke3ikgaa X-Rspam-User: X-HE-Tag: 1721339056-794986 X-HE-Meta: U2FsdGVkX1+Vci2QCTmryYb4UmiYtMNVlQCztNv+H0THT66Tz+Hn1LvLGtpMCexPTIWA4WmwFbzVzQN3Ps4CIxjeq119od/KBGPMANVnlrk3rN10GA5evfR0YnBZQH4iiRv2kDFJwoHJzvQF5WNeWarfqqSaO6qqV39S2vOgvy5y6vYpmHkYRFl1l2odoZjac41or5w1YTrkmO+olwt1dy/Vz9DERvGCPFjWll5CqLHfw1mXXj575zQDjNM1vVXqg4vYFj/AMrIkeWjw74iJCIk1U3z/BorMlKW5q2Mi4fySdEpLixBCXnZVexhNB3+q1pUXGuxp564/+pRxcPqYMzXsECYXshiSqRGd03CTmLM+2Q/tKpp8A+DPxBmCrjS2OgXLGPsV2vamHQPTT0A5AD27tYr0L/tJD/OFL9LqAFYWVptQtn/ptSEYMLbz3rJvVJmAEMbRACGO4OcHQD5v9HbNyZ9kvwlABmclWFFkk9pg9RZLo6jwBOecX7RRGgU9l8YW2p5MUNuqGm9cuIUx3RqZxPXgNNe4m62xpzCnCPVMt9pjY8/y+/M2+sCp50UMfsTHmrO/vSkD9t5K/ggWDxnMApuVW4e+BkGzgCR03HfCd5aJKTjLRy3/7C8LO9p8Bo2QAS10GB+KkQA6ZCs8oAaC5Jgi31A1xVR3MhxQDt3fB2vxc+466SmNhzeMvQlIsp4mBuBdsEutTvBFqBE7Y8Y7cM94gZnef++4OEOMyuBW1iyJp6EUz49TYf1Up0xZZdO3HOeb7RfTan5l8fqZhWm3Pqw+eOJ4TMPdQMIjLuFiVnSE/Au41qcKJ+DLDu/oRkdljRFBizN6XiUGaNpzEYDAYJn/1MQT+iqoOQZ7W3WbxycJc3ma6H29hidtNS7GWfPWAKa8WwtJBElnklR7PG5MMRfe7hsB/ln3hsip3fLgzxHDFLKEx38ueZKfsFnjdq259KmLXeqQoTvhbzz eOm448sk DDIs3v/VXBgOk5sd2Nk0wdRGgK8X/WuAq4oksGEOm7iX0V44oPxTtf/X+y6zuXVPWwVyaepvPyyc2DYCE+I6akOJ2eyEbNpw5A0wZIVNeR191+l8ASawVTZXz7uCqhBLhEsmz09wPRk6BT2fhleCJB569xprgigVh9JHFM34HDg3kur038DOZumQ9RuoA10T/1+5oXtsRjbiHOjjseamfQpXdTWRFTNdLlZIXGmxnABO2r9U= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Jul 18, 2024 at 08:26:11PM GMT, Pasha Tatashin wrote: > As part of the dynamic kernel stack project, we need to know the amount > of data that can be saved by reducing the default kernel stack size [1]. > > Provide a kernel stack usage histogram to aid in optimizing kernel stack > sizes and minimizing memory waste in large-scale environments. The > histogram divides stack usage into power-of-two buckets and reports the > results in /proc/vmstat. This information is especially valuable in > environments with millions of machines, where even small optimizations > can have a significant impact. > > The histogram data is presented in /proc/vmstat with entries like > "kstack_1k", "kstack_2k", and so on, indicating the number of threads > that exited with stack usage falling within each respective bucket. > > Example outputs: > Intel: > $ grep kstack /proc/vmstat > kstack_1k 3 > kstack_2k 188 > kstack_4k 11391 > kstack_8k 243 > kstack_16k 0 > > ARM with 64K page_size: > $ grep kstack /proc/vmstat > kstack_1k 1 > kstack_2k 340 > kstack_4k 25212 > kstack_8k 1659 > kstack_16k 0 > kstack_32k 0 > kstack_64k 0 > > Note: once the dynamic kernel stack is implemented it will depend on the > implementation the usability of this feature: On hardware that supports > faults on kernel stacks, we will have other metrics that show the total > number of pages allocated for stacks. On hardware where faults are not > supported, we will most likely have some optimization where only some > threads are extended, and for those, these metrics will still be very > useful. > > [1] https://lwn.net/Articles/974367 Nice and simple, and this gets us exactly the data we want for dynamic kernel stacks... Reviewed-by: Kent Overstreet > > Signed-off-by: Pasha Tatashin > --- > > Changelog: > v4: > - Expanded the commit message as requested by Andrew Morton. > > include/linux/sched/task_stack.h | 49 ++++++++++++++++++++++++++++++-- > include/linux/vm_event_item.h | 42 +++++++++++++++++++++++++++ > include/linux/vmstat.h | 16 ----------- > mm/vmstat.c | 24 ++++++++++++++++ > 4 files changed, 113 insertions(+), 18 deletions(-) > > diff --git a/include/linux/sched/task_stack.h b/include/linux/sched/task_stack.h > index ccd72b978e1f..65e8c9fb7f9b 100644 > --- a/include/linux/sched/task_stack.h > +++ b/include/linux/sched/task_stack.h > @@ -95,9 +95,51 @@ static inline int object_is_on_stack(const void *obj) > extern void thread_stack_cache_init(void); > > #ifdef CONFIG_DEBUG_STACK_USAGE > +#ifdef CONFIG_VM_EVENT_COUNTERS > +#include > + > +/* Count the maximum pages reached in kernel stacks */ > +static inline void kstack_histogram(unsigned long used_stack) > +{ > + if (used_stack <= 1024) > + this_cpu_inc(vm_event_states.event[KSTACK_1K]); > +#if THREAD_SIZE > 1024 > + else if (used_stack <= 2048) > + this_cpu_inc(vm_event_states.event[KSTACK_2K]); > +#endif > +#if THREAD_SIZE > 2048 > + else if (used_stack <= 4096) > + this_cpu_inc(vm_event_states.event[KSTACK_4K]); > +#endif > +#if THREAD_SIZE > 4096 > + else if (used_stack <= 8192) > + this_cpu_inc(vm_event_states.event[KSTACK_8K]); > +#endif > +#if THREAD_SIZE > 8192 > + else if (used_stack <= 16384) > + this_cpu_inc(vm_event_states.event[KSTACK_16K]); > +#endif > +#if THREAD_SIZE > 16384 > + else if (used_stack <= 32768) > + this_cpu_inc(vm_event_states.event[KSTACK_32K]); > +#endif > +#if THREAD_SIZE > 32768 > + else if (used_stack <= 65536) > + this_cpu_inc(vm_event_states.event[KSTACK_64K]); > +#endif > +#if THREAD_SIZE > 65536 > + else > + this_cpu_inc(vm_event_states.event[KSTACK_REST]); > +#endif > +} > +#else /* !CONFIG_VM_EVENT_COUNTERS */ > +static inline void kstack_histogram(unsigned long used_stack) {} > +#endif /* CONFIG_VM_EVENT_COUNTERS */ > + > static inline unsigned long stack_not_used(struct task_struct *p) > { > unsigned long *n = end_of_stack(p); > + unsigned long unused_stack; > > do { /* Skip over canary */ > # ifdef CONFIG_STACK_GROWSUP > @@ -108,10 +150,13 @@ static inline unsigned long stack_not_used(struct task_struct *p) > } while (!*n); > > # ifdef CONFIG_STACK_GROWSUP > - return (unsigned long)end_of_stack(p) - (unsigned long)n; > + unused_stack = (unsigned long)end_of_stack(p) - (unsigned long)n; > # else > - return (unsigned long)n - (unsigned long)end_of_stack(p); > + unused_stack = (unsigned long)n - (unsigned long)end_of_stack(p); > # endif > + kstack_histogram(THREAD_SIZE - unused_stack); > + > + return unused_stack; > } > #endif > extern void set_task_stack_end_magic(struct task_struct *tsk); > diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h > index 747943bc8cc2..73fa5fbf33a3 100644 > --- a/include/linux/vm_event_item.h > +++ b/include/linux/vm_event_item.h > @@ -154,9 +154,51 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT, > VMA_LOCK_RETRY, > VMA_LOCK_MISS, > #endif > +#ifdef CONFIG_DEBUG_STACK_USAGE > + KSTACK_1K, > +#if THREAD_SIZE > 1024 > + KSTACK_2K, > +#endif > +#if THREAD_SIZE > 2048 > + KSTACK_4K, > +#endif > +#if THREAD_SIZE > 4096 > + KSTACK_8K, > +#endif > +#if THREAD_SIZE > 8192 > + KSTACK_16K, > +#endif > +#if THREAD_SIZE > 16384 > + KSTACK_32K, > +#endif > +#if THREAD_SIZE > 32768 > + KSTACK_64K, > +#endif > +#if THREAD_SIZE > 65536 > + KSTACK_REST, > +#endif > +#endif /* CONFIG_DEBUG_STACK_USAGE */ > NR_VM_EVENT_ITEMS > }; > > +#ifdef CONFIG_VM_EVENT_COUNTERS > +/* > + * Light weight per cpu counter implementation. > + * > + * Counters should only be incremented and no critical kernel component > + * should rely on the counter values. > + * > + * Counters are handled completely inline. On many platforms the code > + * generated will simply be the increment of a global address. > + */ > + > +struct vm_event_state { > + unsigned long event[NR_VM_EVENT_ITEMS]; > +}; > + > +DECLARE_PER_CPU(struct vm_event_state, vm_event_states); > +#endif > + > #ifndef CONFIG_TRANSPARENT_HUGEPAGE > #define THP_FILE_ALLOC ({ BUILD_BUG(); 0; }) > #define THP_FILE_FALLBACK ({ BUILD_BUG(); 0; }) > diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h > index 735eae6e272c..131966a4af78 100644 > --- a/include/linux/vmstat.h > +++ b/include/linux/vmstat.h > @@ -41,22 +41,6 @@ enum writeback_stat_item { > }; > > #ifdef CONFIG_VM_EVENT_COUNTERS > -/* > - * Light weight per cpu counter implementation. > - * > - * Counters should only be incremented and no critical kernel component > - * should rely on the counter values. > - * > - * Counters are handled completely inline. On many platforms the code > - * generated will simply be the increment of a global address. > - */ > - > -struct vm_event_state { > - unsigned long event[NR_VM_EVENT_ITEMS]; > -}; > - > -DECLARE_PER_CPU(struct vm_event_state, vm_event_states); > - > /* > * vm counters are allowed to be racy. Use raw_cpu_ops to avoid the > * local_irq_disable overhead. > diff --git a/mm/vmstat.c b/mm/vmstat.c > index 8507c497218b..642d761b557b 100644 > --- a/mm/vmstat.c > +++ b/mm/vmstat.c > @@ -1416,6 +1416,30 @@ const char * const vmstat_text[] = { > "vma_lock_retry", > "vma_lock_miss", > #endif > +#ifdef CONFIG_DEBUG_STACK_USAGE > + "kstack_1k", > +#if THREAD_SIZE > 1024 > + "kstack_2k", > +#endif > +#if THREAD_SIZE > 2048 > + "kstack_4k", > +#endif > +#if THREAD_SIZE > 4096 > + "kstack_8k", > +#endif > +#if THREAD_SIZE > 8192 > + "kstack_16k", > +#endif > +#if THREAD_SIZE > 16384 > + "kstack_32k", > +#endif > +#if THREAD_SIZE > 32768 > + "kstack_64k", > +#endif > +#if THREAD_SIZE > 65536 > + "kstack_rest", > +#endif > +#endif > #endif /* CONFIG_VM_EVENT_COUNTERS || CONFIG_MEMCG */ > }; > #endif /* CONFIG_PROC_FS || CONFIG_SYSFS || CONFIG_NUMA || CONFIG_MEMCG */ > -- > 2.45.2.1089.g2a221341d9-goog >