From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 60FBDC4332F for ; Fri, 4 Nov 2022 00:19:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 78C656B0072; Thu, 3 Nov 2022 20:19:04 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 73AF86B0073; Thu, 3 Nov 2022 20:19:04 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 602866B0074; Thu, 3 Nov 2022 20:19:04 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 4D23D6B0072 for ; Thu, 3 Nov 2022 20:19:04 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 16D93A01CF for ; Fri, 4 Nov 2022 00:19:04 +0000 (UTC) X-FDA: 80093849808.12.E4BF6B6 Received: from mail-yw1-f178.google.com (mail-yw1-f178.google.com [209.85.128.178]) by imf27.hostedemail.com (Postfix) with ESMTP id A55E540003 for ; Fri, 4 Nov 2022 00:19:03 +0000 (UTC) Received: by mail-yw1-f178.google.com with SMTP id 00721157ae682-3691e040abaso30545717b3.9 for ; Thu, 03 Nov 2022 17:19:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=Ij+8EORhQiwBgvoQHaY6EZ7bPKWoMGciFgaPFeIyDFo=; b=Fy2LdhzVIg5UUciraKhJnMoGEMNeueoP4dkiRBNXBu2atdul2scMBan+NuzqWShUJO xSK2OEHrwI1daR07YakJUr+8D11Lh/gMCgoZYGm4X/pc5ONg7LDRgKWZKqtVQSngbk2t JmLrWSz1NUKX+xIhX0Y3WH4PUG0XlFmKTbMvEZCCAySEr56meiA0e+8TTimRgNNMpIqZ YyhhKdcm/f/B0Ke7yBGhe8b7ZhKV8OE+ZORAbjhmWn/Uswx34dUC7dmNPHoyKNLJzIZX 6E2HA9DnJ1f6WuiDPYNJ0wrVTRPXMN1dNB9UUsyschX9uGrDWi258tseuuY2kQiXh9By v47w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=Ij+8EORhQiwBgvoQHaY6EZ7bPKWoMGciFgaPFeIyDFo=; b=qqVtCI8Xf+xcVeLItppJGfcaxXxPLHf71M+JFSQ0tRTY4ST8uSLshqJ0OSnldF26Kv WKFBp1DrfG0w3LQceufT2Ej1VbtH4V16Dp667i0i6Up+uMkEq7lyfE6FbHKfcLKQrin0 3amQDL23jkzsfIg7S7NRsJTroOPGag4l8f8nNoSjvePbiBY0UhSTuEB4qMD8e7UB1+Vq XYU/Y2+VwVviDSeCRrtLnaMK49byUaQSHUUkB1pMuG59ls93nyhE1IEsuEX03AOJPBkL JQd6JhB2xYHfwPbfXjnh70Q0l5kmCjIrG4wJZRWU02En3DtPDjkjTqfFLsag/N1YzqYU vViw== X-Gm-Message-State: ACrzQf3/8GcdRcB6Ty7PFP10gJUishYBp/h1DbISRZoEb+cMBkMKYnnX Sw2ZqyPzvz/LdoGW2JN/J8YS6ImFfD7GgGrjZyAwgA== X-Google-Smtp-Source: AMsMyM5U3/1DYtWBUcLAPvfgAH1t8of7qHVgezwezwzzU14jnPTYPtNtfhkHA8Gr1l+vOubRKuwsYbaHUyRFioSz7sA= X-Received: by 2002:a0d:ff01:0:b0:353:380e:ca03 with SMTP id p1-20020a0dff01000000b00353380eca03mr31203256ywf.466.1667521142759; Thu, 03 Nov 2022 17:19:02 -0700 (PDT) MIME-Version: 1.0 References: <20221024052841.3291983-1-shakeelb@google.com> <20221103171407.ydubp43x7tzahriq@google.com> <38797f54-3287-496f-a65e-755c1f025e0a@samsung.com> In-Reply-To: <38797f54-3287-496f-a65e-755c1f025e0a@samsung.com> From: Shakeel Butt Date: Thu, 3 Nov 2022 17:18:51 -0700 Message-ID: Subject: Re: [PATCH] mm: convert mm's rss stats into percpu_counter To: Marek Szyprowski Cc: Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=Fy2LdhzV; spf=pass (imf27.hostedemail.com: domain of shakeelb@google.com designates 209.85.128.178 as permitted sender) smtp.mailfrom=shakeelb@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1667521143; a=rsa-sha256; cv=none; b=KclOi1vzdyLwrvQTmSEzYY9enP7WIIUsgey5V8a9k9ChSMJuNW+Ydt3alRbTBwtvC1rTdD oqvxw9iV+aQ0POvumyRvOJVsMOyjkCN+RnnGEEnvXYOX4I4OeG+Ql+xWrbFIIFT+M7EKYK zLxt+s07i9/bLvDMBQ+zBBwzlpPDFOE= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1667521143; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Ij+8EORhQiwBgvoQHaY6EZ7bPKWoMGciFgaPFeIyDFo=; b=Ey6eNoWEPFUlrLIdEMgLplsxzZQUrlpHSXUh9ZWAkQ4YyGzqmcRSZyUmCRagy+/qHGazgp gmHKjjolFbMTdXbxK0QOXJZXaGtzgQVQXeN28JLCkqew9oIWQqUMq58vCL67PmSl0u8CJE bcx1gQENhgWt7wY8alJ4OAIh2MCh+0s= X-Stat-Signature: yenhd5zc6c6ehs9okoo31qa8y555t3db X-Rspamd-Queue-Id: A55E540003 Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=Fy2LdhzV; spf=pass (imf27.hostedemail.com: domain of shakeelb@google.com designates 209.85.128.178 as permitted sender) smtp.mailfrom=shakeelb@google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspam-User: X-Rspamd-Server: rspam12 X-HE-Tag: 1667521143-247046 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Nov 3, 2022 at 4:02 PM Marek Szyprowski wrote: > > Hi, > > On 03.11.2022 18:14, Shakeel Butt wrote: > > On Wed, Nov 02, 2022 at 10:09:57PM +0100, Marek Szyprowski wrote: > >> On 24.10.2022 07:28, Shakeel Butt wrote: > >>> Currently mm_struct maintains rss_stats which are updated on page fault > >>> and the unmapping codepaths. For page fault codepath the updates are > >>> cached per thread with the batch of TASK_RSS_EVENTS_THRESH which is 64. > >>> The reason for caching is performance for multithreaded applications > >>> otherwise the rss_stats updates may become hotspot for such > >>> applications. > >>> > >>> However this optimization comes with the cost of error margin in the rss > >>> stats. The rss_stats for applications with large number of threads can > >>> be very skewed. At worst the error margin is (nr_threads * 64) and we > >>> have a lot of applications with 100s of threads, so the error margin can > >>> be very high. Internally we had to reduce TASK_RSS_EVENTS_THRESH to 32. > >>> > >>> Recently we started seeing the unbounded errors for rss_stats for > >>> specific applications which use TCP rx0cp. It seems like > >>> vm_insert_pages() codepath does not sync rss_stats at all. > >>> > >>> This patch converts the rss_stats into percpu_counter to convert the > >>> error margin from (nr_threads * 64) to approximately (nr_cpus ^ 2). > >>> However this conversion enable us to get the accurate stats for > >>> situations where accuracy is more important than the cpu cost. Though > >>> this patch does not make such tradeoffs. > >>> > >>> Signed-off-by: Shakeel Butt > >> This patch landed recently in linux-next as commit d59f19a7a068 ("mm: > >> convert mm's rss stats into percpu_counter"). Unfortunately it causes a > >> regression on my test systems. I've noticed that it triggers a 'BUG: Bad > >> rss-counter state' warning from time to time for random processes. This > >> is somehow related to CPU hot-plug and/or system suspend/resume. The > >> easiest way to reproduce this issue (although not always) on my test > >> systems (ARM or ARM64 based) is to run the following commands: > >> > >> root@target:~# for i in /sys/devices/system/cpu/cpu[1-9]; do echo 0 > >> >$i/online; > >> BUG: Bad rss-counter state mm:f04c7160 type:MM_FILEPAGES val:1 > >> BUG: Bad rss-counter state mm:50f1f502 type:MM_FILEPAGES val:2 > >> BUG: Bad rss-counter state mm:50f1f502 type:MM_ANONPAGES val:15 > >> BUG: Bad rss-counter state mm:63660fd0 type:MM_FILEPAGES val:2 > >> BUG: Bad rss-counter state mm:63660fd0 type:MM_ANONPAGES val:15 > >> > >> Let me know if I can help debugging this somehow or testing a fix. > >> > > Hi Marek, > > > > Thanks for the report. It seems like there is a race between > > for_each_online_cpu() in __percpu_counter_sum() and > > percpu_counter_cpu_dead()/cpu-offlining. Normally this race is fine for > > percpu_counter users but for check_mm() is not happy with this race. Can > > you please try the following patch: > > > > > > From: Shakeel Butt > > Date: Thu, 3 Nov 2022 06:05:13 +0000 > > Subject: [PATCH] mm: percpu_counter: use race free percpu_counter sum > > interface > > > > percpu_counter_sum can race with cpu offlining. Add a new interface > > which does not race with it and use that for check_mm(). > > --- > > include/linux/percpu_counter.h | 11 +++++++++++ > > kernel/fork.c | 2 +- > > lib/percpu_counter.c | 24 ++++++++++++++++++------ > > 3 files changed, 30 insertions(+), 7 deletions(-) > > > Yes, this seems to fix the issue I've reported. Feel free to add: > > Reported-by: Marek Szyprowski > > Tested-by: Marek Szyprowski > > Thanks a lot Marek. I will send out a formal patch later with your reported-by and tested-by tags.