From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 68C35C43217 for ; Mon, 7 Nov 2022 21:05:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CE72D6B0071; Mon, 7 Nov 2022 16:05:53 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C96908E0001; Mon, 7 Nov 2022 16:05:53 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B5E2E6B0074; Mon, 7 Nov 2022 16:05:53 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id A3D3D6B0071 for ; Mon, 7 Nov 2022 16:05:53 -0500 (EST) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 486591C5E82 for ; Mon, 7 Nov 2022 21:05:53 +0000 (UTC) X-FDA: 80107878186.21.BADABAF Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf13.hostedemail.com (Postfix) with ESMTP id DA3FE20003 for ; Mon, 7 Nov 2022 21:05:51 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id ED21661321; Mon, 7 Nov 2022 21:05:50 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id A1765C433D6; Mon, 7 Nov 2022 21:05:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1667855150; bh=pCGvl5DYveYXMsDaHTopb0oM8wkmKavzUHbdH4jUHgs=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=tgABxHVouokz8n3DnNqt2QSAGMehXffif3y6AoDURdpIahOWz9R/WY0PNVfXlR7wf Sll1NyfTfyJc/CpP3M+JiDAOsfmQkStRd6xg8QskDT8A8odu20JRewB/vehrD94F6R 31Espiy8bS89tn4UTE+wN7bskNqkQUFsjn9/Ovps= Date: Mon, 7 Nov 2022 13:05:49 -0800 From: Andrew Morton To: Shakeel Butt Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Marek Szyprowski Subject: Re: [PATCH] percpu_counter: add percpu_counter_sum_all interface Message-Id: <20221107130549.db68c48afe5f711b2e99c5c0@linux-foundation.org> In-Reply-To: <20221105014013.930636-1-shakeelb@google.com> References: <20221105014013.930636-1-shakeelb@google.com> X-Mailer: Sylpheed 3.7.0 (GTK+ 2.24.33; x86_64-redhat-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=tgABxHVo; spf=pass (imf13.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1667855152; a=rsa-sha256; cv=none; b=rKL5A2Ya8kNfnFIRFNCr8v4UKLp8h2WhzrLESghn0vQkGm3a4pUcJlC5gjKrpl9ZuFzRMU jLZVrD/5Ev1JZO33NPT7mUQ6qrc0UihB9XGsYjpBPqPvh5sf92eIIEkU/TMGRWKzTcBLHp Y8S5LDRmXR4vJSP5T4e3GIj8fz3zy9k= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1667855152; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=lSHgUpMGLNPmExspi5je5C7xS5RLXYCfPIVwmpUjWD0=; b=gQ0sePmvFsft1s/nkssMqQhsswtG7d3MpSr7zxsUIuo8C+aKol/iI9Y1uRkY+/dRHREIyh Zia2lTxiDAMMZd2weU3OfmxKmxfDKugYla5oVW7iok4Xjv+ZnPOSlBzpAvBAgmpjY0iKF2 X1a9zx6ku90duNgrAO4dcBgZ+pIHlC8= X-Stat-Signature: tw3shu7ef95okiazwqpyigmm31t4zynj X-Rspamd-Queue-Id: DA3FE20003 Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=tgABxHVo; spf=pass (imf13.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspam-User: X-Rspamd-Server: rspam06 X-HE-Tag: 1667855151-78913 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Sat, 5 Nov 2022 01:40:13 +0000 Shakeel Butt wrote: > The percpu_counter is used for scenarios where performance is more > important than the accuracy. For percpu_counter users, who want more > accurate information in their slowpath, percpu_counter_sum is provided > which traverses all the online CPUs to accumulate the data. The reason > it only needs to traverse online CPUs is because percpu_counter does > implement CPU offline callback which syncs the local data of the > offlined CPU. > > However there is a small race window between the online CPUs traversal > of percpu_counter_sum and the CPU offline callback. The offline callback > has to traverse all the percpu_counters on the system to flush the CPU > local data which can be a lot. During that time, the CPU which is going > offline has already been published as offline to all the readers. So, as > the offline callback is running, percpu_counter_sum can be called for > one counter which has some state on the CPU going offline. Since > percpu_counter_sum only traverses online CPUs, it will skip that > specific CPU and the offline callback might not have flushed the state > for that specific percpu_counter on that offlined CPU. OK, got it, thanks. > Normally this is not an issue because percpu_counter users can deal with > some inaccuracy for small time window. However a new user i.e. mm_struct > on the cleanup path wants to check the exact state of the percpu_counter > through check_mm(). For such users, this patch introduces > percpu_counter_sum_all() which traverses all possible CPUs. And uses it in fork.c:check_mm()! > --- a/kernel/fork.c > +++ b/kernel/fork.c > @@ -756,7 +756,7 @@ static void check_mm(struct mm_struct *mm) > "Please make sure 'struct resident_page_types[]' is updated as well"); > > for (i = 0; i < NR_MM_COUNTERS; i++) { > - long x = percpu_counter_sum(&mm->rss_stat[i]); > + long x = percpu_counter_sum_all(&mm->rss_stat[i]); check_mm() just became more expensive in some cases. nr_possible_cpus * 4. I wonder if this is enough for people to start caring about. check_mm() is presently non-optional and I'd be reluctant to change this, given how commonly we see the "BUG: Bad rss-counter state" getting reported (22 million hits in a google search!). We could save a ton of that cost by running percpu_counter_sum() first, then trying percpu_counter_sum_all() if percpu_counter_sum() indicated an error. This is only worth bothering about if the new check_mm() cost is a concern.