From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1E6C9C25B74 for ; Thu, 16 May 2024 11:51:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 62DFC6B03AA; Thu, 16 May 2024 07:51:14 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5B6F06B03AC; Thu, 16 May 2024 07:51:14 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 431EB6B03AD; Thu, 16 May 2024 07:51:14 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 207586B03AA for ; Thu, 16 May 2024 07:51:14 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id C4D3DC1712 for ; Thu, 16 May 2024 11:51:13 +0000 (UTC) X-FDA: 82124093226.13.0A6E738 Received: from mail-lj1-f182.google.com (mail-lj1-f182.google.com [209.85.208.182]) by imf17.hostedemail.com (Postfix) with ESMTP id DFF4540013 for ; Thu, 16 May 2024 11:51:10 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Pfq+Mobq; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf17.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.208.182 as permitted sender) smtp.mailfrom=ryncsn@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1715860271; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=2RFvru3qNs5Q1GzuN1S7DvoV9ae5pgaXt+awmuUkakc=; b=HFxO08Syh627KUa6nalzi6kMQKyPpH4jP1iTU9HMlXSwjlHfVGysjo4VExC7TqFb9vMWH1 NLPy9diRmSEGrO51fepV4nMl1LjpOrh7D6ox7g44o0rmY6ZGFTBxru3kjDY0dmH5do9x/N HtudcQTywpUH0tpPkGdU7fG0ZDQCdJs= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Pfq+Mobq; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf17.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.208.182 as permitted sender) smtp.mailfrom=ryncsn@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1715860271; a=rsa-sha256; cv=none; b=A3qNS74T74m+IFij/RDdVMEC2vD+chsdSqvdGos11bKdzvPk3ZngyKeKJUukfHi1Oy+kdG ybB9SmfEt01vjxBizb5yT1jew5Hh/IxoS8G/+pAb6LREaStKDz+1aUIxzXn1bYKnIdS50a M7fBibwPT/H5mJfiIRqdnay5RbdZuqU= Received: by mail-lj1-f182.google.com with SMTP id 38308e7fff4ca-2e564cad1f6so7132981fa.1 for ; Thu, 16 May 2024 04:51:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1715860269; x=1716465069; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=2RFvru3qNs5Q1GzuN1S7DvoV9ae5pgaXt+awmuUkakc=; b=Pfq+Mobq3YT/boKb7tGL2bBITOwDfjSO7w8aIfToVmtaw9fwAhwOowWHeXbOItztb9 wgvCTy8YXK/qi/uDU++oVXxy4Rgonk6t8fV+I/FtO7FSe0tZOvRW1yg4Fl0M5UYsAM30 e2JCJSNhq6vUEAXS/qaGwfo6SaJVIMlNV/9InB1BCDm0j15Ftpk5CHT7dpQ7VdSyePxs jcrM8shCYIa8hQQTcN4XsfvgLqZpe7QriVFffHc7Gi4AKz4OXH1UkOrA8Iv32FZL0dtb z1jc6NQ3xAR+r1LSS/3ZI8tSrk3mLTQ7ViFSVwIrzYJDSjykORAhP/h6WueBcc09Oxgk wBzQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1715860269; x=1716465069; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=2RFvru3qNs5Q1GzuN1S7DvoV9ae5pgaXt+awmuUkakc=; b=MRAqt6GtH8civvOl7TlLrdQq9jJt82ryhQuKs6YGj/W8Oqk1ENVpSFk3gqw7JjYdkZ grCeTsHWoSHAXHA0mCph1tJ/LriRhR8TUXDPPwI3jGXhLGoALwafnkFg6H6wvcRmkxBq Hx3ennrWYBrC/fb+8NzYRbI7SlBkemNo5n1OwAYX1Nf1BOqZoBmWFAswund0IZD2yN7r n0+sl8QJVcFz7Qg76brITtJxD2G/qTEZI1zkJWXzdWX2g/RWmzj2+3kDeVpLp6/mPiSS jJ0pXBuqdNYUPI6XZVp4h/eGjRB711PHyqW5zRse6lyOS2eWqGIbT/xYZOfG+KBdQXCg mlEA== X-Forwarded-Encrypted: i=1; AJvYcCUmp6F+J1U0I3IDLPzlikLNMzaFFWfQaZTxkwdyn5G01wOVBM9nKXX9KweZwNvXVcQMt5GgMWB+Q4djToCmx0up4Qo= X-Gm-Message-State: AOJu0Yyo5WCz+13UWaJ7RTFqiuYdHlB1buJB289p3w/PUh92ix03xx4v MmgTVvT+dW96HMng5tynYWkb1/SsA3CV2pPOpT4Nc847V9xv7rAW22RYlE9tpcQQKRAZqX5oldn MboPCrG7XAM4bauWyJynMp3L8c7xy+p9S/Sg= X-Google-Smtp-Source: AGHT+IHJSkt+9LtPwNrcnP/V+x/wC3MzMQ76225JAhHnRc8Pez8DttnaDS3TUlvL25SMxe5ejVccAkTR7sRW2aT/BAA= X-Received: by 2002:a2e:3218:0:b0:2d8:3e60:b9c9 with SMTP id 38308e7fff4ca-2e5204b2e71mr115151651fa.33.1715860268727; Thu, 16 May 2024 04:51:08 -0700 (PDT) MIME-Version: 1.0 References: <20240418142008.2775308-1-zhangpeng362@huawei.com> <20240418142008.2775308-3-zhangpeng362@huawei.com> In-Reply-To: From: Kairui Song Date: Thu, 16 May 2024 19:50:52 +0800 Message-ID: Subject: Re: [RFC PATCH v2 2/2] mm: convert mm's rss stats to use atomic mode To: "zhangpeng (AS)" Cc: Rongwei Wang , linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, dennisszhou@gmail.com, shakeelb@google.com, jack@suse.cz, surenb@google.com, kent.overstreet@linux.dev, mhocko@suse.cz, vbabka@suse.cz, yuzhao@google.com, yu.ma@intel.com, wangkefeng.wang@huawei.com, sunnanyong@huawei.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: DFF4540013 X-Stat-Signature: 1gzp3swg5ktdm369ek46t6f6rf81p5bu X-Rspam-User: X-Rspamd-Server: rspam11 X-HE-Tag: 1715860270-411844 X-HE-Meta: U2FsdGVkX19pe4cj8p09JbMynurYwAzUAF2OQi4E2j4/vQQxqia3viCOv6u1WulmFteFLNVdMgMqAyijU2I3Lko83JCvEefI+q68iuY7Zkrs1GJk0VEHIhck9KlTlSFsrG6t7Kom43k1ndguJzYA6qc2WUZHzfLuOGiNm6E/6+3RZEjiLjRLG+opld5PMk3ms460ELYfM5Ou7wWzTpvUt5JIZiGYMP6YSd+ZTlXXIwktyweZSUEFOAkloiBmt12eqvdW+JIQG2Lrw8g7Qz8r4yTwo89yLW9Gerloz80PeAJhhMHMT81wNe1+MQ8Gep7FqQXD800UU6/HXBlXjhai70p8ovyHq9mt/hGaEPQtDpocO7aOLWIFMYGRN2x/BdJy/oTbkLTkrZKZVxE48UxylOWm0aV4BMGG87xdqRtQ196Yo39QseWn2tbaPSlX1a52bMH8Ho6ksb9w8NBqcwbYfrVTz1YyQTXUYRPqG6wCZvfXRdYsfpYP01wwV3+kcSf0454qI0yF6daCK7WtVgNB491N4JI38hLJJekvYjM+PYpzGYJ0yGPo3EUjj035R10N00bxkXT15i9IQlCWJ/wtz8w0B3HC2E8gevx7jlEeBgvUv4P5YHY1AeKx1s7GNu2sEAhYiu+iATNHTzoSKOfe6Cg7EeE9JIMbV5JDsXLJxvHgNLK5LYagxuOjiuUxlPFTRmsB/8R5+VhJq6qApCcqqA8JjwNudpcrjP2WQoTAZpHrMnJWJf23zOKufpAzInz3oWBYDoAE2sLiZCQ2DajGcyVpK3ZN2roZp36kKxrRpusSjn1Tkyxn/CZSXGtA0UrBS6XHgz+jyKH2KUTC3/6Ii3WUTxM1xcjVtw1Z5UO087b/uxiGpgJgmq8SPyYdFWORpfJTwZ4XbD3y72KF1djDDVOY5FJERf09HpwWEuVIH5TI3U5JgKYyLHbslId8TmUgHlXv258js2sjaJsq8eH zLMUJlcK 6G08tMFjV2LsBazcC4OsKOA/Na+vzLm6GS2uj7O3DPOUBh8nlc1HxqyvzT4u0LTt93eu9Sw7aIlVEP4jYNIE5DoiYHJqhbgWUma86vALtPgNlqJ+/iv68nVRtU2jfQ1iL6ljlg4gjvEtnmHtzdPTb7G7GsiCLKsWaG7OjkkQ8N3v8/eNVacPHEpmnHL2t1cUfouo5Ga/reyJ+sFJTVitAxzIs8fMg7DOT1TQmGYz52m4PzgkPjE0phhN6arVMWoT/j8yAh+yWpGJf5TyzSCRaO08k054ft8DYCxufYtL+vKdN6CeuRO7alQKMujECoCy0cgzuz+q7OFSY4xnrFUiBnsU845O/CQmi8NYRAQHXiYfGs3IUAnwFPUwQ2hVS9v9wr2jPMTGLmXz/uvadZLtTmTHCdGPb39jtum1SDlS61M5qsFYnwLvJnIinT927FLZ7vciG+jQT0roYdrx8vwO9syiGaIctJ864nqRuJGm47evqL3F5XF0MDYgHT69/0e4vxmIP5pcDUCYbwlGsItRE2MTfBw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Apr 19, 2024 at 11:32=E2=80=AFAM zhangpeng (AS) wrote: > On 2024/4/19 10:30, Rongwei Wang wrote: > > On 2024/4/18 22:20, Peng Zhang wrote: > >> From: ZhangPeng > >> > >> Since commit f1a7941243c1 ("mm: convert mm's rss stats into > >> percpu_counter"), the rss_stats have converted into percpu_counter, > >> which convert the error margin from (nr_threads * 64) to approximately > >> (nr_cpus ^ 2). However, the new percpu allocation in mm_init() causes = a > >> performance regression on fork/exec/shell. Even after commit > >> 14ef95be6f55 > >> ("kernel/fork: group allocation/free of per-cpu counters for mm > >> struct"), > >> the performance of fork/exec/shell is still poor compared to previous > >> kernel versions. > >> > >> To mitigate performance regression, we delay the allocation of percpu > >> memory for rss_stats. Therefore, we convert mm's rss stats to use > >> percpu_counter atomic mode. For single-thread processes, rss_stat is i= n > >> atomic mode, which reduces the memory consumption and performance > >> regression caused by using percpu. For multiple-thread processes, > >> rss_stat is switched to the percpu mode to reduce the error margin. > >> We convert rss_stats from atomic mode to percpu mode only when the > >> second thread is created. > > Hi, Zhang Peng > > > > This regression we also found it in lmbench these days. I have not > > test your patch, but it seems will solve a lot for it. > > And I see this patch not fix the regression in multi-threads, that's > > because of the rss_stat switched to percpu mode? > > (If I'm wrong, please correct me.) And It seems percpu_counter also > > has a bad effect in exit_mmap(). > > > > If so, I'm wondering if we can further improving it on the exit_mmap() > > path in multi-threads scenario, e.g. to determine which CPUs the > > process has run on (mm_cpumask()? I'm not sure). > > > Hi, Rongwei, > > Yes, this patch only fixes the regression in single-thread processes. How > much bad effect does percpu_counter have in exit_mmap()? IMHO, the additi= on > of mm counter is already in batch mode, maybe I miss something? > Hi, Peng Zhang, Rongwei, and all: I've a patch series that is earlier than commit f1a7941243c1 ("mm: convert mm's rss stats into percpu_counter"): https://lwn.net/ml/linux-kernel/20220728204511.56348-1-ryncsn@gmail.com/ Instead of a per-mm-per-cpu cache, it used only one global per-cpu cache, and flush it on schedule. Or, if the arch supports, flush and fetch it use mm bitmap as an optimization (like tlb shootdown). Unfortunately it didn't get much attention and I moved to work on other thi= ngs. I also noticed the fork regression issue, so I did a local rebase of my previous patch, and revert f1a7941243c1. The result is looking good, on my 32 core VM machine, I see similar improvement as the one you posted (alloc/free on fork/exit is gone), I also see minor improvement with database tests, memory usage is lower by a little bit too (no more per-mm cache), and I think the error margin in my patch should be close to zero. I hope I can get some attention here for my idea...