From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 40400C636CD for ; Tue, 31 Jan 2023 05:57:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6FED76B0072; Tue, 31 Jan 2023 00:57:48 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 6AEF86B0073; Tue, 31 Jan 2023 00:57:48 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 54F716B0074; Tue, 31 Jan 2023 00:57:48 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 432D46B0072 for ; Tue, 31 Jan 2023 00:57:48 -0500 (EST) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 1DCCE1A0465 for ; Tue, 31 Jan 2023 05:57:48 +0000 (UTC) X-FDA: 80414037816.08.791A9DF Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) by imf17.hostedemail.com (Postfix) with ESMTP id 5989D40011 for ; Tue, 31 Jan 2023 05:57:46 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=TWSTimuw; spf=pass (imf17.hostedemail.com: domain of 32a3YYwgKCLwujcmggndiqqing.eqonkpwz-oomxcem.qti@flex--shakeelb.bounces.google.com designates 209.85.216.74 as permitted sender) smtp.mailfrom=32a3YYwgKCLwujcmggndiqqing.eqonkpwz-oomxcem.qti@flex--shakeelb.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1675144666; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=7exrOe2on2riTg2cmysFL7OOGBYQJlqYnbFHy7kj8SQ=; b=FL3QmUz4iMKIYkgClF/bl0+3Xp5GRtlX2WFexIxUUOYS/u7dG0vtrlYUBzRWoC22x0PUyd zNsb98dWapaizJTtPrdc+ffUqiv4J7q4xEPoiXe8mMO1bz3oAKsyxLX/tt4VZ3bV/LQaMy IXr+mMEChGtJ+CpAb/dguAytXnzO9I4= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=TWSTimuw; spf=pass (imf17.hostedemail.com: domain of 32a3YYwgKCLwujcmggndiqqing.eqonkpwz-oomxcem.qti@flex--shakeelb.bounces.google.com designates 209.85.216.74 as permitted sender) smtp.mailfrom=32a3YYwgKCLwujcmggndiqqing.eqonkpwz-oomxcem.qti@flex--shakeelb.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1675144666; a=rsa-sha256; cv=none; b=7ZgAemn3kbq5x1mq08/KEXS6ySKmy1Ya5D9avF1513t45P3R7WHOYb4Txq5oJgImvYCO6p zJL9AvF1vSraH6lvWplOOlZSksP4ug0TJGdvmk3Vgeq1qaE9H6tUkQYxc+Hm6Cp3ldf7mI VwsHADp66AYsnmorbRdD5nl/yKebwzU= Received: by mail-pj1-f74.google.com with SMTP id r17-20020a17090aa09100b0021903e75f14so5682012pjp.9 for ; Mon, 30 Jan 2023 21:57:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=7exrOe2on2riTg2cmysFL7OOGBYQJlqYnbFHy7kj8SQ=; b=TWSTimuwBiO123TlBjYWCTliEIrPxGdALcNi4MdFFrlvRGKAndrgyhBUDw3fHKwxT+ k7wWg3I9XQZYnnY8N8pFkaA1ZvoSNjKF+FexKZ6uWuGH26jZ+bOjwlsHWTzmfstcNpj/ 1JEPBdPTfBynboPGiFIFOz+DOLEjJJXHpSpmZLkzj084c/QkCzS5QrflnCw/nGAJcQar uBKfDU4xqaw6L4XdiPwsIVYymwpjjd6xVNB/PfLFSaxdsqTyo4WEtmL0/QAQD8byWlKG zdaHgS6rjFndbKQ9aD+G3XframQKx41QHAKJQcEe2yExT4cM6nfAjVm3QBwXBAx0oKZk Ra1Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=7exrOe2on2riTg2cmysFL7OOGBYQJlqYnbFHy7kj8SQ=; b=W1Xo3uNyAffq+a6VtHy1Tj6rOBby1VQes9JbelXhzCSuGm4vSYuB+1WrGewytzX0CU Elsr74qEYU3tOsKWv0Dm0PNQhCAn60rO5bV/KoEtGpXYVI6W4xh0olu8kAKlg6cIg6vb TOXB65OyT0w2W+aeKRJzxPq91WHOBtF9ZbO/d4HRn4D962viA4/zCoH1zqPDkWu5She1 UGrxTmNKK2qu4v8geqh629GTvO1NMAXBFjyTYnQHylLEYGk8z8mWGRlNayoF5Dbk9ShC Nxb9ATeMHygR6pHV102k+0fOlypZPgXOrEd69btxDUn1o4femIV+znNHu2B0kvek8Q2K JQTA== X-Gm-Message-State: AFqh2krFTS5UWJd63Ieb1LsiBM4tA7U+HTElcv5hI38OMGrsze6q+KUw LeJYkIGx01ts6cl38SSXdL4OsSNqjtcOdg== X-Google-Smtp-Source: AMrXdXuHKCfB2pyWuLh0HSd4OVD5BMeJyszPXnBbqYqJ146ngSaToqqFicpkrv8GNUM55+3EZREsbS5dolzSpQ== X-Received: from shakeelb.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:262e]) (user=shakeelb job=sendgmr) by 2002:a17:90a:6b07:b0:22b:b70f:fa46 with SMTP id v7-20020a17090a6b0700b0022bb70ffa46mr5730992pjj.107.1675144665182; Mon, 30 Jan 2023 21:57:45 -0800 (PST) Date: Tue, 31 Jan 2023 05:57:43 +0000 In-Reply-To: Mime-Version: 1.0 References: <202301301057.e55dad5b-oliver.sang@intel.com> <20230131052352.5qnqegzwmt7akk7t@google.com> Message-ID: <20230131055743.tsilxx5vfl6gx4dj@google.com> Subject: Re: [linus:master] [mm] f1a7941243: unixbench.score -19.2% regression From: Shakeel Butt To: Matthew Wilcox Cc: kernel test robot , oe-lkp@lists.linux.dev, lkp@intel.com, linux-kernel@vger.kernel.org, Andrew Morton , Marek Szyprowski , linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org, ying.huang@intel.com, feng.tang@intel.com, zhengjun.xing@linux.intel.com, fengwei.yin@intel.com Content-Type: text/plain; charset="us-ascii" X-Rspam-User: X-Rspamd-Server: rspam03 X-Stat-Signature: jms67rki3hrobo9hachm379fuj3ef1kc X-Rspamd-Queue-Id: 5989D40011 X-HE-Tag: 1675144666-207850 X-HE-Meta: U2FsdGVkX19qKCeT1GvhXokKOyICM2xOKlUXdL2JFIS2bRRAnFByR7d3pLOpHlRCWt6D5a/t9RGjFFu3gS0xti4jPpBV2TR+pR5RsYtptfcAq3lhZFss2azyUblQeLy+BagQJnamxW/mrSBKdwdO3s8tAl6gEMwiThQXcO84QH6xFTqV8fBj2QndYiDEaO2JHn+4FRxVjiijn6pne6XxQ9bF2c5kLHixIX8H5HPSiKO7JB8fxGJSdCbdp6Itg9z5PSg08skz8jw8TAdvYoC4eShxz+OuhtBpy6difIHOPJuLWKdeq5BduUrWmNOeFVEgoWwORrV94xW07lxfCPYnkpcrtWwgSR6YMYh0pCDC3etrMbNPWamTqJx91sP6CVpoOtruLxWX2048zSA9G//LDfqPWqhp2HNmlmWSlENTIe4uAR7RXYbwe6EtC6IKPWeOr12NnshJJIIpbWF+HFOKCaBDwZGEtRvCFUX0PqmZKLRYutcJETwqmf8ZNzuPXupCY2ZzfZ/ELVDfkzUcP+kqcARXim044NdWry1KZUUhyJbX8SUfq6jL47n2Gj8CHkK4sY0XOKbbc6qn5tn2SdVIYeUUyS6Gi7sXncYhEGrlfDmFSYaZ7avFjF6nnU9VCOexj2C+8cRhKAe+od6RfpwzA9Ogxhw5WZsy5aRiLrLLc9g25SAoU0hTV5RZIoJJPHUP8wd+Rh2XlVwLgDfSJww4jZKd8QKkJt3z4DV19SWnoPzh4dGrqtXgGkbKFGkvfeY0hw+IS7rMdOExMFu0fJadooSFSR43WlROJTfShctixZgGfwHaL2grPXKzDXCGoX/zGAfm/2iYc3HR0WeRWrlt9UGTKqLXMdDfVvxlU9wu1YwmpqlNYSU/fB4RtH4Tv3EimSbEixLCSpQWfGE8nYEtfx1ilz8qTsCXtok211UwUhZ5NWpcpVhf9PqkVK3zRxBKM9lnulOYUfHD/djwexE +uFlVhLG CTc4zyhEdnH5RKhBrzVpAFsghiR7t6IxjTGfvs7puvssCPtvwJ0jihb5eF7sEtGiGAbHuInSd2eU3l2juAxOip+yixRu33S2n3vfNzoUCwX5cVkIi30ldxaB8QZXYSAv/JQ7h6RWM7oKxewOOP3VulcfkRCOi0FZGVKewy5VpnFePsJXT7JueTKejyJF+u6WAENyXM26AgIVmmhhp43zHNfP9HR9mjApbTR39WflzrVMAjel0QUgnt/4UEQxm2nPbajvRF4Xziqne46rcok8jI2N3Neg8pO19sqsN+05zu9438xdAjq7bKdAdqtgNlxdLVS+SmoVw3Ms2c50ZTU+pRXomjIx+scv2/ZT8yGtaEK8GO6l9ogKo4KzGfaUHhZQoq7UgPgyb3RJC2Ei/LPgWQV0iAjgRpKTcXsxWoHlRzAQsQPUusZ24oYgczf2zRO6h0CI0Infj/MislCf2cPTKow3OsP+M6Q1E3qc14D0QlhxWcxI/HMvF+ojiYSYYFoirLIYGVVe0y/wtIgQd+xgqR5V5CxusCjsz5Oi5+LeuyB74E8NGRLtKI8YrywuPrWi9i4J08m/fSVB+LQjusVT5c/L5+QMaQC3w0M0UI8uhZGWz+tHwsAkuzM12kQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.004774, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Jan 31, 2023 at 05:45:21AM +0000, Matthew Wilcox wrote: [...] > > I ran perf and it seems like percpu counter allocation is the additional > > cost with this patch. See the report below. However I made spawn a bit > > more sophisticated by adding a mmap() of a GiB then the page table > > copy became the significant cost and no difference without or with the > > given patch. > > > > I am now wondering if this fork ping pong really an important workload > > that we should revert the patch or ignore for now but work on improving > > the performance of __alloc_percpu_gfp code. > > > > > > - 90.97% 0.06% spawn [kernel.kallsyms] [k] entry_SYSCALL_64_after_hwframe > > - 90.91% entry_SYSCALL_64_after_hwframe > > - 90.86% do_syscall_64 > > - 80.03% __x64_sys_clone > > - 79.98% kernel_clone > > - 75.97% copy_process > > + 46.04% perf_event_init_task > > - 21.50% copy_mm > > - 10.05% mm_init > > ----------------------> - 8.92% __percpu_counter_init > > - 8.67% __alloc_percpu_gfp > > - 5.70% pcpu_alloc > > 5.7% of our time spent in pcpu_alloc seems excessive. Are we contending > on pcpu_alloc_mutex perhaps? Also, are you doing this on a 4-socket > machine like the kernel test robot ran on? I ran on 2-socket machine and I am not sure about pcpu_alloc_mutex but I doubt that because I ran a single instance of the spawn test i.e. a single fork ping pong. > > We could cut down the number of calls to pcpu_alloc() by a factor of 4 > by having a pcpu_alloc_bulk() that would allocate all four RSS counters > at once. > > Just throwing out ideas ... Thanks, I will take a stab at pcpu_alloc_bulk() and will share the result tomorrow. thanks, Shakeel