From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 12594C3DA6E for ; Wed, 20 Dec 2023 20:14:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9EF296B0078; Wed, 20 Dec 2023 15:14:26 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 9A0018D0002; Wed, 20 Dec 2023 15:14:26 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 840A58D0001; Wed, 20 Dec 2023 15:14:26 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 70FEA6B0078 for ; Wed, 20 Dec 2023 15:14:26 -0500 (EST) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 1D689160883 for ; Wed, 20 Dec 2023 20:14:26 +0000 (UTC) X-FDA: 81588298932.18.4BAB07D Received: from mail-pf1-f182.google.com (mail-pf1-f182.google.com [209.85.210.182]) by imf23.hostedemail.com (Postfix) with ESMTP id 52BE6140015 for ; Wed, 20 Dec 2023 20:14:24 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=BGU+whno; spf=pass (imf23.hostedemail.com: domain of shy828301@gmail.com designates 209.85.210.182 as permitted sender) smtp.mailfrom=shy828301@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1703103264; a=rsa-sha256; cv=none; b=DFAecMfRChJ+Ax9y1wmthCY3cZgjvM0ISOD4HTUCKzK6+OKLWPaUMyhZnadRJ/uy/pIq3q dBgN5a8aWef13NRpLWYUk828MShW0AxAr6h+nWti6GVfk12ZbQowDuZ4c6eMVSoGBShRl7 XD7uw6nC8rIG/c4Ux3V7VGB5J0FIUQw= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=BGU+whno; spf=pass (imf23.hostedemail.com: domain of shy828301@gmail.com designates 209.85.210.182 as permitted sender) smtp.mailfrom=shy828301@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1703103264; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=tNiy0jH8FDZZVZiqQIhmWtF+05OpH5cCP9I8S3ulDiA=; b=xS6YtJP6Ak8sjB6RNCetgo71MU0q7VbE2Ijyd1rXPycLO7T7ejT/zrJfRDNE3ZgEz5az/B sLzdh5loom4yGXVRjT6u0aREcz5iBHXt+A3zOpZLTEsGlCQcI9hsiWXe+no84Cm2TTvXG/ K+mpk6vJEZhL93x0Ij8bzEaoi81ZPgY= Received: by mail-pf1-f182.google.com with SMTP id d2e1a72fcca58-6d94308279dso62269b3a.2 for ; Wed, 20 Dec 2023 12:14:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1703103263; x=1703708063; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=tNiy0jH8FDZZVZiqQIhmWtF+05OpH5cCP9I8S3ulDiA=; b=BGU+whnob5+4Iv1Uau16ApqNqdNmm6/KPcJjlOQEKg3CQdXO1IBCdP5NfdEQbHraF5 utMZyvucLgK0WJ1ZLROLr/ntJVseOUG9wn42wHsSbVtyAqza/QDxFowb+xzIzoombGGq 9XS80djZvEJiVBNQr3VlRDO8Hyvo9jCo2CRaR58/LS4MxsQEeBwp9La8ySiitRW4IChf NEvPx5HeMJlCqavJUSahteFnW7HMNqsVqKiHBPoxiqdFztv73GT2F2QtKN8D3PF+w0BE RMSXxCnA1SHyBkefppQ/xZKAwJmd6ojWkRZ7lky6/gIeEu8ltB7nD3DSUXX14tjQuENp FoGw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1703103263; x=1703708063; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=tNiy0jH8FDZZVZiqQIhmWtF+05OpH5cCP9I8S3ulDiA=; b=nHxLgJNJ0nB8qHMjhbC3xWOLNympk0FOYo08Q/8lc2a7gQGwXhImXaKio5JIey9Jxp eoL7VKczNe48Xglbgebc91ROO2Af3UQuHYrF0KvWx2+UoO/gDWrZlaldQQC6MuwlMwGh QAfhMIT/JUSCvMlFp98Hi2wPYTW0pSdKzU8vN8zKa85TS3Pzh7PAwr/VogQR/7M8pqkr pELyC/D3Yt2543PINs0rjZAlAoDcfkze76+sywHnEOKF4MR3TEcSl/bVqp2Cx/O2Tvg1 wlW7wxu/MNZ88glBgn4C7oe9upW4GDk66IqPsoF/j27fCqAf/NSosdGDkOw52bQ/st9y bY1Q== X-Gm-Message-State: AOJu0Yye/Tj1hCJdNj3qIcF7McJW+QXpYzhkn9DOmXW0X10rP6gnBn6T 2AmneOQBSKrW7voeIYJ9mglDwgpCrB72KJnD/AQ= X-Google-Smtp-Source: AGHT+IGusGIA0mvPVdXj2/EcSlL518ttyPvZs9uWvMMcGiJH8OVIeaika87ENJk2i+kQmohvy12yM0rSG72JuO/2atQ= X-Received: by 2002:a05:6a00:1996:b0:6d9:3981:5903 with SMTP id d22-20020a056a00199600b006d939815903mr2452525pfl.0.1703103262693; Wed, 20 Dec 2023 12:14:22 -0800 (PST) MIME-Version: 1.0 References: <202312192310.56367035-oliver.sang@intel.com> In-Reply-To: From: Yang Shi Date: Wed, 20 Dec 2023 12:14:09 -0800 Message-ID: Subject: Re: [linux-next:master] [mm] 1111d46b5c: stress-ng.pthread.ops_per_sec -84.3% regression To: "Christoph Lameter (Ampere)" Cc: Yin Fengwei , kernel test robot , Rik van Riel , oe-lkp@lists.linux.dev, lkp@intel.com, Linux Memory Management List , Andrew Morton , Matthew Wilcox , ying.huang@intel.com, feng.tang@intel.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 52BE6140015 X-Stat-Signature: yymgwy57smrty83uf388wrcx8g4fzchg X-Rspam-User: X-HE-Tag: 1703103264-284986 X-HE-Meta: U2FsdGVkX1/jb17W0SMPYqIGkmbBgi5TdxYDIgKrfGrkSY/sUPcSdc3rCazSoEfvqyVHIVtETV8jlcfVc39eQNahzztucHaoSSA/6wfCy4G7dUWAB5USRMJZX+HQ/TyTXY0+68HBsiihYA5HxofxKkgzTPohnOiQSWTaAogvwGyWwueNjjm1AS19exKQZTW1UcKwyQ0gVIxGBKmF7FhV2/HxDXnAGqbLYRfd9WMRcslyV8W/wmCiTGyVZKXPiGzEevd+G8xEuaNbUsMRFLv3WwkKJAfPtciPBV2QrLQEtyGzd7TELKVc6ZpCOnIjyt5L03zu4Jf5Q3ChlHoFWsujeJer7FSbpZqRwmFQVyTA0JkvSDtKutdf72PP49domop1kEZMuca/nLBoCa0SSzLdAbuN0CXHJolBuuQjY+8xGGgE46cbODaT3UiAD3+Rda/MhBbUqcU4er9y7f4c/dSjWd7OirowT1aeXrzDXbL0h4bGUb/NV2DzdaCYwDLV0toxvjnKJ0GdNavfE9LlRL/89GIlf++0O2g0GG83pSBG+2oS94Hb8ZKbXQed6DHPjuCG+F503SdIYci36zKEaemxOCBOO54c/quFVo5sTGKHzNqpcwwm8cnn4iIt9JIh7hKuJh2iYmPiTEfcNsGZ9hjqlgtLYij9n68s3xuEZTQZg7a5S6o+qRldXjDa80/nrSstBTHSdyYCGHNnS2isK9OdqNURT4os39EuNHMuSFgJeLwKRzjEWps+fMgEpMnqZZnU7Cv3ZN5nqj3A9Hve0IowAGh4Y5GK6sbR1cNgaXf7PILciQc2d5pdAelL+v21jtoSjxY/VJCwOK2IqTyUTnAnFc7tzIRjWrxvOGJwMnfzRpHr5z8i8caMo45GeTTSbYkk7bIhtzm2tVlfWmy7nZvlhebrlGRV+EshFOntqONTP4G4Sm5t/s7Jkjfqsu3BLxkVSq0zKcRC9214JK5Uf+f 9HGKWU+A yZxATeIBDasz6aVgprCLJ0hyr3eLyAKP5RR/Lxpi1pnqAEQ254/40ZQX2LFfduAFLxsgqGELbqItpWCGt5H+kI2lgdOeGt2eD/6xqy4QJTy2Qg1IyaFT4ymYgiy+ZMlPl3RtPCmXZFQ1nQG8nNaEOuZCK7KbkjbLaxOnE2AUnjG/60evjpoLzDLCdZzkExYvMDCmubXAdtLzeXfIM+OqDOSXVgSm/TCGfevbx5ld3HxqQ25uC0DwdYJKasfCwbY9IHaqK/pHtLjOWpH+nhu1Y2EWNeVm81L4kQCyKTweiSEZqYjUtIT68ghJOPPczPCU+rSwghVEvkIfzYgXlB8gNohQQQWA47ME4k5zaadp91heOXXEo3QpQ5UBH8ZR1Pum3QhrS+dUF1bS/9IrZgPXS03OTuOCTeuGsDazSK0p70JJ0xiPZ01TrYA0JZ/XxqZ13B5g4obQT91PeZ/3VawyoBwl6Th3ysdGhEXVz4VxSSte63hc= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Dec 20, 2023 at 7:42=E2=80=AFAM Christoph Lameter (Ampere) wrote: > > On Wed, 20 Dec 2023, Yin Fengwei wrote: > > >> Interesting, wasn't the same regression seen last time? And I'm a > >> little bit confused about how pthread got regressed. I didn't see the > >> pthread benchmark do any intensive memory alloc/free operations. Do > >> the pthread APIs do any intensive memory operations? I saw the > >> benchmark does allocate memory for thread stack, but it should be just > >> 8K per thread, so it should not trigger what this patch does. With > >> 1024 threads, the thread stacks may get merged into one single VMA (8M > >> total), but it may do so even though the patch is not applied. > > stress-ng.pthread test code is strange here: > > > > https://github.com/ColinIanKing/stress-ng/blob/master/stress-pthread.c#= L573 > > > > Even it allocates its own stack, but that attr is not passed > > to pthread_create. So it's still glibc to allocate stack for > > pthread which is 8M size. This is why this patch can impact > > the stress-ng.pthread testing. > > Hmmm... The use of calloc() for 8M triggers an mmap I guess. > > Why is that memory slower if we align the adress to a 2M boundary? Becaus= e > THP can act faster and creates more overhead? glibc calls madvise() to free unused stack, that may have higher cost due to THP (splitting pmd, deferred split queue, etc). > > > while this time, the hotspot is in (pmd_lock from do_madvise I suppose)= : > > - 55.02% zap_pmd_range.isra.0 > > - 53.42% __split_huge_pmd > > - 51.74% _raw_spin_lock > > - 51.73% native_queued_spin_lock_slowpath > > + 3.03% asm_sysvec_call_function > > - 1.67% __split_huge_pmd_locked > > - 0.87% pmdp_invalidate > > + 0.86% flush_tlb_mm_range > > - 1.60% zap_pte_range > > - 1.04% page_remove_rmap > > 0.55% __mod_lruvec_page_state > > Ok so we have 2M mappings and they are split because of some action on 4K > segments? Guess because of the guard pages? It should not relate to guard pages, just due to free unused stack which may be partial 2M. > > >> More time spent in madvise and munmap. but I'm not sure whether this > >> is caused by tearing down the address space when exiting the test. If > >> so it should not count in the regression. > > It's not for the whole address space tearing down. It's for pthread > > stack tearing down when pthread exit (can be treated as address space > > tearing down? I suppose so). > > > > https://github.com/lattera/glibc/blob/master/nptl/allocatestack.c#L384 > > https://github.com/lattera/glibc/blob/master/nptl/pthread_create.c#L576 > > > > Another thing is whether it's worthy to make stack use THP? It may be > > useful for some apps which need large stack size? > > No can do since a calloc is used to allocate the stack. How can the kerne= l > distinguish the allocation? Just by VM_GROWSDOWN | VM_GROWSUP. The user space needs to tell kernel this area is stack by setting proper flags. For example, ffffca1df000-ffffca200000 rw-p 00000000 00:00 0 [s= tack] Size: 132 kB KernelPageSize: 4 kB MMUPageSize: 4 kB Rss: 60 kB Pss: 60 kB Pss_Dirty: 60 kB Shared_Clean: 0 kB Shared_Dirty: 0 kB Private_Clean: 0 kB Private_Dirty: 60 kB Referenced: 60 kB Anonymous: 60 kB LazyFree: 0 kB AnonHugePages: 0 kB ShmemPmdMapped: 0 kB FilePmdMapped: 0 kB Shared_Hugetlb: 0 kB Private_Hugetlb: 0 kB Swap: 0 kB SwapPss: 0 kB Locked: 0 kB THPeligible: 0 VmFlags: rd wr mr mw me gd ac The "gd" flag means GROWSDOWN. But it totally depends on glibc in terms of how it considers about "stack". So glibc just uses calloc() to allocate stack area. > >