From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5E345C3DA42 for ; Wed, 10 Jul 2024 12:24:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C266B6B0082; Wed, 10 Jul 2024 08:24:32 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BD5816B0083; Wed, 10 Jul 2024 08:24:32 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A75E56B0088; Wed, 10 Jul 2024 08:24:32 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 847326B0082 for ; Wed, 10 Jul 2024 08:24:32 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id F394441CE1 for ; Wed, 10 Jul 2024 12:24:31 +0000 (UTC) X-FDA: 82323761142.07.33CF28E Received: from mail-ej1-f52.google.com (mail-ej1-f52.google.com [209.85.218.52]) by imf13.hostedemail.com (Postfix) with ESMTP id 142A92000E for ; Wed, 10 Jul 2024 12:24:29 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=ROn5zH5s; spf=pass (imf13.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.218.52 as permitted sender) smtp.mailfrom=mjguzik@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1720614245; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=mv9Ma+PfTP3jkLEGca17uO0MTJeX8PXN4+wRnopYpKE=; b=aV27N5cwZjcrXtgPOhypp9l+Q9x2Ppg+ezE8vIYeF5ANderCtDzkHV39kENOJ9CT+YGX8V 7LjyXSX8Jcx9A1TwboYuIR1h173QhLTArYKNNG3z23J9kraejgMG5W/jqTb7ZLEah8+Qbu ouL5nO7KIrtHcx1DerMKGewWvNru0yQ= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=ROn5zH5s; spf=pass (imf13.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.218.52 as permitted sender) smtp.mailfrom=mjguzik@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1720614245; a=rsa-sha256; cv=none; b=ITQ/r7aULBTMUKAE6WD27zyJdXrHjoUZ4GxziiV4mrc39mO5d8oTvwR3gp3Dq5kYH3qIcB pMvATR8EPMsyYXI5qokn/DvfvJwtO/IJjBAUpU/96O7Hp1Sfg9Lukeb3s6eeCEseu3cb8J q9kbogpycLYnvaZzOgxvuW7hSZLqwtw= Received: by mail-ej1-f52.google.com with SMTP id a640c23a62f3a-a77e392f59fso494686866b.1 for ; Wed, 10 Jul 2024 05:24:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1720614268; x=1721219068; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=mv9Ma+PfTP3jkLEGca17uO0MTJeX8PXN4+wRnopYpKE=; b=ROn5zH5sfLDrqHfwlrFXd4ahPKWc/QO3myacO46KP4qthwflA6aE18JWMHbOfwYHdM 7zvycBMYocEJkBEj3WzJetlL/aoxp7qq0elXrsZyyywcw0QrYi5QKbRra9VLz/fyApsH UzxOyW8qDnIUkjv9TJbJyDYCHuH+IuwTvbTcdICF1NFNlmUuP3rEDeifuNP19mqwU9OP V2awM4eSdVVztS2KRYO/iASJEy3jSjJ+a/mOoN7Wio6n9gSTlGLFtDqfF883WRNUhVA9 yDtHOJHJUAOebR8biICbS3tMxdIixIOOJH02K+tPDssq6jkAmGuYIX20Hz97pCHQnMGH p6tQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720614268; x=1721219068; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=mv9Ma+PfTP3jkLEGca17uO0MTJeX8PXN4+wRnopYpKE=; b=ORqJbHK09YmJgRvl2Zijlr4KqMMCp2HqPi54d5o99jv/SCnYDYgR2DN5m1EnO95KXx 6S3fPb2Ls3kS83RrE9AVkIFrH6zd7XabfBUaw+RANpnrZ0LN5g34rS37qiCptasoDRMw Qou0EEdVq6sKvll1YkCVxNQJGjtWWKSCJqOqepJbzcgiUdO2RB/wFCggB5PWIuljUNtO 5d0Tf5h/KT/cJ3Z/fYQqwnXkE5HdvGRTt9tx0eEFIXVnfpu7jNbth09/t8E895Xz8L1R wAD74v+sswitflE/rX7LTPsNuvOaBDhkMPbabNmYWjcGzKDcYCkPHMhmEqWAOqaLTJde 1FLA== X-Forwarded-Encrypted: i=1; AJvYcCVSGeUmK4OnGCaqWXDfyOLmajHE/ooFoj0u8IkU/GcutFz+FThbk/aKJLDVxwYUne8eHyonpNb9nLaxncG+Jadubug= X-Gm-Message-State: AOJu0Yx+mwqCqYJ96vTw41ppNyIDkEJ5rsPMO3ZE/pcLfQVv1RNIEuRc ptFZoXt2CJk8cu0mssQOgjJOAyqM5IJuVeu+KUGpYaxsWSa6QgwMbNWoDnRX6DYTe7Qc9lSxNw4 RsLFPFFEsUwfuogXP+AYMGkPiex8= X-Google-Smtp-Source: AGHT+IEXhNr2COma1ZTpeoyqBrcvGs3CYE3G72PaJVMCpLT5Ns64zmX9n+ExOaPDVMP2uN5DPhWGtpOWxmooj8sHVHA= X-Received: by 2002:a17:906:cd0d:b0:a72:8d2f:859c with SMTP id a640c23a62f3a-a780b6ff0f7mr302169566b.33.1720614268121; Wed, 10 Jul 2024 05:24:28 -0700 (PDT) MIME-Version: 1.0 References: <4307e984-a593-4495-b4cc-8ef509ddda03@amd.com> In-Reply-To: <4307e984-a593-4495-b4cc-8ef509ddda03@amd.com> From: Mateusz Guzik Date: Wed, 10 Jul 2024 14:24:16 +0200 Message-ID: Subject: Re: Hard and soft lockups with FIO and LTP runs on a large system To: Bharata B Rao Cc: Yu Zhao , david@fromorbit.com, kent.overstreet@linux.dev, linux-mm@kvack.org, linux-kernel@vger.kernel.org, nikunj@amd.com, "Upadhyay, Neeraj" , Andrew Morton , David Hildenbrand , willy@infradead.org, vbabka@suse.cz, kinseyho@google.com, Mel Gorman , linux-fsdevel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: 1c6t9bxu7j6ixaxt5pg8jmd7ia5mx3qs X-Rspam-User: X-Rspamd-Queue-Id: 142A92000E X-Rspamd-Server: rspam02 X-HE-Tag: 1720614269-124356 X-HE-Meta: U2FsdGVkX18nuhdvH7DhTWOrpDxj3UqbHiT4qXTPNoeo8EWpu6NgnxKDD+v/OTnh5Xm2OoNUt17QL2v2R+RYfKQbLia/jO6cDuibkTwitta2hLRg7zfDEQTmY+9r3QEPqDMDSja1qeoZYjOHQw409Y4uvcCUWdJh0OXJynRrSG66LnljRbjmqSp9mTexZC0d9loEyy9ljsE5kFxeEJzFBZtisXgUvmnvDinS/72Z4RdlDzNl0MBTMAmVIm8je9tIU+fXn21yUOxqBNC/QvBgsKyYJ7MUvEYGIPbvO9qRbRfDgqqu3U1gc6euSuaVjtjhyaMe2ftAo5S/0/vnKWWS2KSNkqd7s+QyPbr6NtCzCpuroslIlXJw23brwTdpxMaxVcZO33KfQ298epneUeNeySFt3uAZ4rCv3EInBz7OyZPrUHR13T9ZG2CCBcjiKuZGCU2m92mljODdp/4g9kxXnpZ6ONfjrrV0DPpyoXO31MhN9zkhJsFmQnBstjNZ8RxN6olbZzcizLRTKZ+wr4+CpMl2iEu8zjwd0E2S+U/ajw1Zbtoauw5nXloiqkyXaIs7GA/srQsqhh4Xz6ZGklKiLnPIhhxyM2n26Hm9fUnj9HBFDJJMrVdegRTE1Cr9cB08iKDl1rCBOPLgScnffuiYhFCtOhygjgbYoGSRaCJEftp3JiRMP5jwttuZ1FFwiHFawOZ8wuSv4qpk7INLE/RkwsMmEgcjuGR6Id8X3rpcV/UvVlv5LwPIYqgPAucgVc5ZhboPeZSytmcmW5CcPYlQKS2x10XKe+UouNjb7eEur6jXwyARdL0g8L9vOQWWkDWqxGSGoTDFFmyk8KVAbvb88cwtINGmUmNWZ39It5/dbuinkzSbuptqmdShm3jhLiU9OhPE6G6wS6sEWa7n4mbjBf9gBmHoFvWVSK7LGKxeRBQytMemzIJFmrPKwfCnrbp9MsJa3+D4+dfwYV+1V6B 4/7IcmUF 9iLTldTpxqoNLFLYbVT1Y90xJetg+gUxpTJJfO8EI4xDVQERsRfWFLTJNy8E3qhtNonjJmrqOwUSahds80h0UFP8h7wFwhOyHxf11xrqM/6IPa8+Y0B7Zkq/2w/w008yrHn77o98wwgWuufnIElpoa0iTLWjWc8+QHFabhg3c7/+FBFQiX+LG+gwUY5VxgrLbVMwdFJZ7RtOA3uh++A/VVGkU+N3BW4HFSVUYAmGsgDI4ZAmfw6CuLRYgz97Sl9ung4uXUR4Qa9vwspN3BnwwQOqozSlYkzszUvewdqX6btNtHBjWEfjkXKR6dXWim3JlTg5W8x0J5eTzWAoc5I/nKHqIRwgspHxSgRUlCbkUp84HinU/4lNG+FeAX1VFyOAfOjlti1aD5366EowxfoN3veJ6B0XL4rjXzTVPkld2WRSJV9M= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000006, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Jul 10, 2024 at 2:04=E2=80=AFPM Bharata B Rao wro= te: > > On 07-Jul-24 4:12 AM, Yu Zhao wrote: > >> Some experiments tried > >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > >> 1) When MGLRU was enabled many soft lockups were observed, no hard > >> lockups were seen for 48 hours run. Below is once such soft lockup. > > >> Below preemptirqsoff trace points to preemption being disabled for mor= e > >> than 10s and the lock in picture is lruvec spinlock. > > > > Also if you could try the other patch (mglru.patch) please. It should > > help reduce unnecessary rotations from deactivate_file_folio(), which > > in turn should reduce the contention on the LRU lock for MGLRU. > > Thanks. With mglru.patch on a MGLRU-enabled system, the below latency > trace record is no longer seen for a 30hr workload run. > > > > >> # tracer: preemptirqsoff > >> # > >> # preemptirqsoff latency trace v1.1.5 on 6.10.0-rc3-mglru-irqstr= c > >> # --------------------------------------------------------------= ------ > >> # latency: 10382682 us, #4/4, CPU#128 | (M:desktop VP:0, KP:0, S= P:0 > >> HP:0 #P:512) > >> # ----------------- > >> # | task: fio-2701523 (uid:0 nice:0 policy:0 rt_prio:0) > >> # ----------------- > >> # =3D> started at: deactivate_file_folio > >> # =3D> ended at: deactivate_file_folio > >> # > >> # > >> # _------=3D> CPU# > >> # / _-----=3D> irqs-off/BH-disabled > >> # | / _----=3D> need-resched > >> # || / _---=3D> hardirq/softirq > >> # ||| / _--=3D> preempt-depth > >> # |||| / _-=3D> migrate-disable > >> # ||||| / delay > >> # cmd pid |||||| time | caller > >> # \ / |||||| \ | / > >> fio-2701523 128...1. 0us$: deactivate_file_folio > >> <-deactivate_file_folio > >> fio-2701523 128.N.1. 10382681us : deactivate_file_folio > >> <-deactivate_file_folio > >> fio-2701523 128.N.1. 10382683us : tracer_preempt_on > >> <-deactivate_file_folio > >> fio-2701523 128.N.1. 10382691us : > >> =3D> deactivate_file_folio > >> =3D> mapping_try_invalidate > >> =3D> invalidate_mapping_pages > >> =3D> invalidate_bdev > >> =3D> blkdev_common_ioctl > >> =3D> blkdev_ioctl > >> =3D> __x64_sys_ioctl > >> =3D> x64_sys_call > >> =3D> do_syscall_64 > >> =3D> entry_SYSCALL_64_after_hwframe > > However the contention now has shifted to inode_hash_lock. Around 55 > softlockups in ilookup() were observed: > > # tracer: preemptirqsoff > # > # preemptirqsoff latency trace v1.1.5 on 6.10.0-rc3-trnmglru > # -------------------------------------------------------------------- > # latency: 10620430 us, #4/4, CPU#260 | (M:desktop VP:0, KP:0, SP:0 HP:0 > #P:512) > # ----------------- > # | task: fio-3244715 (uid:0 nice:0 policy:0 rt_prio:0) > # ----------------- > # =3D> started at: ilookup > # =3D> ended at: ilookup > # > # > # _------=3D> CPU# > # / _-----=3D> irqs-off/BH-disabled > # | / _----=3D> need-resched > # || / _---=3D> hardirq/softirq > # ||| / _--=3D> preempt-depth > # |||| / _-=3D> migrate-disable > # ||||| / delay > # cmd pid |||||| time | caller > # \ / |||||| \ | / > fio-3244715 260...1. 0us$: _raw_spin_lock <-ilookup > fio-3244715 260.N.1. 10620429us : _raw_spin_unlock <-ilookup > fio-3244715 260.N.1. 10620430us : tracer_preempt_on <-ilookup > fio-3244715 260.N.1. 10620440us : > =3D> _raw_spin_unlock > =3D> ilookup > =3D> blkdev_get_no_open > =3D> blkdev_open > =3D> do_dentry_open > =3D> vfs_open > =3D> path_openat > =3D> do_filp_open > =3D> do_sys_openat2 > =3D> __x64_sys_openat > =3D> x64_sys_call > =3D> do_syscall_64 > =3D> entry_SYSCALL_64_after_hwframe > > It appears that scalability issues with inode_hash_lock has been brought > up multiple times in the past and there were patches to address the same. > > https://lore.kernel.org/all/20231206060629.2827226-9-david@fromorbit.com/ > https://lore.kernel.org/lkml/20240611173824.535995-2-mjguzik@gmail.com/ > > CC'ing FS folks/list for awareness/comments. Note my patch does not enable RCU usage in ilookup, but this can be trivially added. I can't even compile-test at the moment, but the diff below should do it. Also note the patches are present here https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git/log/?h=3Dvfs.in= ode.rcu , not yet integrated anywhere. That said, if fio you are operating on the same target inode every time then this is merely going to shift contention to the inode spinlock usage in find_inode_fast. diff --git a/fs/inode.c b/fs/inode.c index ad7844ca92f9..70b0e6383341 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -1524,10 +1524,14 @@ struct inode *ilookup(struct super_block *sb, unsigned long ino) { struct hlist_head *head =3D inode_hashtable + hash(sb, ino); struct inode *inode; + again: - spin_lock(&inode_hash_lock); - inode =3D find_inode_fast(sb, head, ino, true); - spin_unlock(&inode_hash_lock); + inode =3D find_inode_fast(sb, head, ino, false); + if (IS_ERR_OR_NULL_PTR(inode)) { + spin_lock(&inode_hash_lock); + inode =3D find_inode_fast(sb, head, ino, true); + spin_unlock(&inode_hash_lock); + } if (inode) { if (IS_ERR(inode)) --=20 Mateusz Guzik