From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 527B0C3DA59 for ; Fri, 19 Jul 2024 14:27:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A39686B0082; Fri, 19 Jul 2024 10:27:13 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9E9826B0083; Fri, 19 Jul 2024 10:27:13 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8B1A56B0088; Fri, 19 Jul 2024 10:27:13 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 6D2D26B0082 for ; Fri, 19 Jul 2024 10:27:13 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 0A030C075D for ; Fri, 19 Jul 2024 14:27:13 +0000 (UTC) X-FDA: 82356729546.26.DCD4D46 Received: from mail-lf1-f50.google.com (mail-lf1-f50.google.com [209.85.167.50]) by imf28.hostedemail.com (Postfix) with ESMTP id 260ECC001A for ; Fri, 19 Jul 2024 14:27:10 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=byyFYWZa; spf=pass (imf28.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.167.50 as permitted sender) smtp.mailfrom=mjguzik@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721399209; a=rsa-sha256; cv=none; b=jeR5x06gCSc4mNhcqjOuOFxrUY+l8dIp39U0UhvvBkUdBTxUUaDJKrm3u/KXLnCr7U8zo+ 328vMc/Xjvz/2nvWqyubKoMOKMwiQQ62+IBHg0uakr0HPevOAofgbZvFXKnIBbnqEVvvff AiS9T6fhvd0eBTEseO0fbiA7iD1YrZY= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=byyFYWZa; spf=pass (imf28.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.167.50 as permitted sender) smtp.mailfrom=mjguzik@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721399209; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=2+osyaJNx4mfMZk2eanEYAdvwKA1Rms6iSrfEr1XNps=; b=6AtKnTzgRZU7vFCKloHK+0ENpseUIZY7xxFMjd6E3ralkdFj+AOH0Ra7Ob4xm416U38DfN uAM28tisvwPGxlT8LZsKPHsM6Q3cjXkubLmx139jgv0H7bmxMOEP2LjtAh7ASEANWJfi2f SmW+IncOdIkRaoIxJcyAR7X6GvHObI0= Received: by mail-lf1-f50.google.com with SMTP id 2adb3069b0e04-52ea1a69624so1894473e87.1 for ; Fri, 19 Jul 2024 07:27:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1721399229; x=1722004029; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=2+osyaJNx4mfMZk2eanEYAdvwKA1Rms6iSrfEr1XNps=; b=byyFYWZagOwTPNQfssWSxKt6Gra2exWAYMG+UjEaq1Jn6vZivM0WzERaBOar/VTdrB VKpQV+7SkLferqr+ViGL0q+bHFtYmgKHpP38T27ZqSvd878SFMoMpCW6vXj3WxArenbW heVNet4JZEdkfQOz1bHw3NJzmXC8CjY6M1xiy09cy9raACjXB/EokLzdROz57+9V6g0f QZ3Wlt+IvzHm5nC4VGlV8Evchp0kg8/+OmmlQSwaSPyoTHNDTyfFh59MJJrxfVHDDNIK dDVpYGM97IiCMn5MyTIKhON8d1mcaSX9l/ViqCl4/S3/IYaJW+dot+sRWY+v2Nr3EpmY Ga2Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721399229; x=1722004029; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=2+osyaJNx4mfMZk2eanEYAdvwKA1Rms6iSrfEr1XNps=; b=IyEcr940bDr0mmqvPFwa6q+om6+PjjvoyXm1e3a740M9Yb9LxR0L7xEODmhKWTzvWy CPYs6A9Xll11/6TghQH/JbSMgXpxyp8w7w5BUfmDqVZxbxno2q6O+Tssd6bsdCDaiGhK NVI4i0MpFC73/tInhg91CwUyjdFA4pMREx0uVW1OP0U2A+DCjVlkarJ6NOiS5/lck1Aj CrJcL8yjVeuylsUg4xspEgc9CGwqd3q0ON2YYmhDcA/dYdMMC/jci9aTjxXj3Sfg5fBw 49o6NlzWX/4ZrkyUwUIm9TAt8krx7vUSAmxXPcHSXuZsvKZOZIr6+v4SeXCVMkupkXzw 2QFg== X-Forwarded-Encrypted: i=1; AJvYcCWx2uW/DMqLdQEvjazAKL0vz95XOQLa6DBXkVBvFUIsNHqjztay25fHzdBYD2LqqX0xn7CB26ORXx+B0mRK1NeYP1s= X-Gm-Message-State: AOJu0YzwNnZlnVBn4ixYnVT52QMMyOLYhwWNWsZuNNr3qQFP3TBJA+7P kF3YKazq6EAIzh3RcTxFNJkWSuDE6UNFRcSkKUBU1B+VvIjaHUJt18SqV8AXMy0CFDCe/VorFeO 9m8D6j4KSqB5IO94CWv7HIxWI2rQ= X-Google-Smtp-Source: AGHT+IEf4pZtVJ+V6gYfLN/60C2qztF+Mjf3NHIRKamIshYpH62Za/TqThNueJ8rw2+8mpx50FLfuDK9cWccpvXo2C0= X-Received: by 2002:a05:6512:12d4:b0:52e:a008:8f55 with SMTP id 2adb3069b0e04-52ee5427198mr5690402e87.41.1721399228909; Fri, 19 Jul 2024 07:27:08 -0700 (PDT) MIME-Version: 1.0 References: <3128c3c0-ede2-4930-a841-a1da56e797d7@suse.cz> <44fb1971-f3d3-4af8-9bac-aceb2fedd2a6@amd.com> <584ecb5e-b1fc-4b43-ba36-ad396d379fad@amd.com> In-Reply-To: <584ecb5e-b1fc-4b43-ba36-ad396d379fad@amd.com> From: Mateusz Guzik Date: Fri, 19 Jul 2024 16:26:56 +0200 Message-ID: Subject: Re: Hard and soft lockups with FIO and LTP runs on a large system To: Bharata B Rao Cc: Vlastimil Babka , linux-mm@kvack.org, linux-kernel@vger.kernel.org, nikunj@amd.com, "Upadhyay, Neeraj" , Andrew Morton , David Hildenbrand , willy@infradead.org, yuzhao@google.com, kinseyho@google.com, Mel Gorman Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: 9343m5ufw83jmuf9eioihxig6d89hnwa X-Rspamd-Queue-Id: 260ECC001A X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1721399230-635977 X-HE-Meta: U2FsdGVkX19qrlMDBKbP1Q4hA1A+/HBb6bwEEVy+4AMC75dRU/Afv0h01qPJRbwoKWy2HC3ze/eheWewiJ1d2aKxoGR24y4KGvQvl9lgNS7BGEnT0cDAJFhEzOBECiGsFkiOnt6hqEQ/nZh8cHgyhztVuYi64vBvezavtGkyEB8C88hBMzbgFk3Go/q3UGVvpGcMWEDmhDY7q5Ht0CQ/G+Y7eaP0zXCBHWEYrrs/lij2hNSTZGZhgyVul/Hcd9cKOzcPt02yl6I7zVBdeopezo5/VFAgy7QzlnjAe/cC18pWtSU82UxYPl/69XPCiKRprf+Ptki/imRW3GM4oAnZHMKl243FFxCs4EfHHOkHnT2M21AexSy9cRN4mvDqXYE8QRuMz5QJ9SWVQTyi0jxdkx2RAVi2qyNDc1DmW9aPs17MVmx2gzxrw59LDvDMU/hPPuvLmhXuKCfD+xj7PKD+80PjtXnzKkOjANE1st3Kgqb6VUE8vRJjYtyBeJS0EWMYEYktLva3prHxVHj+RIvm3ivkrxa1yLO14GyZ4Nih5NhsOuftwy3liolXzoTBpRLuVsyjZMEV7lcdn6RjNe+R3l/TQ+8HF4UsYP5ykiLaNDtK8TAIC6P0w6Z1XFtkqSNCNUhtA0rt+3s3fhQyZPC2CnV+tfPh2jXr0hszCYzrMUEKFzDwWmfiHCAi8iHbJYmmpKKnfMmwzA3GQoYVkM0eTpM7XM2SKndsUn1FBE63A+HoGvZ5I/OgGIJE917qV2ZsKmVekc0CRTeiA+xKnaLJqJiLsyit077R39Wl2EKm3UldurwP7ScPSxaXWvRy0qSWz21+4UhS2Q9Ma+yE1bCLSGreF9wjEHdFqjZvNsQzn+cdFte8ygxn4xIEcWJJhWUHY2l7ybFEbAWJuQQwb/NKGnV8kA9kgf3OjJctX3zv6aTd5HEeemq1BO7tYswzKvc0kIboD9vFztzBGlgoITE GpYEPjvM 7KO3a8SWTOZ14bxGCw4UTmD1Br0VsIxokzH56JU2Bn25QoFHYdzrCUgjKSRzZvlJMrH4NX3MDK/6jJPFk0Nl4AXEDfGs8yN7as1cxvUv+OkE6InTYMvq3fO6ZDIoTSHOx/tBhfyxup+Atp+zhl3P82YeddYfssp+fB2AEr1rgvsxCPQuyFVl4EqO5nvdo/xVntfx2IKZdzpXhWds0rx5DCmQHFRyfC5BfkPuKNa1zuI+qzIB5Asnj7jqAzaswtqfQ8+xlYignWMv8Vjk63SqsVeCKvSYe6Lmkn8DFHap/z8GrK+zZxIa3t6BTfuo+P5inC/cetC5A4zF6hr/fLCDHolSucw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Jul 19, 2024 at 8:16=E2=80=AFAM Bharata B Rao wro= te: > > On 18-Jul-24 5:41 PM, Mateusz Guzik wrote: > > On Thu, Jul 18, 2024 at 11:00=E2=80=AFAM Bharata B Rao wrote: > >> > >> On 17-Jul-24 4:59 PM, Mateusz Guzik wrote: > >>> As for clear_shadow_entry mentioned in the opening mail, the content = is: > >>> spin_lock(&mapping->host->i_lock); > >>> xa_lock_irq(&mapping->i_pages); > >>> __clear_shadow_entry(mapping, index, entry); > >>> xa_unlock_irq(&mapping->i_pages); > >>> if (mapping_shrinkable(mapping)) > >>> inode_add_lru(mapping->host); > >>> spin_unlock(&mapping->host->i_lock); > >>> > >>> so for all I know it's all about the xarray thing, not the i_lock per= se. > >> > >> The soft lockup signature has _raw_spin_lock and not _raw_spin_lock_ir= q > >> and hence concluded it to be i_lock. > > > > I'm not disputing it was i_lock. I am claiming that the i_pages is > > taken immediately after and it may be that in your workload this is > > the thing with the actual contention problem, making i_lock a red > > herring. > > > > I tried to match up offsets to my own kernel binary, but things went ha= ywire. > > > > Can you please resolve a bunch of symbols, like this: > > ./scripts/faddr2line vmlinux clear_shadow_entry+92 > > > > and then paste the source code from reported lines? (I presume you are > > running with some local patches, so opening relevant files in my repo > > may still give bogus resutls) > > > > Addresses are: clear_shadow_entry+92 __remove_mapping+98 __filemap_add_= folio+332 > > clear_shadow_entry+92 > > $ ./scripts/faddr2line vmlinux clear_shadow_entry+92 > clear_shadow_entry+92/0x180: > spin_lock_irq at include/linux/spinlock.h:376 > (inlined by) clear_shadow_entry at mm/truncate.c:51 > > 42 static void clear_shadow_entry(struct address_space *mapping, > 43 struct folio_batch *fbatch, pgoff_t > *indices) > 44 { > 45 int i; > 46 > 47 if (shmem_mapping(mapping) || dax_mapping(mapping)) > 48 return; > 49 > 50 spin_lock(&mapping->host->i_lock); > 51 xa_lock_irq(&mapping->i_pages); > > > __remove_mapping+98 > > $ ./scripts/faddr2line vmlinux __remove_mapping+98 > __remove_mapping+98/0x230: > spin_lock_irq at include/linux/spinlock.h:376 > (inlined by) __remove_mapping at mm/vmscan.c:695 > > 684 static int __remove_mapping(struct address_space *mapping, struct > folio *folio, > 685 bool reclaimed, struct mem_cgroup > *target_memcg) > 686 { > 687 int refcount; > 688 void *shadow =3D NULL; > 689 > 690 BUG_ON(!folio_test_locked(folio)); > 691 BUG_ON(mapping !=3D folio_mapping(folio)); > 692 > 693 if (!folio_test_swapcache(folio)) > 694 spin_lock(&mapping->host->i_lock); > 695 xa_lock_irq(&mapping->i_pages); > > > __filemap_add_folio+332 > > $ ./scripts/faddr2line vmlinux __filemap_add_folio+332 > __filemap_add_folio+332/0x480: > spin_lock_irq at include/linux/spinlock.h:377 > (inlined by) __filemap_add_folio at mm/filemap.c:878 > > 851 noinline int __filemap_add_folio(struct address_space *mapping, > 852 struct folio *folio, pgoff_t index, gfp_t gfp, void > **shadowp) > 853 { > 854 XA_STATE(xas, &mapping->i_pages, index); > ... > 874 for (;;) { > 875 int order =3D -1, split_order =3D 0; > 876 void *entry, *old =3D NULL; > 877 > 878 xas_lock_irq(&xas); > 879 xas_for_each_conflict(&xas, entry) { > > > > > Most notably in __remove_mapping i_lock is conditional: > > if (!folio_test_swapcache(folio)) > > spin_lock(&mapping->host->i_lock); > > xa_lock_irq(&mapping->i_pages); > > > > and the disasm of the offset in my case does not match either acquire. > > For all I know i_lock in this routine is *not* taken and all the > > queued up __remove_mapping callers increase i_lock -> i_pages wait > > times in clear_shadow_entry. > > So the first two are on i_pages lock and the last one is xa_lock. > bottom line though messing with i_lock removal is not justified afaics > > > > To my cursory reading i_lock in clear_shadow_entry can be hacked away > > with some effort, but should this happen the contention is going to > > shift to i_pages presumably with more soft lockups (except on that > > lock). I am not convinced messing with it is justified. From looking > > at other places the i_lock is not a problem in other spots fwiw. > > > > All that said even if it is i_lock in both cases *and* someone whacks > > it, the mm folk should look into what happens when (maybe i_lock ->) > > i_pages lock is held. To that end perhaps you could provide a > > flamegraph or output of perf record -a -g, I don't know what's > > preferred. > > I have attached the flamegraph but this is for the kernel that has been > running with all the accumulated fixes so far. The original one (w/o > fixes) did show considerable time spent on > native_queued_spin_lock_slowpath but unfortunately unable to locate it no= w. > So I think the problems at this point are all mm, so I'm kicking the ball to that side. --=20 Mateusz Guzik