From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5C5E5C5475B for ; Fri, 8 Mar 2024 08:57:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E6D106B0358; Fri, 8 Mar 2024 03:57:47 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E1D6A6B0359; Fri, 8 Mar 2024 03:57:47 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CE5946B035A; Fri, 8 Mar 2024 03:57:47 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id BC0AA6B0358 for ; Fri, 8 Mar 2024 03:57:47 -0500 (EST) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 8A691A14AA for ; Fri, 8 Mar 2024 08:57:47 +0000 (UTC) X-FDA: 81873268974.11.4450BEA Received: from mail-qv1-f46.google.com (mail-qv1-f46.google.com [209.85.219.46]) by imf10.hostedemail.com (Postfix) with ESMTP id E6295C0005 for ; Fri, 8 Mar 2024 08:57:45 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=JecwDe0+; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf10.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.219.46 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1709888265; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=LkJjoKDdSZiJb0nnseQFMkR3M153lSP9yMaCF3TNUdg=; b=eukBgfRSV5XBrJqFggMg7K1vd9tLtWMsbRnj+/7gsUqbfZQZRIOVJjFN1jVCCz8U0uedlU sTgXRMm5VXYK4evBv4XDj+gZgPEbiSJNdG4WgmOp2CUCsR389P+CKtnOKK4/F/kYd0GFcV q75DT8mFFsoEvxk906quhbO2r7PSZOQ= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=JecwDe0+; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf10.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.219.46 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1709888265; a=rsa-sha256; cv=none; b=GyZsBY+mJRb8z5p6LpjAwiEB/v1OgGa/XezTZr6LeNuQno0HHlA0panZaaHsGmYWZfJnC3 SSWjjHltnj+WX+CfrjS1Zcv4Ww3VKyirj33ISg0+NkFz1dabDZCgiPsHKleD+Cu+7yZSPr QTCjYJtd/DY8NVcswdDopH87d9FiB2c= Received: by mail-qv1-f46.google.com with SMTP id 6a1803df08f44-690b874cacfso1096126d6.1 for ; Fri, 08 Mar 2024 00:57:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1709888265; x=1710493065; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=LkJjoKDdSZiJb0nnseQFMkR3M153lSP9yMaCF3TNUdg=; b=JecwDe0+nWnV1GDh3XM+2YHXowWeK0LhlXW77UEWKmARJLw7U016rnilXErspCSjTg UiJaHOY3iIJGj5vZjNT8LZ7lUzQKn951t071CRAZ0yQrBeptSa+pklxKHzhCUDZ135Ak x0uiQylzp8VthB4c8fPNpyr5oU5+KHHtR2iqYrQaHaC+VebHMziTCQq4W6oeGWiCTENc HMVWRjh/SkF12/wovZWlq05v5ND7DL0MUBFw0jOcPt8xbS1qDMVhDTJSd7IpEFDOp44Q U4n7++kSXIYgUg36IORQQ1l10MnVa8C5BsdFsvBjIhnhV57525UJHGesCpiIt0Umkn8s uVtg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709888265; x=1710493065; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=LkJjoKDdSZiJb0nnseQFMkR3M153lSP9yMaCF3TNUdg=; b=ueivZ6tEm3gpligLsJn7Veu20SrLcUfDjuelI1kVHOSHWfR1JINb8/pum0mGGoJc1/ wSNlukCkUY3KZYZ52bi0mpcN1u7stkQjKml7htfeitVcgSZCzl2Ar8VIsfzktVVU42Vx mkpTAXESZcBc+3RDmFXKRWYUMepg65ct2+2Za8FZQUnRHv+wiAoSpkanTKudx/9y+QCu HSupW45gKFNpA9boTfu1KAhYIIbq71CxOaoienPSvJSBtPZZWWO61KgmbzSXIDgvD5JZ P3JVbIgpJ07DRhisCcSceUIiyZHCMZoPIFW8iZj6NlUELsnuryCUQhhMtSFHznlLuEz0 /LWA== X-Forwarded-Encrypted: i=1; AJvYcCUvZXR20/A+Jrvg5O2L+c3oGx8lAxc5Qhrcuh8B/00HoY+eFiU5AYVhmTRSSobEjw5KlHNifAE9axiLv+Kc9ZxfuSM= X-Gm-Message-State: AOJu0YyrefJr6C4Sg5T6kYLTXTtIIvkObqu0kDl1w1vSmdKxlnMOQD5M aVMib4Uri6NuoH2xWYDmLlYNvf+Mod6/dhOBneL3zCDubXavYuo/UXlCGH9Q4ncgxatYtad1P/O 8yJvywSkCejPgzuzGtZ+LeRAe1NA= X-Google-Smtp-Source: AGHT+IG4NixuhtB4eeA8srzLlmP0hTjQOgMWIcgjgTzndP5KGqJwcev3c8JBOK7Yu5SDvuTvecfDgMwcsPrT/Sc8blU= X-Received: by 2002:a0c:cdc6:0:b0:68f:5926:59ea with SMTP id a6-20020a0ccdc6000000b0068f592659eamr11097670qvn.28.1709888264842; Fri, 08 Mar 2024 00:57:44 -0800 (PST) MIME-Version: 1.0 References: <20240307031952.2123-1-laoar.shao@gmail.com> <20240307090618.50da28040e1263f8af39046f@linux-foundation.org> In-Reply-To: <20240307090618.50da28040e1263f8af39046f@linux-foundation.org> From: Yafang Shao Date: Fri, 8 Mar 2024 16:57:08 +0800 Message-ID: Subject: Re: [PATCH] mm: mglru: Fix soft lockup attributed to scanning folios To: Andrew Morton Cc: yuzhao@google.com, linux-mm@kvack.org, stable@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: E6295C0005 X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: h1gdcxcni8qunyxeomezpoyqd6jref6t X-HE-Tag: 1709888265-751219 X-HE-Meta: U2FsdGVkX18/lrrG7PYKtXMfBkcoyOlk7zS4ObxXwTH1yDX1AG87mD7P/JoBssRGg3TQeEh0cf6jLB5SsYc9U/SU6ubJLBt/bObWXlce/3CUlLOefkj8ZidRwBHSkIAcEb9MU+CmvWj7sdG4R0NcjJUllM3kzYpusRAbYagt9A+b3Mu+0eW6MG19B0z6DKvkHhnLQGDPCu9ro95SCf+dGUeQFzDCSXR0fTCFoOHMrs7voJcQs2nq2tqotU31ra5+IGm5wzUx/oQ8Rg7RGL8C+e5x6UPyC90UXtZWldB/Z8rYg/ajkKB7hLc4QkE4xBM9yu32Zou/TuyG8OibhssfHwaZcMbjHi8lWaiZ5lskwniX2K0kaMdElRMOmToOYIR8HJkAtnHycOx0NL6u02VppbiuWLr/semkWICj6G/IhNidHNiYKPQJkWVlpIeEIPV03c9o1s+WiEajcwq4cmuXKYbkUkA0J7KYDxrGoVoUhFto/pqO8xkqMOfdrJwrR81FMe8R2mGo2Vq7JxAqphPwN/bgkuX4KsNWVgOVKlBLA/FGspUyuNqhhHsqphQE5uFYCaT0XHrGf3sJI09iznxNnyL+darLOie+RF6pglT5mJNjUXfiq3O4vdQaSHYButVQryEYJQVXh0EXiBfK2wxD5N1HLrAPDtJWgTGWWd6EXagyYIm+jGo5zw7nxYgL3/owOJpoONzgmGdygLYrmDZsIvY9+G99W/VlkW1y/tA+oENDeTTXfoQ2ae6yoweTal3i74LxnOFhjAGx7V+uaz13reQpM7OlivD/09+C7yXRhNgEnRrIvtAJGq0lwk2iTk0ewqLk7WhiTueUJlbnJNxn2nM8jSM/MvWsu/Me57xfmz5e5+y8JPRfji9mjFx0lriD56CsHAQEHe4rcEga8B+IvK8jFBFUlAZPVfIU9emrwiiIa/es6hOR0jvw0gNgav6EHEo4jEJM4wkJ7ayeFHk 7oP+UTXL WnAUTwZMeP7sXcirM0PfFj1BFBkKXFz69Oa3oMuiciKPI6bXL2S5LchP7fVIqsx8o4gpq X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Mar 8, 2024 at 1:06=E2=80=AFAM Andrew Morton wrote: > > On Thu, 7 Mar 2024 11:19:52 +0800 Yafang Shao wro= te: > > > After we enabled mglru on our 384C1536GB production servers, we > > encountered frequent soft lockups attributed to scanning folios. > > > > The soft lockup as follows, > > > > ... > > > > There were a total of 22 tasks waiting for this spinlock > > (RDI: ffff99d2b6ff9050): > > > > crash> foreach RU bt | grep -B 8 queued_spin_lock_slowpath | grep "R= DI: ffff99d2b6ff9050" | wc -l > > 22 > > If we're holding the lock for this long then there's a possibility of > getting hit by the NMI watchdog also. The NMI watchdog is disabled as these servers are KVM guest. kernel.nmi_watchdog =3D 0 kernel.soft_watchdog =3D 1 > > > Additionally, two other threads were also engaged in scanning folios, o= ne > > with 19 waiters and the other with 15 waiters. > > > > To address this issue under heavy reclaim conditions, we introduced a > > hotfix version of the fix, incorporating cond_resched() in scan_folios(= ). > > Following the application of this hotfix to our servers, the soft locku= p > > issue ceased. > > > > ... > > > > --- a/mm/vmscan.c > > +++ b/mm/vmscan.c > > @@ -4367,6 +4367,10 @@ static int scan_folios(struct lruvec *lruvec, st= ruct scan_control *sc, > > > > if (!--remaining || max(isolated, skipped_zone) >= =3D MIN_LRU_BATCH) > > break; > > + > > + spin_unlock_irq(&lruvec->lru_lock); > > + cond_resched(); > > + spin_lock_irq(&lruvec->lru_lock); > > } > > Presumably wrapping this with `if (need_resched())' will save some work. good suggestion. > > This lock is held for a reason. I'd like to see an analysis of why > this change is safe. I believe the key point here is whether we can reduce the scope of this lock from: evict_folios spin_lock_irq(&lruvec->lru_lock); scanned =3D isolate_folios(lruvec, sc, swappiness, &type, &list); scanned +=3D try_to_inc_min_seq(lruvec, swappiness); if (get_nr_gens(lruvec, !swappiness) =3D=3D MIN_NR_GENS) scanned =3D 0; spin_unlock_irq(&lruvec->lru_lock); to: evict_folios spin_lock_irq(&lruvec->lru_lock); scanned =3D isolate_folios(lruvec, sc, swappiness, &type, &list); spin_unlock_irq(&lruvec->lru_lock); spin_lock_irq(&lruvec->lru_lock); scanned +=3D try_to_inc_min_seq(lruvec, swappiness); if (get_nr_gens(lruvec, !swappiness) =3D=3D MIN_NR_GENS) scanned =3D 0; spin_unlock_irq(&lruvec->lru_lock); In isolate_folios(), it merely utilizes the min_seq to retrieve the generation without modifying it. If multiple tasks are running evict_folios() concurrently, it seems inconsequential whether min_seq is incremented by one task or another. I'd appreciate Yu's confirmation on this matter. --=20 Regards Yafang