From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1079FC3DA42 for ; Wed, 17 Jul 2024 16:05:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 87CFF6B0083; Wed, 17 Jul 2024 12:05:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 82C8E6B0092; Wed, 17 Jul 2024 12:05:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 71B7F6B0093; Wed, 17 Jul 2024 12:05:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 50CB76B0083 for ; Wed, 17 Jul 2024 12:05:56 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id F212F120B18 for ; Wed, 17 Jul 2024 16:05:55 +0000 (UTC) X-FDA: 82349720670.08.7CA7800 Received: from mail-ed1-f52.google.com (mail-ed1-f52.google.com [209.85.208.52]) by imf14.hostedemail.com (Postfix) with ESMTP id F05BD10001D for ; Wed, 17 Jul 2024 16:05:53 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=1pPmdxot; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf14.hostedemail.com: domain of yosryahmed@google.com designates 209.85.208.52 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721232309; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=7XzbkbJcXikIhEb+ZP9zYLgNCMH4osHfm1Sg9VEiM7Q=; b=eqK6Ts+emHB4Zf/SXXF4sgAj35OADHpK2OzSartauGyTVKJx/1Lw/KWNkuAUPdtxd7rXH6 UdNAynOz1ke7jsPU5J2ySG9l4IzzfJ4PqiAhwTewq3w4e0AAiwyIs1SmqPT7H3OO+LlV0o I2sLRf87sc3gfuiPnCZ4f3+mDiMvmWQ= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721232309; a=rsa-sha256; cv=none; b=fpSk+pbjvT//IPYH1ONowaadZ5ez3tgpXgsIA3BbsUZBBEduEHbSE5flNYRkvoVhXyj10x N5J+Bo34tdW6juKOS/XFLq57JhTxlGxMhkRN9mQcy86DLY5YLldudTBCc0iMAcSIwwasEV A0SS5+EQV8FF1/LbSBcTKcyYhEnq4HY= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=1pPmdxot; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf14.hostedemail.com: domain of yosryahmed@google.com designates 209.85.208.52 as permitted sender) smtp.mailfrom=yosryahmed@google.com Received: by mail-ed1-f52.google.com with SMTP id 4fb4d7f45d1cf-58f9874aeb4so8561146a12.0 for ; Wed, 17 Jul 2024 09:05:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1721232352; x=1721837152; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=7XzbkbJcXikIhEb+ZP9zYLgNCMH4osHfm1Sg9VEiM7Q=; b=1pPmdxotd+sIaNrMwIiaWyTGMp7fOAz7kHat7bIwXP2JTKnbe+sXjQYeEVB4GwTXVu 6oRxRRHfWbJFthTTRELtZx12AadgSKnylVPM9dfBjA4WxKwcjV/3sw+K4R92yewtPArr pY7vl3Ic+r0gw/vTMxa+Cl7WwdXSf10QZpNI03oBKc4PwR4EKJZkZq6Nzu+C68O1d4Zg kkVsd4UA5BpdTRS/2Xgh8q2Ke9Ysb7H9kFZl65bDvd4iBnqlQ9UzQg0dCzgPMD7yTq8s Wlzy1qV04Wy0vBBUzWaoqacSMEOL2No2Y+Jdmg5mNw9K1A7el0eRMS3AOkMd8Eu2kt+g V6hw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721232352; x=1721837152; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=7XzbkbJcXikIhEb+ZP9zYLgNCMH4osHfm1Sg9VEiM7Q=; b=ELepe16gDE8UvcKVRLdkDEQIiXk46b6KQnx7vPaYxjaxCV8hmymetmtq2DFBE6BKqr 2sGOANnUStmI4ek+5ZRtSuDFGLkQq7Gg8T3IT/G/bTioTFiShZ1YgukpxiMTUb9JUR3H KvWdeujl7Qtfdd6re7c/qdMGNab5f3HM3AkyM507eFAq8tJldMheUGSRqIwJzjkUpgMH J76gas8iOsjCRdn/BBdnbmauFtRQAgxc+TbbShqqDKZD71aC51E3AenmnwRzAPXHun9K QZYWU9HadpsfgMGzJVQKaxefCqgdn2e+xwXmZEImftM9CaFF5ISiGdP0CjpAk9bn7cI7 TiiQ== X-Forwarded-Encrypted: i=1; AJvYcCUDfjrz1suuGDbUF5Mm1FbAYKkNSSxFenno5JWPSTjm+xBX/dFxVO32fMvbJ3AAhcEQmRepCuEj5SOu9hIyQ3F3UFI= X-Gm-Message-State: AOJu0YxzUVOWsQlWeMBSrG6KiL+QuDVLz5AR7uH/XLpKzUq17NWgC0t9 EYvEa2YF+nBGY+1PYNUyatAQU+H6NvJFqe2g4DzbFqn57XSFYSAAiqA5jBuV29pgEc0mJ2UyIUI sRFaM2WnAqIz0n8jLvVWTglNu5+uOMoHDz8Hv X-Google-Smtp-Source: AGHT+IF612kZUE8JtTrcOb7CBHDJfT/vFSLwBjbycByRk2UQLAz14lfAY2/3C531BxfWpFzgaaEvnI7pwI9M7gPPalQ= X-Received: by 2002:a17:906:1444:b0:a6f:e456:4207 with SMTP id a640c23a62f3a-a7a01352c5fmr121415666b.61.1721232351397; Wed, 17 Jul 2024 09:05:51 -0700 (PDT) MIME-Version: 1.0 References: <172070450139.2992819.13210624094367257881.stgit@firesoul> <623e62c5-3045-4dca-9f2c-ed15b8d3bad8@redhat.com> In-Reply-To: <623e62c5-3045-4dca-9f2c-ed15b8d3bad8@redhat.com> From: Yosry Ahmed Date: Wed, 17 Jul 2024 09:05:15 -0700 Message-ID: Subject: Re: [PATCH V7 1/2] cgroup/rstat: Avoid thundering herd problem by kswapd across NUMA nodes To: Waiman Long Cc: Jesper Dangaard Brouer , tj@kernel.org, cgroups@vger.kernel.org, shakeel.butt@linux.dev, hannes@cmpxchg.org, lizefan.x@bytedance.com, kernel-team@cloudflare.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: F05BD10001D X-Stat-Signature: by16c34ciaaefcs4nhepsizfqsddrrw5 X-Rspam-User: X-HE-Tag: 1721232353-651349 X-HE-Meta: U2FsdGVkX1/RRoEGvrQpYPsVnYpiuPm3AeDhv8e6saFEH2qt1RyUmdptorhxHRpjAtTgJ2VVZ3WdQA9Isk+WDmzB4FzRLgaVY6eyWnHOij0DSMWThB9R6ke+nKk+5dnBQMHknyuUk/cOKoHexpGz2W7yH2WOH6NuQwCzGBGaIkag7jBQI3THRE4exWCehjDBKod0fQkvC5R1+TQ21n/gjuHfVzwtsbYW+pynDebUU7UXUKg02i3HJjh881C/aGR6fhNfEDX9K671RNgSpCoee0Nq9RMzessHfR3HMb6jS9n2LiwnanaKaFW9FA11bRBUreztzk4EWXiyVjxqN6BZdV3F1jFgJ1GCsR0v2so+me0n7nkfG+mmRPU0Ktl+afofVgaS5YvNtqGP0N6+rfhReWz9s4ps/ltQjPIBMdxg2ord8lEZR8unK8FQcG9La+k9pA5kgsQVileZ5CwX9O9cWhaqpDzTsM2I5payKX89tlov2FJBMcOMQp+tPGWRfCei7PD0AwhuHHuoG/3Mdht82LOW6+hWIPhFpDIKu+fU7IzQQvkQYS8ixEGePhbTALng4xNVAO6y5wPTtXISwXYRzceVwOI/VC/ZFSavgDBGVFWw7VVxRUkjThsB06SdJrkDo+t7P7MgXi4TbDd2YZW6++HN7etBCO6bq2vBRpDspI3wJOCllFNiGPb1F4oJk4sl8RsXkqKgn/BmAo/NGguM8FPs1TcDjo7VNK+cbYiCyF/FicT50RuRXRq8WQjn9iaa47zHYYVKy52+XFFZ71GeKwG8sgl3j6AtRjCjQMR2eDLqhxdSLvyQzYItrIezEzoj3tzPsaSbK/RFlRObVCEJyefDNJWFi+MvqkE3h3+BONVprO0OAL6DmSoM+nadVGG9fBN+D9Rw4TjXJUNKnYVsfiSqDqG/rH8RLnvv0i7XqhdVSkTesLRnlqpEQmf9G9zfuYBc1sBn2F0z15ebuaH 3Y+nnzPt A4OHu+p9d8i4LhTm8fzl84Lq16EimSTcfg+0HFCvfGyxp8E7KhBp00eWd5FL4+T6OAVpO/pmnAB65rK1SWc//B0XMhEZwnoJvS5aZ3E9Acda0615WnelTisrsCuwlMRS0Oj2BbZ3jP/x2pfDP2vRKodcB+Q== X-Bogosity: Ham, tests=bogofilter, spamicity=0.071124, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Jul 16, 2024 at 8:00=E2=80=AFPM Waiman Long wr= ote: > > On 7/16/24 20:35, Yosry Ahmed wrote: > > [..] > >> > >> This is a clean (meaning no cadvisor interference) example of kswapd > >> starting simultaniously on many NUMA nodes, that in 27 out of 98 cases > >> hit the race (which is handled in V6 and V7). > >> > >> The BPF "cnt" maps are getting cleared every second, so this > >> approximates per sec numbers. This patch reduce pressure on the lock, > >> but we are still seeing (kfunc:vmlinux:cgroup_rstat_flush_locked) full > >> flushes approx 37 per sec (every 27 ms). On the positive side > >> ongoing_flusher mitigation stopped 98 per sec of these. > >> > >> In this clean kswapd case the patch removes the lock contention issue > >> for kswapd. The lock_contended cases 27 seems to be all related to > >> handled_race cases 27. > >> > >> The remaning high flush rate should also be addressed, and we should > >> also work on aproaches to limit this like my ealier proposal[1]. > > I honestly don't think a high number of flushes is a problem on its > > own as long as we are not spending too much time flushing, especially > > when we have magnitude-based thresholding so we know there is > > something to flush (although it may not be relevant to what we are > > doing). > > > > If we keep observing a lot of lock contention, one thing that I > > thought about is to have a variant of spin_lock with a timeout. This > > limits the flushing latency, instead of limiting the number of flushes > > (which I believe is the wrong metric to optimize). > > Except for semaphore, none of our locking primitives allow for a timeout > parameter. For sleeping locks, I don't think it is hard to add variants > with timeout parameter, but not the spinning locks. Thanks for pointing this out. I am assuming a mutex with a timeout will also address the priority inversion problem that Shakeel was talking about AFAICT.