From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 95B5EC3DA59 for ; Tue, 16 Jul 2024 21:54:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CBF2B6B007B; Tue, 16 Jul 2024 17:54:55 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C6EB36B0082; Tue, 16 Jul 2024 17:54:55 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B36B96B0083; Tue, 16 Jul 2024 17:54:55 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 932536B007B for ; Tue, 16 Jul 2024 17:54:55 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 01247120657 for ; Tue, 16 Jul 2024 21:54:54 +0000 (UTC) X-FDA: 82346971350.07.78EB720 Received: from mail-ej1-f49.google.com (mail-ej1-f49.google.com [209.85.218.49]) by imf20.hostedemail.com (Postfix) with ESMTP id 2C85E1C000E for ; Tue, 16 Jul 2024 21:54:51 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=fg+3SnnN; spf=pass (imf20.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.49 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721166853; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=TCz2Kt6Np14OnybqMGjWHHC2wl/sK9b6rMuszryzTDE=; b=Pu75g7Kq0wg/HC4yOZ8Plh69C3xc6FYKpnJcvKktf9wjS5MXEivdUSeq923w0o3QRaKesl xZrX0jdbbcBqbx1RRRvQrb0KwZtKosA2s9mo6lNgN7b2rFms0++JfZfOnN+BiKQvnPf85l l5nG+X7GPLUYZe5vWZnn9QbQi50FWLw= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721166853; a=rsa-sha256; cv=none; b=vCqkaXAR8GlnrrkZkh0taLqzzfdujwEtxvmUNf+U/KRs3B4JM1gbM3IlmO9uJc9aZ+LgYU 66dswADotZeBd1mtMq27dN+w0lg6mkpyqf62NR/wsUqPLDE3iIufL/Sx+nrgQlh1ose5jN 4YNgaeNouuUjZq2ktVsoPVMQjssYdIA= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=fg+3SnnN; spf=pass (imf20.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.49 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-ej1-f49.google.com with SMTP id a640c23a62f3a-a797c62565aso636181366b.2 for ; Tue, 16 Jul 2024 14:54:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1721166890; x=1721771690; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=TCz2Kt6Np14OnybqMGjWHHC2wl/sK9b6rMuszryzTDE=; b=fg+3SnnNwgxjgzR33FV8Wy0CW11Ca240ipdMbOtkm47LESD9rAtYwXYGLGaoYHwEEv Zm6M6s0a550ujK+Oyu8pm0Eu/83/WqaVO9XFiGq/P0YXhXpRQsGKTKb4VnKN9NxSAVz4 oe9N7gvafofLwzxnmJoA/oRF9k99G5QSOms6EPU7NW8Q3YIY0yddc7pT9yTGgnv6WzRh fUs2SnhdjEoUInJODFRioCevA18ZNp9sM6Xo52lJDMEHNQAFVoqFL9wKq/OaThQzV+5t d63qCW1YwlNU/4tV13HNdMJNPDnKzZl99Pv9hPj6VPJjXfGiEHZLq7Hms4CjpKDUmDKP Ew7g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721166890; x=1721771690; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=TCz2Kt6Np14OnybqMGjWHHC2wl/sK9b6rMuszryzTDE=; b=fO7iUvjNChWVstk8g+aO7QQdTYvzJdTtu1lNHZAU7Xu5dBxkKAs8PRq4SYIjBNlS5L IlAYJrdfjVICEHG+vOgV5GCpWoAqCz3eCi1K7YUYs5lOtZZ4IaOWeEwvxfbGww/pEMTU KuU8XzzkXpCN3KlqaCbHN1Kyxkz2sV4mPq8Iiac1pyv5QBzbusSoxXqP1O8wlGSTxKH5 6MGziHWWRtoZzvcM1QgPzNoig9aZbtGyr4agxnIKpkbUF2XprilxxoAgofmFVmFxSdRr FgV44ouk8LefD/1Ac8+GwOze5pM9Z5zxM4A/9NLZ1GtYkcBlrlUe/11KHj7JfUfFpSV2 YDdg== X-Forwarded-Encrypted: i=1; AJvYcCU0m5Qq/oJ3JKiZkI22eYNK6U9k1O1rrOo7FWg0hcWahT/JEmZGsCrR0lDIdOIjOhPB9msu0v1V834hsEftS1rOqeA= X-Gm-Message-State: AOJu0YyGGCL8pNbIgfhw2yg4jx2L19+P1Ig12tzcej8S6EwfLIoTs6d3 qVmGpsbAN7KENlm+9PCU0iITOzEKjkJiSH6C7iDyGhpVNZYL8iAKX6ivzSsQW8TH3zzm/IY13er l3Nsqb5GV3sJRHoEhmDrs0sRRKPHmQZzoeNka X-Google-Smtp-Source: AGHT+IH+L98WmINi2A/vrSb7b+4tKg+cB+n8wwUAWNPhhTY5kTA7Q7W01G2LTbnOXEgy7vUpJa5vdfkX9A2/O6+vCQ8= X-Received: by 2002:a17:906:488:b0:a72:5bb9:b140 with SMTP id a640c23a62f3a-a79eaa73fc1mr237672566b.54.1721166889832; Tue, 16 Jul 2024 14:54:49 -0700 (PDT) MIME-Version: 1.0 References: <171952310959.1810550.17003659816794335660.stgit@firesoul> <171952312320.1810550.13209360603489797077.stgit@firesoul> <4n3qu75efpznkomxytm7irwfiq44hhi4hb5igjbd55ooxgmvwa@tbgmwvcqsy75> <7ecdd625-37a0-49f1-92fc-eef9791fbe5b@kernel.org> <9a7930b9-dec0-418c-8475-5a7e18b3ec68@kernel.org> In-Reply-To: <9a7930b9-dec0-418c-8475-5a7e18b3ec68@kernel.org> From: Yosry Ahmed Date: Tue, 16 Jul 2024 14:54:12 -0700 Message-ID: Subject: Re: [PATCH V4 2/2] cgroup/rstat: Avoid thundering herd problem by kswapd across NUMA nodes To: Jesper Dangaard Brouer Cc: Shakeel Butt , tj@kernel.org, cgroups@vger.kernel.org, hannes@cmpxchg.org, lizefan.x@bytedance.com, longman@redhat.com, kernel-team@cloudflare.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 2C85E1C000E X-Stat-Signature: ur96yzttkrcq3t68n8ugg83nhn1h97t1 X-HE-Tag: 1721166891-241849 X-HE-Meta: U2FsdGVkX1/mt6XvCrOspm+pvjBdfDenJtcCYhx63QGiW+NDfhra4y5j6nbwYSzt7P3F54gEWPwCEGMqlMUJsYNzylj6tfjSlaKOglGhX4HvoZbMjW94+dQ4lWjcrY6lHAQgRJ0pS/rqrQeCcbvv6tjlQg5qyzYXUR0JOKWthzILZR0FYe4HYzqApaTFn7oJv02+vqJHK3+eMdF4qlF8V2k++F2QTVnbY+JMUlj7VjIJUTKp3Qkks89GP2qfqXj+IVshKkbgQq5NWh2Tnr4rdc32IeTpGVb4/J6ixaF9xj3SEIbLXbg5uCmzprVLiLGDhjb519dwY0XcrFqRJT4nc16jXEwF/mm7nE4Y8osPev6HEJse3duuV8pZbXTLESIwEAhjurebFchhxslCCBoOogsxEp86je/0W19ruMKFXczXgdx1UCFC63OqvOskLdo9ZSglM12CQm/57BR4ckwDdjelQm7SKjuxRhU1tvauIfkLfcV154/dqmj0ZJnI6Nw+ibdBYjP+DH3t2tFe2XWVct4roBGwwOnepiQ3pSuKHXIcMm7Xe4TtGpebfXkV4HS2PVP8WKAZgYgMUPhwuc3ok+99pIc8xUA0Dc/88NURvnQlmh7iu8nLyNMEMxXGH69I2yhhoaaKdvOD1LjGKyRisHMgBnHJXp52OdX/qaiZHEk1avuvc4mC/bg23gR4C3/VPf6ywwPV2BW5MtMEt7ojFwW5hr/xv5F667mFqzbmbnc+xQuuarNrLKsQWPMCvtCLnoN3nxr5wq6k/6KL7uzvEjGrim+Xq1IG/txmp4QIJlho1i2oF5OwpnzRlACXcN4zmWLq4XMBi0dYiLMv/skTKWMs9Xv+wTTpkjCOUp5bMpQv13Zk68iQKr5lE5erec+Vh8VRQK1GxPufIGT/LaYZnVqeCjPyV94Z7MZmyR/0OPbjuJj3ufYJzNGky1c04WibyUYwT79i0CfcmwMsGaU Y/4hQxtb 182zGujmRT6pUbO9yugO5anbpRN0V++cfZ81QoOtdsLK/wZnsaFcNwArOsOhDxxjym6oFugS/O1AUUhQkWSXRmFlmSDiLwMc8sE1Jmi+bIP6mkbSt7mBLaUSYLF4DEY5/sISGwuFNpQsHuFdzD3hwpNkFRqXpaGvZ9S5hcSGxrK8ya7+rP+rHNf7YuXXatHInHkSMFWohixtS6l0fkdZtkDPU7XDA6ENccfeV X-Bogosity: Ham, tests=bogofilter, spamicity=0.044453, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Jul 8, 2024 at 8:26=E2=80=AFAM Jesper Dangaard Brouer wrote: > > > On 28/06/2024 11.39, Jesper Dangaard Brouer wrote: > > > > > > On 28/06/2024 01.34, Shakeel Butt wrote: > >> On Thu, Jun 27, 2024 at 11:18:56PM GMT, Jesper Dangaard Brouer wrote: > >>> Avoid lock contention on the global cgroup rstat lock caused by kswap= d > >>> starting on all NUMA nodes simultaneously. At Cloudflare, we observed > >>> massive issues due to kswapd and the specific mem_cgroup_flush_stats(= ) > >>> call inlined in shrink_node, which takes the rstat lock. > >>> > [...] > >>> static void cgroup_base_stat_flush(struct cgroup *cgrp, int cpu); > >>> @@ -312,6 +315,45 @@ static inline void __cgroup_rstat_unlock(struct > >>> cgroup *cgrp, int cpu_in_loop) > >>> spin_unlock_irq(&cgroup_rstat_lock); > >>> } > >>> +#define MAX_WAIT msecs_to_jiffies(100) > >>> +/* Trylock helper that also checks for on ongoing flusher */ > >>> +static bool cgroup_rstat_trylock_flusher(struct cgroup *cgrp) > >>> +{ > >>> + bool locked =3D __cgroup_rstat_trylock(cgrp, -1); > >>> + if (!locked) { > >>> + struct cgroup *cgrp_ongoing; > >>> + > >>> + /* Lock is contended, lets check if ongoing flusher is alrea= dy > >>> + * taking care of this, if we are a descendant. > >>> + */ > >>> + cgrp_ongoing =3D READ_ONCE(cgrp_rstat_ongoing_flusher); > >>> + if (cgrp_ongoing && cgroup_is_descendant(cgrp, cgrp_ongoing)= ) { > >> > >> I wonder if READ_ONCE() and cgroup_is_descendant() needs to happen > >> within in rcu section. On a preemptable kernel, let's say we got > >> preempted in between them, the flusher was unrelated and got freed > >> before we get the CPU. In that case we are accessing freed memory. > >> > > > > I have to think about this some more. > > > > I don't think this is necessary. We are now waiting (for completion) and > not skipping flush, because as part of take down function > cgroup_rstat_exit() is called, which will call cgroup_rstat_flush(). > > > void cgroup_rstat_exit(struct cgroup *cgrp) > { > int cpu; > cgroup_rstat_flush(cgrp); > > Sorry for the late response, I was traveling for a bit. I will take a look at your most recent version shortly. But I do have a comment here. I don't see how this addresses Shakeel's concern. IIUC, if the cgroup was freed after READ_ONCE() (and cgroup_rstat_flush() was called), then cgroup_is_descendant() will access freed memory. We are not holding the lock here so we are not preventing cgroup_rstat_flush() from being called for the freed cgroup, right?