From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 86276C433F5 for ; Thu, 6 Oct 2022 02:14:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1343C6B0071; Wed, 5 Oct 2022 22:14:03 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0E4446B0073; Wed, 5 Oct 2022 22:14:03 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EEE838E0001; Wed, 5 Oct 2022 22:14:02 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id DDD986B0071 for ; Wed, 5 Oct 2022 22:14:02 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id B32EC140653 for ; Thu, 6 Oct 2022 02:14:02 +0000 (UTC) X-FDA: 79988904324.19.381A646 Received: from mail-wm1-f51.google.com (mail-wm1-f51.google.com [209.85.128.51]) by imf09.hostedemail.com (Postfix) with ESMTP id 4E53114000A for ; Thu, 6 Oct 2022 02:14:02 +0000 (UTC) Received: by mail-wm1-f51.google.com with SMTP id e18so193909wmq.3 for ; Wed, 05 Oct 2022 19:14:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date; bh=QDIgC+mJ41actrgXCTNkcAZRdvkb92D80nFmN32xkMY=; b=MolortEC0wzeHJawsNOKpE3qaPOkxU08GmWWLNDWfHyDm/aTELESlGfZS8NQo3fCTG xSuHidQxKGXplB8XaC30T/WpsDmhzrLmXue95N90MbK4cRg9aecENdsRXMvqHHNFpJwY HmJ9H6+h5DU12Flt4zv/z23P7v3zHTbfuX+rl1pXKcu12gw1F1lBg0VGO+PyfHx7F3ep Xicq/Wc1j/VUprPgWjNwoHtiDns9LiDWt9oGoPCxkOB6YSkog7CMNsnILnDXaq58O9i2 fvqz3twfcYSdCZ5XF9XCsMi7piGAU0XcEdSlWCiRjiwgUa2XqwYkRodkA2uDAhj9Ur5G jwFQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date; bh=QDIgC+mJ41actrgXCTNkcAZRdvkb92D80nFmN32xkMY=; b=hIizGrz1FkcFxTY/Jb0iHQe6VRf4YT6oJfI1M5Bus2JKQUFlgroRpHpy70IyJXymqH xU2O2e2SYAw/M2nMigcIw1+hTP+PPZb0AIVIERR+RZl13jhIgDX1jm+JyBXbvXhiktEq tUw5vV05xOkLc2ysygwjTtrJKBdsCoB5zchLRWCVxOBA6Xhcc/FGGOlrX5PN+OCxMD4m mdDrSHLNBb1NyOX30lxewvOyQS3Fv/LHnSKyAxUZH6d8AMuvsTXi2GFES7msV78hPtVo Zkooyfb3/UJxuO3L9LuqG/+bJl9VBHWoFWnSwyRvrWcpE6wxiDJU0WS+8ytDcMwlHE+Q qrRg== X-Gm-Message-State: ACrzQf2aA7FDeX7F9qanY+Loaox1lflLJx5ufdPEsqOB958SC0wSp5eW AXyL7wmYgrdWzbAMPpEPmnKeul9AeVwEWmQwarGBEA== X-Google-Smtp-Source: AMsMyM7g2v817+WfXcJ3VUZSGaeCsYGQ50Zr/LsYqL5aSmzkd8byLt3oZ9TuDdmNBJucGLzQuECecZYCn/DEd4+GOSg= X-Received: by 2002:a7b:c4c1:0:b0:3bf:e351:4ba with SMTP id g1-20020a7bc4c1000000b003bfe35104bamr2939233wmk.152.1665022440837; Wed, 05 Oct 2022 19:14:00 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Yosry Ahmed Date: Wed, 5 Oct 2022 19:13:24 -0700 Message-ID: Subject: Re: [RFC] memcg rstat flushing optimization To: Tejun Heo Cc: Zefan Li , Johannes Weiner , Michal Hocko , Shakeel Butt , Roman Gushchin , =?UTF-8?Q?Michal_Koutn=C3=BD?= , Andrew Morton , Linux-MM , Cgroups , Greg Thelen Content-Type: text/plain; charset="UTF-8" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1665022442; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=QDIgC+mJ41actrgXCTNkcAZRdvkb92D80nFmN32xkMY=; b=6JfJvpQ6+EC91BMibGffvticqx9WnKZHrtOpyMDloXpugbLk/7xT4OpnfPgs74LIpzrxbJ hgkTVXxHjq2nWY3vNsLvzUYZPOD7mwVVxdJJ1kBGgAienCux4oz8BDBfvqwOZT7Y0U56Vt BdEu/hTXqB/5EhD48drwbbl9+7B/7dE= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=MolortEC; spf=pass (imf09.hostedemail.com: domain of yosryahmed@google.com designates 209.85.128.51 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1665022442; a=rsa-sha256; cv=none; b=WaZdpExTyBX4rqYITEkLD4aTJ3Fsoky8+Ogy4sVH59n6Qd/u13j9ec2a3OifUnlt3H5Pd8 uBoZIQTV+fQ23pRD52/TcT65cXOM7jfFi0B3masDEyFoF7hBiPvNouAzp8S+CiSfb6KLjm k/ls2zs62NW3WBQVHZYNfEhBVvNoGWg= X-Rspamd-Queue-Id: 4E53114000A Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=MolortEC; spf=pass (imf09.hostedemail.com: domain of yosryahmed@google.com designates 209.85.128.51 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspamd-Server: rspam06 X-Rspam-User: X-Stat-Signature: 5ngz8ty564rfencgqam6gsxasekpen3h X-HE-Tag: 1665022442-683592 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Oct 5, 2022 at 11:38 AM Yosry Ahmed wrote: > > On Wed, Oct 5, 2022 at 11:22 AM Tejun Heo wrote: > > > > Hello, > > > > On Wed, Oct 05, 2022 at 11:02:23AM -0700, Yosry Ahmed wrote: > > > > I was thinking more that being done inside the flush function. > > > > > > I think the flush function already does that in some sense if > > > might_sleep is true, right? The problem here is that we are using > > > > Oh I forgot about that. Right. > > > > ... > > > I took a couple of crashed machines kdumps and ran a script to > > > traverse updated memcgs and check how many cpus have updates and how > > > many updates are there on each cpu. I found that on average only a > > > couple of stats are updated per-cpu per-cgroup, and less than 25% of > > > cpus (but this is on a large machine, I expect the number to go higher > > > on smaller machines). Which is why I suggested a bitmask. I understand > > > though that this depends on whatever workloads were running on those > > > machines, and that in case where most stats are updated the bitmask > > > will actually make things slightly worse. > > > > One worry I have about selective flushing is that it's only gonna improve > > things by some multiples while we can reasonably increase the problem size > > by orders of magnitude. > > I think we would usually want to flush a few stats (< 5?) in irqsafe > contexts out of over 100, so I would say the improvement would be > good, but yeah, the problem size can reasonably increase more than > that. It also depends on which stats we selectively flush. If they are > not in the same cache line we might end up bringing in a lot of stats > anyway into the cpu cache. > > > > > The only real ways out I can think of are: > > > > * Implement a periodic flusher which keeps the stats needed in irqsafe path > > acceptably uptodate to avoid flushing with irq disabled. We can make this > > adaptive too - no reason to do all this if the number to flush isn't huge. > > We do have a periodic flusher today for memcg stats (see > flush_memcg_stats_dwork). It calls __mem_cgroup_flush_stas() which > only flushes if the total number of updates is over a certain > threshold. > mem_cgroup_flush_stas_delayed(), which is called in the page fault > path, only does a flush if the last flush was a certain while ago. We > don't use the delayed version in all irqsafe contexts though, and I am > not the right person to tell if we can. > > But I think this is not what you meant. I think you meant only > flushing the specific stats needed in irqsafe contexts more frequently > and not invoking a flush at all in irqsafe contexts (or using > mem_cgroup_flush_stas_delayed()..?). Right? > > I am not the right person to judge what is acceptably up-to-date to be > honest, so I would wait for other memcgs folks to chime in on this. > > > > > * Shift some work to the updaters. e.g. in many cases, propagating per-cpu > > updates a couple levels up from update path will significantly reduce the > > fanouts and thus the number of entries which need to be flushed later. It > > does add on-going overhead, so it prolly should adaptive or configurable, > > hopefully the former. > > If we are adding overhead to the updaters, would it be better to > maintain a bitmask of updated stats, or do you think it would be more > effective to propagate updates a couple of levels up? I think to > propagate updates up in updaters context we would need percpu versions > of the "pending" stats, which would also add memory consumption. > A potential problem that I also noticed with propagating percpu updates up on the update path is that we will need to update memcg->vmstats_percpu->[state/event/..]_prev. Currently these percpu prev variables are only updated by rstat flushing code. If they can also be updated on the update path then we might need some locking primitive to protect them, which would add more overhead. > > > > Thanks. > > > > -- > > tejun