From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 17A23C433F5 for ; Wed, 5 Oct 2022 16:30:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 534446B0072; Wed, 5 Oct 2022 12:30:11 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4BCDF6B0073; Wed, 5 Oct 2022 12:30:11 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 30F9E6B0074; Wed, 5 Oct 2022 12:30:11 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 166CE6B0072 for ; Wed, 5 Oct 2022 12:30:11 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id CFF7016068C for ; Wed, 5 Oct 2022 16:30:10 +0000 (UTC) X-FDA: 79987432980.02.9D8B47E Received: from mail-pl1-f181.google.com (mail-pl1-f181.google.com [209.85.214.181]) by imf02.hostedemail.com (Postfix) with ESMTP id 6E9278000F for ; Wed, 5 Oct 2022 16:30:10 +0000 (UTC) Received: by mail-pl1-f181.google.com with SMTP id c24so15839407plo.3 for ; Wed, 05 Oct 2022 09:30:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:sender :from:to:cc:subject:date:message-id:reply-to; bh=DIgSyoN94ADqU20gRcmOmyh9FdirVR3/a/n9qkAKtZM=; b=pJ841aP9R3YYjlZHcaKL4JqVDmnW1R6uCt2rV3ba70kepXv9mH8Oyq3he8OA52pVIc AKoD0jCdja0PT3R8DuX3PLkQ5yJ3Wbw+fQM1eOztB9+2PS6HhGG7Rok0CIaPbOxtqFoy qs+eHZMrAPb0R7SuoK5wD7yEPhMW7c/c5Wo3KMATlnL0jcxCVwezL1eN/JgacZud926T vN3bmZUfBdgYXOH+5ASAaa7nhO4k2oOTEnbtcu5YqhyKnWSJQnLJXdv3CdV978h69iyF T8EC84w1Vu7YLVN1BpDrhhHEcfR9xL+SMTNfASMjk7TCyz4psKsKYFOeFpgtn294FIkJ RCBg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:sender :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=DIgSyoN94ADqU20gRcmOmyh9FdirVR3/a/n9qkAKtZM=; b=MdieQJ9qgGLQVU8H1tGVnhDj41e51WYz/EDQO8S5GsbJXIT6Z25EWHrzaxEQ0TXX9r lUcN/cRpoICLJTCXT3xMgDTCj0qWBSDrTeRqd0+dgD5/x9Pm0lhFGgAeyfEbnXVbRAYC OfH8UM19yW4fIQIJsuIlo5eINdwFDsk8+ZMaq+xiMoBwRXAbgk3qbH4BPf0oVk7LJ5dG BWZr7Y1gsPUGrKQ+lFAH6e8JT3F+sUNP2qlSIqZJiyXgobPAfUBcUSTBbDbnuqdho4ST yfBWszbc3cD6r1ADouAHJKOtQTPIOFEi4he/3QAH23KaP1jMl/v26ki1Jl7Mf40SNRV4 ow7A== X-Gm-Message-State: ACrzQf0UWMCPpvtFabxKmtsrp/tTsCZyr/meolKlBCRp+XoChAxVr3Hb KSr37NO5g5tsDzWlm503RvA= X-Google-Smtp-Source: AMsMyM4dZa2wFlO2eoUttBpzx40l+TSN4R3EXA/xgUAQygXzXxfhmBqeBLXW4BHo/49MKxyfo+FMJg== X-Received: by 2002:a17:903:2691:b0:17a:8f3:bef0 with SMTP id jf17-20020a170903269100b0017a08f3bef0mr212119plb.17.1664987409070; Wed, 05 Oct 2022 09:30:09 -0700 (PDT) Received: from localhost (2603-800c-1a02-1bae-a7fa-157f-969a-4cde.res6.spectrum.com. [2603:800c:1a02:1bae:a7fa:157f:969a:4cde]) by smtp.gmail.com with ESMTPSA id n9-20020a17090a160900b001f319e9b9e5sm1332146pja.16.2022.10.05.09.30.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 05 Oct 2022 09:30:08 -0700 (PDT) Date: Wed, 5 Oct 2022 06:30:06 -1000 From: Tejun Heo To: Yosry Ahmed Cc: Zefan Li , Johannes Weiner , Michal Hocko , Shakeel Butt , Roman Gushchin , Michal =?iso-8859-1?Q?Koutn=FD?= , Andrew Morton , Linux-MM , Cgroups , Greg Thelen Subject: Re: [RFC] memcg rstat flushing optimization Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1664987410; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=DIgSyoN94ADqU20gRcmOmyh9FdirVR3/a/n9qkAKtZM=; b=udAGTzESEEZgXC39rxs3I46D2D6X/EKsH+SXhAS8p/7xeKQSEb8oq0zIvjfHMt5p7RBwGQ SXA5XF027RRf8TQfeq8TUfOlogzuD/sTtc9QidfwOp154u/Ig4f8sX5J5hbgzldo/41Ngq AG8v71EzI+oKlDKVLAfwvJOmrH+lsr8= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=pJ841aP9; spf=pass (imf02.hostedemail.com: domain of htejun@gmail.com designates 209.85.214.181 as permitted sender) smtp.mailfrom=htejun@gmail.com; dmarc=fail reason="SPF not aligned (relaxed), DKIM not aligned (relaxed)" header.from=kernel.org (policy=none) ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1664987410; a=rsa-sha256; cv=none; b=Z6jaihTsAYr5toGlSCKw4ZEwkR5KdcXTzptpkI0sZhlcsh9EQH8qX4aoLkCnaZKglc92Yk jGQ3u7lrMCPOWcuc7SDaQ4uORQdoHtuaF1b69p542QV4ZzrzbbFTYlYn3k+5DuYg02GiCY e+lZvGzm9/nm0AvSDvFy1+5S+GNaNQQ= X-Rspam-User: Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=pJ841aP9; spf=pass (imf02.hostedemail.com: domain of htejun@gmail.com designates 209.85.214.181 as permitted sender) smtp.mailfrom=htejun@gmail.com; dmarc=fail reason="SPF not aligned (relaxed), DKIM not aligned (relaxed)" header.from=kernel.org (policy=none) X-Stat-Signature: i35ozit6zs8q7nobtgqjz6zqyoczzrbc X-Rspamd-Queue-Id: 6E9278000F X-Rspamd-Server: rspam02 X-HE-Tag: 1664987410-838452 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hello, On Tue, Oct 04, 2022 at 06:17:40PM -0700, Yosry Ahmed wrote: > We have recently ran into a hard lockup on a machine with hundreds of > CPUs and thousands of memcgs during an rstat flush. There have also > been some discussions during LPC between myself, Michal Koutný, and > Shakeel about memcg rstat flushing optimization. This email is a > follow up on that, discussing possible ideas to optimize memcg rstat > flushing. > > Currently, mem_cgroup_flush_stats() is the main interface to flush > memcg stats. It has some internal optimizations that can skip a flush > if there hasn't been significant updates in general. It always flushes > the entire memcg hierarchy, and always invokes flushing using > cgroup_rstat_flush_irqsafe(), which has interrupts disabled and does > not sleep. As you can imagine, with a sufficiently large number of > memcgs and cpus, a call to mem_cgroup_flush_stats() might be slow, or > in an extreme case like the one we ran into, cause a hard lockup > (despite periodically flushing every 4 seconds). How long were the stalls? Given that rstats are usually flushed by its consumers, flushing taking some time might be acceptable but what's really problematic is that the whole thing is done with irq disabled. We can think about other optimizations later too but I think the first thing to do is making the flush code able to pause and resume. ie. flush in batches and re-enable irq / resched between batches. We'd have to pay attention to guaranteeing forward progress. It'd be ideal if we can structure iteration in such a way that resuming doesn't end up nodes which got added after it started flushing. Thanks. -- tejun