From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B4241C74A5B for ; Thu, 23 Mar 2023 16:37:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 214D06B0071; Thu, 23 Mar 2023 12:37:01 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1C4766B0072; Thu, 23 Mar 2023 12:37:01 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 08CF96B0074; Thu, 23 Mar 2023 12:37:01 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id EDDF86B0071 for ; Thu, 23 Mar 2023 12:37:00 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 1F053A06E7 for ; Thu, 23 Mar 2023 16:37:00 +0000 (UTC) X-FDA: 80600717400.04.46D684D Received: from mail-ed1-f44.google.com (mail-ed1-f44.google.com [209.85.208.44]) by imf07.hostedemail.com (Postfix) with ESMTP id 3804A4000B for ; Thu, 23 Mar 2023 16:36:56 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=A+p59gcS; spf=pass (imf07.hostedemail.com: domain of yosryahmed@google.com designates 209.85.208.44 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1679589417; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=zI1OjhppbjxP5gm//hDBy+RdPuj4ToAxdCG6kel4DWE=; b=NqBJ/x2kO53U+vubb6+NB0iNyZc9qQFSfosiiA4PdQZTaVvz0b01zgf/vn8mPhNsP2TRxq W/feOQ2/sU6aptOJESQubl1J1NnNker90g++glVd3G+wEm2qR3ltZUukJEVZyHX1fgnYKz TKKbebfYeSydxSEe3OfTqVZH+iFqamc= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=A+p59gcS; spf=pass (imf07.hostedemail.com: domain of yosryahmed@google.com designates 209.85.208.44 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1679589417; a=rsa-sha256; cv=none; b=UOd4/oA/MpH2xqIcJhBwOzjGMSMVrLEaeXA6HqNXW6DovP3LDxXalqVa02oPCDh3zGuxZd cx8xp8idnGaUgHWjg8gvMWtewYpAGfgVd1ilegMKJj2+Z84AuVSiihm+ZTWy4ELM7NANbM DNXmSjH5sgYLfvZLXfk6XqkuXWCmP2Q= Received: by mail-ed1-f44.google.com with SMTP id cn12so43747551edb.4 for ; Thu, 23 Mar 2023 09:36:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1679589415; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=zI1OjhppbjxP5gm//hDBy+RdPuj4ToAxdCG6kel4DWE=; b=A+p59gcSMpaGuinyKCOkPl7rgUJJ70UmgDqUkVTD3B9rgQ7uHV6j0HDtHmK2b6R8J+ 2/W1xAnGe9EtNhtO3J4UWlZYNadlYe/pQgiFRbc72HImcZJAX4zF33moW/GMZWhD5z2r 9tIEbrowrjIaYTkv6X9H/wnQxjsoPpkwT4hqIqavlmqQ6bRyNCx8YthG45c3lvwquAjs 2MQdXNkp0JfThOZdK+pUJdSsddDI72vGYhy7ZNth5151+8iycOuvjTfc1vGbgovDUghT XGmNOxuIyQvAxdyTwpxdf/mtm1jm2UDm0hbpIGDh2ACT1/fVvTSV9URC48IavjDL6qaO +aoA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679589415; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=zI1OjhppbjxP5gm//hDBy+RdPuj4ToAxdCG6kel4DWE=; b=LfAvD6X5nHqJrOSBTZtuEyF56TdKkGsl7QjzXR+RckzNtKeG54OMIEtJxU/0GtOJEj 1f4lfhNuT/HRuKHWjENxcjoqAtktRqHnZ+UGQjT4AKI9GiNNurNyOuXiI3jBuBWVAHPI fQ0E3+b3Kk8kYT9aZgHxQHO5RW1vbu+W7t3N9E8/kV0lYZ4Ums6pmsvmHVbVFEV1Kn52 P3EMEd7/RAGO5Y2tpKIY/MDdP2XDdb/qKjHyYTBeLmSCM8GXjirvdze8iShzKS5jyaNY 2lyhQS0ps4a9xm+aTi8AFUquL+iepYq1ebhy9vUjkHZis743shogDwVPbukSD5UF8+fC HDLQ== X-Gm-Message-State: AO0yUKWjYgZGq1EquJPlmNLRMlBTEll5NlZaREBUpMkrOWpeNk1T67J2 PPmnFIND1eDkAivadDOLgoO+PF32azgjf4f7KBLpVQ== X-Google-Smtp-Source: AK7set/zgDFAhT/AM8a6Iyv/UQT+nfT7sT7yCYOdcZTocVNNSH5tLAORJonWRVtQoR3BIHOfcuMdY9wtMzm3fWoaxJM= X-Received: by 2002:a17:906:bccd:b0:8b1:28f6:8ab3 with SMTP id lw13-20020a170906bccd00b008b128f68ab3mr5657477ejb.15.1679589415489; Thu, 23 Mar 2023 09:36:55 -0700 (PDT) MIME-Version: 1.0 References: <20230323040037.2389095-1-yosryahmed@google.com> <20230323040037.2389095-2-yosryahmed@google.com> In-Reply-To: From: Yosry Ahmed Date: Thu, 23 Mar 2023 09:36:19 -0700 Message-ID: Subject: Re: [RFC PATCH 1/7] cgroup: rstat: only disable interrupts for the percpu lock To: Shakeel Butt Cc: Tejun Heo , Josef Bacik , Jens Axboe , Zefan Li , Johannes Weiner , Michal Hocko , Roman Gushchin , Muchun Song , Andrew Morton , Vasily Averin , cgroups@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam03 X-Stat-Signature: jsarnhqpbi3xmegw7hzhw48kgr6c4sk5 X-Rspamd-Queue-Id: 3804A4000B X-HE-Tag: 1679589416-814637 X-HE-Meta: U2FsdGVkX1/uKpDHRAim3axmXwHFzSO1qJC40PoYbD5SqSNUD4S6E8GlSC/fyXSyWuYw8VAvYXnqAE67Dr8k/PcNgRTH/z0s6wW4xAtMBGP5RFMPpdu4H5ErdewnOZvua/M8ImlL//9GG4QAEntPOWiS0j/W00uMpds/X3Aats3mk3KSzFN/ceoAgAh920kuiUrEqz65etxhA38AiMYV6dNaj9xOPRws3zNuM+9lswWBpss1o8pcLLR7AXmOrsDIGsEpL0Zt0rVkt7mSUsjI7exd6vyGdfs5R7evTppdoj//NBcgXnmytTWIUGBUZukbQQRHvbis29K+rVZscSRgcjJ19BhY1jhyuW088Cmsgmrl+qFpPV2OH9SjIhfMwjQAvcz4SqDSRvwtIPOzLusnh1qhQHYaiIGhkYARjr6IWb9EyR/mDlZAAqFDT8Pp9IpS9VhGFBpuII0AYEAslsjeT/mTZ9TT8bik4PI4bwwae+jCnTxaODDtFfGP5P4AzKnDDCD1vfbjpHUmSHmexBh5YfnzEts/xlZNIOo45pn7bwfdlFHPFSn9H8ANLnTmYtQKXAOq25+0619EEGfRNDgEguShW2qF+wcAyszlQvNgbzrN3sPboFDzcSY6+8bAEDMprFw3iQnjjA7n4VmyzyowlMTikAac2IStdyMA7ZKQFj9Kl5TBHrX8rCpn3pTLCcDsGh9de3xWTudUViqtpKTvPkhJ7Jnxh/GryDiSwwjeHEKg/E1+X2JlmuPgarIMLVqlqHRQrILkE5K5hF/fUkkuEOJPr1RxKMFAD5iYhfaRbFFR6Egn2p4u+Pk6m+QMVsHRB6oLl0a3M5LKleKwabg3qLmwptORasMUb8FwDrCcb9UAoXt87GdlHXV6t5Jsjegw2hIFCsh6OkGSbkICdgdWJydvw7ZK0EEbPuF9wR0Qg28IMH+hJow9Mie877Ia+3D7JLdfpNXJ3DsC0vPAjpd YsuukD+r a1kqqtLl18j8TYIafxQtOfA2pCpyb9BAwL71IXQHXd7W8ZfyBYhiXycjO9S2nFXRtvrHtkAaWGtycAKkI5RSrLKiX8WnSkVhGDNKzWI/RAgqSUvWp7t7ArW9RZcW+dMKqNEjiXz96peISxr8C/UhT7xVCxctwufGbOS7wY+HVbeTalM7AHWAtjXFpIj+8JlRDXgUB1RmTvv2hvxIYzEcCqBy8lf84kMHtVuWEEfG/RfNxxsEJIkotbtSKawOLGtSlGHJRMhsuhZA1krw= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Mar 23, 2023 at 9:29=E2=80=AFAM Shakeel Butt = wrote: > > On Thu, Mar 23, 2023 at 9:18=E2=80=AFAM Yosry Ahmed wrote: > > > > On Thu, Mar 23, 2023 at 9:10=E2=80=AFAM Shakeel Butt wrote: > > > > > > On Thu, Mar 23, 2023 at 8:46=E2=80=AFAM Shakeel Butt wrote: > > > > > > > > On Thu, Mar 23, 2023 at 8:43=E2=80=AFAM Yosry Ahmed wrote: > > > > > > > > > > On Thu, Mar 23, 2023 at 8:40=E2=80=AFAM Shakeel Butt wrote: > > > > > > > > > > > > On Thu, Mar 23, 2023 at 6:36=E2=80=AFAM Yosry Ahmed wrote: > > > > > > > > > > > > > [...] > > > > > > > > > > > > > > > > > > > 2. Are we really calling rstat flush in irq context? > > > > > > > > > > > > > > > > > > I think it is possible through the charge/uncharge path: > > > > > > > > > memcg_check_events()->mem_cgroup_threshold()->mem_cgroup_= usage(). I > > > > > > > > > added the protection against flushing in an interrupt con= text for > > > > > > > > > future callers as well, as it may cause a deadlock if we = don't disable > > > > > > > > > interrupts when acquiring cgroup_rstat_lock. > > > > > > > > > > > > > > > > > > > 3. The mem_cgroup_flush_stats() call in mem_cgroup_usag= e() is only > > > > > > > > > > done for root memcg. Why is mem_cgroup_threshold() inte= rested in root > > > > > > > > > > memcg usage? Why not ignore root memcg in mem_cgroup_th= reshold() ? > > > > > > > > > > > > > > > > > > I am not sure, but the code looks like event notification= s may be set > > > > > > > > > up on root memcg, which is why we need to check threshold= s. > > > > > > > > > > > > > > > > This is something we should deprecate as root memcg's usage= is ill defined. > > > > > > > > > > > > > > Right, but I think this would be orthogonal to this patch ser= ies. > > > > > > > > > > > > > > > > > > > I don't think we can make cgroup_rstat_lock a non-irq-disabling= lock > > > > > > without either breaking a link between mem_cgroup_threshold and > > > > > > cgroup_rstat_lock or make mem_cgroup_threshold work without dis= abling > > > > > > irqs. > > > > > > > > > > > > So, this patch can not be applied before either of those two ta= sks are > > > > > > done (and we may find more such scenarios). > > > > > > > > > > > > > > > Could you elaborate why? > > > > > > > > > > My understanding is that with an in_task() check to make sure we = only > > > > > acquire cgroup_rstat_lock from non-irq context it should be fine = to > > > > > acquire cgroup_rstat_lock without disabling interrupts. > > > > > > > > From mem_cgroup_threshold() code path, cgroup_rstat_lock will be ta= ken > > > > with irq disabled while other code paths will take cgroup_rstat_loc= k > > > > with irq enabled. This is a potential deadlock hazard unless > > > > cgroup_rstat_lock is always taken with irq disabled. > > > > > > Oh you are making sure it is not taken in the irq context through > > > should_skip_flush(). Hmm seems like a hack. Normally it is recommende= d > > > to actually remove all such users instead of silently > > > ignoring/bypassing the functionality. > > > > It is a workaround, we simply accept to read stale stats in irq > > context instead of the expensive flush operation. > > > > > > > > So, how about removing mem_cgroup_flush_stats() from > > > mem_cgroup_usage(). It will break the known chain which is taking > > > cgroup_rstat_lock with irq disabled and you can add > > > WARN_ON_ONCE(!in_task()). > > > > This changes the behavior in a more obvious way because: > > 1. The memcg_check_events()->mem_cgroup_threshold()->mem_cgroup_usage() > > path is also exercised in a lot of paths outside irq context, this > > will change the behavior for any event thresholds on the root memcg. > > With proposed skipped flushing in irq context we only change the > > behavior in a small subset of cases. > > > > I think we can skip flushing in irq context for now, and separately > > deprecate threshold events for the root memcg. When that is done we > > can come back and remove should_skip_flush() and add a VM_BUG_ON or > > WARN_ON_ONCE instead. WDYT? > > > > 2. mem_cgroup_usage() is also used when reading usage from userspace. > > This should be an easy workaround though. > > This is a cgroup v1 behavior and to me it is totally reasonable to get > the 2 second stale root's usage. Even if you want to skip flushing in > irq, do that in the memcg code and keep VM_BUG_ON/WARN_ON_ONCE in the > rstat core code. This way we will know if other subsystems are doing > the same or not. We can do that. Basically in mem_cgroup_usage() have: /* Some useful comment */ if (in_task()) mem_cgroup_flush_stats(); and in cgroup_rstat_flush() have: WARN_ON_ONCE(!in_task()); I am assuming VM_BUG_ON is not used outside mm code. The only thing that worries me is that if there is another unlikely path somewhere that flushes stats in irq context we may run into a deadlock. I am a little bit nervous about not skipping flushing if !in_task() in cgroup_rstat_flush().