From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7B93AC48286 for ; Thu, 1 Feb 2024 21:02:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1392A6B007B; Thu, 1 Feb 2024 16:02:21 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0E9FC6B007D; Thu, 1 Feb 2024 16:02:21 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EF3076B007E; Thu, 1 Feb 2024 16:02:20 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id E00B66B007B for ; Thu, 1 Feb 2024 16:02:20 -0500 (EST) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 5988A1A0936 for ; Thu, 1 Feb 2024 21:02:20 +0000 (UTC) X-FDA: 81744458040.21.33968EC Received: from mail-ot1-f45.google.com (mail-ot1-f45.google.com [209.85.210.45]) by imf02.hostedemail.com (Postfix) with ESMTP id 9197380020 for ; Thu, 1 Feb 2024 21:02:17 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=Obxz428s; spf=pass (imf02.hostedemail.com: domain of tjmercier@google.com designates 209.85.210.45 as permitted sender) smtp.mailfrom=tjmercier@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1706821337; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=jOZ2TseNRwk51MKJZKHvasS/mSGbsWbhCUKsCqe4rtM=; b=PRGp99vIfiOOj527Zuep5BSmhbEQv/siMy6Yv0Y2gZrWRHkLzRWMqQavnMbLP3wiLgPT8b /6J2uHbqJO3HHk4NU9867KVZztnQliQP9pgCHOwvpOxj2g6RJQpTaRxSboAr5mde/CE9VL WbAS9d3GNpWgDr/lktAuuTDmQZp2JKE= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1706821337; a=rsa-sha256; cv=none; b=th2QIpCXbhc4R6FPdAShuOSLbh1c2HOWzuwSZddoy5fgV72jTIxHBBFQ/U7kxO76rzhhlW huPFM/FRZnU+OEm3dUEtzI9GrCRN3+kzL9pbxLZVyIfJccgp9KNGQcus1IIgHYO7tYFYEF IBf2DdPDLzgwRu0Ce+FFdD9grXZWtcs= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=Obxz428s; spf=pass (imf02.hostedemail.com: domain of tjmercier@google.com designates 209.85.210.45 as permitted sender) smtp.mailfrom=tjmercier@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-ot1-f45.google.com with SMTP id 46e09a7af769-6e118da997cso729891a34.3 for ; Thu, 01 Feb 2024 13:02:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1706821336; x=1707426136; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=jOZ2TseNRwk51MKJZKHvasS/mSGbsWbhCUKsCqe4rtM=; b=Obxz428s0uMr/QQdvmIvkEtnbjI0SVnwZemUGXbjsW3pKvbPxJBGMl6pBoxO04rziq F0rlph4eqIW83t50AHaXbFA0yF2vyMFN1ubBpz/W40zLMhVgvrU6UwMLxm32mlB8j5dl mB2mtdpmdmIbBfFqoM0kIUR+l0DS+V2h0a6NYMpZ4gNXGQPdZE5msUvm1TxvDtyvq7Z9 nnVegRnAQCBKU8uu57rahMhez9oEzr9DLg6EvnnNm7/EUCLXNVK5Rt6AR3XMbBU//C5R OJivDwQ8ctIvBejO3+jAXgaDvBxQhjsvRNT7cpH3sY0q4yqWo7oismxHYroH+pgF7QeV SHXw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1706821336; x=1707426136; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=jOZ2TseNRwk51MKJZKHvasS/mSGbsWbhCUKsCqe4rtM=; b=pJDd65yyy0FiTAxUrYP1uidb9CUR6xiQGqFT7sDx5fE+ODb4OGfZ80rNgOZQX/4YzO wlYkzZ6X6mKIxw3RobuBwWqiFwkxXmZ+Z9s/XuxwtHcG770XXvMPdE/uInpS2e9pYKBO PG2GdHL1BNy5PFuaIQ5YOUSULMQ4VLp9GmXhIBUNkAXFpsai3SFZtAC1UzpC1feZnqss BNTaTmV1LPTqrnCNeEzyo4yp+plAhooNwwQJnTF8FsiEJ+76flXP9dxpAVnwWZp/waxq mqHV6OuqIni8uWORQYcBDOgrJ5ylV2C+LRtiVv+dBV7DLvMFV1PfJP9F6lSzXxh477kx mtfw== X-Gm-Message-State: AOJu0YwlDVvt0itjMv4spgk+bT9VQUt5CHKUWQQkQXa/w8LaV3RYgnRt uRMBuIAlFdUK0g1CoA9YctZDgly+Rk9YEgpFwRSYvmW/dBchi71Lsu/37TNdSftEqYQkGB+DFG+ Pow6LfdhD0oVM7gvS4KJ7V9UV6mXweu5HeEIO X-Google-Smtp-Source: AGHT+IHCUKaSHtSi9u6hTQ7MDouhvCivlBO94D4qN5GxJ/UQSh2zmLA9mHcjhvIF66SjdJ8igcZQLuiLZksx2q8EZm0= X-Received: by 2002:a05:6830:100d:b0:6e1:34d7:27b1 with SMTP id a13-20020a056830100d00b006e134d727b1mr6035791otp.8.1706821336502; Thu, 01 Feb 2024 13:02:16 -0800 (PST) MIME-Version: 1.0 References: <20240126211927.1171338-1-tjmercier@google.com> In-Reply-To: From: "T.J. Mercier" Date: Thu, 1 Feb 2024 13:02:04 -0800 Message-ID: Subject: Re: [PATCH v2] mm: memcg: Don't periodically flush stats when memcg is disabled To: =?UTF-8?Q?Michal_Koutn=C3=BD?= Cc: Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton , android-mm@google.com, Minchan Kim , cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 9197380020 X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: ynqzmkrxmhxbmitz9aowo8zz76ifrgzg X-HE-Tag: 1706821337-551107 X-HE-Meta: U2FsdGVkX1/ThVmOZ+p0KfHvm2MbpsnyitwK2OxiKyjGfyO7d9xJwPIbuLSwTKONJhH/Dlt7puIWTOgJVqRXmGt0t8ga5jmR3XwQvmT24FojWGhvcFGhsUwg6+8H8fjTrKl6bISm1yUiDEQHKRcDqmNB8ScUYDdtOhA1c/pCAaxyzM7MlccfrsHtI71s+wiNF/yjpI1/GC5tYTNs/bPnliNR0nAlpgBmiPsUnjUuPgpJl03FwqDTGgQT0jruUNoA5viOV32diE5VlxFj5hI/CKZyVWroQ62YHU7PNgnNUjfkSXuuQcutr7oYGvf6su+QjEMz0i1n2FVkufVo8//UjzpB77o7ACJEfwNsYKuBfUN1g/ARWg3luz+hEMvLevXx5hFgwfX36n55hYVtHeEy9q7Y3wdhqP6nbFzGBQ7TH0js2FHi6/fA53+qHki6Olpe5dL4FZI3Ze+hhdR3x4jqmT0KMAxpXY7FJrDoNqdaVZgOsKhgivfULm79H1yMoBe0zlRz5miOj8DAnZwF+47Iv9DzjhP38w7qpfgdFK52pHtGP0cQOuOCiTYTQzEqxK7fpqdZEVDUuNhCX0pDB6WCWJb7LpNh/WFkSunikNU/63EvEoSWSa4upSev0yhmBUg0IuwCUlDFgUZSYsWUIe1SMBSlOue4w4em+1Fs/TpRX299z66dYFl+AmkcUa+Xr8wSNTcMqF4/d7wpkDhqR9Lw8s3807iaKQ0WA+Gr0sdbP61j5HfeSjcP5hZJaTU6JLb1uENCZ8ZFm7RAX1fb5o7cTYGVmqB4LIiN+aQ/R57MItfKf7cnBU4+Y4HnTE6P795vPQSrevQYDR2H5G2pu5ByUY5DjYy2caewCN9L3VIiCte4l0ilGeeJHD6z55/S+dHujTpcgOmnpcLFitvN5T/SMOBz/3c/WL34q9QgqEupDWqrHxaXVQ6Zr40eiHV+xk0w7YUhWf2yqgeVvhOd1zg y2A== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Feb 1, 2024 at 6:26=E2=80=AFAM Michal Koutn=C3=BD wrote: > > On Fri, Jan 26, 2024 at 09:19:25PM +0000, "T.J. Mercier" wrote: > > The root memcg is onlined even when memcg is disabled. When it's online= d > > a 2 second periodic stat flush is started, but no stat flushing is > > required when memcg is disabled because there can be no child memcgs. > > Most calls to flush memcg stats are avoided when memcg is disabled as a > > result of the mem_cgroup_disabled check added in 7d7ef0a4686a > > ("mm: memcg: restore subtree stats flushing"), but the periodic flushin= g > > started in mem_cgroup_css_online is not. Skip it. > > Have you tried > --- a/kernel/cgroup/cgroup.c > +++ b/kernel/cgroup/cgroup.c > @@ -6099,6 +6099,9 @@ int __init cgroup_init(void) > cgroup_unlock(); > > for_each_subsys(ss, ssid) { > + if (!cgroup_ssid_enabled(ssid)) > + continue; > + > if (ss->early_init) { > struct cgroup_subsys_state *css =3D > init_css_set.subsys[ss->id]; > @@ -6118,9 +6121,6 @@ int __init cgroup_init(void) > * disabled flag and cftype registration needs kmalloc, > * both of which aren't available during early_init. > */ > - if (!cgroup_ssid_enabled(ssid)) > - continue; > - > if (cgroup1_ssid_disabled(ssid)) > pr_info("Disabling %s control group subsystem in = v1 mounts\n", > ss->legacy_name); > ? > I'm asking about a try because I'm not sure whether this does not blow > up due to something missing. But I think disabled controllers would not > need to be (root-)onlined at all. > > Thanks, > Michal Hi Michal, It does blow up, but not how I was expecting. There's a null pointer dereference inside find_css_set when trying to get a css pointer for the memory controller, I think because the allocation in cgroup_init_subsys is skipped: [ 9.591766] BUG: kernel NULL pointer dereference, address: 0000000000000= 000 [ 9.591909] #PF: supervisor read access in kernel mode [ 9.592023] #PF: error_code(0x0000) - not-present page [ 9.592138] PGD 0 P4D 0 [ 9.592199] Oops: 0000 [#1] PREEMPT SMP PTI [ 9.592289] CPU: 1 PID: 1 Comm: init Tainted: G E 6.7.0-mainline-maybe-dirty #1 [ 9.592490] Hardware name: emulation qemu-x86/qemu-x86, BIOS 2023.07-rc6-gb8315a3184b2-ab11347395 07/01/2023 [ 9.592731] RIP: 0010:find_css_set+0x5c3/0x7a0 [ 9.592850] Code: 20 69 cf b2 4e 8b 3c 33 48 c7 c7 1d 3b 41 b2 4c 89 fe e8 10 b0 f7 00 49 8b b4 24 a0 00 00 00 4e 8d 24 2b 49 81 c4 b0 fc ff ff <49> 8b 0f 4c 01 e9 48 c7 c7 c4 c0 46 b2 4c 89 e2 e8 e8 af f7 00 49 [ 9.593251] RSP: 0018:ffffb6218000bb90 EFLAGS: 00010087 [ 9.593370] RAX: 0000000000000021 RBX: ffff99181044a200 RCX: 4f5e789f089= a0c00 [ 9.593554] RDX: ffffb6218000ba50 RSI: ffffffffb2448284 RDI: ffffffffb2c= 91950 [ 9.593735] RBP: ffffb6218000bc28 R08: 0000000000000fff R09: ffffffffb2c= 79950 [ 9.593909] R10: 0000000000002ffd R11: 0000000000000004 R12: ffff9918104= 4a2d8 [ 9.594102] R13: 0000000000000428 R14: 0000000000000020 R15: 00000000000= 00000 [ 9.594291] FS: 00007f3d2f986fc8(0000) GS:ffff99182bd00000(0000) knlGS:0000000000000000 [ 9.594467] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 9.594610] CR2: 0000000000000000 CR3: 00000001001d8001 CR4: 00000000003= 70eb0 [ 9.594818] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 00000000000= 00000 [ 9.595007] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 00000000000= 00400 [ 9.595178] Call Trace: [ 9.595237] [ 9.595297] ? __die_body+0x5e/0xb0 [ 9.595392] ? __die+0x9e/0xb0 [ 9.595481] ? page_fault_oops+0x35f/0x3d0 [ 9.595574] ? do_user_addr_fault+0x6ab/0x780 [ 9.595690] ? prb_read_valid+0x28/0x60 [ 9.595782] ? exc_page_fault+0x83/0x220 [ 9.595874] ? asm_exc_page_fault+0x2b/0x30 [ 9.595966] ? find_css_set+0x5c3/0x7a0 [ 9.596059] cgroup_migrate_prepare_dst+0x75/0x2b0 [ 9.596194] cgroup_attach_task+0x293/0x450 [ 9.596305] ? cgroup_attach_task+0xb6/0x450 [ 9.596449] __cgroup_procs_write+0xef/0x1a0 [ 9.596589] cgroup_procs_write+0x16/0x30 [ 9.596733] cgroup_file_write+0x9d/0x260 [ 9.596840] kernfs_fop_write_iter+0x145/0x1e0 [ 9.596981] vfs_write+0x276/0x2e0 [ 9.597092] ksys_write+0x73/0xe0 [ 9.597198] __x64_sys_write+0x1a/0x30 [ 9.597303] do_syscall_64+0x5a/0x100 [ 9.597430] entry_SYSCALL_64_after_hwframe+0x6e/0x76 But there are also calls to mem_cgroup_css_from_folio that I think would cause a different null pointer deref even if we had the css but no root_mem_cgroup. There seems to be an assumption that we'll have a memcg to charge against even when the controller is disabled. To me it looks like that's to simplify the possible combinations of CONFIG_MEMCG and memcg being boot-time disabled or not. Best, T.J.