From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3CADDC433F5 for ; Sat, 1 Oct 2022 12:38:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 24C898D0002; Sat, 1 Oct 2022 08:38:48 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1D59C8D0001; Sat, 1 Oct 2022 08:38:48 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 078158D0002; Sat, 1 Oct 2022 08:38:48 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id E5BB98D0001 for ; Sat, 1 Oct 2022 08:38:47 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id B2D16804A1 for ; Sat, 1 Oct 2022 12:38:47 +0000 (UTC) X-FDA: 79972334694.23.D2814C2 Received: from mail-lf1-f48.google.com (mail-lf1-f48.google.com [209.85.167.48]) by imf27.hostedemail.com (Postfix) with ESMTP id 237BB40018 for ; Sat, 1 Oct 2022 12:38:46 +0000 (UTC) Received: by mail-lf1-f48.google.com with SMTP id q14so3870475lfo.11 for ; Sat, 01 Oct 2022 05:38:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to:subject :user-agent:mime-version:date:message-id:from:to:cc:subject:date; bh=B3MMnLV5KzQSBoCIoVn465sXrjYLB6L1MBWhvVFk+Wo=; b=iTppsY5fg3g/uw+5/L4JVR/cOGxBqqmX6QFi/dlcjF4Gj/uCqY6Fw87YT2wQ5L4yC4 4wswXeBu3WYaaXdu7BUHd3izGnicy6GvHqIo3ZJZDJdxko7kG2cnLcWs2wcQSsDP+6xG k+/ucgrdMwxpa3sl0Ut3iS/Wasft+r6tW32ROxsSGZz6JwoE6XPQ9zkpFSi7T3Fk0DFS mBkR5u6UjERYtCo1Coin33UzSslT0C0m1bfxNEZT+1TVa//OA07wMZuHB7o5IS3B6vNV pN3PnGGpuObZhC3W1seXHh3SYaQmLrSUKb3izDeF5ga9W7F+k/fYQ/qHINs94Jv/zdXg BBhg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to:subject :user-agent:mime-version:date:message-id:x-gm-message-state:from:to :cc:subject:date; bh=B3MMnLV5KzQSBoCIoVn465sXrjYLB6L1MBWhvVFk+Wo=; b=7RtdWSRNBgknaHI4pzbNIALtx+4ZYvZsVXJxX38bhaUMk8WnPRXhysSoSSbx+XwNTw 1caHRlr8OjEDct/IqPWd6/cSwUMIZTRd3gggk0AGXokoB6A7LEKb2LroR4vCRoBFoVAF BncQ6GqIKulwvtgTPMGBg7P1BKZSeZceUHwgJjwWc8bh42ZzI0d3sce5GR74qG0kcBmy uMW65OVnmPQhLhYT4oLBsMXzCP9d966RUBOt0PKQk3cIvfNAnSuIEk0errdwkTDcc7O9 FPqZPcg78BJHkxxfsTPJVvGWPCtcqgYPPubVXDxXnIJqoxp2NlfqUqofhevWT9RebSYh Qezw== X-Gm-Message-State: ACrzQf0NGpMM05YxuKtPqhrs5Q30uJkObxwCCXZVnMGcsCIC06OA8u3y 6xINYctFQNOW3MPnxA93tsQ= X-Google-Smtp-Source: AMsMyM5SlQ6HMYbCBD3iKsdFCda7LA49aSJrDxTgEh66h7PLt9fF0FC6c+cgy39dtbAY0tBtf6r1kA== X-Received: by 2002:a05:6512:b01:b0:4a1:baae:327d with SMTP id w1-20020a0565120b0100b004a1baae327dmr4571907lfu.668.1664627925261; Sat, 01 Oct 2022 05:38:45 -0700 (PDT) Received: from ?IPV6:2a02:2168:a11:244b::1? ([2a02:2168:a11:244b::1]) by smtp.gmail.com with ESMTPSA id bi9-20020a0565120e8900b00497aa190523sm750952lfb.248.2022.10.01.05.38.44 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sat, 01 Oct 2022 05:38:44 -0700 (PDT) Message-ID: Date: Sat, 1 Oct 2022 15:38:43 +0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.3.1 Subject: Re: Possible race in obj_stock_flush_required() vs drain_obj_stock() To: Roman Gushchin Cc: Johannes Weiner , Michal Hocko , Shakeel Butt , Vladimir Davydov , Muchun Song , Sebastian Andrzej Siewior , cgroups@vger.kernel.org, linux-mm@kvack.org References: <1664546131660.1777662787.1655319815@gmail.com> From: Alexander Fedorov In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1664627927; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=B3MMnLV5KzQSBoCIoVn465sXrjYLB6L1MBWhvVFk+Wo=; b=RPGIjTLvYqVeU+OaB3AyQo1UL4arfVOGICMQvUoJYcWTsxXq/uyVzLeXnaLQ60tGiNXVTL aV3BZjljgmZbhI2grAK4e/TBCJIhkdcPgKiZuTQmTSmX5CPcJHBMa9Mm9dZqQhL2CRZBMC DooqQ3qywl0jy0JeGpu7TZM0pC86Y3A= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=iTppsY5f; spf=pass (imf27.hostedemail.com: domain of halcien@gmail.com designates 209.85.167.48 as permitted sender) smtp.mailfrom=halcien@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1664627927; a=rsa-sha256; cv=none; b=7Ms/e9P/2lhRm3Yz2XpDRUsKYPZBACWH8gQLq8+7ytehp2JkhMurOn6NohF1pY2bDAiIuB FnyjLmvJexT3Zq6NfNqprdlcNSwCG176kDFu+ZfQzGSFqqgm5Ncb675nuFsS6F4rGkYkOo xkDzmCbc9LNQrQ00SyUWWMv8yKHjPq4= X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 237BB40018 Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=iTppsY5f; spf=pass (imf27.hostedemail.com: domain of halcien@gmail.com designates 209.85.167.48 as permitted sender) smtp.mailfrom=halcien@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Stat-Signature: a7o4gs1u551mqucocfrjngma9kijs8zk X-HE-Tag: 1664627926-253091 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 30.09.2022 21:26, Roman Gushchin wrote: > On Fri, Sep 30, 2022 at 02:06:48PM +0000, Alexander Fedorov wrote: >> 1) First CPU: >> css_killed_work_fn() -> mem_cgroup_css_offline() -> >> drain_all_stock() -> obj_stock_flush_required() >> if (stock->cached_objcg) { >> >> This check sees a non-NULL pointer for *another* CPU's `memcg_stock` >> instance. >> >> 2) Second CPU: >> css_free_rwork_fn() -> __mem_cgroup_free() -> free_percpu() -> >> obj_cgroup_uncharge() -> drain_obj_stock() >> It frees `cached_objcg` pointer in its own `memcg_stock` instance: >> struct obj_cgroup *old = stock->cached_objcg; >> < ... > >> obj_cgroup_put(old); >> stock->cached_objcg = NULL; >> >> 3) First CPU continues after the 'if' check and re-reads the pointer >> again, now it is NULL and dereferencing it leads to kernel panic: >> static bool obj_stock_flush_required(struct memcg_stock_pcp *stock, >> struct mem_cgroup *root_memcg) >> { >> < ... > >> if (stock->cached_objcg) { >> memcg = obj_cgroup_memcg(stock->cached_objcg); > > Great catch! > > I'm not sure about switching to rcu primitives though. In all other cases > stock->cached_objcg is accessed only from a local cpu, so using rcu_* > function is an overkill. > > How's something about this? (completely untested) Tested READ_ONCE() patch and it works. But are rcu primitives an overkill? For me they are documenting how actually complex is synchronization here. For example, only after converting to rcu I noticed this (5.10.131-rt72): static void drain_obj_stock(struct memcg_stock_pcp *stock) { struct obj_cgroup *old = stock->cached_objcg; < ... > obj_cgroup_put(old); stock->cached_objcg = NULL; } On kernel with enabled preemption this function can be preempted between obj_cgroup_put() -> kfree_rcu() call and `cached_objcg` assignment, and since scheduling marks the end of grace period then another task running drain_all_stock() could access freed memory. Since `cached_objcg` was not marked as synchronized variable this problem could not be seen just by reading drain_obj_stock(), but after adding rcu markings it is easier to notice (and fix with rcu_replace_pointer()). Checked in mainline, this use-after-free was fixed when fixing another problem: 5675114623872300aa9fcd72aef2b8b7f421fe12 "mm/memcg: protect memcg_stock with a local_lock_t" > Also, please add > Fixes: bf4f059954dc ("mm: memcg/slab: obj_cgroup API") Done, and reposted original patch because my email client malformed it (+ comment about use-after-free)... mm/memcg: fix race in obj_stock_flush_required() vs drain_obj_stock() When obj_stock_flush_required() is called from drain_all_stock() it reads the `memcg_stock->cached_objcg` field twice for another CPU's per-cpu variable, leading to TOCTTOU race: another CPU can simultaneously enter drain_obj_stock() and clear its own instance of `memcg_stock->cached_objcg`. Another problem is in drain_obj_stock() which sets `cached_objcg` to NULL after freeing which might lead to use-after-free. To fix it mark `cached_objcg` as RCU protected field and use rcu_*() accessors everywhere. Fixes: bf4f059954dc ("mm: memcg/slab: obj_cgroup API") Signed-off-by: Alexander Fedorov diff --git a/mm/memcontrol.c b/mm/memcontrol.c index c1152f8747..2ab205f529 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2215,7 +2215,7 @@ struct memcg_stock_pcp { unsigned int nr_pages; #ifdef CONFIG_MEMCG_KMEM - struct obj_cgroup *cached_objcg; + struct obj_cgroup __rcu *cached_objcg; unsigned int nr_bytes; #endif @@ -3148,7 +3148,7 @@ static bool consume_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes) local_lock_irqsave(&memcg_stock.lock, flags); stock = this_cpu_ptr(&memcg_stock); - if (objcg == stock->cached_objcg && stock->nr_bytes >= nr_bytes) { + if (objcg == rcu_access_pointer(stock->cached_objcg) && stock->nr_bytes >= nr_bytes) { stock->nr_bytes -= nr_bytes; ret = true; } @@ -3160,7 +3160,8 @@ static bool consume_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes) static void drain_obj_stock(struct memcg_stock_pcp *stock) { - struct obj_cgroup *old = stock->cached_objcg; + struct obj_cgroup *old = rcu_replace_pointer(stock->cached_objcg, NULL, + lockdep_is_held(&stock->lock)); if (!old) return; @@ -3198,16 +3199,16 @@ static void drain_obj_stock(struct memcg_stock_pcp *stock) } obj_cgroup_put(old); - stock->cached_objcg = NULL; } static bool obj_stock_flush_required(struct memcg_stock_pcp *stock, struct mem_cgroup *root_memcg) { struct mem_cgroup *memcg; + struct obj_cgroup *cached_objcg = rcu_dereference(stock->cached_objcg); - if (stock->cached_objcg) { - memcg = obj_cgroup_memcg(stock->cached_objcg); + if (cached_objcg) { + memcg = obj_cgroup_memcg(cached_objcg); if (memcg && mem_cgroup_is_descendant(memcg, root_memcg)) return true; } @@ -3223,11 +3224,11 @@ static void refill_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes) local_lock_irqsave(&memcg_stock.lock, flags); stock = this_cpu_ptr(&memcg_stock); - if (stock->cached_objcg != objcg) { /* reset if necessary */ + if (rcu_access_pointer(stock->cached_objcg) != objcg) { /* reset if necessary */ drain_obj_stock(stock); obj_cgroup_get(objcg); - stock->cached_objcg = objcg; stock->nr_bytes = atomic_xchg(&objcg->nr_charged_bytes, 0); + rcu_assign_pointer(stock->cached_objcg, objcg); } stock->nr_bytes += nr_bytes; @@ -7162,6 +7163,7 @@ static int __init mem_cgroup_init(void) stock = per_cpu_ptr(&memcg_stock, cpu); INIT_WORK(&stock->work, drain_local_stock); + RCU_INIT_POINTER(stock->cached_objcg, NULL); local_lock_init(&stock->lock); }