From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 10AFDC433EF for ; Fri, 8 Apr 2022 08:16:00 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8E9546B0071; Fri, 8 Apr 2022 04:15:59 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 898DE6B0072; Fri, 8 Apr 2022 04:15:59 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7601D6B0074; Fri, 8 Apr 2022 04:15:59 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0140.hostedemail.com [216.40.44.140]) by kanga.kvack.org (Postfix) with ESMTP id 63F256B0071 for ; Fri, 8 Apr 2022 04:15:59 -0400 (EDT) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id DF765B0075 for ; Fri, 8 Apr 2022 08:15:58 +0000 (UTC) X-FDA: 79333003596.23.E7445C6 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf23.hostedemail.com (Postfix) with ESMTP id CB22C140003 for ; Fri, 8 Apr 2022 08:15:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=9waK1GE6Xo1fRvL2g920wRma1kjG5tt3UIqXpis3+Gg=; b=lUGzIaWdoZwMt2kUyFEN2QSwYH 7Yyi5jmaN7iUzK0vnVvs+sltz+iF5E6vpTGjt9skfyJbOKYxnC9FekrYfsGYAfoDPHFBX4cJgNskk UX4oL+V/9oe+00qlR+L0ython3kkTMO19+FbJZGZjhiqvFll1axcPhH8FRYBNjalBai1+rpYJN99f N09hlKKQ6zpu67MtCqT+g7mBC1jLmc/ApUx0oWLp7ntAS0E9uffmUTc4Hq2d0n4KO63uxr45RbLLG yZw0blKpD1jj8PcWU4Exs0/8bcdjrsKbcrpZ9ZDcTP6dgEIM9ZSJSAslvPpjIZwgBx/wns8KWS53X MuWjj+rA==; Received: from j217100.upc-j.chello.nl ([24.132.217.100] helo=worktop.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1ncjmZ-009dKd-NG; Fri, 08 Apr 2022 08:15:51 +0000 Received: by worktop.programming.kicks-ass.net (Postfix, from userid 1000) id ACBDF9862CF; Fri, 8 Apr 2022 10:15:49 +0200 (CEST) Date: Fri, 8 Apr 2022 10:15:49 +0200 From: Peter Zijlstra To: Nico Pache Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Rafael Aquini , Waiman Long , Baoquan He , Christoph von Recklinghausen , Don Dutile , "Herton R . Krzesinski" , David Rientjes , Michal Hocko , Andrea Arcangeli , Andrew Morton , Davidlohr Bueso , Thomas Gleixner , Ingo Molnar , Joel Savitz , Darren Hart , stable@kernel.org Subject: Re: [PATCH v8] oom_kill.c: futex: Don't OOM reap the VMA containing the robust_list_head Message-ID: <20220408081549.GM2731@worktop.programming.kicks-ass.net> References: <20220408032809.3696798-1-npache@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20220408032809.3696798-1-npache@redhat.com> Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=lUGzIaWd; spf=none (imf23.hostedemail.com: domain of peterz@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=peterz@infradead.org; dmarc=none X-Stat-Signature: xsmzkeiriwzixe6bnan7sbedir5iabfh X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: CB22C140003 X-HE-Tag: 1649405757-73880 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Apr 07, 2022 at 11:28:09PM -0400, Nico Pache wrote: > The pthread struct is allocated on PRIVATE|ANONYMOUS memory [1] which can > be targeted by the oom reaper. This mapping is used to store the futex > robust list head; the kernel does not keep a copy of the robust list and > instead references a userspace address to maintain the robustness during > a process death. A race can occur between exit_mm and the oom reaper that > allows the oom reaper to free the memory of the futex robust list before > the exit path has handled the futex death: > > CPU1 CPU2 > ------------------------------------------------------------------------ > page_fault > do_exit "signal" > wake_oom_reaper > oom_reaper > oom_reap_task_mm (invalidates mm) > exit_mm > exit_mm_release > futex_exit_release > futex_cleanup > exit_robust_list > get_user (EFAULT- can't access memory) > > If the get_user EFAULT's, the kernel will be unable to recover the > waiters on the robust_list, leaving userspace mutexes hung indefinitely. > > Use the robust_list address stored in the kernel to skip the VMA that holds > it, allowing a successful futex_cleanup. > > Theoretically a failure can still occur if there are locks mapped as > PRIVATE|ANON; however, the robust futexes are a best-effort approach. > This patch only strengthens that best-effort. > > The following case can still fail: > robust head (skipped) -> private lock (reaped) -> shared lock (skipped) This is still all sorts of confused.. it's a list head, the entries can be in any random other VMA. You must not remove *any* user memory before doing the robust thing. Not removing the VMA that contains the head is pointless in the extreme. Did you not read the previous discussion?