From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C3B96C433EF for ; Fri, 14 Jan 2022 14:40:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 444C86B0072; Fri, 14 Jan 2022 09:40:15 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 3F42B6B0073; Fri, 14 Jan 2022 09:40:15 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2E3316B0074; Fri, 14 Jan 2022 09:40:15 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay039.a.hostedemail.com [64.99.140.39]) by kanga.kvack.org (Postfix) with ESMTP id 215CF6B0072 for ; Fri, 14 Jan 2022 09:40:15 -0500 (EST) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id C9C2421B3B for ; Fri, 14 Jan 2022 14:40:14 +0000 (UTC) X-FDA: 79029152748.02.2BDC526 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf07.hostedemail.com (Postfix) with ESMTP id 32F5F40005 for ; Fri, 14 Jan 2022 14:40:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1642171213; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=RKV19dENBhFl1olZIBfaFtSkXrSPs/HdlOLwrwpt/XY=; b=JI9pf9Z/tDn46TkJP/+RJ0h2O4YMZd2Arpzc6BQmKRS/zilusCTeQrYUF7R1S6Oux/w2wN czOoOH2qoElfhSL2p8jHWBoOnA3I5QuiNY+4Vp7qw4a9lHwBwXEMwGq6jNaYFOukIc0weQ rBXivjvO/L9ksiB/1AWXo2+RN5RgnBI= Received: from mail-yb1-f198.google.com (mail-yb1-f198.google.com [209.85.219.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-442-qUPhck0wP5uhSaYrcGUqoQ-1; Fri, 14 Jan 2022 09:40:12 -0500 X-MC-Unique: qUPhck0wP5uhSaYrcGUqoQ-1 Received: by mail-yb1-f198.google.com with SMTP id g6-20020a25db06000000b00611ca09ecd0so8228915ybf.6 for ; Fri, 14 Jan 2022 06:40:12 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=RKV19dENBhFl1olZIBfaFtSkXrSPs/HdlOLwrwpt/XY=; b=4/rPwZZpfTkJqKdk7GIaWWp7H/LnHp7hTDWuOX6Q6GyDrFeZ8vQ8ceTjK4aTHhx7jC rTXrAUVuBBR5MLdqCpuZ2WuQ+i4iUh04/iDDoVHhyO00t9u+nMuN92JiI8mvhs1jWcej BMmrNZ20fRulWscd7Anqpt/rUVemfxUnga3XRXW6WFqQBiuTZNY6nLAhIylDvZugb2gz Oa9I3S3NkwSzPIz7Bb4fpEM98/HtBcXBEnNQRTd/dC/7RcFM4vB1amzi8I+AsGh4JVJg P1Q4OO7S0aemmRzP8QB1E0bjXWz+43AnsQKkrnCheOxcrgM12wU99iDireQ2Iq0WCTE6 s0tw== X-Gm-Message-State: AOAM5300+5I5oIpQJDBALcxB2cbENRXIsPiiXw/IRT+HB825GqTqTPPf xCZEcWCWHOyEI1eu0uEwVw/TGc4vyRqu7kXAUi/CwZqGVc52uT+gzpJWHZnTPmzYb6hk9lo6L3y 6kZy2JL/mXV2PVEtKcBeYlJ/Ql0Q= X-Received: by 2002:a05:6902:1029:: with SMTP id x9mr13850663ybt.51.1642171211992; Fri, 14 Jan 2022 06:40:11 -0800 (PST) X-Google-Smtp-Source: ABdhPJzHapFsOwhgeQK8c6DQZkHoQw83S550a/inbIsrjpPP/8BgLW1kn2oHlZGR6sri9U5Gkb8PKUuLM8BKKzLen5M= X-Received: by 2002:a05:6902:1029:: with SMTP id x9mr13850634ybt.51.1642171211744; Fri, 14 Jan 2022 06:40:11 -0800 (PST) MIME-Version: 1.0 References: <20211207214902.772614-1-jsavitz@redhat.com> <20211207154759.3f3fe272349c77e0c4aca36f@linux-foundation.org> In-Reply-To: From: Joel Savitz Date: Fri, 14 Jan 2022 09:39:55 -0500 Message-ID: Subject: Re: [PATCH] mm/oom_kill: wake futex waiters before annihilating victim shared mutex To: Michal Hocko Cc: Andrew Morton , linux-kernel , Waiman Long , linux-mm@kvack.org, Nico Pache , Peter Zijlstra , Thomas Gleixner , Ingo Molnar , Darren Hart , Davidlohr Bueso , =?UTF-8?Q?Andr=C3=A9_Almeida?= X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 32F5F40005 X-Stat-Signature: a4srurk4ar4ggcqo5x3zqd7pa8ufszqo Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="JI9pf9Z/"; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf07.hostedemail.com: domain of jsavitz@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=jsavitz@redhat.com X-Rspamd-Server: rspam06 X-HE-Tag: 1642171214-363958 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: > What has happened to the oom victim and why it has never exited? What appears to happen is that the oom victim is sent SIGKILL by the process that triggers the oom while also being marked as an oom victim. As you mention in your patchset introducing the oom reaper in commit aac4536355496 ("mm, oom: introduce oom reaper"), the purpose the the oom reaper is to try and free more memory more quickly than it otherwise would have been by assuming anonymous or swapped out pages won't be needed in the exit path as the owner is already dying. However, this assumption is violated by the futex_cleanup() path, which needs access to userspace in fetch_robust_entry() when it is called in exit_robust_list(). Trace_printk()s in this failure path reveal an apparent race between the oom reaper thread reaping the victim's mm and the futex_cleanup() path. There may be other ways that this race manifests but we have been most consistently able to trace that one. Since in the case of an oom victim using robust futexes the core assumption of the oom reaper is violated, we propose to solve this problem by either canceling or delaying the waking of the oom reaper thread by wake_oom_reaper in the case that tsk->robust_list is non-NULL. e.g. the bug does not reproduce with this patch (from npache@redhat.com): diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 989f35a2bbb1..b8c518fdcf4d 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -665,6 +665,19 @@ static void wake_oom_reaper(struct task_struct *tsk) if (test_and_set_bit(MMF_OOM_REAP_QUEUED, &tsk->signal->oom_mm->flags)) return; +#ifdef CONFIG_FUTEX + /* + * don't wake the oom_reaper thread if we still have a robust list to handle + * This will then rely on the sigkill to handle the cleanup of memory + */ + if(tsk->robust_list) + return; +#ifdef CONFIG_COMPAT + if(tsk->compat_robust_list) + return; +#endif +#endif + get_task_struct(tsk); spin_lock(&oom_reaper_lock); Best, Joel Savitz