From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E2266C004D4 for ; Thu, 19 Jan 2023 21:02:00 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 476AB6B0071; Thu, 19 Jan 2023 16:02:00 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 426A06B0073; Thu, 19 Jan 2023 16:02:00 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2C7046B0074; Thu, 19 Jan 2023 16:02:00 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 1AF376B0071 for ; Thu, 19 Jan 2023 16:02:00 -0500 (EST) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id DC560C0DA0 for ; Thu, 19 Jan 2023 21:01:59 +0000 (UTC) X-FDA: 80372770758.25.052911D Received: from mail-wm1-f45.google.com (mail-wm1-f45.google.com [209.85.128.45]) by imf14.hostedemail.com (Postfix) with ESMTP id 046D0100014 for ; Thu, 19 Jan 2023 21:01:56 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=aH17Bd8b; spf=pass (imf14.hostedemail.com: domain of surenb@google.com designates 209.85.128.45 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1674162117; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=r/D5AFaWAhCR0gGYjljSteRmQs5Q9XOUZKCf6TA7Htc=; b=UM4p3KS/xeNVSjtOHB/Z9ZYTYKR1fANpUQ50iCnAcAOC6y7Mt+7iroBf9zgEkIzCJu1wkd 357hA36gbp1Vk3e5iixOOHtwJ8+wJfxr4jwdiBSOHpnhgMgG8NePmy5odl+fD4fB0jL6DG zskXE5u1l4X3ZKckSOUNz5cF3IYN1XA= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=aH17Bd8b; spf=pass (imf14.hostedemail.com: domain of surenb@google.com designates 209.85.128.45 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1674162117; a=rsa-sha256; cv=none; b=yVxKxxw97qs9CSlPxjopFgpEO7Kp9lFxKL8UyU6EQJTJs1KIUEM6qmjc6SGRiOk1Kq893Q 12/BmeR/ecarCMwKd5Tb/MtzTAVxWXfUQnl38/+s9Cf9QP+VX7DFa3ET9yn8M7Fq6scpWx FLi6TujBmK0Ty56QHwYy0VMr8R8x1Ps= Received: by mail-wm1-f45.google.com with SMTP id k16so2519118wms.2 for ; Thu, 19 Jan 2023 13:01:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=r/D5AFaWAhCR0gGYjljSteRmQs5Q9XOUZKCf6TA7Htc=; b=aH17Bd8bL7rFIslBv9Dgred+WzvkeFtC0MYX6E4BUgAyNx7g4pGkE8Oa4r22RN/DE6 EDHKLNTs6t7hdcXXs4pg/9v3Zkpsb8OELYsZyd2+IgmQCQBz1FeunTMQsbRAjv2D9PnF OekoImj0rppOW6G9NUtR+e5GdrV06hzZnffn2sRT4xjRdTY1s63hFlBQ5v9nbPXg6vmo diYQhcvUjoKY6e0HZsRDyFNnHHKPr/CJbjM1u3uNBjvzTaUcOTSmvPa3z/kZYpnC7q0j excR0qFH0HyxJBjpAzQZTPQiCoP/yxo7z5ZiJkqtBceWiMwVibpg362WHhYgCIzJtNH0 xp+w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=r/D5AFaWAhCR0gGYjljSteRmQs5Q9XOUZKCf6TA7Htc=; b=zdimvGS8mX7YtB1nyqsAsWv9O7Y8WbJygsTEP40u4Aa0+YvX0CkU9/lX5jGBQxymU1 QuAgy2Qy3HtTxfsx9ztDtQAe+GJ0w4f/y26H76dm369xcbPNOkzJigeXlBEluYMOFUlT BPb5KxL2ufdEkKsAwJ7n72qz7O9DzkkbsSeDx/4GiecQZuOd8cESdUMooxhy6/1exYfV yFbFRmCQMZX4ALPgTUKeJcaaQSEpZAHV7dTQMHVbnsfLYWb/dF79aClf+CKUhggls8Sc ZtxJp7WxGGez9W7hBpDiklDeP9vTd9XyYMnag4bsPBA0n+AWX211Q+kB7KZpNiA0WIPA QjiA== X-Gm-Message-State: AFqh2krQZNBVu6IcjKLpK0tR3sE5K4hYl0lU6r2iASzncXJixa1mO3UM ogivrCNKXTnaDCQnEJR+GFDtHJ7uBoyeJ/qB7IzuGQ== X-Google-Smtp-Source: AMrXdXuu+5oCXCZTe3GScRYJj9HvrrlyBhSpeEBW9O7MNtGBeWkn0umByLsXQZHm2KSZi2VagZgEmsnC9im4jvqRhWk= X-Received: by 2002:a05:600c:3412:b0:3d0:a619:c445 with SMTP id y18-20020a05600c341200b003d0a619c445mr518591wmp.17.1674162115239; Thu, 19 Jan 2023 13:01:55 -0800 (PST) MIME-Version: 1.0 References: <20230113022555.2467724-1-kamatam@amazon.com> In-Reply-To: From: Suren Baghdasaryan Date: Thu, 19 Jan 2023 13:01:42 -0800 Message-ID: Subject: Re: another use-after-free in ep_remove_wait_queue() To: Munehisa Kamata , Tejun Heo Cc: ebiggers@kernel.org, hannes@cmpxchg.org, hdanton@sina.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, mengcc@amazon.com Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 046D0100014 X-Rspam-User: X-Stat-Signature: dkpyxb8ipicijhmnfubxssbcb89u4eor X-HE-Tag: 1674162116-825866 X-HE-Meta: U2FsdGVkX18sxqXZaP0j9Ean2EFy8ZdT3+ux3fqmflfiCD6V+MQ/m6T6XI17vb3IBG9hy/OyDdw7RnmZ3EfGt6g7oJV+pL0JfReVTzIdRoKm3isZWrPrFBW1wFYHVOlXeJnL4spX3HteZ2/09X8057O9wM9DfAH1zlc9/wZxleXhqtpvvOSBIb3vuZKs/TY6qotjxeYr0wmX0aRSkexruaVKWJwRicpf9BHNeMiGBOzomQUlPmhgn26RNtbpTAeULl8uJVUl4ptO1lxETcciUE3hLniV/NH3LOoXob9+BYJEyCtJeQ+2Ti5yuIWUbMzJ6XbcVjFi9PDORtPrAqTZPLNZ+2lxVC4H7yAyojMW1zb+auRRCVcVShrrS5n6YxexFh6DqOCSez5Kq+4+Q0dCYpYBoFKDLMlaNNK7H6oQ/CO13z6qhuLL0cTkezylK0cRR/V39QyMVv7epyMNq3XZQq2W0HT1szF2XQCcPz5EGzNoxmJzcJRM97wX4nyJ8ET7qRGr0ZYT3bZxvYKedkYBSvmZyGLXCnTUXLmPd84XRFk6iida7EzV8Y7vAORUHDtiMGjGbhQGkb5G2WF3CEbjDmh8SbrOa9jzG8X4JSAJl+JTJaeNHG1FdwP8EyKmV8XWnHCaX0Yb0+D+5oFmLnG9C1rJ5zcpQTIcGbhGUhmIEirMpqSXPNFM7N1cJinkNLx2yarLzvq2wcqZw196CCtDX8KoVu97/eNZh0OeK8K7z+5zrN0z3Vzv110VWRwWzrZZgUnDVtFuui1lP6tBUVI2SLPBdU1n9I8ydaOvN+tBtRpdnX8vuMkXOp+/Aryv7fB33ZKpOWWwHTCRC5GHkXMjoASnv9DHeCONmP4QLdirPYbQPre/UG+n2IbySYLaqAwFni6E2+OJV0vcT7bX+Rz3R37hejOrgRWywXTdD5EiI5gvWDiNT+tFi0oq/z8pnQCtl1+0L1bJLSKSISgcPve Grvt2fxV VTQXe6C8Wuo9rF0/fvEslWOVdb4MPhBEjufOgcWlwiyOqpGPywulEb+jJbWRHu6f8STcXkJtQx7/Gg3bJAUTtAhkKNp8Ugw7WrbKct5euzkWYeJ45UBrz2V8Uj1PYo8jXWw6B1SXjjJT5JxOz52huCaOhZvs7mu79yV1y3Xspml/KJ/MvZpmQ5PYuvz17dKooR2GKKkCplgaz3/cQt0Aw2l/2M31lBOOFlD+G4Sc/UaNcEn3P9dwMLHjckHNa2dEKmrkfGU2QV47n1+3fk8tMR2k/L2s/rinpEu6DgTDv7rfWL232j2CRH2YmuLnKkr8bJydOmoYFdCbHFP893iafA8PrlFLUyPJ7aNg7/AYkDw3+hk2sOya3l4gAFo2KBSP/JEm2HcJROPnaMu0FFkwiMZnkXFdrnDnu1uf2RVIk5MmCQFGM9GHiQUSSMcpZ4g9jh2yRaYalclH4+Io7mJRUh04g3KtfKQBG6BqdIVmy72Nhmk4v7xaMtGG6/hnyF0CbBGJbmHyLOm1oM8DrQI55TV+PlSm2URQFmOGB X-Bogosity: Ham, tests=bogofilter, spamicity=0.000029, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Jan 18, 2023 at 7:06 PM Suren Baghdasaryan wrote: > > On Fri, Jan 13, 2023 at 9:52 AM Suren Baghdasaryan wrote: > > > > On Thu, Jan 12, 2023 at 6:26 PM Munehisa Kamata wrote: > > > > > > On Thu, 2023-01-12 22:01:24 +0000, Suren Baghdasaryan wrote: > > > > > > > > On Mon, Jan 9, 2023 at 7:06 PM Suren Baghdasaryan wrote: > > > > > > > > > > On Mon, Jan 9, 2023 at 5:33 PM Suren Baghdasaryan wrote: > > > > > > > > > > > > On Sun, Jan 8, 2023 at 3:49 PM Hillf Danton wrote: > > > > > > > > > > > > > > On 8 Jan 2023 14:25:48 -0800 PM Munehisa Kamata wrote: > > > > > > > > > > > > > > > > That patch survived the repro in my original post, however, the waker > > > > > > > > (rmdir) was getting stuck until a file descriptor of the epoll instance or > > > > > > > > the pressure file got closed. So, if the following modified repro runs > > > > > > > > with the patch, the waker never returns (unless the sleeper gets killed) > > > > > > > > while holding cgroup_mutex. This doesn't seem to be what you expected to > > > > > > > > see with the patch, does it? Even wake_up_all() does not appear to empty > > > > > > > > the queue, but wake_up_pollfree() does. > > > > > > > > > > > > > > Thanks for your testing. And the debugging completes. > > > > > > > > > > > > > > Mind sending a patch with wake_up_pollfree() folded? > > > > > > > > > > > > I finally had some time to look into this issue. I don't think > > > > > > delaying destruction in psi_trigger_destroy() because there are still > > > > > > users of the trigger as Hillf suggested is a good way to go. Before > > > > > > [1] correct trigger destruction was handled using a > > > > > > psi_trigger.refcount. For some reason I thought it's not needed > > > > > > anymore when we placed one-trigger-per-file restriction in that patch, > > > > > > so I removed it. Obviously that was a wrong move, so I think the > > > > > > cleanest way would be to bring back the refcounting. That way the last > > > > > > user of the trigger (either psi_trigger_poll() or psi_fop_release()) > > > > > > will free the trigger. > > > > > > I'll check once more to make sure I did not miss anything and if there > > > > > > are no objections, will post a fix. > > > > > > > > > > Uh, I recalled now why refcounting was not helpful here. I'm making > > > > > the same mistake of thinking that poll_wait() blocks until the call to > > > > > wake_up() which is not the case. Let me think if there is anything > > > > > better than wake_up_pollfree() for this case. > > > > > > > > Hi Munehisa, > > > > Sorry for the delay. I was trying to reproduce the issue but even > > > > after adding a delay before ep_remove_wait_queue() it did not happen. > > > > > > Hi Suren, > > > > > > Thank you for your help here. > > > > > > Just in case, do you have KASAN enabled in your config? If not, this may > > > just silently corrupt a certain memory location and not immediately > > > followed by obvious messages or noticeable event like oops. > > > > Yes, KASAN was enabled in my build. > > > > > > > > > One thing about wake_up_pollfree() solution that does not seem right > > > > to me is this comment at > > > > https://elixir.bootlin.com/linux/latest/source/include/linux/wait.h#L253: > > > > > > > > `In the very rare cases where a ->poll() implementation uses a > > > > waitqueue whose lifetime is tied to a task rather than to the 'struct > > > > file' being polled, this function must be called before the waitqueue > > > > is freed...` > > > > > > > > In our case we free the waitqueue from cgroup_pressure_release(), > > > > which is the handler for `release` operation on cgroup psi files. The > > > > other place calling psi_trigger_destroy() is psi_fop_release(), which > > > > is also tied to the lifetime to the psi files. Therefore the lifetime > > > > of the trigger's waitqueue is tied to the lifetime of the files and > > > > IIUC, we should not be required to use wake_up_pollfree(). > > > > Could you please post your .config file? I might be missing some > > > > configuration which prevents the issue from happening on my side. > > > > > > Sure, here is my config. > > > > > > https://gist.github.com/kamatam9/a078bdd9f695e7a0767b061c60e48d50 > > > > > > I confirmed that it's reliably reproducible with v6.2-rc3 as shown below. > > > > > > https://gist.github.com/kamatam9/096a79cf59d8ed8785c4267e917b8675 > > > > Thanks! I'll try to figure out the difference. > > Sorry for the slow progress on this issue. I'm multiplexing between > several tasks ATM but I did not forget about this one. > Even though I still can't get the kasan UAF report, I clearly see the > wrong order when tracing these functions and forcing the delay before > ep_remove_wait_queue(). I don't think that should be happening, so > something in the release process I think needs fixing. Will update > once I figure out the root cause, hopefully sometime this week. Hi Folks, I spent some more time digging into the details and this is what's happening. When we call rmdir to delete the cgroup with the pressure file being epoll'ed, roughly the following call chain happens in the context of the shell process: do_rmdir cgroup_rmdir kernfs_drain_open_files cgroup_file_release cgroup_pressure_release psi_trigger_destroy Later on in the context of our reproducer, the last fput() is called causing wait queue removal: fput ep_eventpoll_release ep_free ep_remove_wait_queue remove_wait_queue By this time psi_trigger_destroy() already destroyed the trigger's waitqueue head and we hit UAF. I think the conceptual problem here (or maybe that's by design?) is that cgroup_file_release() is not really tied to the file's real lifetime (when the last fput() is issued). Otherwise fput() would call eventpoll_release() before f_op->release() and the order would be fine (we would remove the wait queue first in eventpoll_release() and then f_op->release() would cause trigger's destruction). Considering these findings, I think we can use the wake_up_pollfree() without contradicting the comment at https://elixir.bootlin.com/linux/latest/source/include/linux/wait.h#L253 because indeed, cgroup_file_release() and therefore psi_trigger_destroy() are not tied to the file's lifetime. I'm CC'ing Tejun to check if this makes sense to him and cgroup_file_release() is working as expected in this case. Munehisha, if Tejun confirms this is all valid, could you please post a patch replacing wake_up_interruptible() with wake_up_pollfree()? We don't need to worry about wake_up_all() because we have a limitation of one trigger per file descriptor: https://elixir.bootlin.com/linux/latest/source/kernel/sched/psi.c#L1419, so there can be only one waiter. Thanks, Suren. > > > > Suren. > > > > > > > > > > > Regards, > > > Munehisa > > > > > > > > > > Thanks, > > > > Suren. > > > > > > > > > > > > > > > > > > > > > > > > > > [1] https://lore.kernel.org/lkml/20220111232309.1786347-1-surenb@google.com/ > > > > > > > > > > > > Thanks, > > > > > > Suren. > > > > > > > > > > > > > > > > > > > > Hillf > > > > > > > >