From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 48A0CC433E0 for ; Wed, 3 Mar 2021 03:59:39 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id B5F3F64E7C for ; Wed, 3 Mar 2021 03:59:38 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B5F3F64E7C Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 32B7C8D0121; Tue, 2 Mar 2021 22:59:38 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 301D48D00FA; Tue, 2 Mar 2021 22:59:38 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1A4AB8D0121; Tue, 2 Mar 2021 22:59:38 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0228.hostedemail.com [216.40.44.228]) by kanga.kvack.org (Postfix) with ESMTP id ED1CA8D00FA for ; Tue, 2 Mar 2021 22:59:37 -0500 (EST) Received: from smtpin24.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id AFE8D181AEF07 for ; Wed, 3 Mar 2021 03:59:37 +0000 (UTC) X-FDA: 77877208794.24.AB779BA Received: from mail-lf1-f48.google.com (mail-lf1-f48.google.com [209.85.167.48]) by imf10.hostedemail.com (Postfix) with ESMTP id E11C840001DE for ; Wed, 3 Mar 2021 03:59:35 +0000 (UTC) Received: by mail-lf1-f48.google.com with SMTP id q25so14628428lfc.8 for ; Tue, 02 Mar 2021 19:59:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=bKpoC3J4YgBOA8Lql8xpYica6gBEcGbcCojRmFUiU44=; b=QJY+fdcAWaHs2JlBp+MFbv/MUjuLEpDoEM2RYzK/clfBJqqjFCpeD87HhHTnvb3Nwr usvG5G0Pritxbdrr8e5XahEZe13S85WimA82MjZbWMHjF2QCry/H5shU+MBvTCyDE0wY dYwScetM1lY0U10eFHy+84DJwxcMzGKIauKINBgPp318lIZ/2AunSsQdPv8TKhfAoKs9 Una+VZNVkdxX2iPEAAgLBXq6BTjlrR6XNL7+qulk1tToP484csKnymL/IQTrlzZFP+Ri 36zQ+6B+VGTQbXQk+L9jxqtB7nCcLpl7NghErrmNR/OcQB42WAcECHv/GPCXoYWwtaWE 4P/A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=bKpoC3J4YgBOA8Lql8xpYica6gBEcGbcCojRmFUiU44=; b=JJZVDEf9BJTu9/CDga/dPb4Oge3LUB97/U6+H+Z8q3kHqG1JxsqUZC8pJJ+pmRkXKv 6GlL+btSdpXu0b+XrdSf9S2LSJRXCng171bTFV2U8oV2k190gwgYroBGHjuovZbRoNwJ 3D/mYaLrxwVEKDMAPdsI3tFQNpRSfhsDeKsG97Tew0grjYkaQEhnsalBBYYv5fSZ52ld ZM1uR6W3Unr4NozCp/fkRztTfZ3Jyt9lvvZIgBbnMDw0Td8N534zcwdaKa/S8nxEE64G t786NhRqAz7ENQZf00jyz1zBmtSZOmqPtIxU5UTeg+ab9+cBRp9WvWILkNjJRenrQNhe TRVA== X-Gm-Message-State: AOAM5319mZxt6J4pnQO586z9lje8GY1Cujfud/bBA8ey8BCCU59bJgkj j7vat+I3C66jnDgFLJv0Yi3ZYeR4mGXj5XJBMLctmQ== X-Google-Smtp-Source: ABdhPJx3IcTjNJQcxvnUo6pt+ZuB5f+NIxbcFmdB/KskCL7IZtwIqr8wPqsD8EI70PiGIwabnTAwBGagj9Ui9tVGeuk= X-Received: by 2002:a19:ee19:: with SMTP id g25mr13603347lfb.83.1614743975183; Tue, 02 Mar 2021 19:59:35 -0800 (PST) MIME-Version: 1.0 References: <7b7c4f41-b72e-840f-278a-320b9d97f887@oracle.com> <122e8c5d-60b8-52d9-c6a1-00cd61b2e1b6@oracle.com> <06edda9a-dce9-accd-11a3-97f6d5243ed1@oracle.com> In-Reply-To: <06edda9a-dce9-accd-11a3-97f6d5243ed1@oracle.com> From: Shakeel Butt Date: Tue, 2 Mar 2021 19:59:22 -0800 Message-ID: Subject: Re: possible deadlock in sk_clone_lock To: Mike Kravetz Cc: Michal Hocko , syzbot , Andrew Morton , LKML , Linux MM , syzkaller-bugs , Eric Dumazet , Mina Almasry Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: E11C840001DE X-Stat-Signature: fbz3rmuzyfmn1dancwnujeuapaozyfbm Received-SPF: none (google.com>: No applicable sender policy available) receiver=imf10; identity=mailfrom; envelope-from=""; helo=mail-lf1-f48.google.com; client-ip=209.85.167.48 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1614743975-868625 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Mar 2, 2021 at 1:19 PM Mike Kravetz wrote: > > On 3/2/21 6:29 AM, Michal Hocko wrote: > > On Tue 02-03-21 06:11:51, Shakeel Butt wrote: > >> On Tue, Mar 2, 2021 at 1:44 AM Michal Hocko wrote: > >>> > >>> On Mon 01-03-21 17:16:29, Mike Kravetz wrote: > >>>> On 3/1/21 9:23 AM, Michal Hocko wrote: > >>>>> On Mon 01-03-21 08:39:22, Shakeel Butt wrote: > >>>>>> On Mon, Mar 1, 2021 at 7:57 AM Michal Hocko wrote: > >>>>> [...] > >>>>>>> Then how come this can ever be a problem? in_task() should exclude soft > >>>>>>> irq context unless I am mistaken. > >>>>>>> > >>>>>> > >>>>>> If I take the following example of syzbot's deadlock scenario then > >>>>>> CPU1 is the one freeing the hugetlb pages. It is in the process > >>>>>> context but has disabled softirqs (see __tcp_close()). > >>>>>> > >>>>>> CPU0 CPU1 > >>>>>> ---- ---- > >>>>>> lock(hugetlb_lock); > >>>>>> local_irq_disable(); > >>>>>> lock(slock-AF_INET); > >>>>>> lock(hugetlb_lock); > >>>>>> > >>>>>> lock(slock-AF_INET); > >>>>>> > > [...] > >>> Wouldn't something like this help? It is quite ugly but it would be > >>> simple enough and backportable while we come up with a more rigorous > >>> solution. What do you think? > >>> > >>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c > >>> index 4bdb58ab14cb..c9a8b39f678d 100644 > >>> --- a/mm/hugetlb.c > >>> +++ b/mm/hugetlb.c > >>> @@ -1495,9 +1495,11 @@ static DECLARE_WORK(free_hpage_work, free_hpage_workfn); > >>> void free_huge_page(struct page *page) > >>> { > >>> /* > >>> - * Defer freeing if in non-task context to avoid hugetlb_lock deadlock. > >>> + * Defer freeing if in non-task context or when put_page is called > >>> + * with IRQ disabled (e.g from via TCP slock dependency chain) to > >>> + * avoid hugetlb_lock deadlock. > >>> */ > >>> - if (!in_task()) { > >>> + if (!in_task() || irqs_disabled()) { > >> > >> Does irqs_disabled() also check softirqs? > > > > Nope it doesn't AFAICS. I was referring to the above lockdep splat which > > claims irq disabled to be the trigger. But now that you are mentioning > > that it would be better to replace in_task() along the way. We have > > discussed that in another email thread and I was suggesting to use > > in_atomic() which should catch also bh disabled situation. The big IF is > > that this needs preempt count to be enabled unconditionally. There are > > changes in the RCU tree heading that direction. > > I have not been following developments in preemption and the RCU tree. > The comment for in_atomic() says: > > /* > * Are we running in atomic context? WARNING: this macro cannot > * always detect atomic context; in particular, it cannot know about > * held spinlocks in non-preemptible kernels. Thus it should not be > * used in the general case to determine whether sleeping is possible. > * Do not use in_atomic() in driver code. > */ > > That does seem to be the case. I verified in_atomic can detect softirq > context even in non-preemptible kernels. But, as the comment says it > will not detect a held spinlock in non-preemptible kernels. So, I think > in_atomic would be better than the current check for !in_task. That > would handle this syzbot issue, but we could still have issues if the > hugetlb put_page path is called while someone is holding a spinlock with > all interrupts enabled. Looks like there is no way to detect this > today in non-preemptible kernels. in_atomic does detect spinlocks held > in preemptible kernels. > > I might suggest changing !in_task to in_atomic for now, and then work on > a more robust solution. I'm afraid such a robust solution will > require considerable effort. It would need to handle put_page being > called in any context: hardirq, softirq, spinlock held ... The > put_page/free_huge_page path will need to offload (workqueue or > something else) any processing that can possibly sleep. > > Is it worth making the in_atomic change now, or should we just start > working on the more robust complete solution? IMHO the change to in_atomic is beneficial because it will at least fix this specific issue. No reason to keep the users of TCP TX zerocopy from hugetlb pages broken for a more comprehensive solution.