From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 62792C433DB for ; Mon, 1 Mar 2021 15:10:28 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id F415564DF1 for ; Mon, 1 Mar 2021 15:10:27 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org F415564DF1 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 7EB0C8D0079; Mon, 1 Mar 2021 10:10:27 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 7C1FA8D0063; Mon, 1 Mar 2021 10:10:27 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6FF5F8D0079; Mon, 1 Mar 2021 10:10:27 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0004.hostedemail.com [216.40.44.4]) by kanga.kvack.org (Postfix) with ESMTP id 5B9DA8D0063 for ; Mon, 1 Mar 2021 10:10:27 -0500 (EST) Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 18B676D9B for ; Mon, 1 Mar 2021 15:10:27 +0000 (UTC) X-FDA: 77871641694.09.F3D4DF4 Received: from mail-lj1-f176.google.com (mail-lj1-f176.google.com [209.85.208.176]) by imf18.hostedemail.com (Postfix) with ESMTP id 69AC52000393 for ; Mon, 1 Mar 2021 15:10:27 +0000 (UTC) Received: by mail-lj1-f176.google.com with SMTP id r23so19812866ljh.1 for ; Mon, 01 Mar 2021 07:10:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=FqEasT/CcfZgJ5sVq0zoEEbPXzj4gEZQSUG9/1WN1Do=; b=qRoGU2wohpJbUgLEd98urVPFt7jw5V4xjVOmoQ/mcoIONClUYY4pc7fZPnsT8/eaRe Jcm9qw83ze6XBT0PNhWMfTEOr9vqGzJwyOWPrrBWBJI6pM6AWXpfAl2AM92TSFjFMTku nJ4QESmGf3h8gLAd5iITcfCuL6j8inhDnQyVW2S7faUUCsSew+UyLsv+qo/32IKNjdNr IsmUplL4/SsVIIxgTnhySm59IemXwlhJknJIjLSwOPzdQwUnSCg29w594Yd9w5Sp+VOF JSXiHgk5k18LDWcDy7vDvDpeNPz9M13kvd/PYv/n7R0NVhQZvOARy1JqhOJfbaiz+qEI 9wtA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=FqEasT/CcfZgJ5sVq0zoEEbPXzj4gEZQSUG9/1WN1Do=; b=kuG0DD6KPWjbtmjcGbRMr/N7CyuOo1rL5K2X7bw9Oc+0pT6xTuIz60L9PWTYZESQz0 pc1so2H3G6zP4MLdieNaYlwPwMGh82aSCwAjBjg7ESw2qbgfqLdiMLsv2+fl8BFomc8G av+JQr/d+xyy5h90AFuk6mBMd1zpPj8xLYIDlm59WTawUCcYQnoWfAK9UUAl0lg/m4Lq rg9JL5nX3lI1uHt8U+wA3e1EpJKxnSMg1I2k3X2Q5x7IMlJFggCgGAbm+PQpa6Q2lBPX VFdIyKwfQ6peqx1oQELNlSAo1i9gYpJ/ruWOzuSVOKDZMw7lXCU+zDwc6gb1T5Vby8qT ZHTQ== X-Gm-Message-State: AOAM530Gee4u/aPtnqdr16cSzZlVs8vWNZRWjUewxaGsFY30yU1XVoV0 zJeWO2bwvuQo8K2sHz2sM2WHMJwrQ9SMew8jgi35NA== X-Google-Smtp-Source: ABdhPJzYMlV3say0EDik5yoGswKhTchbG/yK2wzysG+tJfg23isFoGyFOtdQgaZvIkuwCzw90uj3upS1wU2GP43BFT0= X-Received: by 2002:a2e:5c02:: with SMTP id q2mr9379535ljb.81.1614611423221; Mon, 01 Mar 2021 07:10:23 -0800 (PST) MIME-Version: 1.0 References: <000000000000f1c03b05bc43aadc@google.com> <7b7c4f41-b72e-840f-278a-320b9d97f887@oracle.com> In-Reply-To: From: Shakeel Butt Date: Mon, 1 Mar 2021 07:10:11 -0800 Message-ID: Subject: Re: possible deadlock in sk_clone_lock To: Michal Hocko Cc: Mike Kravetz , syzbot , Andrew Morton , LKML , Linux MM , syzkaller-bugs , Eric Dumazet , Mina Almasry Content-Type: text/plain; charset="UTF-8" X-Stat-Signature: 4fdoqziqgj435h6rcx9g7i16cfpu6jx4 X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 69AC52000393 Received-SPF: none (google.com>: No applicable sender policy available) receiver=imf18; identity=mailfrom; envelope-from=""; helo=mail-lj1-f176.google.com; client-ip=209.85.208.176 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1614611427-729288 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Mar 1, 2021 at 4:12 AM Michal Hocko wrote: > > On Fri 26-02-21 16:00:30, Shakeel Butt wrote: > > On Fri, Feb 26, 2021 at 3:14 PM Mike Kravetz wrote: > > > > > > Cc: Michal > > > > > > On 2/26/21 2:44 PM, Shakeel Butt wrote: > > > > On Fri, Feb 26, 2021 at 2:09 PM syzbot > > > > wrote: > > > > > > >> other info that might help us debug this: > > > >> > > > >> Possible interrupt unsafe locking scenario: > > > >> > > > >> CPU0 CPU1 > > > >> ---- ---- > > > >> lock(hugetlb_lock); > > > >> local_irq_disable(); > > > >> lock(slock-AF_INET); > > > >> lock(hugetlb_lock); > > > >> > > > >> lock(slock-AF_INET); > > > >> > > > >> *** DEADLOCK *** > > > > > > > > This has been reproduced on 4.19 stable kernel as well [1] and there > > > > is a reproducer as well. > > > > > > > > It seems like sendmsg(MSG_ZEROCOPY) from a buffer backed by hugetlb. I > > > > wonder if we just need to make hugetlb_lock softirq-safe. > > > > > > > > [1] https://syzkaller.appspot.com/bug?extid=6383ce4b0b8ec575ad93 > > > > > > Thanks Shakeel, > > > > > > Commit c77c0a8ac4c5 ("mm/hugetlb: defer freeing of huge pages if in non-task > > > context") attempted to address this issue. It uses a work queue to > > > acquire hugetlb_lock if the caller is !in_task(). > > > > > > In another recent thread, there was the suggestion to change the > > > !in_task to in_atomic. > > > > > > I need to do some research on the subtle differences between in_task, > > > in_atomic, etc. TBH, I 'thought' !in_task would prevent the issue > > > reported here. But, that obviously is not the case. > > > > I think the freeing is happening in the process context in this report > > but it is creating the lock chain from softirq-safe slock to > > irq-unsafe hugetlb_lock. So, two solutions I can think of are: (1) > > always defer the freeing of hugetlb pages to a work queue or (2) make > > hugetlb_lock softirq-safe. > > There is __do_softirq so this should be in the soft IRQ context no? > Is this really reproducible with kernels which have c77c0a8ac4c5 > applied? Yes this is softirq context and syzbot has reproduced this on linux-next 20210224. > > Btw. making hugetlb lock irq safe has been already discussed and it > seems to be much harder than expected as some heavy operations are done > under the lock. This is really bad. What about just softirq-safe i.e. spin_[un]lock_bh()? Will it still be that bad? > Postponing the whole freeing > operation into a worker context is certainly possible but I would > consider it rather unfortunate. We would have to add some sync mechanism > to wait for hugetlb pages in flight to prevent from external > observability to the userspace. E.g. when shrinking the pool. I think in practice recycling of hugetlb pages is a rare event, so we might get away without the sync mechanism. How about start postponing the freeing without sync mechanism and add it later if there are any user reports complaining? > -- > Michal Hocko > SUSE Labs