From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-11.6 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CC319C433DF for ; Tue, 14 Jul 2020 17:32:35 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 8037C2074A for ; Tue, 14 Jul 2020 17:32:35 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="I4yd2tYC" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8037C2074A Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 083766B0003; Tue, 14 Jul 2020 13:32:35 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 033D86B0007; Tue, 14 Jul 2020 13:32:34 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E65256B0008; Tue, 14 Jul 2020 13:32:34 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0106.hostedemail.com [216.40.44.106]) by kanga.kvack.org (Postfix) with ESMTP id CBBD96B0003 for ; Tue, 14 Jul 2020 13:32:34 -0400 (EDT) Received: from smtpin11.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 55E3F33CD for ; Tue, 14 Jul 2020 17:32:34 +0000 (UTC) X-FDA: 77037375828.11.uncle68_2802e2f26ef2 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin11.hostedemail.com (Postfix) with ESMTP id 275A1180F8B87 for ; Tue, 14 Jul 2020 17:32:34 +0000 (UTC) X-HE-Tag: uncle68_2802e2f26ef2 X-Filterd-Recvd-Size: 9953 Received: from mail-vs1-f65.google.com (mail-vs1-f65.google.com [209.85.217.65]) by imf17.hostedemail.com (Postfix) with ESMTP for ; Tue, 14 Jul 2020 17:32:33 +0000 (UTC) Received: by mail-vs1-f65.google.com with SMTP id x13so8911406vsx.13 for ; Tue, 14 Jul 2020 10:32:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=wKQqoE49hY1h24A/ULTRwztiia1lkZDkVuV1AI5gZxk=; b=I4yd2tYCFpKnBAEJVZuPcvzbUCV4m/tg5VgJt6FqT4ufqa7EoCc4B2aKyESglcBlgX PLu4FPf8UWVCaeYsW+RGc0mQb7Hv5XWWXtexPhjFNgPDsY5V4ftKWElKMV0Tws+zwAXk qsZVPa5AxaahaIkc+inzI2cbFValQXQJWPdaYnZnwX4h/kTbqoBfVobXHqN6ZWuUDQm3 VWmf8+5x8A8U3rMjwf9AVK0Csnfu+RnYYvDQ+uONGJpdU7Lot0o+ZjDbTCYt3RmkJsca EjLT2sGS9FszEK9xAw75LgkHEDZZapEbvAe0j1E/OvfVxw/WQI1UTWnQoiWZXYQZwWH/ VhJg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=wKQqoE49hY1h24A/ULTRwztiia1lkZDkVuV1AI5gZxk=; b=rViFZ9nFulpMGkk+zjM3YUXNrDycwJaPRpMK/M/wdGFsbYLzHWYJ1xMCefXxf53E+d rUfBd+B+/QkVh6ltLDB7sKYWyFqFE52BYxuvPc5poPupXQyev/m/ndp4nKzzI2QTlela rbccHIqh0KafVL/GCv37KdquGs+iDSOHi6qEM9heX9iUx0RnJOAnsiUsRrVBm8tJ4vm0 IAO3t33UMqyyU4p7FZDXaLhmMsAe8mEpDNXhxHs+ZILM/7+lmnMMYng8OE9bSkOpm8gU pGSmGcnXXQc4CDzk6DFL4LLXgLoOzd+Id3YvaYcYc2V+Nnw4+mV3vVo++NPcw4++iRoX lBQQ== X-Gm-Message-State: AOAM531eAS7IR44R40Bf9Oucjydks/O4GiO0fWKWcNSGlVNx7UjLUd+Z o0sFw6ms1nMWWBB02J+GQ37hpXPrD+8/Lxh5M3m36Q== X-Google-Smtp-Source: ABdhPJwqIAsR0maZT2/rFgCAmd2x6qhrpX0RcDNO/WUFfXhQmZZ7p9b7sNbXzqCJmWk2gEmDf2IPXFGRTWsIZ2b9Ia0= X-Received: by 2002:a67:ed82:: with SMTP id d2mr4484456vsp.221.1594747952390; Tue, 14 Jul 2020 10:32:32 -0700 (PDT) MIME-Version: 1.0 References: <0000000000000b5f9d059aa2037f@google.com> <20200714033252.8748-1-hdanton@sina.com> <20200714053205.15240-1-hdanton@sina.com> <20200714140859.15156-1-hdanton@sina.com> <20200714141815.GP24642@dhcp22.suse.cz> In-Reply-To: From: Suren Baghdasaryan Date: Tue, 14 Jul 2020 10:32:20 -0700 Message-ID: Subject: Re: possible deadlock in shmem_fallocate (4) To: Todd Kjos Cc: Michal Hocko , Hridya Valsaraju , Hillf Danton , Eric Biggers , syzbot , Andrew Morton , =?UTF-8?B?QXJ2ZSBIasO4bm5ldsOlZw==?= , Christian Brauner , "open list:ANDROID DRIVERS" , Greg Kroah-Hartman , Hugh Dickins , "Joel Fernandes (Google)" , LKML , Linux-MM , Martijn Coenen , syzkaller-bugs , Todd Kjos , Markus Elfring Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 275A1180F8B87 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam03 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Jul 14, 2020 at 9:41 AM Suren Baghdasaryan wrote: > > On Tue, Jul 14, 2020 at 8:47 AM Todd Kjos wrote: > > > > +Suren Baghdasaryan +Hridya Valsaraju who support the ashmem driver. > > Thanks for looping me in. > > > > > > > On Tue, Jul 14, 2020 at 7:18 AM Michal Hocko wrote: > > > > > > On Tue 14-07-20 22:08:59, Hillf Danton wrote: > > > > > > > > On Tue, 14 Jul 2020 10:26:29 +0200 Michal Hocko wrote: > > > > > On Tue 14-07-20 13:32:05, Hillf Danton wrote: > > > > > > > > > > > > On Mon, 13 Jul 2020 20:41:11 -0700 Eric Biggers wrote: > > > > > > > On Tue, Jul 14, 2020 at 11:32:52AM +0800, Hillf Danton wrote: > > > > > > > > > > > > > > > > Add FALLOC_FL_NOBLOCK and on the shmem side try to lock inode upon the > > > > > > > > new flag. And the overall upside is to keep the current gfp either in > > > > > > > > the khugepaged context or not. > > > > > > > > > > > > > > > > --- a/include/uapi/linux/falloc.h > > > > > > > > +++ b/include/uapi/linux/falloc.h > > > > > > > > @@ -77,4 +77,6 @@ > > > > > > > > */ > > > > > > > > #define FALLOC_FL_UNSHARE_RANGE 0x40 > > > > > > > > > > > > > > > > +#define FALLOC_FL_NOBLOCK 0x80 > > > > > > > > + > > > > > > > > > > > > > > You can't add a new UAPI flag to fix a kernel-internal problem like this. > > > > > > > > > > > > Sounds fair, see below. > > > > > > > > > > > > What the report indicates is a missing PF_MEMALLOC_NOFS and it's > > > > > > checked on the ashmem side and added as an exception before going > > > > > > to filesystem. On shmem side, no more than a best effort is paid > > > > > > on the inteded exception. > > > > > > > > > > > > --- a/drivers/staging/android/ashmem.c > > > > > > +++ b/drivers/staging/android/ashmem.c > > > > > > @@ -437,6 +437,7 @@ static unsigned long > > > > > > ashmem_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) > > > > > > { > > > > > > unsigned long freed = 0; > > > > > > + bool nofs; > > > > > > > > > > > > /* We might recurse into filesystem code, so bail out if necessary */ > > > > > > if (!(sc->gfp_mask & __GFP_FS)) > > > > > > @@ -445,6 +446,11 @@ ashmem_shrink_scan(struct shrinker *shri > > > > > > if (!mutex_trylock(&ashmem_mutex)) > > > > > > return -1; > > > > > > > > > > > > + /* enter filesystem with caution: nonblock on locking */ > > > > > > + nofs = current->flags & PF_MEMALLOC_NOFS; > > > > > > + if (!nofs) > > > > > > + current->flags |= PF_MEMALLOC_NOFS; > > > > > > + > > > > > > while (!list_empty(&ashmem_lru_list)) { > > > > > > struct ashmem_range *range = > > > > > > list_first_entry(&ashmem_lru_list, typeof(*range), lru); > > > > > > > > > > I do not think this is an appropriate fix. First of all is this a real > > > > > deadlock or a lockdep false positive? Is it possible that ashmem just > > > > > > > > The warning matters and we can do something to quiesce it. > > > > > > The underlying issue should be fixed rather than _something_ done to > > > silence it. > > > > > > > > needs to properly annotate its shmem inodes? Or is it possible that > > > > > the internal backing shmem file is visible to the userspace so the write > > > > > path would be possible? > > > > > > > > > > If this a real problem then the proper fix would be to set internal > > > > > shmem mapping's gfp_mask to drop __GFP_FS. > > > > > > > > Thanks for the tip, see below. > > > > > > > > Can you expand a bit on how it helps direct reclaimers like khugepaged > > > > in the syzbot report wrt deadlock? > > > > > > I do not understand your question. > > > > > > > TBH I have difficult time following > > > > up after staring at the chart below for quite a while. > > > > > > Yes, lockdep reports are quite hard to follow and they tend to confuse > > > one hell out of me. But this one says that there is a reclaim dependency > > > between the shmem inode lock and the reclaim context. > > > > > > > Possible unsafe locking scenario: > > > > > > > > CPU0 CPU1 > > > > ---- ---- > > > > lock(fs_reclaim); > > > > lock(&sb->s_type->i_mutex_key#15); > > > > lock(fs_reclaim); > > > > > > > > lock(&sb->s_type->i_mutex_key#15); > > > > > > Please refrain from proposing fixes until the actual problem is > > > understood. I suspect that this might be just false positive because the > > > lockdep cannot tell the backing shmem which is internal to ashmem(?) > > > with any general shmem. Actually looking some more into this, I think you are right. Ashmem currently does not redirect writes into the backing shmem and fallocate call from ashmem_shrink_scan is always performed against asma->file, which is the backing shmem. IOW writes into the backing shmem are not supported, therefore this concurrent locking can't happen. I'm not sure how we can annotate the fact that the inode_lock in generic_file_write_iter and in shmem_fallocate always operate on different inodes. Ideas? > > > But somebody really familiar with ashmem code > > > should have a look I believe. > > I believe the deadlock is possible if a write to ashmem fd coincides > with shrinking of ashmem caches. I just developed a possible fix here > https://android-review.googlesource.com/c/kernel/common/+/1361205 but > wanted to test it before posting upstream. The idea is to detect such > a race between write and cache shrinking operations and let > ashmem_shrink_scan bail out if the race is detected instead of taking > inode_lock. AFAIK writing ashmem files is not a usual usage for ashmem > (standard usage is to mmap it and use as shared memory), therefore > this bailing out early should not affect ashmem cache maintenance > much. Besides ashmem_shrink_scan already bails out early if a > contention on ashmem_mutex is detected, which is a much more probable > case (see: https://elixir.bootlin.com/linux/v5.8-rc4/source/drivers/staging/android/ashmem.c#L497). > > I'll test and post the patch here in a day or so if there are no early > objections to it. > Thanks! > > > > > > > -- > > > Michal Hocko > > > SUSE Labs