From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id D73DECD1284
	for <linux-mm@archiver.kernel.org>; Thu,  4 Apr 2024 09:15:54 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 2CCBC6B0087; Thu,  4 Apr 2024 05:15:54 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 27CC86B0088; Thu,  4 Apr 2024 05:15:54 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 147256B0089; Thu,  4 Apr 2024 05:15:54 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11])
	by kanga.kvack.org (Postfix) with ESMTP id EC1776B0087
	for <linux-mm@kvack.org>; Thu,  4 Apr 2024 05:15:53 -0400 (EDT)
Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay06.hostedemail.com (Postfix) with ESMTP id 82B3AA158F
	for <linux-mm@kvack.org>; Thu,  4 Apr 2024 09:15:53 +0000 (UTC)
X-FDA: 81971292186.18.7A62780
Received: from mail-lf1-f47.google.com (mail-lf1-f47.google.com [209.85.167.47])
	by imf09.hostedemail.com (Postfix) with ESMTP id 8ADB1140026
	for <linux-mm@kvack.org>; Thu,  4 Apr 2024 09:15:50 +0000 (UTC)
Authentication-Results: imf09.hostedemail.com;
	dkim=pass header.d=gmail.com header.s=20230601 header.b="VPQ3F6/C";
	spf=pass (imf09.hostedemail.com: domain of huangzhaoyang@gmail.com designates 209.85.167.47 as permitted sender) smtp.mailfrom=huangzhaoyang@gmail.com;
	dmarc=pass (policy=none) header.from=gmail.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1712222150;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=nUdDZySjrF1wvq9oOqTRZ+b/TlHRXdjH4rV3Y1d7tV8=;
	b=8l+LwSg/sv3K3A1TIYtIeE+BfeJdqs1bHP2xtREZwCedrap4hCSwxPMIacwnGEXlP5EvrC
	fkKxRN1Q2INjqs6iDQ86uHL+VkvxIHeEV9fM8LQyZAmXajVm/708iyuzLduvcw8nB3mG/N
	tPZpr7cfdTHEoj1/doUkwore3/dyr04=
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1712222150; a=rsa-sha256;
	cv=none;
	b=I2nWEMURwo8Gcw41s2r4BFHTsBsNagJCSLj7297EzEm2VxR1f4g5n8OP3tUA3p4pN9oLjQ
	yc51LBzoM09hQBF2OwBoQHxlZxPkKD/qEaNJfn0hTJyeGZhDS82S5KA5Yyb1vZ7y8Icf21
	1JMgPekzkZceUjE+mJzqrF0lgbIFtnk=
ARC-Authentication-Results: i=1;
	imf09.hostedemail.com;
	dkim=pass header.d=gmail.com header.s=20230601 header.b="VPQ3F6/C";
	spf=pass (imf09.hostedemail.com: domain of huangzhaoyang@gmail.com designates 209.85.167.47 as permitted sender) smtp.mailfrom=huangzhaoyang@gmail.com;
	dmarc=pass (policy=none) header.from=gmail.com
Received: by mail-lf1-f47.google.com with SMTP id 2adb3069b0e04-51381021af1so1182272e87.0
        for <linux-mm@kvack.org>; Thu, 04 Apr 2024 02:15:50 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1712222149; x=1712826949; darn=kvack.org;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:from:to:cc:subject:date
         :message-id:reply-to;
        bh=nUdDZySjrF1wvq9oOqTRZ+b/TlHRXdjH4rV3Y1d7tV8=;
        b=VPQ3F6/CqOSkyhEFLMMNrXAm9WegqAwXHFS2Znu/nCVG5hE0SwJkcvDWe4tP15V2Lu
         X73Vl9cyfOi8Ti6nUye2x3mm3gnc7y4nNzh0/vUeGDr1ExiJpqzRl+hRTynjXq2f66d4
         pVJdVMDlgKmZJIhu/ZwroLHI5IzlisxuLix3uPCd9uKkz4RF3ELvyhnteccyI/9SyZxv
         gEFqbRoSOVPLLYQ8oGrkdLgljNaxD6YkQcpmnu7ouS/TzRMz4kDWKAjy9xj/0TSgOfHF
         l2fR3vIh1o2UHKkU9SOVPaa6ZfnipwQwOT0Ns3GLahswPmihZN0e84SE3BRjeYqOvob9
         Aq1Q==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1712222149; x=1712826949;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=nUdDZySjrF1wvq9oOqTRZ+b/TlHRXdjH4rV3Y1d7tV8=;
        b=kWOdygWTlkoDmca899ygwYDgwO/35Dw0EXGEM6EwuOZ/IvRtHpAXGSqYwYtegt8hWg
         fsFrVJ4j2nQkGBsDjpoCV1kSKR+KwAUBA5OoCqeYggycAsNzb5XdV8i1Tr6BZAE6igQS
         LJ3Kd54O8F2UZI0iV07dBiOOOZH/2Y3qRBU7EKqeBSPknVn5sqH5INW/1PklOyRKQ4R3
         wLisD3ot91FSXKNeTaufJx2aG+7fnDkC6XSR9t3Y7MHsl2XTrA+krT/Vz2inBux91cOl
         gVRZ6KoHajQLOEWSqm7XO2Mb+eIeYoJJZTNHbmCdWrdzpBpSmphufrlrRalHaHD2QjNp
         T6PA==
X-Forwarded-Encrypted: i=1; AJvYcCX7QQdbk8vWjHuEGfplJIUnrQVfbnpmdAYq8JHH79aDj6tSxEeFmxfAjlSd5bwW+aUbEMLey77skCQuzsY4uLrpFOw=
X-Gm-Message-State: AOJu0YwpUOJ6kxAyQwcpksEfqMZf9FWkheeSegSwCN0Ulnv9C9TKVUEx
	O4xWsXABpACjE5uyeKLe9iwEWYzZUw+48OwhqCKorknQvco7zWGH+gimZLF6PJyW3H4numT9IuH
	Ly0AojBa/DzDFc4NQUUlUYU7hwhE=
X-Google-Smtp-Source: AGHT+IEElv649+FFP6kt5/YPlcCr8Xo7UyNH6A7fZ4VarBPQSVgWE20nIGmRs7d52Zocup2PPcz3hVe83Tm59A/z/Io=
X-Received: by 2002:ac2:58f7:0:b0:512:e02f:9fa7 with SMTP id
 v23-20020ac258f7000000b00512e02f9fa7mr1606524lfo.1.1712222148288; Thu, 04 Apr
 2024 02:15:48 -0700 (PDT)
MIME-Version: 1.0
References: <20240401081734.1433755-1-zhaoyang.huang@unisoc.com>
 <736b982a-57c9-441a-812c-87cdee2e096e@redhat.com> <CAGWkznF6NQrB-vwBZCfCF-1WQJv8iwPPrwND7yDsPJw1EbfxQA@mail.gmail.com>
 <2f8af9d1-8c8e-4e1c-a794-76f6bb287b08@redhat.com> <CAGWkznEcS7rNWMudMBA7e-bjudX-sR4TGofyB5G1E=OmW_0Ckw@mail.gmail.com>
 <134b6c6d-5bcf-4394-85ad-2da2e8dec1d3@redhat.com>
In-Reply-To: <134b6c6d-5bcf-4394-85ad-2da2e8dec1d3@redhat.com>
From: Zhaoyang Huang <huangzhaoyang@gmail.com>
Date: Thu, 4 Apr 2024 17:15:37 +0800
Message-ID: <CAGWkznEOopuVCkzdOj+HqH4ZGx0_8kHn+j8ZOhbhuoGV9pTQWg@mail.gmail.com>
Subject: Re: [PATCHv2 1/1] mm: fix unproperly folio_put by changing API in read_pages
To: David Hildenbrand <david@redhat.com>
Cc: "zhaoyang.huang" <zhaoyang.huang@unisoc.com>, Andrew Morton <akpm@linux-foundation.org>, 
	NeilBrown <neilb@suse.de>, linux-mm@kvack.org, linux-kernel@vger.kernel.org, 
	steve.kang@unisoc.com, Matthew Wilcox <willy@infradead.org>, 
	Christoph Hellwig <hch@infradead.org>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Stat-Signature: ho1zt3y51pjfbstkd1n3h5w1ui4jjf34
X-Rspamd-Server: rspam10
X-Rspamd-Queue-Id: 8ADB1140026
X-Rspam-User: 
X-HE-Tag: 1712222150-333979
X-HE-Meta: U2FsdGVkX18Rr3N6PhKHvygrN+XLb47xEPLyNXIj1aCt9p8lRJ1w9KlTvasy+XixMXmwKGJANeuUdhjBC0C7ngJnKbrDnOMm/emEgoaVvK48emi0L1EBErfrLvhl7Llgk9DESjz28vscG523M9bJwribEAtPrXYpLjqJLhHYdPHi5Hii+Szw4+WRKYmnupvJ7CyEI36fFcCuBMw9PnRUJG6UliLJ6OnhPHHKWslGK2LFwvQ4lLXI41n4BjFVIEYRXxSv4bB8BhukuBn6V9APhCMyqFXVUz7N55LbFtvSpLOws3x7oJbUDh5mBCrodRJQQezr/9BXaOyUechWY2VI2Ag6TpLARZhLuzmL8NLwkIHrvaJBnyeOaFjF3JbK3pch3noTGur65cJEazClN8l6RH4QTJMHFevAE5/AlUgDHM2I0e7cRdIZadvcJj+zEIzKSQolnRqK2P+pGIFnop+eXK3TU61QmkwLvWQiWqUGdF2+wXD5hesWPxO33ydETE+iZr1RTRxMX8E0fafGvyyxD5Bbc3R5SufKegCRQeOfpT16Ig7vaaz46jx9DbDJYUXnYwALNtsmGfJZjbOKkVPyCp0OsrEk3R9YUnekJIAv2X2DzHpYaPDlw9Aa4ldyHh3S6ARWKJyFfNYg5HLNf+n9SQiz2Vwu03JiIOTS9VlGVuuLaT5S4KcTISE85P0ePxfz4zF8409UpBFsC556gjbfe2BPxyEPtEKQ33xvtd2i2ZZteACt+BP5K3rAB1HlRuuMBnMmPK8CnAQQLegeKs/GhoPsz4hv1IgeDAWbYExhforFt7ZUst5tay3MSyBQYnXbrajOaNpXLJNQtpT0LDkcrWTRbBGAlZ8IMDTT6LAnwTF1t5HLSOduy5DLkQ2NIH3lw8Mak6tErAPnFoK4qT/N+sMtllTVUuurL/drmFmMvEw43dMCeeQ9+BvdK91BPLjHPAo7e8VspktvtM0QfEi
 qp/RcTtG
 aAQMhQPrEj9BtQLf/KRcyPWGSSVEa38E6PGmnHBUKuSSzDrntuGXacaelr+z3TtfPfCU7SWOS/59k4UjjNX4N36l8yBTZuvBEg7JVRLZvgx5YLYNXoRNOfeBQq7IHRIpmgyVSoOXqr13y5xWdOk0syqTsPnnI0Bxra1wy5J3dQX299TvfWCDQQ6M0NuNWkxD/ua9hU51yfCLU6jEinTy3JhBHMMRDc5esrR/VpuJ/m6deWfZapWcsiIg8AnqqcVF8Dem0A01tRQ2t58UeZmsGpnBi43lU4hXdhDNPIRjeNkDliPH0edATyhkMNfAcaPq4qgf4zDbT+pF0BIYWAef3s/3DKKBcYvszup7UiRzb4zrk3tkMaz6KWgIcWz5t6hpouTbAEy2AlDZaYwNVEswGvxO2qxDAhZ6TTKye0syP70oTYRuf7/o30hdl65eCjDjRUn4s7Q9fQSyQzfq+0sLtH9BR/6MpGoyjBqT9
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

On Wed, Apr 3, 2024 at 7:47=E2=80=AFPM David Hildenbrand <david@redhat.com>=
 wrote:
>
> On 03.04.24 13:08, Zhaoyang Huang wrote:
> > On Wed, Apr 3, 2024 at 4:01=E2=80=AFPM David Hildenbrand <david@redhat.=
com> wrote:
> >>
> >> On 03.04.24 07:50, Zhaoyang Huang wrote:
> >>> On Tue, Apr 2, 2024 at 8:58=E2=80=AFPM David Hildenbrand <david@redha=
t.com> wrote:
> >>>>
> >>>> On 01.04.24 10:17, zhaoyang.huang wrote:
> >>>>> From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> >>>>>
> >>>>> An VM_BUG_ON in step 9 of [1] could happen as the refcnt is dropped
> >>>>> unproperly during the procedure of read_pages()->readahead_folio->f=
olio_put.
> >>>>> This is introduced by commit 9fd472af84ab ("mm: improve cleanup whe=
n
> >>>>> ->readpages doesn't process all pages")'.
> >>>>>
> >>>>> key steps of[1] in brief:
> >>>>> 2'. Thread_truncate get folio to its local fbatch by find_get_entry=
 in step 2
> >>>>> 7'. Last refcnt remained which is not as expect as from alloc_pages
> >>>>>        but from thread_truncate's local fbatch in step 7
> >>>>> 8'. Thread_reclaim succeed to isolate the folio by the wrong refcnt=
(not
> >>>>>        the value but meaning) in step 8
> >>>>> 9'. Thread_truncate hit the VM_BUG_ON in step 9
> >>>>>
> >>>>> [1]
> >>>>> Thread_readahead:
> >>>>> 0. folio =3D filemap_alloc_folio(gfp_mask, 0);
> >>>>>           (refcount 1: alloc_pages)
> >>>>> 1. ret =3D filemap_add_folio(mapping, folio, index + i, gfp_mask);
> >>>>>           (refcount 2: alloc_pages, page_cache)
> >>
> >> [not going into all details, just a high-level remark]
> >>
> >> page_cache_ra_unbounded() does a filemap_invalidate_lock_shared(), whi=
ch
> >> is a down_read_trylock(&mapping->invalidate_lock).
> >>
> >> That is, all read_pages() calls in mm/readahead.c happen under
> >> mapping->invalidate_lock in read mode.
> >>
> >> ... and ...
> >>
> >>>>>
> >>>>> Thread_truncate:
> >>>>> 2. folio =3D find_get_entries(&fbatch_truncate);
> >>>>>           (refcount 3: alloc_pages, page_cache, fbatch_truncate))
> >>
> >> truncation, such as truncate_inode_pages() must be called under
> >> mapping->invalidate_lock held in write mode. So naive me would have
> >> thought that readahead and truncate cannot race in that way.
> >>
> >> [...]
> >>
> > Thanks for the reminder. But I don't find the spot where holding
> > "mapping->invalidate_lock" by check the callstack of
> > 'kill_bdev()->truncate_inode_pages()->truncate_inode_pages_range()',
> > or the lock is holded beyond this?
>
> Well, truncate_inode_pages() documents:
>
> "Called under (and serialised by) inode->i_rwsem and
> mapping->invalidate_lock."
>
> If that's not the case than that's either (a) a BUG or (b) an
> undocumented exception to the rule, whereby other mechanisms are in
> place to prevent any further pagecache magic.
>
> I mean, kill_bdev() documents " Kill _all_ buffers and pagecache , dirty
> or not..", so *something* has to be in place to guarantee that there
> cannot be something concurrently filling the pagecache again, otherwise
> kill_bdev() could not possibly do something reasonable.
>
> For example, blkdev_flush_mapping() is called when bd_openers goes to 0,
> and my best guess is that nobody should be able to make use of that
> device at that point.
>
> Similarly, changing the blocksize sounds like something that wouldn't be
> done at arbitrary points in time ...
>
> So kill_bdev() already has a "special" smell to it and I suspect (b)
> applies, where concurrent pagecache action is not really any concern.
>
> But I'm not an expert and I looked at most of that code right now for
> the first time ...
Thanks for your help. I don't know if it is related to an out of date
documentation issue. Regardless of truncation path, there could be an
isolation path for entering this race where the refcnt of local folio
batch replaced the one of alloc_pages and then makes the page out of
the protection by folio_lock and could race between reclaiming and
following actions of isolation. It could be any kind of phenomenon
other than VM_BUG_ON.
>
> >>
> >>>>
> >>>> Something that would help here is an actual reproducer that triggers=
this
> >>>> issue.
> >>>>
> >>>> To me, it's unclear at this point if we are talking about an actual
> >>>> issue or a theoretical issue?
> >>> Thanks for feedback. Above callstack is a theoretical issue so far
> >>> which is arised from an ongoing analysis of a practical livelock issu=
e
> >>> generated by folio_try_get_rcu which is related to abnormal folio
> >>> refcnt state. So do you think this callstack makes sense?
> >>
> >> I'm not an expert on that code, and only spent 5 min looking into the
> >> code. So my reasoning about invalidate_lock above might be completely =
wrong.
> >>
> >> It would be a very rare race that was not reported so far in practice.
> >> And it certainly wouldn't be the easiest one to explain, because the
> >> call chain above is a bit elaborate and does not explain which locks a=
re
> >> involved and how they fail to protect us from any such race.
> >>
> >> For this case in particular, I think we really need a real reproducer =
to
> >> convince people that the actual issue does exist and the fix actually
> >> resolves the issue.
> > Sorry, it is theoretically yet according to my understanding.
>
> Okay, if you find a reproducer, please share it and we can investigate
> if it's a locking problem or something else. As of now, I'm not
> convinced that there is an actual issue that needs fixing.
I am hoping anybody could help to confirm if this is indeed a BUG
according to the procedures in the commit message.
>
> --
> Cheers,
>
> David / dhildenb
>