From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C0532C77B72 for ; Fri, 14 Apr 2023 20:13:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5800B6B0072; Fri, 14 Apr 2023 16:13:04 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 530376B0075; Fri, 14 Apr 2023 16:13:04 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3F828900002; Fri, 14 Apr 2023 16:13:04 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 2C1496B0072 for ; Fri, 14 Apr 2023 16:13:04 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 0142D120403 for ; Fri, 14 Apr 2023 20:13:03 +0000 (UTC) X-FDA: 80681095488.03.C077C16 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf03.hostedemail.com (Postfix) with ESMTP id 8D9B42001E for ; Fri, 14 Apr 2023 20:13:01 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=vi878U9I; dmarc=none; spf=none (imf03.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1681503182; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=IFVgDJvaqoP8L0V4Ig7LOdBTpqO29/AtKOKBDjrNX24=; b=TOW8g401yj/yA1U2Rgow92adrzTE+zY850detyKvku8q9aI+vxmf8Oxs4nWlCQhYz9SeKL YVSAcwwHGiDsUQa+dW/zUY4D65Ue85FJFfnpVr1tA8NsOgolgrJ2X6iAW/LnWsD4o8oWlx OuBwBL/8b6Z3MxEC9+8jixiOrOHJWus= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=vi878U9I; dmarc=none; spf=none (imf03.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1681503182; a=rsa-sha256; cv=none; b=4mbQTk7v5ecu/7kO4DB4UVQjaaUwlCURLzRvuomWNByBLST0NEq4PNn0yYmW7HbFwSvzIC NkmS8ht6JI8ae6hNHekf35U/pGIqpgrbCT7XW0OQkR1u73quCtlJV4IaRCC+6YjC6rDT8p XJOtHi35v9Z0JlxdG1EyQvUdJYx8z+I= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=IFVgDJvaqoP8L0V4Ig7LOdBTpqO29/AtKOKBDjrNX24=; b=vi878U9Ico1+rwTYJ8O3q5tu1r 4nQ0nB+d74xJhAbaFdoTjSer3VFaOgN6DSKeTO83N0NW95JfG7pXyWIqHsmEPrwuIRJegPHSeGvDF cR9kzVX5NVQ9+XNImzVSlwEo5XGnI8037c3KAFw6pOx4pUFdeBmds7jGaRWMO/q1If0rVqoiJqsgo UxMpeqr5L5Tkb5PR70EfvZFDgYOTEd8kautDqFps1UhItacTATzeflJCO9F7mzn3x7ypUEqZ3UeIM PXlVTdncIj/fggtHWt3o2dx7UdTup2L8tsiWiRYP1arBupwSdQBpA5ab+RKAZ41CUo7eX55GTl6kR qnBfb/xA==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1pnPmz-00933Z-7u; Fri, 14 Apr 2023 20:12:57 +0000 Date: Fri, 14 Apr 2023 21:12:57 +0100 From: Matthew Wilcox To: Tarun Sahu Cc: linux-mm@kvack.org, akpm@linux-foundation.org, muchun.song@linux.dev, mike.kravetz@oracle.com, aneesh.kumar@linux.ibm.com, sidhartha.kumar@oracle.com, gerald.schaefer@linux.ibm.com, linux-kernel@vger.kernel.org, jaypatel@linux.ibm.com Subject: Re: [PATCH] mm/folio: Avoid special handling for order value 0 in folio_set_order Message-ID: References: <20230414194832.973194-1-tsahu@linux.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20230414194832.973194-1-tsahu@linux.ibm.com> X-Rspamd-Queue-Id: 8D9B42001E X-Rspamd-Server: rspam09 X-Rspam-User: X-Stat-Signature: xxtaxpxqpjkiekxg1uj9oysgfqanrieh X-HE-Tag: 1681503181-2999 X-HE-Meta: U2FsdGVkX18tL1eNeKRFiKDEDf27/MjAxTQFqkbYyE8SEEGdQwaDS93A4udQxJv9hKJ8+G3s1Q3Itr2GgIkmHhEzsYU6dj+h9xNBwRf7uHqg7JM8AHR7mlzD1J7t9NwglgQNylTZRPjq0uueV+lI4aXpZjzVi4V4evlrFx6mK3eUa6OEqgikT4u544yjfUumi+3Y1GdLIcl1QgmgDQ8kBXOEcgjJmg1F6xOOy+apjbr6IdgiFcw8O2yiYD7E40y9g0ghVa0ifDh/P2oMQGGYPMoCcH3GYuH5yQENCFOODygR/5MomLAZFfUoLNeAKzvdfdyUTZKOVC0Lrly4kIGkyKOINCzxnffGZzUAQh0WVSeaOlDdI2O6/JNDaDQIHt4T4QDERtUcdSn/ZIrCoONtQyAbc1EWQRvM9KJWygn0tiRtLpOp/F8rIBWUuJhTVvONBT+6Q09W+ZSFyR2Dm3EZLEmTJja+8tXxrJzInhToGBC96EyNSVkOJz3Hu7Ov0iYO7kQpSAsg0i0y6iPG4banPF73SedHiyRewITrxlS9CrH6DyawheQYpVtGMgf3TefOky9/uFlwzggy2AtQ6V17HKk3Qqnep7Bkp6iEAXyTNEJEsU/mNSXKkTW3sTyLxFzNbsUIc6hTtilIBFJMNj7VPsYedwgmTrcvHt9vLNDT32XCZUihH9uYDH+2aBD7ZIa307pE19OS7PCeJxVqgTdsVFReHbtgV0sq0vvovJT2DdvZiLUKDry7v3hX5uPdMmQMMeueDBgj8JXhFGCEiQYQNBkVyblHkM6w3eEvtHeAGa0l/jSzqupMlwUdXKg3f91Sm22cpzDmC2ZMN06yptT8RHgBNT25Z8nYsmJCTeB0VEMwZeyb1jsPuw+M09K5docI63JGSzq0vDKreF085XIyX8yg5NV+dTrQcl2JLK6ntT5vn1MHelmnJOOFgx5GwR/lQ3y1tbasB4JeNCbHCrQ wOsXvEIw IcBs+k2Lk1xnPhi7xHuxWNTD//yHcCZmKT8hTlU+Sokw4NQiCL9/QBSxymmv5IYBanfdPUuLf50otmflTP7asTGrH7+BatejrYCpGUhuBLn9PGC9+VXnkFuHfkbAMtSLayOAAwlo09UuaU9sZHVWMvN0W/FY+y5QcM9b12TO4aywSKv/QHFCAbT0VTBBanINx0dTCXx5JSBHNBhxLQLuwq/a/8G629Otk9g9BmN+x1aMDPIxcXeX76yiqSrCrKwrQIQ7J647QeN81yDY8BsebByH2+vaF+v72+vVV X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Sat, Apr 15, 2023 at 01:18:32AM +0530, Tarun Sahu wrote: > folio_set_order(folio, 0); which is an abuse of folio_set_order as 0-order > folio does not have any tail page to set order. I think you're missing the point of how folio_set_order() is used. When splitting a large folio, we need to zero out the folio_nr_pages in the tail, so it does have a tail page, and that tail page needs to be zeroed. We even assert that there is a tail page: if (WARN_ON_ONCE(!folio_test_large(folio))) return; Or maybe you need to explain yourself better. > folio->_folio_nr_pages is > set to 0 for order 0 in folio_set_order. It is required because > _folio_nr_pages overlapped with page->mapping and leaving it non zero > caused "bad page" error while freeing gigantic hugepages. This was fixed in > Commit ba9c1201beaa ("mm/hugetlb: clear compound_nr before freeing gigantic > pages"). Also commit a01f43901cfb ("hugetlb: be sure to free demoted CMA > pages to CMA") now explicitly clear page->mapping and hence we won't see > the bad page error even if _folio_nr_pages remains unset. Also the order 0 > folios are not supposed to call folio_set_order, So now we can get rid of > folio_set_order(folio, 0) from hugetlb code path to clear the confusion. ... this is all very confusing. > The patch also moves _folio_set_head and folio_set_order calls in > __prep_compound_gigantic_folio() such that we avoid clearing them in the > error path. But don't we need those bits set while we operate on the folio to set it up? It makes me nervous if we don't have those bits set because we can end up with speculative references that point to a head page while that page is not marked as a head page. It may not be a problem, but I want to see some air-tight analysis of that. > Testing: I have run LTP tests, which all passes. and also I have written > the test in LTP which tests the bug caused by compound_nr and page->mapping > overlapping. > > https://lore.kernel.org/all/20230413090753.883953-1-tsahu@linux.ibm.com/ > > Running on older kernel ( < 5.10-rc7) with the above bug this fails while > on newer kernel and, also with this patch it passes. > > Signed-off-by: Tarun Sahu > --- > mm/hugetlb.c | 9 +++------ > mm/internal.h | 8 ++------ > 2 files changed, 5 insertions(+), 12 deletions(-) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index f16b25b1a6b9..e2540269c1dc 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -1489,7 +1489,6 @@ static void __destroy_compound_gigantic_folio(struct folio *folio, > set_page_refcounted(p); > } > > - folio_set_order(folio, 0); > __folio_clear_head(folio); > } > > @@ -1951,9 +1950,6 @@ static bool __prep_compound_gigantic_folio(struct folio *folio, > struct page *p; > > __folio_clear_reserved(folio); > - __folio_set_head(folio); > - /* we rely on prep_new_hugetlb_folio to set the destructor */ > - folio_set_order(folio, order); > for (i = 0; i < nr_pages; i++) { > p = folio_page(folio, i); > > @@ -1999,6 +1995,9 @@ static bool __prep_compound_gigantic_folio(struct folio *folio, > if (i != 0) > set_compound_head(p, &folio->page); > } > + __folio_set_head(folio); > + /* we rely on prep_new_hugetlb_folio to set the destructor */ > + folio_set_order(folio, order); > atomic_set(&folio->_entire_mapcount, -1); > atomic_set(&folio->_nr_pages_mapped, 0); > atomic_set(&folio->_pincount, 0); > @@ -2017,8 +2016,6 @@ static bool __prep_compound_gigantic_folio(struct folio *folio, > p = folio_page(folio, j); > __ClearPageReserved(p); > } > - folio_set_order(folio, 0); > - __folio_clear_head(folio); > return false; > } > > diff --git a/mm/internal.h b/mm/internal.h > index 18cda26b8a92..0d96a3bc1d58 100644 > --- a/mm/internal.h > +++ b/mm/internal.h > @@ -425,16 +425,12 @@ int split_free_page(struct page *free_page, > */ > static inline void folio_set_order(struct folio *folio, unsigned int order) > { > - if (WARN_ON_ONCE(!folio_test_large(folio))) > + if (WARN_ON_ONCE(!order || !folio_test_large(folio))) > return; > > folio->_folio_order = order; > #ifdef CONFIG_64BIT > - /* > - * When hugetlb dissolves a folio, we need to clear the tail > - * page, rather than setting nr_pages to 1. > - */ > - folio->_folio_nr_pages = order ? 1U << order : 0; > + folio->_folio_nr_pages = 1U << order; > #endif > } > > -- > 2.31.1 >