From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 39329CA9EB5 for ; Mon, 21 Oct 2019 08:17:04 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id EA74520640 for ; Mon, 21 Oct 2019 08:17:03 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org EA74520640 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=ah.jp.nec.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 95CE56B0003; Mon, 21 Oct 2019 04:17:03 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 90D926B0005; Mon, 21 Oct 2019 04:17:03 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7FC556B0006; Mon, 21 Oct 2019 04:17:03 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0084.hostedemail.com [216.40.44.84]) by kanga.kvack.org (Postfix) with ESMTP id 5E2F66B0003 for ; Mon, 21 Oct 2019 04:17:03 -0400 (EDT) Received: from smtpin18.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with SMTP id E77156105 for ; Mon, 21 Oct 2019 08:17:02 +0000 (UTC) X-FDA: 76067086284.18.band19_77a63f250c415 X-HE-Tag: band19_77a63f250c415 X-Filterd-Recvd-Size: 7369 Received: from tyo162.gate.nec.co.jp (tyo162.gate.nec.co.jp [114.179.232.162]) by imf06.hostedemail.com (Postfix) with ESMTP for ; Mon, 21 Oct 2019 08:17:01 +0000 (UTC) Received: from mailgate01.nec.co.jp ([114.179.233.122]) by tyo162.gate.nec.co.jp (8.15.1/8.15.1) with ESMTPS id x9L8GtuI017043 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Mon, 21 Oct 2019 17:16:55 +0900 Received: from mailsv02.nec.co.jp (mailgate-v.nec.co.jp [10.204.236.94]) by mailgate01.nec.co.jp (8.15.1/8.15.1) with ESMTP id x9L8GtY5015611; Mon, 21 Oct 2019 17:16:55 +0900 Received: from mail01b.kamome.nec.co.jp (mail01b.kamome.nec.co.jp [10.25.43.2]) by mailsv02.nec.co.jp (8.15.1/8.15.1) with ESMTP id x9L8EIaa018392; Mon, 21 Oct 2019 17:16:55 +0900 Received: from bpxc99gp.gisp.nec.co.jp ([10.38.151.147] [10.38.151.147]) by mail03.kamome.nec.co.jp with ESMTP id BT-MMP-79913; Mon, 21 Oct 2019 16:45:35 +0900 Received: from BPXM23GP.gisp.nec.co.jp ([10.38.151.215]) by BPXC19GP.gisp.nec.co.jp ([10.38.151.147]) with mapi id 14.03.0439.000; Mon, 21 Oct 2019 16:45:34 +0900 From: Naoya Horiguchi To: Oscar Salvador CC: "mhocko@kernel.org" , "mike.kravetz@oracle.com" , "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" Subject: Re: [RFC PATCH v2 10/16] mm,hwpoison: Rework soft offline for free pages Thread-Topic: [RFC PATCH v2 10/16] mm,hwpoison: Rework soft offline for free pages Thread-Index: AQHVhPYxuKrdanK4Pk2pyFsBlimO86dkJj2A Date: Mon, 21 Oct 2019 07:45:33 +0000 Message-ID: <20191021074533.GA10507@hori.linux.bs1.fc.nec.co.jp> References: <20191017142123.24245-1-osalvador@suse.de> <20191017142123.24245-11-osalvador@suse.de> In-Reply-To: <20191017142123.24245-11-osalvador@suse.de> Accept-Language: en-US, ja-JP Content-Language: ja-JP X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.34.125.96] Content-Type: text/plain; charset="iso-2022-jp" Content-ID: Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-TM-AS-MML: disable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Oct 17, 2019 at 04:21:17PM +0200, Oscar Salvador wrote: > When trying to soft-offline a free page, we need to first take it off > the buddy allocator. > Once we know is out of reach, we can safely flag it as poisoned. >=20 > take_page_off_buddy will be used to take a page meant to be poisoned > off the buddy allocator. > take_page_off_buddy calls break_down_buddy_pages, which splits a > higher-order page in case our page belongs to one. >=20 > Once the page is under our control, we call page_set_poison to set it I guess you mean page_handle_poison here. > as poisoned and grab a refcount on it. >=20 > Signed-off-by: Oscar Salvador > --- > mm/memory-failure.c | 20 +++++++++++----- > mm/page_alloc.c | 68 +++++++++++++++++++++++++++++++++++++++++++++++= ++++++ > 2 files changed, 82 insertions(+), 6 deletions(-) >=20 > diff --git a/mm/memory-failure.c b/mm/memory-failure.c > index 37b230b8cfe7..1d986580522d 100644 > --- a/mm/memory-failure.c > +++ b/mm/memory-failure.c > @@ -78,6 +78,15 @@ EXPORT_SYMBOL_GPL(hwpoison_filter_dev_minor); > EXPORT_SYMBOL_GPL(hwpoison_filter_flags_mask); > EXPORT_SYMBOL_GPL(hwpoison_filter_flags_value); > =20 > +extern bool take_page_off_buddy(struct page *page); > + > +static void page_handle_poison(struct page *page) hwpoison is a separate idea from page poisoning, so maybe I think it's better to be named like page_handle_hwpoison(). > +{ > + SetPageHWPoison(page); > + page_ref_inc(page); > + num_poisoned_pages_inc(); > +} > + > static int hwpoison_filter_dev(struct page *p) > { > struct address_space *mapping; > @@ -1830,14 +1839,13 @@ static int soft_offline_in_use_page(struct page *= page) > =20 > static int soft_offline_free_page(struct page *page) > { > - int rc =3D dissolve_free_huge_page(page); > + int rc =3D -EBUSY; > =20 > - if (!rc) { > - if (set_hwpoison_free_buddy_page(page)) > - num_poisoned_pages_inc(); > - else > - rc =3D -EBUSY; > + if (!dissolve_free_huge_page(page) && take_page_off_buddy(page)) { > + page_handle_poison(page); > + rc =3D 0; > } > + > return rc; > } > =20 > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index cd1dd0712624..255df0c76a40 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -8632,6 +8632,74 @@ bool is_free_buddy_page(struct page *page) > =20 > #ifdef CONFIG_MEMORY_FAILURE > /* > + * Break down a higher-order page in sub-pages, and keep our target out = of > + * buddy allocator. > + */ > +static void break_down_buddy_pages(struct zone *zone, struct page *page, > + struct page *target, int low, int high, > + struct free_area *area, int migratetype) > +{ > + unsigned long size =3D 1 << high; > + struct page *current_buddy, *next_page; > + > + while (high > low) { > + area--; > + high--; > + size >>=3D 1; > + > + if (target >=3D &page[size]) { > + next_page =3D page + size; > + current_buddy =3D page; > + } else { > + next_page =3D page; > + current_buddy =3D page + size; > + } > + > + if (set_page_guard(zone, current_buddy, high, migratetype)) > + continue; > + > + if (current_buddy !=3D target) { > + add_to_free_area(current_buddy, area, migratetype); > + set_page_order(current_buddy, high); > + page =3D next_page; > + } > + } > +} > + > +/* > + * Take a page that will be marked as poisoned off the buddy allocator. > + */ > +bool take_page_off_buddy(struct page *page) > + { > + struct zone *zone =3D page_zone(page); > + unsigned long pfn =3D page_to_pfn(page); > + unsigned long flags; > + unsigned int order; > + bool ret =3D false; > + > + spin_lock_irqsave(&zone->lock, flags); > + for (order =3D 0; order < MAX_ORDER; order++) { > + struct page *page_head =3D page - (pfn & ((1 << order) - 1)); > + int buddy_order =3D page_order(page_head); > + struct free_area *area =3D &(zone->free_area[buddy_order]); > + > + if (PageBuddy(page_head) && buddy_order >=3D order) { > + unsigned long pfn_head =3D page_to_pfn(page_head); > + int migratetype =3D get_pfnblock_migratetype(page_head, > + pfn_head); > + > + del_page_from_free_area(page_head, area); > + break_down_buddy_pages(zone, page_head, page, 0, > + buddy_order, area, migratetype); > + ret =3D true; > + break; indent with whitespace? And you can find a few more coding style warning with checkpatch.pl. BTW, if we consider to make unpoison mechanism to keep up with the new semantics, we will need the reverse operation of take_page_off_buddy(). Do you think that that part will come with a separate work? Thanks, Naoya Horiguchi > + } > + } > + spin_unlock_irqrestore(&zone->lock, flags); > + return ret; > + } > + > +/* > * Set PG_hwpoison flag if a given page is confirmed to be a free page. = This > * test is performed under the zone lock to prevent a race against page > * allocation. > --=20 > 2.12.3 >=20 > =