From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2E577C433EF for ; Tue, 5 Apr 2022 20:46:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 44B816B0072; Tue, 5 Apr 2022 16:46:33 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3D2D86B0073; Tue, 5 Apr 2022 16:46:33 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 225546B0074; Tue, 5 Apr 2022 16:46:33 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.25]) by kanga.kvack.org (Postfix) with ESMTP id 0BC8E6B0072 for ; Tue, 5 Apr 2022 16:46:33 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id CF60C24677 for ; Tue, 5 Apr 2022 20:46:22 +0000 (UTC) X-FDA: 79324008204.08.FFA886B Received: from mail-lf1-f45.google.com (mail-lf1-f45.google.com [209.85.167.45]) by imf12.hostedemail.com (Postfix) with ESMTP id 30AC140032 for ; Tue, 5 Apr 2022 20:46:21 +0000 (UTC) Received: by mail-lf1-f45.google.com with SMTP id h7so590484lfl.2 for ; Tue, 05 Apr 2022 13:46:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=E8S1WMTKEghD7RleOp1ljbbbA439xNOU+5AYO01idbM=; b=P8nVmFqf5o9Px/cZNEHRwUlCkU6ZdAf0OS/DVp4yXD/J++LJx/+gh516jLPreM0JYi Iow1v/v2JKXd5J9HxnL7haxgQZWkeN2GhedjSpN3ZCMYRTo9i7i2P/r/QDmDyB2czygP Y0fCA1XdWUmyYtp3k2x7JhDzKp3w4BNE1SMSO+4AfPRIq2l0EqlX3hwV0GJxedJyr5gz Zi8RiFvc0Ma127J+1v4IyVaA3aRm8qbFRZ/8IKiGI4Zhk6KvlB5I53vOjCWY9SKY0Vhp fbqJcJv70lS4909uIyDtGAWBggwfyprTgKA7s9VGSJukYaY4sFBrs6nM9hjwimhUW7K1 NgTw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=E8S1WMTKEghD7RleOp1ljbbbA439xNOU+5AYO01idbM=; b=WfAE9e+a69J3bj2/tJymZOgQQ67t37LL3c3NqQD3iuCMAzADQ0HBlMGBKCThcc8hH8 UHKC5Fk+ukf4jJ61h89aStTbESQLzSH/Gc+2HOEtiPkiPJ82tp6n02z2u3eIEuCUSHl6 duUoDN+eo7pJpJNULddQ+u6tpGMCbRqfyIIeGw5bcLls80AkKg0tgRF6Y2Putc+i1GYo 77E+fIU4rgHe/vmfblbqk/gVpyydam/XXxv3f7lt9xXNnLqJqc1auj9Cq3b7Jpm6qLPz qty2f1YnPCa3KElOjE7MtxJRY+qY/BluZjh7im14IHYlTgtQuJHNJMglb+4sSBv5IWay DIYQ== X-Gm-Message-State: AOAM533E4uT2/oo+IEwGheQ8pTF44DCVxzGJpVC+0Ix0ppUk9qySRlS9 Y4zo/KBUrqwHo64LSmzzr2YH9fKL+UfCYboZCx8btg== X-Google-Smtp-Source: ABdhPJxV6SNd+18cL2QtjsXZQ1hCD41mfj6DIkJ9/X4WPjKqe3M9kapxifVnsccsTpW347qNnc1h4wPTJeBbXA9QDg0= X-Received: by 2002:a05:6512:3404:b0:44a:310f:72f7 with SMTP id i4-20020a056512340400b0044a310f72f7mr3907294lfr.47.1649191580222; Tue, 05 Apr 2022 13:46:20 -0700 (PDT) MIME-Version: 1.0 References: <20220323232929.3035443-1-jiaqiyan@google.com> <20220323232929.3035443-3-jiaqiyan@google.com> <484e856c-9a57-1696-8a23-75967ee7c291@huawei.com> In-Reply-To: <484e856c-9a57-1696-8a23-75967ee7c291@huawei.com> From: Jiaqi Yan Date: Tue, 5 Apr 2022 13:46:08 -0700 Message-ID: Subject: Re: [RFC v1 2/2] mm: khugepaged: recover from poisoned file-backed memory To: Tong Tiangen Cc: Yang Shi , "Luck, Tony" , =?UTF-8?B?SE9SSUdVQ0hJIE5BT1lBKOWggOWPoyDnm7TkuZ8p?= , "Kirill A. Shutemov" , Miaohe Lin , Jue Wang , Linux MM Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 30AC140032 X-Stat-Signature: 7fynr6m5ku64g4ix1mqcbo3sk1jk9x5o X-Rspam-User: Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=P8nVmFqf; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf12.hostedemail.com: domain of jiaqiyan@google.com designates 209.85.167.45 as permitted sender) smtp.mailfrom=jiaqiyan@google.com X-HE-Tag: 1649191581-187005 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Mar 28, 2022 at 9:02 PM Tong Tiangen wrote= : > > > > =E5=9C=A8 2022/3/24 7:29, Jiaqi Yan =E5=86=99=E9=81=93: > > Make collapse_file roll back when copying pages failed. > > More concretely: > > * extract copy operations into a separate loop > > * postpone the updates for nr_none until both scan and copy succeeded > > * postpone joining small xarray entries until both scan and copy > > succeeded > > * as for update operations to NR_XXX_THPS > > * for SHMEM file, postpone until both scan and copy succeeded > > * for other file, roll back if scan succeeded but copy failed > > > > Signed-off-by: Jiaqi Yan > > --- > > include/linux/highmem.h | 18 ++++++++++ > > mm/khugepaged.c | 75 +++++++++++++++++++++++++++-------------= - > > 2 files changed, 67 insertions(+), 26 deletions(-) > > > > diff --git a/include/linux/highmem.h b/include/linux/highmem.h > > index 15d0aa4d349c..fc5aa221bdb5 100644 > > --- a/include/linux/highmem.h > > +++ b/include/linux/highmem.h > > @@ -315,6 +315,24 @@ static inline void copy_highpage(struct page *to, = struct page *from) > > kunmap_local(vfrom); > > } > > > > +/* > > + * Machine check exception handled version of copy_highpage. > > + * Return true if copying page content failed; otherwise false. > > + */ > > +static inline bool copy_highpage_mc(struct page *to, struct page *from= ) > > +{ > > + char *vfrom, *vto; > > + unsigned long ret; > > + > > + vfrom =3D kmap_local_page(from); > > + vto =3D kmap_local_page(to); > > + ret =3D copy_mc_to_kernel(vto, vfrom, PAGE_SIZE); > > + kunmap_local(vto); > > + kunmap_local(vfrom); > > + > > + return ret > 0; > > +} > > + > > #endif > > > > static inline void memcpy_page(struct page *dst_page, size_t dst_off, > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c > > index 84ed177f56ff..ed2b1cd4bbc6 100644 > > --- a/mm/khugepaged.c > > +++ b/mm/khugepaged.c > > @@ -1708,12 +1708,13 @@ static void collapse_file(struct mm_struct *mm, > > { > > struct address_space *mapping =3D file->f_mapping; > > gfp_t gfp; > > - struct page *new_page; > > + struct page *new_page, *page, *tmp; > > pgoff_t index, end =3D start + HPAGE_PMD_NR; > > LIST_HEAD(pagelist); > > XA_STATE_ORDER(xas, &mapping->i_pages, start, HPAGE_PMD_ORDER); > > int nr_none =3D 0, result =3D SCAN_SUCCEED; > > bool is_shmem =3D shmem_file(file); > > + bool copy_failed =3D false; > > int nr; > > > > VM_BUG_ON(!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && !is_shmem); > > @@ -1936,9 +1937,7 @@ static void collapse_file(struct mm_struct *mm, > > } > > nr =3D thp_nr_pages(new_page); > > > > - if (is_shmem) > > - __mod_lruvec_page_state(new_page, NR_SHMEM_THPS, nr); > > - else { > > + if (!is_shmem) { > > __mod_lruvec_page_state(new_page, NR_FILE_THPS, nr); > > filemap_nr_thps_inc(mapping); > > /* > > @@ -1956,34 +1955,39 @@ static void collapse_file(struct mm_struct *mm, > > } > > } > > > > - if (nr_none) { > > - __mod_lruvec_page_state(new_page, NR_FILE_PAGES, nr_none)= ; > > - if (is_shmem) > > - __mod_lruvec_page_state(new_page, NR_SHMEM, nr_no= ne); > > - } > > - > > - /* Join all the small entries into a single multi-index entry */ > > - xas_set_order(&xas, start, HPAGE_PMD_ORDER); > > - xas_store(&xas, new_page); > > xa_locked: > > xas_unlock_irq(&xas); > > xa_unlocked: > > > > if (result =3D=3D SCAN_SUCCEED) { > > - struct page *page, *tmp; > > - > > /* > > * Replacing old pages with new one has succeeded, now we > > - * need to copy the content and free the old pages. > > + * attempt to copy the contents. > > */ > > index =3D start; > > - list_for_each_entry_safe(page, tmp, &pagelist, lru) { > > + list_for_each_entry(page, &pagelist, lru) { > > while (index < page->index) { > > clear_highpage(new_page + (index % HPAGE_= PMD_NR)); > > index++; > > } > > - copy_highpage(new_page + (page->index % HPAGE_PMD= _NR), > > - page); > > + if (copy_highpage_mc(new_page + (page->index % HP= AGE_PMD_NR), page)) { > > + copy_failed =3D true; > > The 1st patch here used "copy_succeed =3D false", It is best that the > logic of the two positions can be unified. copy_failed here will be eliminated once we have SCAN_COPY_MC defined in version 2. Version 2 also renames "copy_succeeded" in collapse_huge_page() to "copied", mimicking the "isolated" variable for __collapse_huge_page_isolate(). > > > + break; > > + } > > + index++; > > + } > > + while (!copy_failed && index < end) { > > + clear_highpage(new_page + (page->index % HPAGE_PM= D_NR)); > > + index++; > > + } > > + } > > + > > + if (result =3D=3D SCAN_SUCCEED && !copy_failed) { > > + /* > > + * Copying old pages to huge one has succeeded, now we > > + * need to free the old pages. > > + */ > > + list_for_each_entry_safe(page, tmp, &pagelist, lru) { > > list_del(&page->lru); > > page->mapping =3D NULL; > > page_ref_unfreeze(page, 1); > > @@ -1991,12 +1995,20 @@ static void collapse_file(struct mm_struct *mm, > > ClearPageUnevictable(page); > > unlock_page(page); > > put_page(page); > > - index++; > > } > > - while (index < end) { > > - clear_highpage(new_page + (index % HPAGE_PMD_NR))= ; > > - index++; > > + > > + xas_lock_irq(&xas); > > + if (is_shmem) > > + __mod_lruvec_page_state(new_page, NR_SHMEM_THPS, = nr); > > + if (nr_none) { > > + __mod_lruvec_page_state(new_page, NR_FILE_PAGES, = nr_none); > > + if (is_shmem) > > + __mod_lruvec_page_state(new_page, NR_SHME= M, nr_none); > > } > > + /* Join all the small entries into a single multi-index e= ntry. */ > > + xas_set_order(&xas, start, HPAGE_PMD_ORDER); > > + xas_store(&xas, new_page); > > + xas_unlock_irq(&xas); > > > > SetPageUptodate(new_page); > > page_ref_add(new_page, HPAGE_PMD_NR - 1); > > @@ -2012,9 +2024,11 @@ static void collapse_file(struct mm_struct *mm, > > > > khugepaged_pages_collapsed++; > > } else { > > - struct page *page; > > - > > - /* Something went wrong: roll back page cache changes */ > > + /* > > + * Something went wrong: > > + * either result !=3D SCAN_SUCCEED or copy_failed, > > + * roll back page cache changes > > + */ > > xas_lock_irq(&xas); > > mapping->nrpages -=3D nr_none; > > > > @@ -2047,6 +2061,15 @@ static void collapse_file(struct mm_struct *mm, > > xas_lock_irq(&xas); > > } > > VM_BUG_ON(nr_none); > > + /* > > + * Undo the updates of thp_nr_pages(new_page) for non-SHM= EM file, > > + * which is not updated yet for SHMEM file. > > + * These undos are not needed if result is not SCAN_SUCCE= ED. > > + */ > > + if (!is_shmem && result =3D=3D SCAN_SUCCEED) { > > + __mod_lruvec_page_state(new_page, NR_FILE_THPS, -= nr); > > + filemap_nr_thps_dec(mapping); > > + } > > xas_unlock_irq(&xas); > > > > new_page->mapping =3D NULL;