From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D1675C76196 for ; Tue, 28 Mar 2023 16:02:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5ADD86B0081; Tue, 28 Mar 2023 12:02:13 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 55DF36B0082; Tue, 28 Mar 2023 12:02:13 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 425B36B0083; Tue, 28 Mar 2023 12:02:13 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 340D76B0081 for ; Tue, 28 Mar 2023 12:02:13 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id E079BC0995 for ; Tue, 28 Mar 2023 16:02:12 +0000 (UTC) X-FDA: 80618773704.20.D76FA50 Received: from mail-pg1-f176.google.com (mail-pg1-f176.google.com [209.85.215.176]) by imf19.hostedemail.com (Postfix) with ESMTP id C9E0E1A0019 for ; Tue, 28 Mar 2023 16:02:10 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=As1s45y4; spf=pass (imf19.hostedemail.com: domain of shy828301@gmail.com designates 209.85.215.176 as permitted sender) smtp.mailfrom=shy828301@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1680019330; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=AgVWcZ1QlfGf5QGmf7lYWnM12UNOGNr69m9VRX058uI=; b=ojXXv50ZZug6LBwr8yyByPMkvHqO+yuG5Bt1kuiuFTtsaN8PRR01YS2fwfcsRAqJLa+TDZ xrKNxL87afLBNgYz7EbuMjV5TIoP/UR6qnzh2sn+VwfQNK4t79rhXyvHrpsVNPHfoBvYnV JObl9Y21aWrT1gyA+EyeY6qrAMcJWzU= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=As1s45y4; spf=pass (imf19.hostedemail.com: domain of shy828301@gmail.com designates 209.85.215.176 as permitted sender) smtp.mailfrom=shy828301@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1680019330; a=rsa-sha256; cv=none; b=fl8RbECMyO+hUK00BtHzYNpVPsEzkGOL2rv+R6G5OaiJoqT3jp1+qH6rgAw3v1CiLduAFI G+Fb8Nt/YBMFBNVPAbUswOlkJ3M55/Od0+LqURHiR/mjBIYF/R1X3MoKELBCx04pcv0VWH VZSk+4Jgz8QZTHLywe81jd6CERwVeps= Received: by mail-pg1-f176.google.com with SMTP id d22so7516912pgw.2 for ; Tue, 28 Mar 2023 09:02:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1680019329; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=AgVWcZ1QlfGf5QGmf7lYWnM12UNOGNr69m9VRX058uI=; b=As1s45y4Uev3R7FtICWQR9MhwLTcEuVFTkH6umJ8djaEtigL9WuqdWuTL02ogh7Gzb mKJRMNdJWfeFE1qEWK6ZxtZWYEvpWNlGlCRfkRskTQiKTaOz4tqxUdyK89LLbep5H/i7 4qkt4opOpoJK2Me9/Yg6AT9qyVXBVwif++dt50CTvL+yG/GaDyVcSRXHIVUu3zKRJScq qRGdOtKVTFYfv0jLsVGY6cj3kurdBIn2knaJHyoXL9B79rbbVr7MkV6cSKrGbYWoBK7B zy8O7IeCeqZz9qW/paVZPQJG/szzWOyoxQpmXKTAXKriUdh5R9s778uXSQjpe1C/v501 XsiA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680019329; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=AgVWcZ1QlfGf5QGmf7lYWnM12UNOGNr69m9VRX058uI=; b=pdUj8lRcoktoWXf82+I1VsLRY2l5wKOOYmakzLVxeoDgCxkx72JuitnaR+K/BqUHKf d/w7XO4y9DpAx8lDbTnywjJmYnAxqDdyBBaF7YGNYrmdyIX2r9D1BJ8ps5IiefDuqL9g 7aOxm7dCL0NZivZ3ijMHH5YdzEZhTsdqfiHbFttuThCPsSiaHyzvnjnpMMO+ESr6oQ8r Y4KeAKy0rh/Rpm+HKt1an+ZVfHoAtPB/Aqjc+pEXdJeZ1rTcvpGg2nh/X+MkNuajTLXF QGtTkI5pJI+poY/C71TZqjjt1Mj45kGmmIXNQBvFqoXL3wqJathXv1F76XacjR+wndYb N8qg== X-Gm-Message-State: AAQBX9dlDV4jp1Rn/PCsTR3UZVUopgmCIAaikox0l2WqjsDaxDA0Nej/ KHTcCaJyAvBeVZKhuK3JWL1tuLo8bZPmpXeZ0OQ= X-Google-Smtp-Source: AKy350YktqvULwFdcdGRSRRNlwdZRE74WFAaCxlDEUSdh59d/8lZ94tpNeUscDkflEJjtjeIXJdow4A//ER9AuXylEI= X-Received: by 2002:a05:6a00:1414:b0:625:96ce:f774 with SMTP id l20-20020a056a00141400b0062596cef774mr8464274pfu.0.1680019329463; Tue, 28 Mar 2023 09:02:09 -0700 (PDT) MIME-Version: 1.0 References: <20230327211548.462509-1-jiaqiyan@google.com> <20230327211548.462509-4-jiaqiyan@google.com> In-Reply-To: <20230327211548.462509-4-jiaqiyan@google.com> From: Yang Shi Date: Tue, 28 Mar 2023 09:01:57 -0700 Message-ID: Subject: Re: [PATCH v11 3/3] mm/khugepaged: recover from poisoned file-backed memory To: Jiaqi Yan Cc: kirill.shutemov@linux.intel.com, kirill@shutemov.name, tongtiangen@huawei.com, tony.luck@intel.com, naoya.horiguchi@nec.com, linmiaohe@huawei.com, linux-mm@kvack.org, akpm@linux-foundation.org, osalvador@suse.de, wangkefeng.wang@huawei.com, stevensd@chromium.org, hughd@google.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: C9E0E1A0019 X-Stat-Signature: ocswhhzz8key6oyg86k5ugkunntc68k5 X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1680019330-261122 X-HE-Meta: U2FsdGVkX1+j+YiA1fxMAnH+XC3sWUntpgJ1vLNXl1XATzBDUg2VxY4XTgSlEiFnQsw0aH41QpO6cE0rzpdBEgmUp4vKUwE88tNhR6029zS2f0wTRSC6PAzDvJiHggmrYl+3WIKNPlyzk9hCrA2DDz0RJcFS8TlqP1usRipoEIZ9tzSF5/DHmA/RWgXFDgzizaYciQ2i33JzWFWxHUkFHGCkL1NMsM6SKFBTG5KfCI8UKdk49d1nhedPRt6fm+n/bgmXG9a/Pscw0gy3RIf1SwvblRheBBJBri2hVcyfr1T9FvMm5y9VXbIr4SVJn2SCzpyWWFgvuJsaaDurfPzHTVpuqJKwOjVvpmPIZqtbKy7pk+fRYRiyiJhDvJSiIKYDHiXxDKv7GH9EdwsDRY0Mmm8FqylTXdhNbebQueO8/QTSxazZDfaKllw8EMAIu5iUk6HlbEB/+4fKTqKxsD4FmIEhCIQeGjM7/dU6M0B07I4UnOGCt8+N9U3LQT6rrDutv94EvAAV6Qs/uS8zdgsJXWO7nv3l0JdICimp9jsQQizXfbf9wgqpAjHKniJDMxv7u2ObYKxi2oNPJk3UqlzVeuu3Z4rqIPzMfBMFUotvIPwx+ntlO9pyBx1QC9Xqk9pOpKa/9bH+P3MBwkx/EvIjpd6YuoHDixh9cjQcYleAMBYtOzI6Tl2J71Mq6rr4ZyXOfo664FcpIGD1GovXQbBeITS7FFq43VbfFtcWQILHZ3/+lbuLUah5klbQibx0J2DRAXrFm+I9WeEl7Y3y+Q9bv7UC/1JTwR9IQIYXjzsnjBE3PZcgdpm1bVeAABRhNf7Q72cFV962rMsLyybAP4UxKFBemZGAEH2FrVWKnzTQq58oPs57mc4255BBCSD4YqX7C5j/xckBTdzVcHWQi179G/O3kyoU4QWkxhK8xzgFjc3kqoBbvtOx3sM7WIVfR/F+XEFl3tHQVWFWUgt+V9I VkL8NPOk Ca3NEAIZyN/xg7ViFq5OC0GaNNBdob1R+KsyZ8V4uRw9/U1a5ioJOpe0RdHKytqFNDYBXveBZXhWJ7oE70vkNGPlSQZnNpONqL+j7RdLYjPnILCHoSWc27iVmWEpB4gF+zEE0jvHZryfsZPjjOFMwEQxDCzo2b+WbBkfTMIYGh0mKIFqOsou2XQjpXAUoctJCp6Qge1121hfhvoJoz49dqpOcg7/sZYh/f+jAsxCkQ470RoIzyOgOQWWaDvgRHMUyNgIRpC8+2QML+/255Cmvu2fP3UJLIYE1bIgwEp7EnLN+DpwPeKppcpJHAOH2Nu3STokA98X3NYPqL/2jxFA7igF3u0T3B4z6D7XEAXVwkCj+/5NHmZ8IAzr5Oe4/Km4FT7/mOQ9A29iaCCFiK/ELz2yClmTA1hlJvLC7 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Mar 27, 2023 at 2:16=E2=80=AFPM Jiaqi Yan wro= te: > > Make collapse_file roll back when copying pages failed. More concretely: > - extract copying operations into a separate loop > - postpone the updates for nr_none until both scanning and copying > succeeded > - postpone joining small xarray entries until both scanning and copying > succeeded > - postpone the update operations to NR_XXX_THPS until both scanning and > copying succeeded > - for non-SHMEM file, roll back filemap_nr_thps_inc if scan succeeded but > copying failed > > Tested manually: > 0. Enable khugepaged on system under test. Mount tmpfs at /mnt/ramdisk. > 1. Start a two-thread application. Each thread allocates a chunk of > non-huge memory buffer from /mnt/ramdisk. > 2. Pick 4 random buffer address (2 in each thread) and inject > uncorrectable memory errors at physical addresses. > 3. Signal both threads to make their memory buffer collapsible, i.e. > calling madvise(MADV_HUGEPAGE). > 4. Wait and then check kernel log: khugepaged is able to recover from > poisoned pages by skipping them. > 5. Signal both threads to inspect their buffer contents and make sure no > data corruption. > > Signed-off-by: Jiaqi Yan Reviewed-by: Yang Shi A nit below: > --- > mm/khugepaged.c | 86 +++++++++++++++++++++++++++++++------------------ > 1 file changed, 54 insertions(+), 32 deletions(-) > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c > index bef68286345c8..38c1655ce0a9e 100644 > --- a/mm/khugepaged.c > +++ b/mm/khugepaged.c > @@ -1874,6 +1874,9 @@ static int collapse_file(struct mm_struct *mm, unsi= gned long addr, > { > struct address_space *mapping =3D file->f_mapping; > struct page *hpage; > + struct page *page; > + struct page *tmp; > + struct folio *folio; > pgoff_t index =3D 0, end =3D start + HPAGE_PMD_NR; > LIST_HEAD(pagelist); > XA_STATE_ORDER(xas, &mapping->i_pages, start, HPAGE_PMD_ORDER); > @@ -1918,8 +1921,7 @@ static int collapse_file(struct mm_struct *mm, unsi= gned long addr, > > xas_set(&xas, start); > for (index =3D start; index < end; index++) { > - struct page *page =3D xas_next(&xas); > - struct folio *folio; > + page =3D xas_next(&xas); > > VM_BUG_ON(index !=3D xas.xa_index); > if (is_shmem) { > @@ -2099,12 +2101,8 @@ static int collapse_file(struct mm_struct *mm, uns= igned long addr, > put_page(page); > goto xa_unlocked; > } > - nr =3D thp_nr_pages(hpage); > > - if (is_shmem) > - __mod_lruvec_page_state(hpage, NR_SHMEM_THPS, nr); > - else { > - __mod_lruvec_page_state(hpage, NR_FILE_THPS, nr); > + if (!is_shmem) { > filemap_nr_thps_inc(mapping); > /* > * Paired with smp_mb() in do_dentry_open() to ensure > @@ -2115,21 +2113,9 @@ static int collapse_file(struct mm_struct *mm, uns= igned long addr, > smp_mb(); > if (inode_is_open_for_write(mapping->host)) { > result =3D SCAN_FAIL; > - __mod_lruvec_page_state(hpage, NR_FILE_THPS, -nr)= ; > filemap_nr_thps_dec(mapping); > - goto xa_locked; > } > } > - > - if (nr_none) { > - __mod_lruvec_page_state(hpage, NR_FILE_PAGES, nr_none); > - /* nr_none is always 0 for non-shmem. */ > - __mod_lruvec_page_state(hpage, NR_SHMEM, nr_none); > - } > - > - /* Join all the small entries into a single multi-index entry */ > - xas_set_order(&xas, start, HPAGE_PMD_ORDER); > - xas_store(&xas, hpage); > xa_locked: > xas_unlock_irq(&xas); > xa_unlocked: > @@ -2142,21 +2128,36 @@ static int collapse_file(struct mm_struct *mm, un= signed long addr, > try_to_unmap_flush(); > > if (result =3D=3D SCAN_SUCCEED) { > - struct page *page, *tmp; > - struct folio *folio; > - > /* > * Replacing old pages with new one has succeeded, now we > - * need to copy the content and free the old pages. > + * attempt to copy the contents. > */ > index =3D start; > - list_for_each_entry_safe(page, tmp, &pagelist, lru) { > + list_for_each_entry(page, &pagelist, lru) { > while (index < page->index) { > clear_highpage(hpage + (index % HPAGE_PMD= _NR)); > index++; > } > - copy_highpage(hpage + (page->index % HPAGE_PMD_NR= ), > - page); > + if (copy_mc_highpage(hpage + (page->index % HPAGE= _PMD_NR), > + page) > 0) { > + result =3D SCAN_COPY_MC; > + break; > + } > + index++; > + } > + while (result =3D=3D SCAN_SUCCEED && index < end) { > + clear_highpage(hpage + (index % HPAGE_PMD_NR)); > + index++; > + } > + } > + > + nr =3D thp_nr_pages(hpage); > + if (result =3D=3D SCAN_SUCCEED) { > + /* > + * Copying old pages to huge one has succeeded, now we > + * need to free the old pages. > + */ > + list_for_each_entry_safe(page, tmp, &pagelist, lru) { > list_del(&page->lru); > page->mapping =3D NULL; > page_ref_unfreeze(page, 1); > @@ -2164,12 +2165,23 @@ static int collapse_file(struct mm_struct *mm, un= signed long addr, > ClearPageUnevictable(page); > unlock_page(page); > put_page(page); > - index++; > } > - while (index < end) { > - clear_highpage(hpage + (index % HPAGE_PMD_NR)); > - index++; > + > + xas_lock_irq(&xas); > + if (is_shmem) > + __mod_lruvec_page_state(hpage, NR_SHMEM_THPS, nr)= ; > + else > + __mod_lruvec_page_state(hpage, NR_FILE_THPS, nr); > + > + if (nr_none) { > + __mod_lruvec_page_state(hpage, NR_FILE_PAGES, nr_= none); > + /* nr_none is always 0 for non-shmem. */ > + __mod_lruvec_page_state(hpage, NR_SHMEM, nr_none)= ; > } > + /* Join all the small entries into a single multi-index e= ntry. */ > + xas_set_order(&xas, start, HPAGE_PMD_ORDER); > + xas_store(&xas, hpage); > + xas_unlock_irq(&xas); > > folio =3D page_folio(hpage); > folio_mark_uptodate(folio); > @@ -2187,8 +2199,6 @@ static int collapse_file(struct mm_struct *mm, unsi= gned long addr, > unlock_page(hpage); > hpage =3D NULL; > } else { > - struct page *page; > - > /* Something went wrong: roll back page cache changes */ > xas_lock_irq(&xas); > if (nr_none) { > @@ -2222,6 +2232,18 @@ static int collapse_file(struct mm_struct *mm, uns= igned long addr, > xas_lock_irq(&xas); > } > VM_BUG_ON(nr_none); > + /* > + * Undo the updates of filemap_nr_thps_inc for non-SHMEM > + * file only. This undo is not needed unless failure is > + * due to SCAN_COPY_MC. > + * > + * Paired with smp_mb() in do_dentry_open() to ensure the > + * update to nr_thps is visible. > + */ > + smp_mb(); > + if (!is_shmem && result =3D=3D SCAN_COPY_MC) > + filemap_nr_thps_dec(mapping); I think the memory barrier should be after the dec. > + > xas_unlock_irq(&xas); > > hpage->mapping =3D NULL; > -- > 2.40.0.348.gf938b09366-goog >