From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B7DDEC6FD1C for ; Sat, 25 Mar 2023 00:39:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CED666B0071; Fri, 24 Mar 2023 20:39:47 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C9B206B0074; Fri, 24 Mar 2023 20:39:47 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B62FA6B0075; Fri, 24 Mar 2023 20:39:47 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id A61F76B0071 for ; Fri, 24 Mar 2023 20:39:47 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 480CFC0B8C for ; Sat, 25 Mar 2023 00:39:47 +0000 (UTC) X-FDA: 80605562814.14.02DCA19 Received: from mail-yb1-f171.google.com (mail-yb1-f171.google.com [209.85.219.171]) by imf27.hostedemail.com (Postfix) with ESMTP id 754AF40002 for ; Sat, 25 Mar 2023 00:39:44 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=FVtzEcOQ; spf=pass (imf27.hostedemail.com: domain of hughd@google.com designates 209.85.219.171 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1679704784; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=P5Q9472ukul+OPJ9R5Qb2M4fFZVKFxzRt/aOK/lgHfo=; b=ccv0cGKG2sS6RX1FNF42d4jPhMoJVyK4QN0KvT4z2hMlsms/oTdwW/ngykx+EkXZG00szh XYCbiFA6KLFFl9TFNM40WWo0Nl1iu34lwmcZskcFovrzml+AmHwN/meazUAOrYgRTvcExL j5mP+97HmEZeYiGQjzZJNYa76sQqUig= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=FVtzEcOQ; spf=pass (imf27.hostedemail.com: domain of hughd@google.com designates 209.85.219.171 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1679704784; a=rsa-sha256; cv=none; b=utQfI+yRCsjlIptFgkXU6dPVotTZegMswsnH/Kr6ALSMjbdNsObueeNCRGB5lKnkvz4uYr LJby0j7XhVPLx2SGTWWemU54AUzfJVIDGRO3tGJ6+4goQyzNPyeCaZnSdOGe7m8qtMdGlK th4h8itTfGyfuERBY65mBaECEBf1uTE= Received: by mail-yb1-f171.google.com with SMTP id p204so4266628ybc.12 for ; Fri, 24 Mar 2023 17:39:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1679704783; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=P5Q9472ukul+OPJ9R5Qb2M4fFZVKFxzRt/aOK/lgHfo=; b=FVtzEcOQ0JHgsDk1t0mxSPruI51GIqD+5rTIiioVUqofN8dsTS7BLO9MxGADvf//MK ZsLciBABh0FNGiICsJy6E9OMknphcj8oFtgvSguBjgmpAMNHKACYjPiy0YHRq3qfOkYc Qv3v+XszA1a/7vew+dBip4QRinoTATp8H7XaYEr5uwsGfW28zwT5vIs2i55H4lt6rdM9 JdKdBR+VVNNoL4AdYoEF7xxNLwWD8InAC56BkqxVBPn2HoTnjm6qvtVS4QZ9kqRV3C6W QzK3Qi3r46I8HJgwgbUFuH02JujfNXZmQBf5EEKZiPix/Y3KyXfJtAxMvhJ31JG0xRjo zlDw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679704783; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=P5Q9472ukul+OPJ9R5Qb2M4fFZVKFxzRt/aOK/lgHfo=; b=7Gq3Hh5imQQ/Ndtg7bsznvNPN7Ttyanud2IJvKKpQlHpRFNGceORRBdqXC+kDkmz18 2BbLp25WDvfPlEH8b27uAtJmvSyO9gDcQjptZlqWK6aIWXNgQgrjQWMZDjW0shKMozu0 F1XlCasbkb6KJEFQzhDYdsbpnlylSdaBInsGT01dASRA6PmdisoiwNtXukKgWqC3WWIn tdKkvqSxUOicRfnh/yQWUfVwH01MjWP8MYdQjL4xJlvz/LkQ7oOHt7UFf4TSDqm9bg3+ XtcEvI3MuVvkNdU4cA1z7iaAzFLOWf0HBFrfrTrlzFFXFztBRpyZh9thpAu3eYkLPcuA qcwg== X-Gm-Message-State: AAQBX9fiVoSwBDwteLadlvW2bNdi0zJWoX7Gkmqv+xmiYczuxgFtSMPd xk54Sne3Vyj+T6RD+3O5N3/gtg== X-Google-Smtp-Source: AKy350azdKe3nc/i3urWfddjrQ6GC8r35i1wjFtwzlkNYnTgokX8dZlXTg/JrQRcuGBiyjWhsTVBIw== X-Received: by 2002:a05:6902:1201:b0:a81:789a:df3a with SMTP id s1-20020a056902120100b00a81789adf3amr5601931ybu.19.1679704783480; Fri, 24 Mar 2023 17:39:43 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id 194-20020a2505cb000000b00b7767ca7474sm795640ybf.17.2023.03.24.17.39.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Mar 2023 17:39:42 -0700 (PDT) Date: Fri, 24 Mar 2023 17:39:22 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Jiaqi Yan cc: Yang Shi , kirill.shutemov@linux.intel.com, kirill@shutemov.name, tongtiangen@huawei.com, tony.luck@intel.com, akpm@linux-foundation.org, naoya.horiguchi@nec.com, linmiaohe@huawei.com, linux-mm@kvack.org, osalvador@suse.de, wangkefeng.wang@huawei.com Subject: Re: [PATCH v10 3/3] mm/khugepaged: recover from poisoned file-backed memory In-Reply-To: Message-ID: <3731c8e-961c-7497-f7c9-5edf8c6ea793@google.com> References: <20230305065112.1932255-1-jiaqiyan@google.com> <20230305065112.1932255-4-jiaqiyan@google.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="-1463760895-1824986655-1679704781=:1659" X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 754AF40002 X-Rspam-User: X-Stat-Signature: 8sp8rp8k4hbzs4wachti95teqqrdc45r X-HE-Tag: 1679704784-199710 X-HE-Meta: U2FsdGVkX1+O/00bIZ5g/7r1bGrED+5JSeQS2T0+z4vok5YFIZXMUkgGc8vpk5Pq1YpWII3q5WAqQfz+lZMZe99mF3Fir1dWvyy2a/ddmKrkkdhZqeGjvvjDsuuwG/ljlwfG2NB2hwGcss43wOXPFOhdsbpplk+xhxZh2joVzHGfxudsI6CcVAHaoSo4WYAU5dZmT332N6MWLdAsn60ciFwdmmuk4/67jJG2cyInKnSEnoeLuecew1oQNxpjTSeCSE0bMhx/c6l2+h+yhc0m5hwtv2DOuANV35x7RMkQSq38L1YEmfvk9Y0GC2dDClhyDpeKr6iDxSYEJ1MHg98oej3WX44VbOK9Znjd1J17jOuS+gxSmyfTaEMMYXkaLKLrzAm9qcIngkhkcWWCaF0XdLluMcehBkXrMSoH3KOEe260i5QveXes3zPU//JMGk17vWP65OyYf9J7c89t7ikRoMS+XoEEweeEE1XdBzhVQu3kL7IxhAOpcsHVme2k3azDH2kHERXE5KgPOMHPs8o7LwKi+Hj+6oinbGBPK+N2j0VnNx5kgDLXDfOGTYHJQ9BisrMWSv5XB9PEj2Im0Bt+TBellBVwLeolGFrAd/mBcFVfUlKe0ByPmXQpEAsXRRnPtzZbQ6MEwqv+5iqRtH82705vgy8vReqa0AC9tQLldWBMPO12eF5a44F1uqXot+BPDAF62LFC+udiO7K52Xn3TFtHEek4dpHszNSIY1UOf9O0jg8ruLO/j4/vvZcD7uEKGkV+qp9YUTa51KRVZiHnlIwGnrlklX4z6Zm1CJrXAN1Ocop4okShnra6jPp+dagpgpYgsRh6Qk0CBvrawc07Q39SStxfFAXcYcYnH/TOTEQrGSr0AbzOg3pAsyrlQlT7kiLutcZQicVwaR94u7wuRQbO6nVofuxKPh8ia3OUpKvKLvtyMF89VAT/LK9enLBMzXdtP6lctZXu9DzNFy4 Cc8nVeqE 8jXHjLM6lTvvw8mQks9ghi1EWYxrBb3hxzrsEAaf1mP1hQv8w8lkuF7V5dkwApuRxaTxG7Cx/XgepT2aARumDtJx+KBhqHCfsIDxjy/aA1vDtg6J2DZ3gzVojTnLUwxQ0ndpt3WlzYDF/EHcBhN16bY36eXRZxDlGPCsbpM1vg89uEQ3lhNddxmqjEfe42TcZ3d5DawT3bB7QX/ORbp60reGzCn0MDze+OG1GzulmNU5qro/qVhKPIh3aNlFrxG/GX8FtlZ0Li7O07/lp3C/0L62Sh2xaL9Xzp3twV7By3jl42lWxxnvVprBekpaDg+mw82nJPhxbBtFRVRQbDl95SY8Mo2pA3ivRhFa9ch8QFDncMBQk/jB7OGiD0DWC3D6d7mV++S1dY36Bvla7KkvaZlEVHtpZDyjNlHx0JZGwk4c4M0MZazsQfhhIHkiLj1VhESHSnRPsyXx+giZFhshE6ixGwuXTNFn9Ot7bNYcs4wsyPHCZikA8pCEZEg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. ---1463760895-1824986655-1679704781=:1659 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: QUOTED-PRINTABLE On Fri, 24 Mar 2023, Jiaqi Yan wrote: > On Fri, Mar 24, 2023 at 2:15=E2=80=AFPM Yang Shi wr= ote: > > On Sat, Mar 4, 2023 at 10:51=E2=80=AFPM Jiaqi Yan = wrote: > > > > > > Make collapse_file roll back when copying pages failed. More concrete= ly: > > > - extract copying operations into a separate loop > > > - postpone the updates for nr_none until both scanning and copying > > > succeeded > > > - postpone joining small xarray entries until both scanning and copyi= ng > > > succeeded > > > - postpone the update operations to NR_XXX_THPS until both scanning a= nd > > > copying succeeded > > > - for non-SHMEM file, roll back filemap_nr_thps_inc if scan succeeded= but > > > copying failed > > > > > > Tested manually: > > > 0. Enable khugepaged on system under test. Mount tmpfs at /mnt/ramdis= k. > > > 1. Start a two-thread application. Each thread allocates a chunk of > > > non-huge memory buffer from /mnt/ramdisk. > > > 2. Pick 4 random buffer address (2 in each thread) and inject > > > uncorrectable memory errors at physical addresses. > > > 3. Signal both threads to make their memory buffer collapsible, i.e. > > > calling madvise(MADV_HUGEPAGE). > > > 4. Wait and then check kernel log: khugepaged is able to recover from > > > poisoned pages by skipping them. > > > 5. Signal both threads to inspect their buffer contents and make sure= no > > > data corruption. > > > > > > Signed-off-by: Jiaqi Yan > > > > Reviewed-by: Yang Shi > > > > Just a nit below: Acked-by: Hugh Dickins with a little nit from me below, if you are respinning: > > > > > --- > > > mm/khugepaged.c | 78 ++++++++++++++++++++++++++++++-----------------= -- > > > 1 file changed, 48 insertions(+), 30 deletions(-) > > > > > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c > > > index c3c217f6ebc6e..3ea2aa55c2c52 100644 > > > --- a/mm/khugepaged.c > > > +++ b/mm/khugepaged.c > > > @@ -1890,6 +1890,9 @@ static int collapse_file(struct mm_struct *mm, = unsigned long addr, > > > { > > > struct address_space *mapping =3D file->f_mapping; > > > struct page *hpage; > > > + struct page *page; > > > + struct page *tmp; > > > + struct folio *folio; > > > pgoff_t index =3D 0, end =3D start + HPAGE_PMD_NR; > > > LIST_HEAD(pagelist); > > > XA_STATE_ORDER(xas, &mapping->i_pages, start, HPAGE_PMD_ORDER= ); > > > @@ -1934,8 +1937,7 @@ static int collapse_file(struct mm_struct *mm, = unsigned long addr, > > > > > > xas_set(&xas, start); > > > for (index =3D start; index < end; index++) { > > > - struct page *page =3D xas_next(&xas); > > > - struct folio *folio; > > > + page =3D xas_next(&xas); > > > > > > VM_BUG_ON(index !=3D xas.xa_index); > > > if (is_shmem) { > > > @@ -2117,10 +2119,7 @@ static int collapse_file(struct mm_struct *mm,= unsigned long addr, > > > } > > > nr =3D thp_nr_pages(hpage); > > > > > > - if (is_shmem) > > > - __mod_lruvec_page_state(hpage, NR_SHMEM_THPS, nr); > > > - else { > > > - __mod_lruvec_page_state(hpage, NR_FILE_THPS, nr); > > > + if (!is_shmem) { > > > filemap_nr_thps_inc(mapping); > > > /* > > > * Paired with smp_mb() in do_dentry_open() to ensure That "nr =3D thp_nr_pages(hpage);" above becomes stranded a long way away from where "nr" is actually used for updating those statistics: please move it down with them. (I see "nr" is also reported in the tracepoint at the end, FWIW, so maybe that will show "0" in more failure cases than it used to, but that's okay - it has been decently initialized.) Thanks, Hugh ---1463760895-1824986655-1679704781=:1659--