From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 27FFEC433EF for ; Tue, 24 May 2022 16:32:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BA1418D0002; Tue, 24 May 2022 12:32:27 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B292E8D0001; Tue, 24 May 2022 12:32:27 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A12428D0002; Tue, 24 May 2022 12:32:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 8D5C68D0001 for ; Tue, 24 May 2022 12:32:27 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay13.hostedemail.com (Postfix) with ESMTP id 5D9586023B for ; Tue, 24 May 2022 16:32:27 +0000 (UTC) X-FDA: 79501179534.16.3241DAC Received: from mail-lf1-f48.google.com (mail-lf1-f48.google.com [209.85.167.48]) by imf04.hostedemail.com (Postfix) with ESMTP id C85324002A for ; Tue, 24 May 2022 16:32:10 +0000 (UTC) Received: by mail-lf1-f48.google.com with SMTP id br17so19418422lfb.2 for ; Tue, 24 May 2022 09:32:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=ZjwngPTZMswUYbLpmcB3cgrD2HFnvwqPgviJzpmRvwg=; b=dPay97JtlFiN0t0B7xZKnmejYbf2lKbmxsHKE6p0IVp99g4lYMryj5D7J5ba+4sjIY MtE7rNxfjXacFanbuXVJJkSqiSgJhk9M6Nca7rAQDi9BaDilosh+tEbSsAnL19nrn+cU VGY8YxwKKEiCKqiDBLywfsxcecZjY3plbakOXTdduFKFn0VVQUDy5coAfnZAfotvqZPs Ugv2WKaK74NdczGkPN4IgCB43mrecabSRGhY5lHbBi45BRIbQom1zAMR5c4r2ySfaL8K wS+zjpX+sNwTclM3MuUtGih832JiWYqA2JRJ4lQCe4AbawPTezqfA72v/BLEftikKsl/ ek9g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=ZjwngPTZMswUYbLpmcB3cgrD2HFnvwqPgviJzpmRvwg=; b=f86hJlUoHKzMvKwLy96xAq7Jw7vS4GOMnDee3rtald1qTeSsE8OXIXrWdMrMjd5c1Z Ac5Yk5fQu8YjwZEyaAgJXxm6H8iDxptrN8/WzsL6cO1Nq4PIzRL04hNcuKT/mBbiRidy bNOhtwS3DzwnZBThUOYXgVVj3YsDJu19Ffy1mUyEbP+usB1hj6whYqBkxDlnXREzEQ2M XfP/bDOVWiqjR9uz+a1UwJTwfUKxSIKRUkM5GCy8Qw3eyYEq5TaNXrmy2k3B1NvQ9Ggc 1PiXJagqYukXyGG4noL5+uY/mqAurE63x/AJmrtOHb2SSzHInND+X0QwgjzNRAEXQOGy txkw== X-Gm-Message-State: AOAM531nGZfvFPdYSgM5p3rD2w4DQ7FLAUyhlZeuNl19WWBLBvt3dfEY EhAhXPd1wdLELUoCe2ZXFLK53IWs0wfbO+8jodxQUw== X-Google-Smtp-Source: ABdhPJz5uwuROtOq+a4RJu5lfLBvD183a2SyIIta83dNl1RWKD3oBa9xhT3S8kxR/53c2IdSETKG1Z8yaGrbTNAObJM= X-Received: by 2002:a19:5e16:0:b0:478:6d8e:ced2 with SMTP id s22-20020a195e16000000b004786d8eced2mr8433192lfb.152.1653409944714; Tue, 24 May 2022 09:32:24 -0700 (PDT) MIME-Version: 1.0 References: <20220524025352.1381911-1-jiaqiyan@google.com> <20220524025352.1381911-2-jiaqiyan@google.com> In-Reply-To: From: Jiaqi Yan Date: Tue, 24 May 2022 09:32:13 -0700 Message-ID: Subject: Re: [PATCH v3 1/2] mm: khugepaged: recover from poisoned anonymous memory To: Jue Wang Cc: Oscar Salvador , Yang Shi , Tong Tiangen , "Luck, Tony" , =?UTF-8?B?SE9SSUdVQ0hJIE5BT1lBKOWggOWPoyDnm7TkuZ8p?= , "Kirill A. Shutemov" , Miaohe Lin , Linux MM Content-Type: text/plain; charset="UTF-8" Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=dPay97Jt; spf=pass (imf04.hostedemail.com: domain of jiaqiyan@google.com designates 209.85.167.48 as permitted sender) smtp.mailfrom=jiaqiyan@google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspam-User: X-Stat-Signature: g7bwfu8mc1kst9wuucjiojmusyi7n5a1 X-Rspamd-Queue-Id: C85324002A X-Rspamd-Server: rspam01 X-HE-Tag: 1653409930-368259 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000004, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, May 24, 2022 at 7:05 AM Jue Wang wrote: > > On Tue, May 24, 2022 at 3:48 AM Oscar Salvador wrote: > > > > On Mon, May 23, 2022 at 07:53:51PM -0700, Jiaqi Yan wrote: > > > Make __collapse_huge_page_copy return whether > > > collapsing/copying anonymous pages succeeded, > > > and make collapse_huge_page handle the return status. > > > > > > Break existing PTE scan loop into two for-loops. > > > The first loop copies source pages into target huge page, > > > and can fail gracefully when running into memory errors in > > > source pages. Roll back the page table and page states > > > when copying failed: > > > 1) re-establish the PTEs-to-PMD connection. > > > 2) release pages back to their LRU list. > > > > If you spell out what the first loop does, just tell > > what the second loop does as wel, it just gets easier. > > Thanks for the suggestion. I will amend the commit msg in the next version as follows: If copying all pages succeeds, the second loop releases and clears up these normal pages. Otherwise, the second loop does the followings to roll back the page table and page states: 1) re-establish the original PTEs-to-PMD connection. 2) release source pages back to their LRU list. > > > > +static bool __collapse_huge_page_copy(pte_t *pte, > > > + struct page *page, > > > + pmd_t *pmd, > > > + pmd_t rollback, > > > + struct vm_area_struct *vma, > > > + unsigned long address, > > > + spinlock_t *pte_ptl, > > > + struct list_head *compound_pagelist) > > > { > > > struct page *src_page, *tmp; > > > pte_t *_pte; > > > - for (_pte = pte; _pte < pte + HPAGE_PMD_NR; > > > - _pte++, page++, address += PAGE_SIZE) { > > > - pte_t pteval = *_pte; > > > + pte_t pteval; > > > + unsigned long _address; > > > + spinlock_t *pmd_ptl; > > > + bool copy_succeeded = true; > > > > > > - if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) { > > > + /* > > > + * Copying pages' contents is subject to memory poison at any iteration. > > > + */ > > > + for (_pte = pte, _address = address; > > > + _pte < pte + HPAGE_PMD_NR; > > > + _pte++, page++, _address += PAGE_SIZE) { > > > + pteval = *_pte; > > > + > > > + if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) > > > clear_user_highpage(page, address); > > > - add_mm_counter(vma->vm_mm, MM_ANONPAGES, 1); > > > - if (is_zero_pfn(pte_pfn(pteval))) { > > > - /* > > > - * ptl mostly unnecessary. > > > - */ > > > - spin_lock(ptl); > > > - ptep_clear(vma->vm_mm, address, _pte); > > > - spin_unlock(ptl); > > > + else { > > > + src_page = pte_page(pteval); > > > + if (copy_highpage_mc(page, src_page)) { > > > + copy_succeeded = false; > > > + trace_mm_collapse_huge_page_copy(pte_page(*pte), > > > + src_page, SCAN_COPY_MC); > > > > You seem to assume that if there is an error, it will always happen on > > the page we are copying from. What if the page we are copying to is the > > fauly one? Can that happen? Can that be detected by copy_mc_to_kernel? > > And if so, can that be differentiated? > It's possible that the copy to page has some uncorrectable memory errors. > > Yet only read transactions signals machine check exceptions while > write transactions do not. > > It's the reading from the source page with uncorrectable errors > causing system panics most (since khugepaged is a kernel context > thread). > > > Thanks, > -Jue > > > > -- > > Oscar Salvador > > SUSE Labs