From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2FB60C46467 for ; Thu, 19 Jan 2023 15:10:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B20856B0075; Thu, 19 Jan 2023 10:10:18 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id AD0D76B0078; Thu, 19 Jan 2023 10:10:18 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 999C26B007B; Thu, 19 Jan 2023 10:10:18 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 8967C6B0075 for ; Thu, 19 Jan 2023 10:10:18 -0500 (EST) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 5D9541409E6 for ; Thu, 19 Jan 2023 15:10:18 +0000 (UTC) X-FDA: 80371884516.21.F7D74C0 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by imf09.hostedemail.com (Postfix) with ESMTP id 236AD14001C for ; Thu, 19 Jan 2023 15:10:15 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=fYLKiyUA; spf=none (imf09.hostedemail.com: domain of kirill.shutemov@linux.intel.com has no SPF policy when checking 192.55.52.115) smtp.mailfrom=kirill.shutemov@linux.intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1674141016; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=yuYGh75NmKMkTE+kOgyqB2S3t/qE1KFDS+5GcZDasvU=; b=TfjUqmLHwM9thy2DI5rjCKBSqQbO/mB746cUJ4gpu/cQ7BQfntu6+WGWksJuhzwKO23CzG nE6SRXh6isig1IqrowhWZTrcfVwIaLE3OBD3f2S0TK072fjOgwnJri4T+lew+w9TbMX2JS Z25pKTxiM7A6eF0LkVHWQ7dYWCjMvDA= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=fYLKiyUA; spf=none (imf09.hostedemail.com: domain of kirill.shutemov@linux.intel.com has no SPF policy when checking 192.55.52.115) smtp.mailfrom=kirill.shutemov@linux.intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1674141016; a=rsa-sha256; cv=none; b=BPNqB667KGutJuEN3flY5KIZKaNZ45I6bXqeeFMMR+zov6rXWEDDLEQjopItAHNQ3PFch3 PIk6ZB20dUK85Oh99w2Sw2brU8ApO4UlCNUzxJaHTv06bEdZ2/eqj/p1GhvJUrDUBEOVHz TOHgHgD2+QEKmG8ECFKgLaR643MQQaA= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1674141016; x=1705677016; h=date:from:to:cc:subject:message-id:references: mime-version:in-reply-to; bh=x4phDWkU0d+rAOo+73vMsnCZEet+XmmROp+2yK+WXco=; b=fYLKiyUArS9Gu26Wp1CNeJfULYqsOD+fWJk3dwKzmoMuBCyeW3S9WcuJ +KvrH7tfBpIbI7p+vfGoFBc0gMXXgUZTuFHdRKwYSeOs2W4Dg/exQ81dr 3yfEdXtGRYWtbXRQi59dswRaHm1AGIHOB9BoU/GSXs0r+44wMhF3r1Hmh UXENy9rkjArcLEQTsUUOb63knfEC1G98ekTGhmzvG/8m1OlnnlCQojanQ 57a9lwMTQsIODlw8KORE4VNn+0p6/vhOzzae69iXj/9jtPcPrMuIQBkta EXqPIgO8ucp/MUJg4+BtQhpLxNfjLguUGVpt6In7Zs3VTKIyh2Y0NqCWF Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10595"; a="325350336" X-IronPort-AV: E=Sophos;i="5.97,229,1669104000"; d="scan'208";a="325350336" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Jan 2023 07:10:07 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10595"; a="610093067" X-IronPort-AV: E=Sophos;i="5.97,229,1669104000"; d="scan'208";a="610093067" Received: from sburgans-mobl.ger.corp.intel.com (HELO box.shutemov.name) ([10.249.47.75]) by orsmga003-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Jan 2023 07:10:04 -0800 Received: by box.shutemov.name (Postfix, from userid 1000) id 4D6DA10B080; Thu, 19 Jan 2023 18:10:01 +0300 (+03) Date: Thu, 19 Jan 2023 18:10:01 +0300 From: kirill.shutemov@linux.intel.com To: Jiaqi Yan Cc: kirill@shutemov.name, shy828301@gmail.com, tongtiangen@huawei.com, tony.luck@intel.com, akpm@linux-foundation.org, wangkefeng.wang@huawei.com, naoya.horiguchi@nec.com, linmiaohe@huawei.com, linux-mm@kvack.org, osalvador@suse.de Subject: Re: [PATCH v9 2/2] mm/khugepaged: recover from poisoned file-backed memory Message-ID: <20230119151001.jepfildnq5vjba5q@box.shutemov.name> References: <20221205234059.42971-1-jiaqiyan@google.com> <20221205234059.42971-3-jiaqiyan@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20221205234059.42971-3-jiaqiyan@google.com> X-Rspam-User: X-Rspamd-Server: rspam03 X-Stat-Signature: 95xorq6ykxqcg86ycq3tistx3c4sxtqy X-Rspamd-Queue-Id: 236AD14001C X-HE-Tag: 1674141015-381522 X-HE-Meta: U2FsdGVkX19AT0yhnc+oEt5JHTSn3SMcwNwNHP2Q4jnMqyfoyk/pTgMk4M9RS2Bb1hFBtx9eQBkgInek73rVIDnSjpDh2LRdiXLB7AgWkrZjyjWSO3bzHHWmEjKD2uZJvLTCDgBnx02dR19p5ug1tVFr8+UDnZ0bZxm0Xh5ia13SZPqwXNEq9t39NgzLNzIAdWbLGH3og1YhVlYHf7tEhi0in4RnUvzNhSx/5w1frr2AsbpczLTdwA/ZoqkhljDTL/fe8C8hm1xXkmfoq8TYgc84bS2KwMJwZzO0hopS2sneQZdrIean8c+mmR+n/x9K7od2FLWJCmm6bmVbJDRKKZlsD7jEVt+DY81ewRntTcBwLOLNFYwUPqOgNpvMxQmzCB9LHdjcG3yq3s6u2pCqK46tzRTRTAoVbZOnHYe2NHXVEMFJKxIu7kGmsK4jz/2LzS0jDWFA3I2TzFaVSqC4XfSuYzi1OJICRiZdl42sTJ4DbXuCMez1kv/yX+YZpcZVWXEiaV9v6Gp6utsrV7qaUfRZ2QVtdpxIrBWPk9W6V+UCYvu9killUwr7AEZwdFqBOp8vcLAxcs48jsTXyJU20gywKczjlEeQHQ+gyNFsG+MM6VF2S3W662RUSiu2W64zyuGa/VWsgkTvC5MqCdyJWuUhgmX7uzt8Gk1+sDCqnZjXJPFX0fYfAChQBHq6W/+ZHN3JfdWME25HW2cDrI4WVnGra4NkCAW4WzwCq+gsQAmqRGaDFgEf9ZXaLQx7UjBJnWArpu9+/4bKGwNTIwlJ9vH3sIk7tNxpwD0Ez9qWTcSUsnsVvDfvxw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000058, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Dec 05, 2022 at 03:40:59PM -0800, Jiaqi Yan wrote: > Make collapse_file roll back when copying pages failed. More concretely: > - extract copying operations into a separate loop > - postpone the updates for nr_none until both scanning and copying > succeeded > - postpone joining small xarray entries until both scanning and copying > succeeded > - postpone the update operations to NR_XXX_THPS until both scanning and > copying succeeded > - for non-SHMEM file, roll back filemap_nr_thps_inc if scan succeeded but > copying failed > > Tested manually: > 0. Enable khugepaged on system under test. Mount tmpfs at /mnt/ramdisk. > 1. Start a two-thread application. Each thread allocates a chunk of > non-huge memory buffer from /mnt/ramdisk. > 2. Pick 4 random buffer address (2 in each thread) and inject > uncorrectable memory errors at physical addresses. > 3. Signal both threads to make their memory buffer collapsible, i.e. > calling madvise(MADV_HUGEPAGE). > 4. Wait and then check kernel log: khugepaged is able to recover from > poisoned pages by skipping them. > 5. Signal both threads to inspect their buffer contents and make sure no > data corruption. > > Signed-off-by: Jiaqi Yan Okay, looks sane. Acked-by: Kirill A. Shutemov -- Kiryl Shutsemau / Kirill A. Shutemov