From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.5 required=3.0 tests=MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A7FDACA9EA0 for ; Tue, 22 Oct 2019 10:25:00 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 6E08B21783 for ; Tue, 22 Oct 2019 10:25:00 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6E08B21783 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 1B80C6B0003; Tue, 22 Oct 2019 06:25:00 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 169ED6B0006; Tue, 22 Oct 2019 06:25:00 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 07F526B0007; Tue, 22 Oct 2019 06:25:00 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0158.hostedemail.com [216.40.44.158]) by kanga.kvack.org (Postfix) with ESMTP id D97CB6B0003 for ; Tue, 22 Oct 2019 06:24:59 -0400 (EDT) Received: from smtpin16.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with SMTP id 4C27682499A8 for ; Tue, 22 Oct 2019 10:24:59 +0000 (UTC) X-FDA: 76071037518.16.tiger38_82d29461ba63c X-HE-Tag: tiger38_82d29461ba63c X-Filterd-Recvd-Size: 3922 Received: from mx1.suse.de (mx2.suse.de [195.135.220.15]) by imf39.hostedemail.com (Postfix) with ESMTP for ; Tue, 22 Oct 2019 10:24:58 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id C5BE5AAF1; Tue, 22 Oct 2019 10:24:57 +0000 (UTC) Date: Tue, 22 Oct 2019 12:24:57 +0200 From: Michal Hocko To: Oscar Salvador Cc: n-horiguchi@ah.jp.nec.com, mike.kravetz@oracle.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [RFC PATCH v2 10/16] mm,hwpoison: Rework soft offline for free pages Message-ID: <20191022102457.GJ9379@dhcp22.suse.cz> References: <20191017142123.24245-1-osalvador@suse.de> <20191017142123.24245-11-osalvador@suse.de> <20191018120615.GM5017@dhcp22.suse.cz> <20191021125842.GA11330@linux> <20191021154158.GV9379@dhcp22.suse.cz> <20191022074615.GA19060@linux> <20191022082611.GD9379@dhcp22.suse.cz> <20191022083505.GA19708@linux> <20191022092256.GH9379@dhcp22.suse.cz> <20191022095852.GB20429@linux> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20191022095852.GB20429@linux> User-Agent: Mutt/1.10.1 (2018-07-13) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue 22-10-19 11:58:52, Oscar Salvador wrote: > On Tue, Oct 22, 2019 at 11:22:56AM +0200, Michal Hocko wrote: > > Hmm, that might be a misunderstanding on my end. I thought that it is > > the MCE handler to say whether the failure is recoverable or not. If yes > > then we can touch the content of the memory (that would imply the > > migration). Other than that both paths should be essentially the same, > > no? Well unrecoverable case would be essentially force migration failure > > path. > > > > MADV_HWPOISON is explicitly documented to test MCE handling IIUC: > > : This feature is intended for testing of memory error-handling > > : code; it is available only if the kernel was configured with > > : CONFIG_MEMORY_FAILURE. > > > > There is no explicit note about the type of the error that is injected > > but I think it is reasonably safe to assume this is a recoverable one. > > MADV_HWPOISON stands for hard-offline. > MADV_SOFT_OFFLINE stands for soft-offline. > > MADV_SOFT_OFFLINE (since Linux 2.6.33) > Soft offline the pages in the range specified by addr and > length. The memory of each page in the specified range is > preserved (i.e., when next accessed, the same content will be > visible, but in a new physical page frame), and the original > page is offlined (i.e., no longer used, and taken out of > normal memory management). The effect of the > MADV_SOFT_OFFLINE operation is invisible to (i.e., does not > change the semantics of) the calling process. > > This feature is intended for testing of memory error-handling > code; it is available only if the kernel was configured with > CONFIG_MEMORY_FAILURE. I have missed that one somehow. Thanks for pointing out. [...] > AFAICS, for hard-offline case, a recovered event would be if: > > - the page to shut down is already free > - the page was unmapped > > In some cases we need to kill the process if it holds dirty pages. Yes, I would expect that the page table would be poisoned and the process receive a SIGBUS when accessing that memory. > But we never migrate contents in hard-offline path. > I guess it is because we cannot really trust the contents anymore. Yes, that makes a perfect sense. What I am saying that the migration (aka trying to recover) is the main and only difference. The soft offline should poison page tables when not able to migrate as well IIUC. -- Michal Hocko SUSE Labs