From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 717F3C433EF for ; Wed, 27 Oct 2021 02:09:15 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id EF4F8610A3 for ; Wed, 27 Oct 2021 02:09:14 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org EF4F8610A3 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 4412B80007; Tue, 26 Oct 2021 22:09:14 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3F089940007; Tue, 26 Oct 2021 22:09:14 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2DF5980007; Tue, 26 Oct 2021 22:09:14 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0195.hostedemail.com [216.40.44.195]) by kanga.kvack.org (Postfix) with ESMTP id 1E565940007 for ; Tue, 26 Oct 2021 22:09:14 -0400 (EDT) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id CA11931E45 for ; Wed, 27 Oct 2021 02:09:13 +0000 (UTC) X-FDA: 78740584986.14.751B3BA Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf16.hostedemail.com (Postfix) with ESMTP id 4CF62F0000AA for ; Wed, 27 Oct 2021 02:09:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1635300552; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=Qcm5Q4Ovs8CHRPn2mWRVm4wu22bHWN0cLhSQGwpxEDE=; b=Wy6XeCr/xF2T1sr0GuSivzRZcYCVL6BCO4N+pdHWMUgGaBpM69sVnsBtav5pf63c7h9tJ4 nIdFBRW/VkfX7zzAfAEXSUNvDJZLUtGKQuSjUAlVezz+1Fk4IaydeMwTq3oi7nP/P8Xqer rVndeEwnlJPIGQ3MuWXk9W2tsHzMRQ4= Received: from mail-pj1-f71.google.com (mail-pj1-f71.google.com [209.85.216.71]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-450-bLcvXGdLN2CziaZbITm2gQ-1; Tue, 26 Oct 2021 22:09:11 -0400 X-MC-Unique: bLcvXGdLN2CziaZbITm2gQ-1 Received: by mail-pj1-f71.google.com with SMTP id n2-20020a17090a2fc200b001a1bafb59bfso694506pjm.1 for ; Tue, 26 Oct 2021 19:09:11 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=Qcm5Q4Ovs8CHRPn2mWRVm4wu22bHWN0cLhSQGwpxEDE=; b=VVyeW/bC0bUjptmvka9ky6Q6unvgIk/iAs7fWOAAUMye6uMrLzHu5d2HIT1tlBfJSz 5djqMdyyHHdK1tw0mWh2KS4N+0BzpPDglko/B8pDG+3N1L1YjNgV5gZt0PuXAW5gkeFo KYzQgbORrHbLSjK9CHvEh+3db4BgN93wrfaceuwL6QYbL+H7ZOO4T8kgcYbJIHYdr26Y KUvpkVdGrUH9G2dDL07724vnJjNDyigDlF3x6G21Ej/TQ2MnnJeaF4ivFzIpMahGkE1K GpqtKhndGPku68TdijG95xmLpdg5RCGOF9gqPoQkjsU3pUGm8+8foQTS3RjGP+NgXNNj yLXA== X-Gm-Message-State: AOAM531n03OexyobUQbADm6oXpVpO4nV7kspx+QPS497ukC25MCYgQai 4NY2XtvE6p/nV0HzQIxY2CyTQitmuS6n582JDFSvoWjtLQue6Bs2DJkHRu3HMjREvviwa0Q70x9 jbg72p0tesXI= X-Received: by 2002:a05:6a00:23d5:b0:47c:236d:65b4 with SMTP id g21-20020a056a0023d500b0047c236d65b4mr2581943pfc.52.1635300550260; Tue, 26 Oct 2021 19:09:10 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwOhUbOwgnNQ/gXrlZ5/SKrkDBkzTfSumNM5GaK2aTsmco/xZazNKgSrwS9NnkH+OZ6m7FM6w== X-Received: by 2002:a05:6a00:23d5:b0:47c:236d:65b4 with SMTP id g21-20020a056a0023d500b0047c236d65b4mr2581919pfc.52.1635300549857; Tue, 26 Oct 2021 19:09:09 -0700 (PDT) Received: from xz-m1.local ([191.101.132.60]) by smtp.gmail.com with ESMTPSA id z8sm20508403pgc.53.2021.10.26.19.09.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 26 Oct 2021 19:09:08 -0700 (PDT) Date: Wed, 27 Oct 2021 10:09:03 +0800 From: Peter Xu To: Naoya Horiguchi , Matt Mackall , Dave Hansen Cc: David Hildenbrand , linux-mm@kvack.org, Andrew Morton , Alistair Popple , Mike Kravetz , Konstantin Khlebnikov , Bin Wang , Yang Shi , Naoya Horiguchi , linux-kernel@vger.kernel.org Subject: Re: [PATCH v1] mm, pagemap: expose hwpoison entry Message-ID: References: <20211004115001.1544259-1-naoya.horiguchi@linux.dev> <258d0ddb-6c82-0c95-a15e-b085b59d2142@redhat.com> <20211004143228.GA1545442@u2004> <20211026232736.GA2704541@u2004> MIME-Version: 1.0 In-Reply-To: <20211026232736.GA2704541@u2004> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Stat-Signature: 5b49gqg6go433hcyc6wmet1d6eqm3fgs Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="Wy6XeCr/"; spf=none (imf16.hostedemail.com: domain of peterx@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 4CF62F0000AA X-HE-Tag: 1635300548-851956 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Oct 27, 2021 at 08:27:36AM +0900, Naoya Horiguchi wrote: > On Mon, Oct 04, 2021 at 11:32:28PM +0900, Naoya Horiguchi wrote: > > On Mon, Oct 04, 2021 at 01:55:30PM +0200, David Hildenbrand wrote: > > > On 04.10.21 13:50, Naoya Horiguchi wrote: > ... > > > > > > > > Hwpoison entry for hugepage is also exposed by this patch. The below > > > > example shows how pagemap is visible in the case where a memory error > > > > hit a hugepage mapped to a process. > > > > > > > > $ ./page-types --no-summary --pid $PID --raw --list --addr 0x700000000+0x400 > > > > voffset offset len flags > > > > 700000000 12fa00 1 ___U_______Ma__H_G_________________f_______1 > > > > 700000001 12fa01 1ff ___________Ma___TG_________________f_______1 > > > > 700000200 12f800 1 __________B________X_______________f______w_ > > > > 700000201 12f801 1 ___________________X_______________f______w_ // memory failure hit this page > > > > 700000202 12f802 1fe __________B________X_______________f______w_ > > > > > > > > The entries with both of "X" flag (hwpoison flag) and "w" flag (swap > > > > flag) are considered as hwpoison entries. So all pages in 2MB range > > > > are inaccessible from the process. We can get actual error location > > > > by page-types in physical address mode. > > > > > > > > $ ./page-types --no-summary --addr 0x12f800+0x200 --raw --list > > > > offset len flags > > > > 12f800 1 __________B_________________________________ > > > > 12f801 1 ___________________X________________________ > > > > 12f802 1fe __________B_________________________________ > > > > > > > > Signed-off-by: Naoya Horiguchi > > > > --- > > > > fs/proc/task_mmu.c | 41 ++++++++++++++++++++++++++++++++--------- > > > > include/linux/swapops.h | 13 +++++++++++++ > > > > tools/vm/page-types.c | 7 ++++++- > > > > 3 files changed, 51 insertions(+), 10 deletions(-) > > > > > > > > > Please also update the documentation located at > > > > > > Documentation/admin-guide/mm/pagemap.rst > > > > I will do this in the next post. > > Reading the document, I found that swap type is already exported so we > could identify hwpoison entry with it (without new PM_HWPOISON bit). > One problem is that the format of swap types (like SWP_HWPOISON) depends > on a few config macros like CONFIG_DEVICE_PRIVATE and CONFIG_MIGRATION, > so we also need to export how the swap type field is interpreted. I had similar question before.. though it was more on the generic swap entries not the special ones yet. The thing is I don't know how the userspace could interpret normal swap device indexes out of reading pagemap, say if we have two swap devices with "swapon -s" then I've no idea how do we know which device has which swap type index allocated. That seems to be a similar question asked above on special swap types - the interface seems to be incomplete, if not unused at all. AFAIU the information on "this page is swapped out to device X on offset Y" is not reliable too, because the pagein/pageout from kernel is transparent to the userspace and not under control of userspace at all. IOW, if the user reads that swap entry, then reads data upon the disk of that offset out and put it somewhere else, then it means the data read could already be old if kernel paged in the page after userspace reading the pagemap but before it reading the disk, and I don't see any way to make it right unless the userspace could stop the kernel from page-in a swap entry. That's why I really wonder whether we should expose normal swap entry at all, as I don't know how it could be helpful and used in the 100% right way. Special swap entries seem a bit different - at least for is_pfn_swap_entry() typed swap entries we can still expose the PFN which might be helpful, which I can't tell. I used to send an email to Matt Mackall and Dave Hansen asking about above but didn't get a reply. Ccing again this time with the list copied. > > I thought of adding new interfaces for example under /sys/kernel/mm/swap/type_format/, > which shows info like below (assuming that all CONFIG_{DEVICE_PRIVATE,MIGRATION,MEMORY_FAILURE} > is enabled): > > $ ls /sys/kernel/mm/swap/type_format/ > hwpoison > migration_read > migration_write > device_write > device_read > device_exclusive_write > device_exclusive_read > > $ cat /sys/kernel/mm/swap/type_format/hwpoison > 25 > > $ cat /sys/kernel/mm/swap/type_format/device_write > 28 > > Does it make sense or any better approach? Then I'm wondering whether we care about the rest of the normal swap devices too with pagemap so do we need to expose some information there too (only if there's a real use case, though..)? Or... should we just don't expose swap entries at all, at least generic swap entries? We can still expose things like hwpoison via PM_* bits well defined in that case. Thanks, -- Peter Xu