From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D3227C433EF for ; Tue, 19 Apr 2022 16:16:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0F7678D0080; Tue, 19 Apr 2022 12:16:42 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0A73A8D0047; Tue, 19 Apr 2022 12:16:42 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E895D8D0080; Tue, 19 Apr 2022 12:16:41 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.27]) by kanga.kvack.org (Postfix) with ESMTP id D2DEA8D0047 for ; Tue, 19 Apr 2022 12:16:41 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay11.hostedemail.com (Postfix) with ESMTP id 9C3EE823C7 for ; Tue, 19 Apr 2022 16:16:41 +0000 (UTC) X-FDA: 79374131802.16.E28A44D Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf29.hostedemail.com (Postfix) with ESMTP id 6C528120014 for ; Tue, 19 Apr 2022 16:16:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1650385000; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=feagHcqxbeiZZqCkc7XRmApRsAl3Hwsk1QnHwbNmE9Q=; b=ZoEt0vZdj04l9rcRdyo4Y/ohfWtNlU/qvot66ojnanzlLLjdn2R39c/6wXldNLt0XfF4UF kkN0dA3fUX+UE/jg08W7HPQM7Cp1H4SYLlbpJRgfsA/OOW0TOi9cMr4tmVFfD9FaZh+n2o 5gDy1LLyCrnClvxxbF5kEIBE4iw3OkM= Received: from mail-io1-f70.google.com (mail-io1-f70.google.com [209.85.166.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-562-kUYahzWsNr6NQhMUT3j2QQ-1; Tue, 19 Apr 2022 12:16:39 -0400 X-MC-Unique: kUYahzWsNr6NQhMUT3j2QQ-1 Received: by mail-io1-f70.google.com with SMTP id m8-20020a0566022e8800b00654992238ceso4454431iow.23 for ; Tue, 19 Apr 2022 09:16:39 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=feagHcqxbeiZZqCkc7XRmApRsAl3Hwsk1QnHwbNmE9Q=; b=OXP8xWzihxULhX5swwfaTMbEa50R/FhCPO1elpSowEkC7sanHWKOFWUACZbdte/yor BwoweSawyf2ZaI5VCcEZD1FdTR01wW+wPi3rTFsHpiMDNcekFeP142R4XhrJGagSfYrV o9h/81I36fUBdOgfcubC9pp4K0Uy8EPnpTaKkd4A4ypnQVkzWyXMNOQYKv30TKdjFJd7 qW3co9T3r6c+Af6/YRYnVMofaAn/B1ECK4NFMtNXt0Q6D23z8WYrwAJozw88rletC76R /e56fM6FoGd86tEkUMCeSrzwRM5yfYDE5mdq95e93vIc/mMh9OI9QF9GYqOjpwZqM627 k39A== X-Gm-Message-State: AOAM531YrMISLBkD3gApuqSoNNEZatraRMv0fCaL4i07J19lB1Eejhf2 mWDwLGxnVsEHI12KtKdXXTljQBObpzkMeD19EIlMPI/cfAlMHTO1srVzCie5pl8eY2OOQ/rpPST OcnnrA1rjjeU= X-Received: by 2002:a05:6e02:1d85:b0:2cb:fa5e:73fa with SMTP id h5-20020a056e021d8500b002cbfa5e73famr7182108ila.294.1650384998622; Tue, 19 Apr 2022 09:16:38 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwhHvQQOCwK/bC1iSAYmZ/USMc21YsvbIyJomXHZ6/51e2BqTSDdxCO939mrDl7PDkCurFtYQ== X-Received: by 2002:a05:6e02:1d85:b0:2cb:fa5e:73fa with SMTP id h5-20020a056e021d8500b002cbfa5e73famr7182085ila.294.1650384998180; Tue, 19 Apr 2022 09:16:38 -0700 (PDT) Received: from xz-m1.local (cpec09435e3e0ee-cmc09435e3e0ec.cpe.net.cable.rogers.com. [99.241.198.116]) by smtp.gmail.com with ESMTPSA id a1-20020a923301000000b002cae7560bfesm9092389ilf.62.2022.04.19.09.16.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 19 Apr 2022 09:16:37 -0700 (PDT) Date: Tue, 19 Apr 2022 12:16:35 -0400 From: Peter Xu To: David Hildenbrand Cc: Alistair Popple , Miaohe Lin , akpm@linux-foundation.org, willy@infradead.org, vbabka@suse.cz, dhowells@redhat.com, neilb@suse.de, surenb@google.com, minchan@kernel.org, sfr@canb.auug.org.au, rcampbell@nvidia.com, naoya.horiguchi@nec.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v2] mm/swapfile: unuse_pte can map random data if swap read fails Message-ID: References: <20220416030549.60559-1-linmiaohe@huawei.com> <87tuapk9n7.fsf@nvdebian.thelocal> <5a78dd68-343d-ac57-a698-2cfead8ee366@huawei.com> <72cfde7a-61d7-980c-4653-94ae83eb4257@redhat.com> <87pmldjxiq.fsf@nvdebian.thelocal> <21003e7a-01e4-c751-dd41-fce4149d424c@redhat.com> MIME-Version: 1.0 In-Reply-To: <21003e7a-01e4-c751-dd41-fce4149d424c@redhat.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=ZoEt0vZd; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf29.hostedemail.com: domain of peterx@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=peterx@redhat.com X-Rspam-User: X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 6C528120014 X-Stat-Signature: kjb9rr7aztcgd5eusc6dowr1yj3qt1nn X-HE-Tag: 1650385000-206537 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Apr 19, 2022 at 01:14:29PM +0200, David Hildenbrand wrote: > On 19.04.22 10:08, Alistair Popple wrote: > > David Hildenbrand writes: > > > >> On 19.04.22 09:29, Miaohe Lin wrote: > >>> On 2022/4/19 11:51, Alistair Popple wrote: > >>>> Miaohe Lin writes: > >>>> > >>>>> There is a bug in unuse_pte(): when swap page happens to be unreadable, > >>>>> page filled with random data is mapped into user address space. In case > >>>>> of error, a special swap entry indicating swap read fails is set to the > >>>>> page table. So the swapcache page can be freed and the user won't end up > >>>>> with a permanently mounted swap because a sector is bad. And if the page > >>>>> is accessed later, the user process will be killed so that corrupted data > >>>>> is never consumed. On the other hand, if the page is never accessed, the > >>>>> user won't even notice it. > >>>> > >>>> Hi Miaohe, > >>>>> It seems we're not actually using the pfn that gets stored in the special swap > >>>> entry here. Is my understanding correct? If so I think it would be better to use > >>> > >>> Yes, you're right. The pfn is not used now. What we need here is a special swap entry > >>> to do the right things. I think we can change to store some debugging information instead > >>> of pfn if needed in the future. > >>> > >>>> the new PTE markers Peter introduced[1] rather than adding another swap entry > >>>> type. > >>> > >>> IIUC, we should not reuse that swap entry here. From definition: > >>> > >>> PTE markers > >>> `=========' > >>> ... > >>> PTE marker is a new type of swap entry that is ony applicable to file > >>> backed memories like shmem and hugetlbfs. It's used to persist some > >>> pte-level information even if the original present ptes in pgtable are > >>> zapped. > >>> > >>> It's designed for file backed memories while swapin error entry is for anonymous > >>> memories. And there has some differences in processing. So it's not a good idea > >>> to reuse pte markers. Or am I miss something? > >> > >> I tend to agree. As raised in my other reply, maybe we can simply reuse > >> hwpoison entries and update the documentation of them accordingly. > > > > Unless I've missed something I don't think PTE markers should be restricted > > solely to file backed memory. It's true that the only user of them at the moment > > is UFFD-WP for file backed memory, but PTE markers are just a special swap entry > > same as what is added here. > > There is a difference. > > What we want here is "there used to be something mapped but it's not > readable anymore. Please fail hard when userspace tries accessing > this.". Just like with hwpoison entries. > > What a pte marker expresses is that "here is nothing mapped right now > but we have additional metadata available here. For file-backed memory, > it translates to: If we ever touch this page, lookup the pagecache what > to map here." > > In the anonymous memory world, this would map to "populate the zeropage > or a fresh anonymous page on access." and keep the metadata around. So far it's defined like that, but it does not necessarily need to. IMHO PTE marker could work here for the anonymous use case as Alistair stated. Say, it's fairly simple to not go into anonymous page handling at all if we see this pte marker with the new bit set. It's indeed just tailored for such use case where we don't need to store special data like pfn. Hwpoison entry looks good to me too, but as discussed we may need to reserve pfn=0 or -1 or anything we're sure an invalid value, and then we'll also need to cover the rest hwpoison related code (carefully, as rightfully pointed out by Miaohe on the difference of VM_FAULT_* fields being returned) to not faultly treat the "swp device read error" with general MCEs. >From that POV it seems pte markers would be slightly cleaner, we'll need to touch up existing pte markers code path to start accept anonymous vmas, though. No strong opinion on this. Btw, is there an error dumped into dmesg when the read error happens (e.g., would block IO path trigger some warning already)? I'm wondering whether we should report it to the user somehow so that the user should know even earlier than when the bad page is accessed, then the user could potentially do something useful. -- Peter Xu