From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.1 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4F91BC07E9A for ; Wed, 14 Jul 2021 16:24:12 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id CF1E261106 for ; Wed, 14 Jul 2021 16:24:11 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CF1E261106 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id E66A46B0078; Wed, 14 Jul 2021 12:24:11 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DF00D6B0081; Wed, 14 Jul 2021 12:24:11 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C1B086B0083; Wed, 14 Jul 2021 12:24:11 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0005.hostedemail.com [216.40.44.5]) by kanga.kvack.org (Postfix) with ESMTP id 987E66B0078 for ; Wed, 14 Jul 2021 12:24:11 -0400 (EDT) Received: from smtpin17.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 80C3C18524BA5 for ; Wed, 14 Jul 2021 16:24:10 +0000 (UTC) X-FDA: 78361715460.17.48C866D Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf15.hostedemail.com (Postfix) with ESMTP id 23727D0000A3 for ; Wed, 14 Jul 2021 16:24:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1626279849; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=XOSyZTmUOUHddn1EWyHVvuoKVwDzElqd4nGHPGr4I4I=; b=dOuXQM+uqwCjBIsgJztOlJpwF3wzOq9aUcEFgscWVkS1r7qImlfRTVZ/udcladuPqdQ+pj wDo//a0dDbqYw62xsSVyBg7a3kyVJ4bhCdv8NvhHoo9R7MmOQMgAIqIRGoRzCR95L7QIzm UACiZdUOlypCeD77/DXg7uzawMbNjsg= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-162-C1r79vYsN2GSILo6x21HJQ-1; Wed, 14 Jul 2021 12:24:06 -0400 X-MC-Unique: C1r79vYsN2GSILo6x21HJQ-1 Received: by mail-wm1-f70.google.com with SMTP id z4-20020a7bc1440000b0290228d7e174f1so621320wmi.0 for ; Wed, 14 Jul 2021 09:24:05 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:to:cc:references:from:organization:subject :message-id:date:user-agent:mime-version:in-reply-to :content-language:content-transfer-encoding; bh=XOSyZTmUOUHddn1EWyHVvuoKVwDzElqd4nGHPGr4I4I=; b=truNep+J/sO30kiZtZ47/4v+troyMBV1JKo+vBn2fTt70IOpTI8KvB+cqvL+/86uKF ik9TU/L3qj5jXmI/5Kx9q3v8HnAdoQungcVWdzQ9rUzA7OHfQZW/x+ksYQvFN8yNRdP/ MQ9CvwfgE9vKN+owl3EqWuocaoYCFXgT8ReL3X0UWmZs3GGN7FyZAX/FQCQpLt7TnyfQ w0K7huOsZAz32W1rQ/IT7LkVRdPAs5DbF3X1Lo79SHOpFOJeddp56Dm0qXWj5PsKySm/ lsjzcOOU1tuNYlUjq05YRtSVd3tu28PSXGed119I4SqJHCiRFjszG/Er2SE6dNdC/rjk yo/A== X-Gm-Message-State: AOAM530btNmvjj//rhaKPFFdCJ7r0C3H/aWA/E5PgB6SsCKwsB4oafI7 by2g+w8EH30e+SaxXihXqBZZHDPfU/WWlhkDNR/9yyxAc0MNdyut9F/WVGRYgS3UMd2z7UfwDq8 u6Qrpn5nf3a0= X-Received: by 2002:a05:6000:18ae:: with SMTP id b14mr11053678wri.427.1626279844919; Wed, 14 Jul 2021 09:24:04 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwOUHqgSVMSt4kT31DzF0TTbh3RCm3TqeS6VrhoFbzDF67zPoEwHcFxfwhmql5gyhE25Od6Ww== X-Received: by 2002:a05:6000:18ae:: with SMTP id b14mr11053627wri.427.1626279844617; Wed, 14 Jul 2021 09:24:04 -0700 (PDT) Received: from [192.168.3.132] (p5b0c60d5.dip0.t-ipconnect.de. [91.12.96.213]) by smtp.gmail.com with ESMTPSA id w3sm3143291wrt.55.2021.07.14.09.24.03 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 14 Jul 2021 09:24:04 -0700 (PDT) To: Peter Xu , Tiberiu Georgescu Cc: akpm@linux-foundation.org, catalin.marinas@arm.com, peterz@infradead.org, chinwen.chang@mediatek.com, linmiaohe@huawei.com, jannh@google.com, apopple@nvidia.com, christian.brauner@ubuntu.com, ebiederm@xmission.com, adobriyan@gmail.com, songmuchun@bytedance.com, axboe@kernel.dk, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, ivan.teterevkov@nutanix.com, florian.schmidt@nutanix.com, carl.waldspurger@nutanix.com, Hugh Dickins , Andrea Arcangeli References: <20210714152426.216217-1-tiberiu.georgescu@nutanix.com> <20210714152426.216217-2-tiberiu.georgescu@nutanix.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [RFC PATCH 1/1] pagemap: report swap location for shared pages Message-ID: <0e38ef52-0ac7-c15b-114b-3316973fc7dc@redhat.com> Date: Wed, 14 Jul 2021 18:24:02 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=dOuXQM+u; spf=none (imf15.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Stat-Signature: yzpzck7tba5igd9t3z49qik4fjmzkr1d X-Rspamd-Queue-Id: 23727D0000A3 X-Rspamd-Server: rspam01 X-HE-Tag: 1626279850-795937 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 14.07.21 18:08, Peter Xu wrote: > On Wed, Jul 14, 2021 at 03:24:26PM +0000, Tiberiu Georgescu wrote: >> When a page allocated using the MAP_SHARED flag is swapped out, its pa= gemap >> entry is cleared. In many cases, there is no difference between swappe= d-out >> shared pages and newly allocated, non-dirty pages in the pagemap inter= face. >> >> This patch addresses the behaviour and modifies pte_to_pagemap_entry()= to >> make use of the XArray associated with the virtual memory area struct >> passed as an argument. The XArray contains the location of virtual pag= es >> in the page cache, swap cache or on disk. If they are on either of the >> caches, then the original implementation still works. If not, then the >> missing information will be retrieved from the XArray. >> >> Co-developed-by: Florian Schmidt >> Signed-off-by: Florian Schmidt >> Co-developed-by: Carl Waldspurger >> Signed-off-by: Carl Waldspurger >> Co-developed-by: Ivan Teterevkov >> Signed-off-by: Ivan Teterevkov >> Signed-off-by: Tiberiu Georgescu >> --- >> fs/proc/task_mmu.c | 37 +++++++++++++++++++++++++++++-------- >> 1 file changed, 29 insertions(+), 8 deletions(-) >> >> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c >> index eb97468dfe4c..b17c8aedd32e 100644 >> --- a/fs/proc/task_mmu.c >> +++ b/fs/proc/task_mmu.c >> @@ -1359,12 +1359,25 @@ static int pagemap_pte_hole(unsigned long star= t, unsigned long end, >> return err; >> } >> =20 >> +static void *get_xa_entry_at_vma_addr(struct vm_area_struct *vma, >> + unsigned long addr) >> +{ >> + struct inode *inode =3D file_inode(vma->vm_file); >> + struct address_space *mapping =3D inode->i_mapping; >> + pgoff_t offset =3D linear_page_index(vma, addr); >> + >> + return xa_load(&mapping->i_pages, offset); >> +} >> + >> static pagemap_entry_t pte_to_pagemap_entry(struct pagemapread *pm, >> struct vm_area_struct *vma, unsigned long addr, pte_t pte) >> { >> u64 frame =3D 0, flags =3D 0; >> struct page *page =3D NULL; >> =20 >> + if (vma->vm_flags & VM_SOFTDIRTY) >> + flags |=3D PM_SOFT_DIRTY; >> + >> if (pte_present(pte)) { >> if (pm->show_pfn) >> frame =3D pte_pfn(pte); >> @@ -1374,13 +1387,22 @@ static pagemap_entry_t pte_to_pagemap_entry(st= ruct pagemapread *pm, >> flags |=3D PM_SOFT_DIRTY; >> if (pte_uffd_wp(pte)) >> flags |=3D PM_UFFD_WP; >> - } else if (is_swap_pte(pte)) { >> + } else if (is_swap_pte(pte) || shmem_file(vma->vm_file)) { >> swp_entry_t entry; >> - if (pte_swp_soft_dirty(pte)) >> - flags |=3D PM_SOFT_DIRTY; >> - if (pte_swp_uffd_wp(pte)) >> - flags |=3D PM_UFFD_WP; >> - entry =3D pte_to_swp_entry(pte); >> + if (is_swap_pte(pte)) { >> + entry =3D pte_to_swp_entry(pte); >> + if (pte_swp_soft_dirty(pte)) >> + flags |=3D PM_SOFT_DIRTY; >> + if (pte_swp_uffd_wp(pte)) >> + flags |=3D PM_UFFD_WP; >> + } else { >> + void *xa_entry =3D get_xa_entry_at_vma_addr(vma, addr); >> + >> + if (xa_is_value(xa_entry)) >> + entry =3D radix_to_swp_entry(xa_entry); >> + else >> + goto out; >> + } >> if (pm->show_pfn) >> frame =3D swp_type(entry) | >> (swp_offset(entry) << MAX_SWAPFILES_SHIFT); >> @@ -1393,9 +1415,8 @@ static pagemap_entry_t pte_to_pagemap_entry(stru= ct pagemapread *pm, >> flags |=3D PM_FILE; >> if (page && page_mapcount(page) =3D=3D 1) >> flags |=3D PM_MMAP_EXCLUSIVE; >> - if (vma->vm_flags & VM_SOFTDIRTY) >> - flags |=3D PM_SOFT_DIRTY; >=20 > IMHO moving this to the entry will only work for the initial iteration,= however > it won't really help anything, as soft-dirty should always be used in p= air with > clear_refs written with value "4" first otherwise all pages will be mar= ked > soft-dirty then the pagemap data is meaningless. >=20 > After the "write 4" op VM_SOFTDIRTY will be cleared and I expect the te= st case > to see all zeros again even with the patch. >=20 > I think one way to fix this is to do something similar to uffd-wp: we l= eave a > marker in pte showing that this is soft-dirtied pte even if swapped out= . How exactly does such a pte look like? Simply pte_none() with another=20 bit set? > However we don't have a mechanism for that yet in current linux, and th= e > uffd-wp series is the first one trying to introduce something like that= . Can you give me a pointer? I'm very interested in learning how to=20 identify this case. --=20 Thanks, David / dhildenb