From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3B9C2C433EF for ; Tue, 29 Mar 2022 20:55:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 583C08D0002; Tue, 29 Mar 2022 16:55:10 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 50D5D8D0001; Tue, 29 Mar 2022 16:55:10 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 30FB28D0002; Tue, 29 Mar 2022 16:55:10 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.28]) by kanga.kvack.org (Postfix) with ESMTP id 1B1E08D0001 for ; Tue, 29 Mar 2022 16:55:10 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id E1E5824282 for ; Tue, 29 Mar 2022 20:55:09 +0000 (UTC) X-FDA: 79298628738.10.5BE8E9C Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf05.hostedemail.com (Postfix) with ESMTP id 58370100017 for ; Tue, 29 Mar 2022 20:55:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1648587308; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=e9prqv3sLrb2nHVYFNlrQEoqAWhCBkFnosz374taqY0=; b=cX8q/ToUHdDK4SyKBde3KuQ4WTZJHzYVedkBCEHJvZBhphgtX42Jykei+LRqj8T2Yfe+hS sKgnAbLCD1pNV5Fl5rlcHL/u3wF36B8xgK8nC3bmu1zeubXBEC/Px9wKNrjX9Of9Fbn6e6 B+uIR6ec9QwbNddTuOwSh1LRGvVWm7k= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-304-TjJBw5D2Mpudo4XMEjYqPw-1; Tue, 29 Mar 2022 16:55:07 -0400 X-MC-Unique: TjJBw5D2Mpudo4XMEjYqPw-1 Received: by mail-wm1-f70.google.com with SMTP id r203-20020a1c44d4000000b0038c8655c40eso5080554wma.6 for ; Tue, 29 Mar 2022 13:55:07 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent :content-language:to:cc:references:from:organization:subject :in-reply-to:content-transfer-encoding; bh=e9prqv3sLrb2nHVYFNlrQEoqAWhCBkFnosz374taqY0=; b=HV7ID2aZNTA0OqptI7jqTVzW0vLP/bxyWyVW6ElzA93rLQOIQyLwjQqrt1VjC8AXhz vgXZl2z+MfhY4bMgZEiGGTh58dT7o9Hjdx10fW3cNsBsiW1Vz6Oua0OKp+jCsdPdM6Bx aj4UJEbb9sun2d3dB05CZ2HWuIWpNkeygqGffCG/mLc59CsTEHuuTRLFmCIe3Nzp90iF LTLZTbAfwTXOQ0Cnt7WUM9jlJChLi8msjkZ9pFpxXHreHNfgtnq1RH4HInRbZI0jYdM6 eu2cAzshiThQ5MDYaAI7hhUDplQSZmmU8s8GdL/+f16wqx3SpWpmSXWMYdtkQnrFe9vM 4fyQ== X-Gm-Message-State: AOAM532uUXsgw/SJ8W3C4ah/5+hOMBKQju/2KpsRjZGBRe+pQDQI9hUG K4+g3TreaKQKoEJ5SgpSgb3lQjZaNcIup6cmmN2lmA+Id1FZ0jabYzwfDrNDD0tYe/SAURxfTph gvWihHrvfQIw= X-Received: by 2002:a5d:43d2:0:b0:204:5f97:d003 with SMTP id v18-20020a5d43d2000000b002045f97d003mr34200504wrr.417.1648587306110; Tue, 29 Mar 2022 13:55:06 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwQ0oo38c3JdMOTHM0L5sTBmpbmX6iEHiESq0hozhnmU6VejvkingkAddMYKlNw2MIAKJYznQ== X-Received: by 2002:a5d:43d2:0:b0:204:5f97:d003 with SMTP id v18-20020a5d43d2000000b002045f97d003mr34200467wrr.417.1648587305820; Tue, 29 Mar 2022 13:55:05 -0700 (PDT) Received: from ?IPV6:2003:cb:c708:af00:7a8a:46df:a7c3:c4c7? (p200300cbc708af007a8a46dfa7c3c4c7.dip0.t-ipconnect.de. [2003:cb:c708:af00:7a8a:46df:a7c3:c4c7]) by smtp.gmail.com with ESMTPSA id z13-20020a5d440d000000b00203f2b010b1sm15631407wrq.44.2022.03.29.13.55.04 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 29 Mar 2022 13:55:05 -0700 (PDT) Message-ID: Date: Tue, 29 Mar 2022 22:55:03 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.6.2 To: Khalid Aziz , linux-kernel@vger.kernel.org Cc: Andrew Morton , Hugh Dickins , Linus Torvalds , David Rientjes , Shakeel Butt , John Hubbard , Jason Gunthorpe , Mike Kravetz , Mike Rapoport , Yang Shi , "Kirill A . Shutemov" , Matthew Wilcox , Vlastimil Babka , Jann Horn , Michal Hocko , Nadav Amit , Rik van Riel , Roman Gushchin , Andrea Arcangeli , Peter Xu , Donald Dutile , Christoph Hellwig , Oleg Nesterov , Jan Kara , Liang Zhang , Pedro Gomes , Oded Gabbay , linux-mm@kvack.org References: <20220315104741.63071-1-david@redhat.com> <20220315104741.63071-2-david@redhat.com> <909cc1b6-6f4f-4c45-f418-31d5dd5acaa3@oracle.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [PATCH v2 01/15] mm/rmap: fix missing swap_free() in try_to_unmap() after arch_unmap_one() failed In-Reply-To: <909cc1b6-6f4f-4c45-f418-31d5dd5acaa3@oracle.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="cX8q/ToU"; spf=none (imf05.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Rspam-User: X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 58370100017 X-Stat-Signature: medtpoejtszi6igkskscjcrygx86srwe X-HE-Tag: 1648587309-379369 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 29.03.22 22:42, Khalid Aziz wrote: > On 3/29/22 07:59, David Hildenbrand wrote: >> On 15.03.22 11:47, David Hildenbrand wrote: >>> In case arch_unmap_one() fails, we already did a swap_duplicate(). let's >>> undo that properly via swap_free(). >>> >>> Fixes: ca827d55ebaa ("mm, swap: Add infrastructure for saving page metadata on swap") >>> Reviewed-by: Khalid Aziz >>> Signed-off-by: David Hildenbrand >>> --- >>> mm/rmap.c | 1 + >>> 1 file changed, 1 insertion(+) >>> >>> diff --git a/mm/rmap.c b/mm/rmap.c >>> index 6a1e8c7f6213..f825aeef61ca 100644 >>> --- a/mm/rmap.c >>> +++ b/mm/rmap.c >>> @@ -1625,6 +1625,7 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, >>> break; >>> } >>> if (arch_unmap_one(mm, vma, address, pteval) < 0) { >>> + swap_free(entry); >>> set_pte_at(mm, address, pvmw.pte, pteval); >>> ret = false; >>> page_vma_mapped_walk_done(&pvmw); >> >> Hi Khalid, >> >> I'm a bit confused about the semantics if arch_unmap_one(), I hope you can clarify. >> >> >> See patch #11 in this series, were we can fail unmapping after arch_unmap_one() succeeded. E.g., >> >> @@ -1623,6 +1634,24 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, >> page_vma_mapped_walk_done(&pvmw); >> break; >> } >> + if (anon_exclusive && >> + page_try_share_anon_rmap(subpage)) { >> + swap_free(entry); >> + set_pte_at(mm, address, pvmw.pte, pteval); >> + ret = false; >> + page_vma_mapped_walk_done(&pvmw); >> + break; >> + } >> + /* >> + * Note: We don't remember yet if the page was mapped >> + * exclusively in the swap entry, so swapin code has >> + * to re-determine that manually and might detect the >> + * page as possibly shared, for example, if there are >> + * other references on the page or if the page is under >> + * writeback. We made sure that there are no GUP pins >> + * on the page that would rely on it, so for GUP pins >> + * this is fine. >> + */ >> if (list_empty(&mm->mmlist)) { >> spin_lock(&mmlist_lock); >> if (list_empty(&mm->mmlist)) >> >> >> For now, I was under the impression that we don't have to undo anything after >> arch_unmap_one() succeeded, because we seem to not do anything for two >> cases below. But looking into arch_unmap_one() and how it allocates stuff I do >> wonder what we would actually want to do here -- I'd assume we'd want to >> trigger the del_tag_store() somehow? > > Hi David, > Hi, thanks for your fast reply. > Currently once arch_unmap_one() completes successfully, we are at the point of no return for this pte. It will be > replaced by swap pte soon thereafter. Patch 11 adds another case where we may return without replacing current pte with > swap pte. For now could you resolve this by moving the above code block in patch 11 to before the call to > arch_unmap_one(). That still leaves open the issue having the flexibility of undoing what arch_unmap_one() does for some > other reason in future. That will require coming up with a properly architected way to do it. I really want clearing PG_anon_exclusive be the last action, without eventually having to set it again and overcomplicating PG_anon_exclusive/rmap handling. Ideally, we'd have a "arch_remap_one()" that just reverts what arch_unmap_one() did. > >> >> arch_unmap_one() calls adi_save_tags(), which allocates memory. >> adi_restore_tags()->del_tag_store() reverts that operation and ends up >> freeing memory conditionally; However, it's only >> called from arch_do_swap_page(). >> >> >> Here is why I have to scratch my head: >> >> a) arch_do_swap_page() is only called from do_swap_page(). We don't do anything similar >> for mm/swapfile.c:unuse_pte(), aren't we missing something? > > My understanding of this code path maybe flawed, so do correct me if this does not sound right. unused_pte() is called > upon user turning off swap on a device. unused_pte() is called by unused_pte_range() which swaps the page back in from > swap device before calling unuse_pte(). Once the page is read back in from swap, ultimately access to the va for the > page will result in call to __handle_mm_fault() which in turn will call handle_pte_fault() to insert a new pte for this > mapping and handle_pte_fault() will call arch_do_swap_page() which will restore the tags. unuse_pte() will replace a swap pte directly by a proper, present pte, just like do_swap_page() would. You won't end up in do_swap_page() anymore and arch_do_swap_page() won't be called, because there is no swap PTE anymore. > >> >> b) try_to_migrate_one() does the arch_unmap_one(), but who will do the >> restore+free after migration succeeded or failed, aren't we missing something? > > try_to_migrate_one() replaces the current pte with a migration pte after calling arch_unmap_one(). This causes > __handle_mm_fault() to be called when a reference to the va covered by migration pte is made. This will in turn finally > result in a call to arch_do_swap_page() which restores the tags. Migration PTEs are restore via mm/migrate.c:remove_migration_ptes(). arch_do_swap_page() won't be called. What you mention is if someone accesses the migration PTE while migration is active and the migration PTEs have not been removed yet. While we'll end up in do_swap_page(), we'll do a migration_entry_wait(), followed by an effective immediate "return 0;". arch_do_swap_page() won't get called. So in essence, I think this doesn't work as expected yet. In the best case we don't immediately free memory. In the worst case we lose the tags. -- Thanks, David / dhildenb