From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D4F83C433FE for ; Thu, 31 Mar 2022 13:46:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E769F6B0072; Thu, 31 Mar 2022 09:46:26 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E26796B0073; Thu, 31 Mar 2022 09:46:26 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C9EDB6B0074; Thu, 31 Mar 2022 09:46:26 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.26]) by kanga.kvack.org (Postfix) with ESMTP id B888E6B0072 for ; Thu, 31 Mar 2022 09:46:26 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 61D341F79 for ; Thu, 31 Mar 2022 13:46:26 +0000 (UTC) X-FDA: 79304805972.05.0BB5029 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf30.hostedemail.com (Postfix) with ESMTP id 85FCB80016 for ; Thu, 31 Mar 2022 13:46:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1648734385; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=C4Vh0RqP+GZ21MRJn3PqH0IuI8ZeNebvuM8i4oIwPFs=; b=CjDi1BuiazqsQ0X1FSMYnV7PHQnVS0LTyI4IQxer8fzKyn1t26+b63x4T/auYWZ8FX8RMX lY9riv81JcaBo9NThs0ci5O5GLsRFftkimAOFZYHom6Q7A0ylRKnFAloHt8mzFYVtJVBZC 3/Kbo/CIUdTzoT2xRtindMa7xlrY9Ps= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-46-vzV4RhwwPsCmE7ZgK43SgQ-1; Thu, 31 Mar 2022 09:46:23 -0400 X-MC-Unique: vzV4RhwwPsCmE7ZgK43SgQ-1 Received: by mail-wm1-f70.google.com with SMTP id z16-20020a05600c0a1000b0038bebbd8548so1439607wmp.3 for ; Thu, 31 Mar 2022 06:46:23 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent :content-language:to:cc:references:from:organization:subject :in-reply-to:content-transfer-encoding; bh=C4Vh0RqP+GZ21MRJn3PqH0IuI8ZeNebvuM8i4oIwPFs=; b=JMx8yRTNTs+/IepAf5F1v5z858KspZ3lbLk07wq9EWvvqHuo8NA4w+PV4moLUjCq1E QNQfCJw6LYWGaht2xkxHzyUZ5hVrQbwuRDvHMRX2ZNTwZaUmqQS4OXFbBw0pzTA/WcYB NND6RkSDE8XM5pQAhoDxgcsUVgx8yWXBGRy30Hvmt/Au74hNmodTc3KmGx/8ocsfnqn3 r1v+im4w13FQUujr/DCYDsDMGRdM364bZFYg11f4VKYL2eV+2G4XWEMfaVR8PqpfwoQf 7NG+tkP+muwAidP3tsPDZ39aZ5xAzOQ8AZNpO2VdW5aD1+cw5izA6KO0Rb7qd6QN1zmi 1ODg== X-Gm-Message-State: AOAM530o9MFk8qapTT6wTf0wdkA51K8rbmpwcWV2GgCpRUQNn+CcfT4q hISdjollID3GJNCCYfW9Rgg4s6b0VXN7YF6/aE4zmS6MssrrE7CDlctlCsPgOn6G53692YFlNYO HPTr9AHUoa2A= X-Received: by 2002:a5d:6651:0:b0:203:fc53:cf22 with SMTP id f17-20020a5d6651000000b00203fc53cf22mr4110469wrw.365.1648734382349; Thu, 31 Mar 2022 06:46:22 -0700 (PDT) X-Google-Smtp-Source: ABdhPJx4oUI2/ThFdvSfSKYkBlLO/JOXVqYiGz/W2ANcPKnyfSgHWNfiqvsbxo79KJKL2VH7gS+41w== X-Received: by 2002:a5d:6651:0:b0:203:fc53:cf22 with SMTP id f17-20020a5d6651000000b00203fc53cf22mr4110428wrw.365.1648734382057; Thu, 31 Mar 2022 06:46:22 -0700 (PDT) Received: from ?IPV6:2003:cb:c707:ac00:381c:2e8b:3b48:488e? (p200300cbc707ac00381c2e8b3b48488e.dip0.t-ipconnect.de. [2003:cb:c707:ac00:381c:2e8b:3b48:488e]) by smtp.gmail.com with ESMTPSA id y5-20020a1c4b05000000b0038cbf571334sm7063506wma.18.2022.03.31.06.46.19 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 31 Mar 2022 06:46:21 -0700 (PDT) Message-ID: <2eaa2667-219c-1bac-8f50-0020fa42e54e@redhat.com> Date: Thu, 31 Mar 2022 15:46:19 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.6.2 To: Khalid Aziz , linux-kernel@vger.kernel.org Cc: Andrew Morton , Hugh Dickins , Linus Torvalds , David Rientjes , Shakeel Butt , John Hubbard , Jason Gunthorpe , Mike Kravetz , Mike Rapoport , Yang Shi , "Kirill A . Shutemov" , Matthew Wilcox , Vlastimil Babka , Jann Horn , Michal Hocko , Nadav Amit , Rik van Riel , Roman Gushchin , Andrea Arcangeli , Peter Xu , Donald Dutile , Christoph Hellwig , Oleg Nesterov , Jan Kara , Liang Zhang , Pedro Gomes , Oded Gabbay , linux-mm@kvack.org References: <20220315104741.63071-1-david@redhat.com> <20220315104741.63071-2-david@redhat.com> <909cc1b6-6f4f-4c45-f418-31d5dd5acaa3@oracle.com> <689c199f-9954-524b-7a2d-40f186e87b34@oracle.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [PATCH v2 01/15] mm/rmap: fix missing swap_free() in try_to_unmap() after arch_unmap_one() failed In-Reply-To: <689c199f-9954-524b-7a2d-40f186e87b34@oracle.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Stat-Signature: izdrys4kb97t31m8xwe1rq8d9q1mokss Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=CjDi1Bui; spf=none (imf30.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Rspam-User: X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 85FCB80016 X-HE-Tag: 1648734385-985028 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: > Hi David, Hi, > > arch_remap_one() sounds reasonable. Would you like to add that in your patch series, or would you like me to create a > separate patch for adding this on top of your patch series? Let's handle it separately, because I think there is still a lot to be clarified. I also feel like the terminology ("arch_unmap_one()") is a little too generic, if we really only care about anonymous pages (do we?). > >> >>> >>>> >>>> arch_unmap_one() calls adi_save_tags(), which allocates memory. >>>> adi_restore_tags()->del_tag_store() reverts that operation and ends up >>>> freeing memory conditionally; However, it's only >>>> called from arch_do_swap_page(). >>>> >>>> >>>> Here is why I have to scratch my head: >>>> >>>> a) arch_do_swap_page() is only called from do_swap_page(). We don't do anything similar >>>> for mm/swapfile.c:unuse_pte(), aren't we missing something? >>> >>> My understanding of this code path maybe flawed, so do correct me if this does not sound right. unused_pte() is called >>> upon user turning off swap on a device. unused_pte() is called by unused_pte_range() which swaps the page back in from >>> swap device before calling unuse_pte(). Once the page is read back in from swap, ultimately access to the va for the >>> page will result in call to __handle_mm_fault() which in turn will call handle_pte_fault() to insert a new pte for this >>> mapping and handle_pte_fault() will call arch_do_swap_page() which will restore the tags. >> >> unuse_pte() will replace a swap pte directly by a proper, present pte, >> just like do_swap_page() would. You won't end up in do_swap_page() >> anymore and arch_do_swap_page() won't be called, because there is no >> swap PTE anymore. >> >>> >>>> >>>> b) try_to_migrate_one() does the arch_unmap_one(), but who will do the >>>> restore+free after migration succeeded or failed, aren't we missing something? >>> >>> try_to_migrate_one() replaces the current pte with a migration pte after calling arch_unmap_one(). This causes >>> __handle_mm_fault() to be called when a reference to the va covered by migration pte is made. This will in turn finally >>> result in a call to arch_do_swap_page() which restores the tags. >> >> Migration PTEs are restore via mm/migrate.c:remove_migration_ptes(). >> arch_do_swap_page() won't be called. >> >> What you mention is if someone accesses the migration PTE while >> migration is active and the migration PTEs have not been removed yet. >> While we'll end up in do_swap_page(), we'll do a migration_entry_wait(), >> followed by an effective immediate "return 0;". arch_do_swap_page() >> won't get called. >> >> >> So in essence, I think this doesn't work as expected yet. In the best >> case we don't immediately free memory. In the worst case we lose the tags. >> > > I see what you mean. I can work on fixing these issues up. Ideally, I think we'd handle migration differently: migrate the tags directly instead of temporarily storing them I also wonder, how to handle migration of THP (which uses migration PMDs) and THP splitting (which also uses migration PTEs); maybe they don't apply on sparc? Further, I wonder if you have to handle zapping of swap/migration PTEs, and if you'd also have to cleanup+freeup any allcoated tag memory? E.g., zap_pte_range() can simply zap swap and migration entries, wouldn't Last but not least, I do wonder if mremap() -- e.g., move_pte() -- and fork() -- e.g., copy_pte_range() -- are handled properly, where we can end up moving/copying swap+migration PTEs around. -- Thanks, David / dhildenb