From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AF485C433FE for ; Fri, 28 Jan 2022 08:41:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 34F786B0071; Fri, 28 Jan 2022 03:41:23 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 2D92E6B0072; Fri, 28 Jan 2022 03:41:23 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 152256B0073; Fri, 28 Jan 2022 03:41:23 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0003.hostedemail.com [216.40.44.3]) by kanga.kvack.org (Postfix) with ESMTP id 01E5E6B0071 for ; Fri, 28 Jan 2022 03:41:22 -0500 (EST) Received: from smtpin20.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id B60FB8249980 for ; Fri, 28 Jan 2022 08:41:22 +0000 (UTC) X-FDA: 79079051604.20.DE66219 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf02.hostedemail.com (Postfix) with ESMTP id 118CF80016 for ; Fri, 28 Jan 2022 08:41:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1643359281; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ZoOlUeTcIB3lde6agUIzDy5+c2Kj0vOnYJNRZPTSdL4=; b=akm9+BmFfw4+SymGGiuDjA//xG/hI5HvaAGQ1wr9ANlVPO1OMMqB+6Okpmk6ir0L0BwfT9 9x2y/2cLRx0CaJ3eBbHM9pPtmb80jmby26JIE3DdAKIG3jOuQyOx2HbxY3ZEx4oHoNJDiX Na8eMlfYnoo39MJHVjzODzIQuV2dMlU= Received: from mail-wm1-f71.google.com (mail-wm1-f71.google.com [209.85.128.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-98-xvof_SNqNu27ERdvfEtS9A-1; Fri, 28 Jan 2022 03:41:20 -0500 X-MC-Unique: xvof_SNqNu27ERdvfEtS9A-1 Received: by mail-wm1-f71.google.com with SMTP id w5-20020a1cf605000000b0034b8cb1f55eso5576758wmc.0 for ; Fri, 28 Jan 2022 00:41:19 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent :content-language:to:cc:references:from:organization:subject :in-reply-to:content-transfer-encoding; bh=ZoOlUeTcIB3lde6agUIzDy5+c2Kj0vOnYJNRZPTSdL4=; b=XdaP0kK4YGYEB2Eh3kozkWP6fTXF2TAk05+HbB9/fZUUoU/x5zhFWUlTzhWJ7MNsoT Yb7OsWwOJQf92DrgPQTAwY56Xzhjpu/G0Zf1Q0v8UT4YNQ4Rnrnu4AI7WZS0RQAiGqM+ h0L411VI6glp6TnMRLsaXWKpQyyjLWvL3FCi6EOJTb86CP5d1d6IXUthW7YlpHTLrrMj sgvVxWEHTzctoiIOfhhLMjzzr19qRBWITwCmIGgoynU15uQz8zYLeo03+kL1dVtRFWiD pR0ZpRdRJWcgtABFeyYDROsYY3gRzkckhkhJD/Bh71qnw4yaNwRT2vAWiRTv80cNoHKC ugkQ== X-Gm-Message-State: AOAM532/yBdx+hi8xQ8BJYRqZsj0timFGQ2kzKzw0658097qZajOq901 zg47ZbT6iaQPM9+UVjgYv+1s+inKRu9BFpbR237Kv9MZg9GTuhtPuFiFYW0h83stRfcjlY7Ye5G qmmnS73T3egA= X-Received: by 2002:a05:600c:3d10:: with SMTP id bh16mr6696136wmb.127.1643359278852; Fri, 28 Jan 2022 00:41:18 -0800 (PST) X-Google-Smtp-Source: ABdhPJxDYshvnGcRSoQjNBBVvUbNaeHYsPPNmiz9uM5jWeFmAZDN6EGM2Es8WGNrnAHlANkpfhkOFQ== X-Received: by 2002:a05:600c:3d10:: with SMTP id bh16mr6696103wmb.127.1643359278632; Fri, 28 Jan 2022 00:41:18 -0800 (PST) Received: from ?IPV6:2003:cb:c70e:5c00:522f:9bcd:24a0:cd70? (p200300cbc70e5c00522f9bcd24a0cd70.dip0.t-ipconnect.de. [2003:cb:c70e:5c00:522f:9bcd:24a0:cd70]) by smtp.gmail.com with ESMTPSA id h127sm17207297wmh.2.2022.01.28.00.41.16 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 28 Jan 2022 00:41:17 -0800 (PST) Message-ID: <205231d0-2b4e-7d93-1028-2d501c1cbf74@redhat.com> Date: Fri, 28 Jan 2022 09:41:16 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.4.0 To: Yang Shi Cc: Linux Kernel Mailing List , Andrew Morton , Hugh Dickins , Linus Torvalds , David Rientjes , Shakeel Butt , John Hubbard , Jason Gunthorpe , Mike Kravetz , Mike Rapoport , "Kirill A . Shutemov" , Matthew Wilcox , Vlastimil Babka , Jann Horn , Michal Hocko , Nadav Amit , Rik van Riel , Roman Gushchin , Andrea Arcangeli , Peter Xu , Donald Dutile , Christoph Hellwig , Oleg Nesterov , Jan Kara , Liang Zhang , Linux MM References: <20220126095557.32392-1-david@redhat.com> <20220126095557.32392-7-david@redhat.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [PATCH RFC v2 6/9] mm/khugepaged: remove reuse_swap_page() usage In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 118CF80016 X-Stat-Signature: gf8mttaghx8hjxs996b89tt6mf5it1sd Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=akm9+BmF; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf02.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=david@redhat.com X-Rspam-User: nil X-HE-Tag: 1643359281-954052 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 27.01.22 22:23, Yang Shi wrote: > On Wed, Jan 26, 2022 at 2:00 AM David Hildenbrand wrote: >> >> reuse_swap_page() currently indicates if we can write to an anon page >> without COW. A COW is required if the page is shared by multiple >> processes (either already mapped or via swap entries) or if there is >> concurrent writeback that cannot tolerate concurrent page modifications. >> >> reuse_swap_page() doesn't check for pending references from other >> processes that already unmapped the page, however, >> is_refcount_suitable() essentially does the same thing in the context of >> khugepaged. khugepaged is the last remaining user of reuse_swap_page() and >> we want to remove that function. >> >> In the context of khugepaged, we are not actually going to write to the >> page and we don't really care about other processes mapping the page: >> for example, without swap, we don't care about shared pages at all. >> >> The current logic seems to be: >> * Writable: -> Not shared, but might be in the swapcache. Nobody can >> fault it in from the swapcache as there are no other swap entries. >> * Readable and not in the swapcache: Might be shared (but nobody can >> fault it in from the swapcache). >> * Readable and in the swapcache: Might be shared and someone might be >> able to fault it in from the swapcache. Make sure we're the exclusive >> owner via reuse_swap_page(). >> >> Having to guess due to lack of comments and documentation, the current >> logic really only wants to make sure that a page that might be shared >> cannot be faulted in from the swapcache while khugepaged is active. >> It's hard to guess why that is that case and if it's really still required, >> but let's try keeping that logic unmodified. > > I don't think it could be faulted in while khugepaged is active since > khugepaged does hold mmap_lock in write mode IIUC. So page fault is > serialized against khugepaged. It could get faulted in by another process sharing the page, because we only synchronize against the current process. > > My wild guess is that collapsing shared pages was not supported before > v5.8, so we need reuse_swap_page() to tell us if the page in swap > cache is shared or not. But it is not true anymore. And khugepaged > just allocates a THP then copy the data from base pages to huge page > then replace PTEs to PMD, it doesn't change the content of the page, > so I failed to see a problem by collapsing a shared page in swap > cache. But I'm really not entirely sure, I may miss something... Looking more closely where this logic originates from, it was introduced in: commit 10359213d05acf804558bda7cc9b8422a828d1cd Author: Ebru Akagunduz Date: Wed Feb 11 15:28:28 2015 -0800 mm: incorporate read-only pages into transparent huge pages This patch aims to improve THP collapse rates, by allowing THP collapse in the presence of read-only ptes, like those left in place by do_swap_page after a read fault. Currently THP can collapse 4kB pages into a THP when there are up to khugepaged_max_ptes_none pte_none ptes in a 2MB range. This patch applies the same limit for read-only ptes. The change essentially results in a read-only mapped PTE page getting copied and mapped writable via a new PMD-mapped THP. It mentions do_swap_page(), so I assume it just tried to do what do_swap_page() would do when trying to map a page swapped in from the page cache writable immediately. But we differ from do_swap_page() that we're not actually going to map the page writable, we're going to copy the page (__collapse_huge_page_copy()) and map the copy writable. I assume we can remove that logic. -- Thanks, David / dhildenb