From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6D9D1C54E94 for ; Wed, 25 Jan 2023 08:24:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D18656B0073; Wed, 25 Jan 2023 03:24:31 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id CC86F6B0075; Wed, 25 Jan 2023 03:24:31 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B901C6B007B; Wed, 25 Jan 2023 03:24:31 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id AAAAE6B0073 for ; Wed, 25 Jan 2023 03:24:31 -0500 (EST) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 6E6E01409AD for ; Wed, 25 Jan 2023 08:24:31 +0000 (UTC) X-FDA: 80392634742.27.6F0B38A Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by imf12.hostedemail.com (Postfix) with ESMTP id 7FB074000E for ; Wed, 25 Jan 2023 08:24:29 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=sVx9cPTQ; spf=pass (imf12.hostedemail.com: domain of mhocko@suse.com designates 195.135.220.28 as permitted sender) smtp.mailfrom=mhocko@suse.com; dmarc=pass (policy=quarantine) header.from=suse.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1674635069; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=88qMX3h57j/RfZUiXZEagaCAMHZdsjE9HT3TlaCAtoM=; b=iQdZB8i4vIw2dBojCvOK5upDI+KyHlw+D6JXhhbF9vsWlGOPZo9FRQXJUA3UAahLbW7o2L KUYPzU4LXTO2cUyjSMBGesGhBjhyALFc6R3rZT4nr36msaYLtAGQYUE/wsmBQemTeMkKqs zrE6cjSZApJOtUcHE30UfMWioNiNQCI= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=sVx9cPTQ; spf=pass (imf12.hostedemail.com: domain of mhocko@suse.com designates 195.135.220.28 as permitted sender) smtp.mailfrom=mhocko@suse.com; dmarc=pass (policy=quarantine) header.from=suse.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1674635069; a=rsa-sha256; cv=none; b=RadwkZmbH7UK1DMXWwUO57W2CdVcJFOTsRap/x1PoNLT4RwBYYKbPNfHIV0FwC6vE9pc84 MHjekbzp3QxBQ4R5sWibJC63BS3CBGKx7vyvkpg675dgOAED+LEwjQkm515IftChBrzByo x1T4PsziMrsIeHynxdY9So62ry9ZKS8= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 1948C21A01; Wed, 25 Jan 2023 08:24:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1674635068; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=88qMX3h57j/RfZUiXZEagaCAMHZdsjE9HT3TlaCAtoM=; b=sVx9cPTQ1PqrMxYByS/mFKl+KAiZWk1fHdADgoSzXturzar6diTnlfwH8TI8wdrsWM3Jbs zQBteUC6tLqbWQHMbx9hXBhqAqA5fijDD3shF2ma1vrSn48gxWvBwBiC8hpNn7Rm/tagst GL0n9KYuJHxf5YdXsAG32HSJp7mfIwE= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id EC42D1339E; Wed, 25 Jan 2023 08:24:27 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id gIuBNzvn0GOmdQAAMHmgww (envelope-from ); Wed, 25 Jan 2023 08:24:27 +0000 Date: Wed, 25 Jan 2023 09:24:27 +0100 From: Michal Hocko To: Mike Kravetz Cc: linux-mm@kvack.org, Naoya Horiguchi , David Rientjes , Matthew Wilcox , David Hildenbrand , Peter Xu , James Houghton , Muchun Song Subject: Re: A mapcount riddle Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Queue-Id: 7FB074000E X-Stat-Signature: wn9g1dxdhsy98ep75d3in9f1fkg1q3na X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1674635069-97235 X-HE-Meta: U2FsdGVkX1/Xxr0LNGnd4nNcDWbpQCRSg3NMCpDDRvd6xUTMK28O7/SuPSar6EY/R4OKiuv+r/ubyCwLdtXzQ31S6nKejAIVXeq6/B/zTPuBJ1LSBJ/RX5d0jTRg4KaEgjjDuQ5kISZeV8Q+LMEUxBcsjlralNis8hqk8vhzHrU0fRlpyyX4jnXmvUXkVNSptz28DipAZ0iQtSNvWr1oHQ5sJ7UuHhch2glh1CqMF+D54OBv26UGgkZg6MdLfXy8YikSEkuL6UU1lcN343PYsUmJcuglEnMxBgD7kz5RDUXPDRcePFTHq8ZzeRgpVg5Y8/yisa1K+hu01EzfSLOs/0nrNEnmvvV6N/VGwtEIuQqjVv1xkI7YskY5noXjQZW5YF72mMzylbcgtZVyBi60NT9PsXJXjsz7uG4I8g3KB2apnhBe9cOTgTlokNeer+0vEiVHkRYzj7o0QaU0Ia9USolRe02UTq8WsezIaHob6Om/FdtmjICw7qBmrAohzyicC5RfHrDOlpcCVblr+V90oyvFgU9Qeuxt4ReOxK6TO96MDeGtP10ltY/WMLSZFEeF5Qggk6XevCeDSKhCcM5XM4qg4K5MHT8CGPa9L6Yqt66WCRqeOpoGZxD2QQOAdhOyCcLJ6N09hAa5rVxrYLWGlruS/RinoJ8j6jhigaGPhBkHOTvmFSyNFN7LFW2rQhoBvgQ4fnrprDu201CSpWU2i2JWTPAWDaOPlZQP68oEbrllw528ctQvu1XMu8ZvcO1Hp+v4ITWKgVAFkHtUpNQhGPXbupEtsMIU1tD6EAKRWDihrQ/h0Oc2Aw6LXOpbNNH52VtwE6RZ/hkqWrrSTjR14bkUuf8WrPc9cNgr96vFzW6Rm9nOUx9E3UXRnNAHtS6tcpMHGwsf95mzFIJwzGICX9tL6LHy54AYw1WT1gUmWegaibpCYNtYQExJCmPoVwRkrbPsut81udSEh7izlWO 0fvdCE7d mwU+ECS8GsMkMByNao//VnxF4TCUE/ZCj35Ca1IIxFzCkpeq6OVIGienFR9aRgB85uCpdZCLQ3nxomvpvUDCQ6aYy29/8g0yol7Ul7huCX1/kn4Thm8SNGTklBK9vzhfx8dsJ5ma0gotVwYXbOjlQ1cvctMtTybeGvyoHhY+SGIS8RXw= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue 24-01-23 12:56:24, Mike Kravetz wrote: > Q How can a page be mapped into multiple processes and have a > mapcount of 1? > > A It is a hugetlb page referenced by a shared PMD. > > I was looking to expose some basic information about PMD sharing via > /proc/smaps. After adding the code, I started a couple processes > sharing a large hugetlb mapping that would result in the use of > shared PMDs. When I looked at the output of /proc/smaps, I saw > my new metric counting the number of shared PMDs. However, what > stood out was that the entire mapping was listed as Private_Hugetlb. > WTH??? It certainly was shared! It's been quite some time since I had to look into this area but pmd shared hugetlb pages have always been quite weird AFAIR. > The routine smaps_hugetlb_range > decides between Private_Hugetlb and Shared_Hugetlb with this code: > > if (page) { > int mapcount = page_mapcount(page); > > if (mapcount >= 2) > mss->shared_hugetlb += huge_page_size(hstate_vma(vma)); > else > mss->private_hugetlb += huge_page_size(hstate_vma(vma)); > } > > After spending some time looking for issues in the page_mapcount code, > I came to the realization that the mapcount of hugetlb pages only > referenced by a shared PMD would be 1 no matter how many processes had > mapped the page. When a page is first faulted, the mapcount is set to 1. > When faulted in other processes, the shared PMD is added to the page > table of the other processes. No increase of mapcount will occur. yes, really subtle but looking at it from the hugetlb POV, it is page table that is shared rather than the underlying page. Is this distinction useful/reasonable to the userspace. Not really but pmd sharing is quite hard to stumble over by accident and I suspect most users who use this feature just got used to those specialities. > At first thought this seems bad. However, I believe this has been the > behavior since hugetlb PMD sharing was introduced in 2006 and I am > unaware of any reported issues. I did a audit of code looking at > mapcount. In addition to the above issue with smaps, there appears > to be an issue with 'migrate_pages' where shared pages could be migrated > without appropriate privilege. > > /* With MPOL_MF_MOVE, we migrate only unshared hugepage. */ > if (flags & (MPOL_MF_MOVE_ALL) || > (flags & MPOL_MF_MOVE && page_mapcount(page) == 1)) { > if (isolate_hugetlb(page, qp->pagelist) && > (flags & MPOL_MF_STRICT)) > /* > * Failed to isolate page but allow migrating pages > * which have been queued. > */ > ret = 1; > } Could you elaborate what is problematic about that? The whole pmd sharing is a cooperative thing. So if some of the processes decides to migrate the page then why that should be a problem for others sharing that page via page table? Am I missing something obvious? > I will prepare fixes for both of these. However, I wanted to ask if > anyone has ideas about other potential issues with this? > > Since COW is mostly relevant to private mappings, shared PMDs generally > do not apply. Nothing stood out in a quick audit of code. I am pretty sure there are other corner cases lurking in this area which are really hard to look through until you stumble over them. The shared mapping reporting is probably good to have fixed but I am not sure why the migration is a real problem. Thanks! -- Michal Hocko SUSE Labs