From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 74E12C433EF for ; Thu, 16 Dec 2021 09:37:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B80336B0071; Thu, 16 Dec 2021 04:37:44 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B2FE46B0073; Thu, 16 Dec 2021 04:37:44 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A459C6B0074; Thu, 16 Dec 2021 04:37:44 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0129.hostedemail.com [216.40.44.129]) by kanga.kvack.org (Postfix) with ESMTP id 95AD66B0071 for ; Thu, 16 Dec 2021 04:37:44 -0500 (EST) Received: from smtpin15.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 5689C876F4 for ; Thu, 16 Dec 2021 09:37:34 +0000 (UTC) X-FDA: 78923154828.15.522D74E Received: from mail-lf1-f47.google.com (mail-lf1-f47.google.com [209.85.167.47]) by imf04.hostedemail.com (Postfix) with ESMTP id E37EB40014 for ; Thu, 16 Dec 2021 09:37:33 +0000 (UTC) Received: by mail-lf1-f47.google.com with SMTP id bt1so5862184lfb.13 for ; Thu, 16 Dec 2021 01:37:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov-name.20210112.gappssmtp.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=Oka+YI0LvffBVqPDwGvUxJqrcMiVKh6ABLGssY8+VyM=; b=QFCgck31BXIRZW0GpeIhPKaCJqHCH28pOfAPmH0Sx7CIEBeBMFhp+VBwKVZy4zC/rO 6B97nwPBLD03tRaRUvWrdnA3E7AXa+6fbO8XjAsYDBHMHstqZmP3KC3eoU2YOBlU2GG5 euZNkgcnpznTFP9rdgmvYHTo9C1Z1jI1/OcEQqkNKkHDU6rPewJHSJ4uKybggqhdqv1e iyRcWnulsB2drwREy4Z0ZPnjALhr5ZRs3W5ANW6J9WAmSZCwC8w99M5A3+XmgqfAAhyp edDkSQQY/q5eMwvtMMV726aJe8FIys3XPukvU1tv2VyuT4tFf1/0t0Wf0wIux7cNRRMM KKFQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=Oka+YI0LvffBVqPDwGvUxJqrcMiVKh6ABLGssY8+VyM=; b=5GjwN/YNOGI1r/2h5TExcQwDmC770MeuO60nTHdhFDhgrPwqr8J8aU06hua+cUVaOf RNTdBGFqTKobi24QMD1p7vviaFt9fXbUbg1w/b4c1rS1nRIys/6bhEt7FiGrR0QklbhQ nj8Fw3j8SaUxo0eT6xaKJ0dVEa1HApYOWwaZ8d5thpProvVivcl0TlQcCrT+NUWL6/o3 nmy8Ur4dJde2pP6K+gvKy0ekYQfKmf7yk0/KL+WHU+fXWzycoZOyb2TE+VdMP48gJK8v aerJupqjlWouq/qr7RxXHnZMdDYUWgKT1I/QrvxK4VeUEurv0rKTwB3D+rGo0VR5sQ3e Q3dQ== X-Gm-Message-State: AOAM530j6zBSE36nfEE59TVHORj6nIUiJV9xcoPHQHlwKK1bnLz48XpZ is7mYu46vBRXECqBF1kL0Yj+aw== X-Google-Smtp-Source: ABdhPJzYKUTaiABNT3XpJ32ik95hUFSgpuAD3UfjnaQrtfuW9S84s1/hjNYu/DR7MPOXygEQcnnaYQ== X-Received: by 2002:a05:6512:10c4:: with SMTP id k4mr13906618lfg.373.1639647452151; Thu, 16 Dec 2021 01:37:32 -0800 (PST) Received: from box.localdomain ([86.57.175.117]) by smtp.gmail.com with ESMTPSA id b32sm1026335ljf.41.2021.12.16.01.37.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 16 Dec 2021 01:37:31 -0800 (PST) Received: by box.localdomain (Postfix, from userid 1000) id C4213103C3C; Thu, 16 Dec 2021 12:37:37 +0300 (+03) Date: Thu, 16 Dec 2021 12:37:37 +0300 From: "Kirill A. Shutemov" To: Matthew Wilcox Cc: linux-mm@kvack.org, Hugh Dickins , David Hildenbrand , Mike Kravetz Subject: Re: folio mapcount Message-ID: <20211216093737.7w2fv7p7j2rrx5r6@box.shutemov.name> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: E37EB40014 X-Stat-Signature: 5krb3s96e7qekw6xhbbqmum4tfkm455x Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=shutemov-name.20210112.gappssmtp.com header.s=20210112 header.b=QFCgck31; spf=none (imf04.hostedemail.com: domain of kirill@shutemov.name has no SPF policy when checking 209.85.167.47) smtp.mailfrom=kirill@shutemov.name; dmarc=none X-HE-Tag: 1639647453-617767 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Dec 15, 2021 at 09:55:20PM +0000, Matthew Wilcox wrote: > I've been trying to understand whether we can simplify the mapcount > handling for folios from the current situation with THPs. Let me > quote the commit message from 53f9263baba6: > > > mm: rework mapcount accounting to enable 4k mapping of THPs > > > > We're going to allow mapping of individual 4k pages of THP compound. It > > means we need to track mapcount on per small page basis. > > > > Straight-forward approach is to use ->_mapcount in all subpages to track > > how many time this subpage is mapped with PMDs or PTEs combined. But > > this is rather expensive: mapping or unmapping of a THP page with PMD > > would require HPAGE_PMD_NR atomic operations instead of single we have > > now. > > > > The idea is to store separately how many times the page was mapped as > > whole -- compound_mapcount. This frees up ->_mapcount in subpages to > > track PTE mapcount. > > > > We use the same approach as with compound page destructor and compound > > order to store compound_mapcount: use space in first tail page, > > ->mapping this time. > > > > Any time we map/unmap whole compound page (THP or hugetlb) -- we > > increment/decrement compound_mapcount. When we map part of compound > > page with PTE we operate on ->_mapcount of the subpage. > > > > page_mapcount() counts both: PTE and PMD mappings of the page. > > > > Basically, we have mapcount for a subpage spread over two counters. It > > makes tricky to detect when last mapcount for a page goes away. > > > > We introduced PageDoubleMap() for this. When we split THP PMD for the > > first time and there's other PMD mapping left we offset up ->_mapcount > > in all subpages by one and set PG_double_map on the compound page. > > These additional references go away with last compound_mapcount. > > > > This approach provides a way to detect when last mapcount goes away on > > per small page basis without introducing new overhead for most common > > cases. > > What breaks if we simply track any mapping (whether by PMD or PTE) > as an increment to the head page (aka folio's) refcount? The obvious answer is CoW: as discussed yesterday we need exact mapcount to know if the page can be re-used or has to be copied. Consider the case when you have folio mapped as PMD and then split into PTE page table (like with mprotect()). You get WP page fault on a page that has mapcount == 512. How would you know if we can re-use the 4k? Also we need to detect case when the last mapping of a 4k in the folio has gone to trigger deferred_split_huge_page() logic. > Essentially, we make the head mapcount 'the number of VMAs which contain > a reference to any page in this folio'. Okay, so you will have mapcount == 2 or 3 for mprotect case above, not 512. But it doesn't help with answering question if the page can be re-used. You would need to do rmap walk to get the answer. Note also that VMA lifecycle is different from page lifecycle: MADV_DONTNEED removes mapping, but leaves VMA intact. Who would decrement mapcount here? > We can remove PageDoubleMap. The tail refcounts will all be 0. If it's > useful, we could introduce a 'partial_mapcount' which would be <= > mapcount (but I don't know if it's useful). Splitting a PMD would not > change ->_mapcount. Splitting the folio already causes the folio to be > unmapped, so page faults will naturally re-increment ->_mapcount of each > subpage. > > We might need some additional logic to treat a large folio (aka compound > page) as a single unit; that is, when we fault on one page, we place > entries for all pages in this folio (that fit ...) into the page tables, > so that we only account it once, even if it's not compatible with using > a PMD. I still don't see a way to simplify mapcount for THP. But I'm preconsived becasue I'm the author of the current scheme. Please, prove me wrong. I want to be mistaken. :) -- Kirill A. Shutemov