From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 43AC5EB64DC for ; Fri, 14 Jul 2023 14:12:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 93BBD6B0078; Fri, 14 Jul 2023 10:12:46 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8EC8E6B007B; Fri, 14 Jul 2023 10:12:46 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 798CC6B007D; Fri, 14 Jul 2023 10:12:46 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 68DEF6B0078 for ; Fri, 14 Jul 2023 10:12:46 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 3185F1A0251 for ; Fri, 14 Jul 2023 14:12:46 +0000 (UTC) X-FDA: 81010408332.02.6A8932E Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf25.hostedemail.com (Postfix) with ESMTP id 824DBA0020 for ; Fri, 14 Jul 2023 14:12:43 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=LUxO2NPM; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf25.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1689343963; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=nrLKOdVXb9Fv/CAMsQrfuXshfrR4te+3T5qXw1Gj1IE=; b=B8hAOhuZLia9CrqRY5R4P8ZQWA9/0GovF5Ky8z0beQlsUWx1vzoAX+2nu3z95y/U9ZnnzG EvFaRpkhZmXR9pdzyXihaMGjec2VGTcsxQ2ZeDCN3Ui19YcgrE7352VX8VQlyjdeOVHqad VmeMEP8BdlvJ8xcXRGliqqaVeLCBXDc= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=LUxO2NPM; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf25.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1689343963; a=rsa-sha256; cv=none; b=l1BeKe65Omo9OOCbXfVY/ataB0dUV+/ZuQfO2MJF77Ev8sMYiaUK89jJjsqKoEtLOWAZl6 nydZ0N5vbhRmNFDMwAKW7Jc0BLr4mMcwBC8lO7SrSzK9Mmm3nqJ+Kzn8HcQG66EKW4KQ0a n0X1BTEc2DKTXkvgkboVHfahve4VNwA= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1689343962; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=nrLKOdVXb9Fv/CAMsQrfuXshfrR4te+3T5qXw1Gj1IE=; b=LUxO2NPMY2xUyy30y/aiX/zeI+ErTMoT172rIxV5NNeOVh9oQjuRYI/9GuK8oSV3BA2cFF VVIRlYsUAmcD90nAu0LIuTfWSK9dCt95Izk7yBm8CDGq9FGVSAWSjH5h3N3CZiBKD4GWcr YcOYlMXEbfuUCswxzzYcNI9yveNKgBQ= Received: from mail-wm1-f69.google.com (mail-wm1-f69.google.com [209.85.128.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-489-NoJgwrA3M8yEHWTCu4L0mA-1; Fri, 14 Jul 2023 10:12:41 -0400 X-MC-Unique: NoJgwrA3M8yEHWTCu4L0mA-1 Received: by mail-wm1-f69.google.com with SMTP id 5b1f17b1804b1-3fbe590234aso11430355e9.1 for ; Fri, 14 Jul 2023 07:12:41 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689343960; x=1691935960; h=content-transfer-encoding:in-reply-to:subject:organization:from :references:cc:to:content-language:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=nrLKOdVXb9Fv/CAMsQrfuXshfrR4te+3T5qXw1Gj1IE=; b=gOdRcwxt+BsSOVPn8Vh2a/DZzZ6Z8qlo8QqdPf1z0hn9mZY2Ot8hH0wrKz/j7PdR6s MF1FxumYErZUDPB9Pk4iNmPOAB/Qypcz6g9JnXI8uhpX+hqdNc/lLa3GcMBsvyo3JUX4 udVRRk6tQ8lVgD3luW1pilgQWWj7sp+Uoq7aR+1EICnFXRRd59k7OW4kbNEWPVz+o3Le tMODl5bMy0zuUmy5CSDMpicPbltu/wBerOUGOVxi2qbfqLZBhaFn0J1d8FHrrOn+x7F3 l+8tTO2VvHt7tLHZ6CERILvTqdWslSS8wYd2BoMVAH34sbQg0d0iSWT9/g9U9FUzhYZ4 gCMQ== X-Gm-Message-State: ABy/qLbAUARcvVfGjgGuRS3vxk7l3hluJEQylUCO3EvloBovnY43sdG7 3zH5uUnI8zOP8fi54JoDU4ZyHzg5I9ZuF8UAhCm5bK+pzQdggtqRQebYhoAbk/6N28worLFDs71 8yTm7tE4E+zw= X-Received: by 2002:a05:600c:2353:b0:3fc:8a:7c14 with SMTP id 19-20020a05600c235300b003fc008a7c14mr4978102wmq.18.1689343960398; Fri, 14 Jul 2023 07:12:40 -0700 (PDT) X-Google-Smtp-Source: APBJJlHtmmiOx8q2VOGhJpOeQ3H0bJZAr2CZdVX4yXHW0FI1RF1vgnAlnczvdivP0L6fJcDjc4jN3g== X-Received: by 2002:a05:600c:2353:b0:3fc:8a:7c14 with SMTP id 19-20020a05600c235300b003fc008a7c14mr4978081wmq.18.1689343960003; Fri, 14 Jul 2023 07:12:40 -0700 (PDT) Received: from ?IPV6:2003:cb:c70a:4500:8a9e:a24a:133d:86bb? (p200300cbc70a45008a9ea24a133d86bb.dip0.t-ipconnect.de. [2003:cb:c70a:4500:8a9e:a24a:133d:86bb]) by smtp.gmail.com with ESMTPSA id e14-20020a05600c218e00b003fbbe41fd78sm1557672wme.10.2023.07.14.07.12.38 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 14 Jul 2023 07:12:39 -0700 (PDT) Message-ID: <237fd94f-0062-124c-6317-76fc4207ccd7@redhat.com> Date: Fri, 14 Jul 2023 16:12:38 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.12.0 To: "Yin, Fengwei" , Yu Zhao , Zi Yan Cc: Minchan Kim , linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, willy@infradead.org, ryan.roberts@arm.com, shy828301@gmail.com, "Vishal Moola (Oracle)" References: <20230713150558.200545-1-fengwei.yin@intel.com> <8547495c-9051-faab-a47d-1962f2e0b1da@intel.com> <2cbf457e-389e-cd45-1262-879513a4cf41@intel.com> <36cfe140-5685-bea7-d267-4a61f21aeb79@redhat.com> <9bcc8014-f5ef-9021-3ffc-032e39c32249@intel.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [RFC PATCH] madvise: make madvise_cold_or_pageout_pte_range() support large folio In-Reply-To: <9bcc8014-f5ef-9021-3ffc-032e39c32249@intel.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 824DBA0020 X-Stat-Signature: iudn3p7y4udh73q3birkxjiqzokummmf X-Rspam-User: X-HE-Tag: 1689343963-50073 X-HE-Meta: U2FsdGVkX18Ye+VG9Dip9TIvGavRpj/ugpm3xhew4RWjafLPMMoe4gTL4BGJa9CwEX6YL+YclAc8GK+i+VAXn/p2luIz9WxPAbrPlh+jeg5YdUsGSzMfTcY6U3jFVg9sgaorK1uhfOeis+4n5cQ0QpXZEgTM/Nlo3wNu3soGmqedHxQ6080qY0sKDxeyU+cNhkMWnqLYqTn5TUKzpiiZWLdyRlU9fGyxy8toZ6gsDsyqSQoFIBwwn5wjY8POdEZSVeVP+CiCBNhsYyifuRIB3Uh5glGcaK3+aRptjEZpfFEU4VunXLMJtGlXGv3JyrnOoo8yP835JZQWV9bWD/4Iiq4V28WXyZfdZLlWMJnUdkl25pmT+dDD/p/Vg9YEfrTpfD6OIa8NUBeB355njv1nTo5l81sD19IaK4597YGirmjt59BXQYxW0B5ED1BYQRbcuIcw4U2G2/uMejFZRNVWeVrWki6Ru6sY4tOdeOiC3X3j9MQ0ApycZOlLwUX90g4i663ZAwH8p4+9q61Mx/mlfQ6Hw3NIWtPCbOIZ0f7Na0+9q1re8fdNZuHYNbCIKzYuyOR+xoXYvJglW1jzBLr9P6TvYsOriCstkXDJ45978TqiMBFzLYeCsVc9fzTkYBwZLksowF3c2vHYvn3swMjw14anwCnWOkFDC//asbXSONrcyzep4m391rdskoEGEInjxAP7ic3eakrsstaNq3o24O96oB86suwElGjD8ADlTB7PuuJ52ApoUU033TmNL+P9aR5Z30Z9g/ud76MFl60CoWQ+gGe411wGxxm8Eai4swqd1GsmdPQud0Q4j6cN5YQUkA0UozNkBvyMgtpVcXWeEgTfwhW0lx7fI2hZYm/03tJRX20ziSB167dZjh7IK+VUj49mY7nfissVOYu7UUUBrHAorAUrry0z1ZjK9iezWmMnItdxAjnVIrWBg8UX/s4uqb2S4T9U1qUInUca79S 9CwiFTA+ aMXCpsEecj6x+2dijbkrvW5KQeSbcmK/y+dwanyYhMCHPQoSHdPxAkXk2+iKx6/nsSeNgl8qtrEtqZNoOdEKtQRc0rM/xo5Caz3q9YjA3gBGm1vwVGZ2DaJQmocpS9NsvefOiRe2zCgmU2ryU13MtOJh+0bYsDoTFKlDyTUI4LvuoZSSeUc3+C6RoY23HWLlMha2PYH+UZYb0T3LeSIbY/Z7TWlzI9UQNPVXuKqI425jcP7NQSH6Gc+IGV3/dl7uRXhH0Z+65luCDwOWD+D3ZSdQmanAjsaXg5mGl9kC3c3FG1ILiG/1x1cE9COWFrZnMURjvstoC+wwUlekmYbfM9mO7J/7lK4hfwuPptqeAfK2CZwZCBRBPgY+rLiPzz4RyEqfEV5dPbUuWtXEKP0Z9Ghz32YzoqJWoMorwiPw1gAP9gfx3NG17Me0rzpD8zhTJJQD1wcrP5VHItmBQJdbBiDWqJFvbqvhHv+g1auF2ckaay03uvUkeLUVxcA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 14.07.23 15:58, Yin, Fengwei wrote: > > > On 7/14/2023 5:25 PM, David Hildenbrand wrote: >> >> (1) We're unmapping a single subpage, the compound_mapcount == 0 >>     and the total_mapcount > 0. If the subpage mapcount is now 0, add it >>     to the deferred split queue. >> >> (2) We're unmapping a complete folio (PMD mapping / compound), the >>     compound_mapcount is 0 and the total_mapcount > 0. >> >>  (a) If total mapcount < folio_nr_pages, add it >>      to the deferred split queue. >> >>  (b) If total mapcount >= folio_nr_pages , we have to scan all subpage >>      mapcounts. If any subpage mapcount == 0, add it to the deferred >>      split queue. >> >> >> (b) is a bit nasty. It would happen when we fork() with a PMD-mapped THP, the parent splits the THP due to COW, and then our child unmaps or splits the PMD-mapped THP (unmap easily happening during exec()). Fortunately, we'd only scan once when unmapping the PMD. >> >> >> Getting rid of the subpage mapcount usage in (1) would mean that we have to do exactly what we do in (2). But then we'd need to ha handle (2) (B) differently as well. >> >> So, for 2 (b) we would either need some other heuristic, or we add it to the deferred split queue more frequently and let that one detect using an rmap walk "well, every subpage is still mapped, let's abort the split". > > Or another option for 2 (b): don't add it to deferred split queue. We > know the folio in deferred list is mainly scanned when system needs to > reclaim memory. > > Maybe it's better to let page reclaim choose the large folio here because > page reclaiming will call folio_referenced() which does rmap_walk()->pvmw(). > And we can reuse rmap_walk() in folio_referenced() to detect whether there > are pages of folio are not mapped. If it's the case, we can split it then. > > Comparing to deferred list, the advantage is that folio_referenced() doesn't > unmap page. While in deferred list, we need to add extra rmap_walk() to > check whether there is page not mapped. Right, I also came to the conclusion that the unmapping is undesirable. However, once benefit of the unmap is that we know when to stop scanning (due to page_mapped()). But maybe the temporary unmapping is actually counter-productive. > > Just a thought. I could miss something here. Thanks. Interesting idea. I also had the thought that adding folios to the deferred split queue when removing the rmap is semantically questionable. Yes, we remove the rmap when zapping/unmapping a pte/pmd. But we also (eventually only temporarily!) unmap when splitting a THP or when installing migration entries. Maybe we can flag folios when zapping PTEs that they are a reasonable candidate to tell page reclaim code "this one was partially zapped, maybe there is some memory to reclaim there, now". Maybe that involves the deferred split queue. -- Cheers, David / dhildenb