From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D6E96E7718F for ; Sun, 29 Dec 2024 21:13:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DE50E6B0082; Sun, 29 Dec 2024 16:13:11 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D94C56B0083; Sun, 29 Dec 2024 16:13:11 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C5D776B0085; Sun, 29 Dec 2024 16:13:11 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id A87716B0082 for ; Sun, 29 Dec 2024 16:13:11 -0500 (EST) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 0CB1F14020C for ; Sun, 29 Dec 2024 21:13:11 +0000 (UTC) X-FDA: 82949245512.30.EC1D1A0 Received: from mail-ua1-f54.google.com (mail-ua1-f54.google.com [209.85.222.54]) by imf13.hostedemail.com (Postfix) with ESMTP id 70B5620008 for ; Sun, 29 Dec 2024 21:12:25 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=UGzQ1J5v; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf13.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.222.54 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1735506768; a=rsa-sha256; cv=none; b=qr5Ux/g8PcYBbU8ArRfLLj79U75KcXkj97kyAx2gZukbz8qX5oJtj4fBNQ++66g2xmPu6C TuKahILKIsyYDho6yqaWZ8HfOjJHCwJ8Crv/VKeEOpJYC4BfffxsFAWVEH1pP6a5Khfa4b kfA/9RcgMMCdVcW+yJ3UdzmQrDIMxKM= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=UGzQ1J5v; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf13.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.222.54 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1735506768; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=4H1WRQdekMCO6h7Yl6/58NC8+08rDmc4PpE9J7xvAe0=; b=Bvxs9rOn7sYSOHosfXIMMv29QdjqXqLMryEMpksc//5nvQ6PdEIFn0l+NC73/R4F3JCWVi GnCrzQLS/vdvH7pnJIzl5RkOLMolAPekfKUCRucCA2KMEIuqeXkwTjNVQbYm4p/mhLE0Zm bxFqWZ9nAhH76h5SO2NjaeHNMsN9TtY= Received: by mail-ua1-f54.google.com with SMTP id a1e0cc1a2514c-85c4c9349b3so1746046241.3 for ; Sun, 29 Dec 2024 13:13:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1735506788; x=1736111588; darn=kvack.org; h=content-transfer-encoding:to:subject:message-id:date:from :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=4H1WRQdekMCO6h7Yl6/58NC8+08rDmc4PpE9J7xvAe0=; b=UGzQ1J5vzVl8q9kwyDoGar1bZD3Uk/9UVRWRH1zLylm09/yXL7eH8y+meTFCxM/Yps Om6ziGYliVSpkdNUf8Ruf1cHVJFMiZjk5N7hn9nkfS3PP4OGPRX4lvkCUuOC8UfNdi+b zhOhdWbJMvNp2T7BmzR+VhHyQxw+oJ0sNLl7X7BwwQYXAaYCWS2riYVdNfPrYLwv/KU2 TTkW1RfQsXcS5/iU5w3FEqxTHrnp1COcgy8IfqxV0AGxrqH+698iqY1oa9B5nSACPLl6 gVELSXjVLIcTAoXPqkcqwbAZhx9FWSv9Uocowx4ixzFt25l0orCI5bBTNVAd3Y+IBEOM O48g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1735506788; x=1736111588; h=content-transfer-encoding:to:subject:message-id:date:from :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=4H1WRQdekMCO6h7Yl6/58NC8+08rDmc4PpE9J7xvAe0=; b=i/YombFhNNMxoB+0LGs3wdXotksW4EUZ4hSRjV3dKAbYe7MpnmATrxPRSHUafrBd03 vlNVLYx/M859LytPTEEB0OdZKQxpykOq2OygrWGVR3i1z8se9pfPN6+ou2NEHNCzogwr cKRusQ7iMrGwXHBKypQMSB9YuosdAtb9mLfzFrlheWc7B8MsQpswoupJ85sSPa7EfCWF yGWrUJJKeYPiCh+gVV7FMtgNMbiQtclsb96AIJ/DHeCD+HZA3Fp6cN9FbT+3gfdJ6vW7 KSRws4CFUkmzcVfun9oDRvh67TSsx0zgZLi64PX9rEHSdgCf6dlkKIvW5cpZD/5UJScc 7obw== X-Gm-Message-State: AOJu0YxOo0s2bv63gfPJQyPiwVAD0247UZMFyKfHXe4BP2oIvN69FtJR XxOTPjULHl5AD6h/G5OokPsRoX0V7bNxZxcGn2iIfv3VbGt9tdYj5YonyD5JTW1txY/qun6wlXU 3XYI3NlP/WwLSE3fDeiDYAxbXCbwKWD1d X-Gm-Gg: ASbGncsirC0mO0JARX9ROAGwB+UWytfEP+ayZumb8A0pvXeZrZ4d4cMjQaQ0SHm9TQ5 RT3WQEklW2s3wMzQbQmP+CPH4uil8k7hi97sgQ6Br80VA/ViIiM27fmUebt2AB73/jcEkupH8 X-Google-Smtp-Source: AGHT+IFPZNuNbMd3eLdtuY6TDGNV3HC+jIWVnb8inmwilAmN+qk8Sly399c61LIuWVEGQqTcSlbvl27TLIx4vmA5NjI= X-Received: by 2002:a05:6102:c4e:b0:4b2:af77:b53a with SMTP id ada2fe7eead31-4b2cc36a2dcmr27186724137.11.1735506788225; Sun, 29 Dec 2024 13:13:08 -0800 (PST) MIME-Version: 1.0 From: Barry Song <21cnbao@gmail.com> Date: Mon, 30 Dec 2024 10:12:57 +1300 Message-ID: Subject: All MADV_FREE mTHPs are fully subjected to deferred_split_folio() To: Linux-MM , Lance Yang , Ryan Roberts , David Hildenbrand , Baolin Wang , Andrew Morton Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 70B5620008 X-Stat-Signature: kthzjseqjyommdb3mjeardrsfq6juejp X-HE-Tag: 1735506745-134899 X-HE-Meta: U2FsdGVkX1+ELPA7pV9BsZIqODU3mab/e3jWNLaotMVtumhvWCAChvZLt6wctnH1KH0plWNks7agQiiY4MZEMPnp6igKC6hXcmNaIspAxJbUE8vwTPmIZKvvnHKjJMkV4GnODGpl8U5YWfJpgdvLK/SyI0K8+KzuNSq8bYRk9AUzukO2MpeN24kh2wYqrAffFTyjpfmus0ErRp+jqD/KQNk2pluwm1WVTPbodTPt9bZ/VkkvxkGjseiaQ3qhyFppvyWtx1Z+z4aVx85lukTbqV3Pe4czr7vKI9Zy4OuaYeCvftgZC2UZ0cEqt7/rsnXd7xlC2tqF2oBJRkQE3pjOF7liUjTWIKGk+Bb4ZwOlkQCkqmkPGq4/kA6Wa3UN5vRC8I5rGvpn7EUWrgR+5ctyVC38TJvuG0SdldRUnRoxJRoFKAq8h04sLIXyHSpzIukkKlZJNSnUB9AXMbgXJhciOLnuI7ZSetJONePEb89p0Qp28WJQ+LTlmkD2qHK+BiwjpgKJxyYn5kEjcb3N4XrZR6F+T/yypsTWINAQ26FKg64LoiHVcFpdeorU8bnECBjWMT7js+vNCxIGnGpFet/uJlMwM5XoIWQUGymzgaWcw7iH7ugj6gTdhWN7Q4EYZajxTyO4y1+FSk9HonPs0y+RSB5jqy6H+3bzHEnLNOYp0Djv93H9uwakaWPbczobjmRPQlXFvomXvZ+WgdZdV6A2ACYZSIhrxTHtrCag2ike90VrfYlOOkatXg2/rOhKOsqn9YPb7qg0SD8aBz9O2qzw7FnUUvo8Bhrs3/41FJzuu8jSDa3dMJKHQn2ejopCWJzu9uc+N1d8VeOPISiqX6ItCmbonYL2re+7LG2z/QFg5MHzS9FX+KUHG5fgcsJPR6SaPsSSoKMPM+J0XMG+cZc4mtq1uCGIoGgrLhZas72w8+IVZCLH52nQ7UdkuTJ+uKRdt+2xKNMz8WFbOZSOQsF bR/eTJeH rRrLzwWCeqnqr89gFx87QUPmjuQ4oEOtxtIyAonJHdul4WhQ2UIoLGJHrI9oDZhi+4/xiyzORzzaqCk+esNNDnpQmzI/ZOhHfOShgONqPEiWQ0DdMwjeleEheSyOwAuIbfeOv9svKSyW+NmMaJeB2nW9zfFit4HsK6O2IynkOZsUxpwEbv+YuKPsa8qovdd6Q1ABWVnuf65Iu52ONZDA//ScrT0+CnnWYe8QP1hweaxi8TbrfSzRCFLcnq5OndR1nH0WRiw3DQM0Q5I+aCMyNXVhI/zQtTdTaa34DxHje43+biq1ivXrY2eGrmja0n8VAvVUNaMP3pBWcRiF5FHOeAJtasYbHtUrD9tppo7I/qFpBxjQ= X-Bogosity: Ham, tests=bogofilter, spamicity=0.048830, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi Lance, Along with Ryan, David, Baolin, and anyone else who might be interested, We=E2=80=99ve noticed an unexpectedly high number of deferred splits. The r= oot cause appears to be the changes introduced in commit dce7d10be4bbd3 ("mm/madvise: optimize lazyfreeing with mTHP in madvise_free"). Since that commit, split_folio is no longer called in mm/madvise.c. However, we are still performing deferred_split_folio for all MADV_FREE mTHPs, even for those that are fully aligned with mTHP. This happens because we execute a goto discard in try_to_unmap_one(), which eventually leads to folio_remove_rmap_pte() adding all folios to deferred_split when we scan the 1st pte in try_to_unmap_one(). discard: if (unlikely(folio_test_hugetlb(folio))) hugetlb_remove_rmap(folio); else folio_remove_rmap_pte(folio, subpage, vma); This could lead to a race condition with shrinker - deferred_split_scan(). The shrinker might call folio_try_get(folio), and while we are scanning the second PTE of this folio in try_to_unmap_one(), the entire mTHP could be transitioned back to swap-backed because the reference count is incremented. /* * The only page refs must be one from isol= ation * plus the rmap(s) (dropped by discard:). */ if (ref_count =3D=3D 1 + map_count && (!folio_test_dirty(folio) || ... (vma->vm_flags & VM_DROPPABLE))) { dec_mm_counter(mm, MM_ANONPAGES); goto discard; } It also significantly increases contention on ds_queue->split_queue_lock du= ring memory reclamation and could potentially introduce other race conditions wi= th shrinker as well. I=E2=80=99m curious if anyone has suggestions for resolving this issue. My idea is to use folio_remove_rmap_ptes to drop all PTEs at once, rather than folio_remove_rmap_pte, which processes PTEs one by one for an mTHP. This approach would require so= me changes, such as checking the dirty state of PTEs and performing a TLB flush for the entire mTHP as a whole in try_to_unmap_one(). Please let me know if you have any objections or alternative suggestions. Thanks Barry