From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 208D4C54E49 for ; Mon, 26 Feb 2024 13:54:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A0E2444016A; Mon, 26 Feb 2024 08:54:54 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 9BD81440147; Mon, 26 Feb 2024 08:54:54 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8861D44016A; Mon, 26 Feb 2024 08:54:54 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 7685C440147 for ; Mon, 26 Feb 2024 08:54:54 -0500 (EST) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 4F64F1C0A0E for ; Mon, 26 Feb 2024 13:54:54 +0000 (UTC) X-FDA: 81834100908.01.0002BB9 Received: from mail-qv1-f42.google.com (mail-qv1-f42.google.com [209.85.219.42]) by imf01.hostedemail.com (Postfix) with ESMTP id 6968C4001E for ; Mon, 26 Feb 2024 13:54:52 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=aJ2Q4PPZ; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf01.hostedemail.com: domain of ioworker0@gmail.com designates 209.85.219.42 as permitted sender) smtp.mailfrom=ioworker0@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1708955692; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=z3VNu3jbt2E2dPwyMwWXLjmnd7lyq2z8eAe3tEfU6WA=; b=TNh7vUDc5risapwOPZgahZQ7UJ0Ohzatxk0N40jDnhA5u5teWhDLBmWvtOvy68iCBlSX2N OUi9ZCsMJ3/Ai/CXaiRIXtQulTGQ60wMG+LbuRsgDuRlCwy6MyH4Vg0PoX3SfnSruK/qSj +6CewaPo0hTcSzhARdUVkP8Jj/Y0/yY= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=aJ2Q4PPZ; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf01.hostedemail.com: domain of ioworker0@gmail.com designates 209.85.219.42 as permitted sender) smtp.mailfrom=ioworker0@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1708955692; a=rsa-sha256; cv=none; b=HA9wpQZoqHQ+0dABvbmFohqzvGtP6HdKaAg5xvjbXkCtQ0DI1KETISi0No1BxHJMZkojtG zGtQvgxDBeawTTTE7sQ6PDbHDY8dLb1mrzNZAC5rxNqk22HqmZz7I/dxTY1HiouyVyvdGG ASSp6r8bqcJ6wSlzWfuDVo6zt8E3e3I= Received: by mail-qv1-f42.google.com with SMTP id 6a1803df08f44-68f5cdca7a3so19945076d6.1 for ; Mon, 26 Feb 2024 05:54:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708955691; x=1709560491; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=z3VNu3jbt2E2dPwyMwWXLjmnd7lyq2z8eAe3tEfU6WA=; b=aJ2Q4PPZmGgmLQGmJqvtSAVo1oeYw3I2Dk71sLrKYkmpjbj0PwAletLXNeo4sZNMbn 6D0hyKZ6PyqybizpuKRmCSoEEeodIt27xnGLFAmEXfdJpSd44VBxwy8FmRJoD22S+PDF 19Q7moumHk6HgyJhRrVC9BN7PhLYkxN+nUNGQLulrTLCV9vNcq1nMqztaaaA8sXOeugC v8gB/LJwda+J2jNV4lwi61E94kwUC5iozlQ/eX9KbvadDaVCZXMK/m6FtOEv4jfxzvLB O9HtQUmjYvE6BzcYyWDVXWK8NC3MjC81FpQuF3Cgoqb/lcKtDYtRJ8wkJD1Eisf3+3Zo eN3g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708955691; x=1709560491; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=z3VNu3jbt2E2dPwyMwWXLjmnd7lyq2z8eAe3tEfU6WA=; b=w1rFDu2qNes6exkJThb8ZTqNj/aNaGmryp+N1v94pv+5LBxgd28X9/rpECwZNH3dvm 0O2AjB0hLAElAbQXwmZ/+LRpIDzq+O3PNgtTUQqJaFaDFroAfVv8+vS/aW/MpwinxK90 tohQpjeb573F4bpOeN7Jh3nyRVmwQ+PFUpVBqsZoAP+/IbL52e+T0866v6IloHBKgwyU 4N0TYbrsI5q7PKws1dH3XM1fDltofeFgRIX6gPo9MY1efUEfo+DyeubN4xQiFzXHvuuv J/Y+imqrEbL0eLgdo8z1Xm8lLm+pIVZ5lekuCNbj6GQGlaE6L2aTCObC3XU6Ps+qOqzx Sx/A== X-Forwarded-Encrypted: i=1; AJvYcCWgmYJz1MtYsSrtG/g3ncf7z27DTi0Ft5fT/+d83cGG1JKbLTNKfGGVCl5DSH3fx1l2wmPzzZdfaxux/fBhNRoe4DU= X-Gm-Message-State: AOJu0YzKGyGckikzHnqnEPfRFMH6AbHWKtttI+qJAJdd1FJCR05/lBEp huFajch+MsSALThEro13ZArbHrHKpzNDvRK9+D6EhjCGM56yrFNM9dMqrNLxkg5i7KOhvfN8ypb 53s9+4768oAjSCfFhbgXVtCMEpmAfDA3y0cuMbXHW X-Google-Smtp-Source: AGHT+IHVu12IKFVkB7sagqRbr02Yr/SH/2B72heqL/GYC7Lj8ZA2d9FyRXqO0S6/Xte+4CcI8bihGWkhT6PqCdJcWYM= X-Received: by 2002:a0d:e813:0:b0:608:e551:d9e9 with SMTP id r19-20020a0de813000000b00608e551d9e9mr2190603ywe.16.1708955232261; Mon, 26 Feb 2024 05:47:12 -0800 (PST) MIME-Version: 1.0 References: <20240226083526.26002-1-ioworker0@gmail.com> <8b909691-ca53-43b9-aab1-dba3ef3577cd@arm.com> <517e4c23-11f8-4ded-a502-354c482c4e51@redhat.com> In-Reply-To: <517e4c23-11f8-4ded-a502-354c482c4e51@redhat.com> From: Lance Yang Date: Mon, 26 Feb 2024 21:47:00 +0800 Message-ID: Subject: Re: [PATCH 1/1] mm/madvise: enhance lazyfreeing with mTHP in madvise_free To: David Hildenbrand Cc: Ryan Roberts , fengwei.yin@intel.com, akpm@linux-foundation.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, mhocko@suse.com, minchan@kernel.org, peterx@redhat.com, shy828301@gmail.com, songmuchun@bytedance.com, wangkefeng.wang@huawei.com, zokeefe@google.com, 21cnbao@gmail.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 6968C4001E X-Stat-Signature: hm1n5f5649g84j5zxx7efaq1puu3g6dn X-HE-Tag: 1708955692-545157 X-HE-Meta: U2FsdGVkX18+FdhktkxaD6PPB0OY6+Dp5McG+T8xzzFrAaQ/1l+2AK252IrvyIQob24CRqo2iv2CvFO7LCGvCnKQYPKX6gkm/0tkgZejc55lYXbtZ/dbjUHCpHEFQwjnfcQU/EbAxmYI9dR4ZsBTNY9scIK2OqIOXDT/fbwJX0Pl70nkVo5YMmSJgI1c9OMCNNleJEUluyol+rhaGH38befMV1m9YZvmuNd5F/Q1PsGnsc8zmrJ4sg5p4RVG74oO7peyz1H36fw+TlYwgSma91A0CSCTfKmpk6FVDQ4a88jSIRArbnv5LtZs5duPWmQFgnUO1HqStUd8AbGqC7VTeyzVznWIzyvbB0vh+CU9BSY7v+rR+DiSwnMmTINY/CfiI+lOcPxTLeZFGfI88iOFGcb3pF+6tc16NgSabqUQvXCabt3yTDtAumtBw4apiSp2yihd2UkJwhKqLy6Xtn/D0N/ocsJvkSIyXfzR9pRk74Ntrcf5Fd+WffQIP3qyi4rLJVgSkDRDvwMrdJ2xKae3ZpiM9aZ3FoeChGDpABBMnOSew7OXJCsxM8CbNXQVVN5LX/gmRb5dWVdaz9+caxeAw1Wubcq3BD6BSq7rY6x1t/18JQxJ7T2dfRGh7WxuuXzjqkM++Ysk2fZf2uELrXeDObzwUTRypdrmmzkSyPjleGbau61YmsAgLuISGHkWIgFlUK2bDDuCEQuIs1NklOgOVFMihfK9MPKaXZpCJKRThqS/OVuMIoeHzJsO4xzOtu+M8yN5DcG5thA6FaSu3Twj4bNhV2K41paPz5fdlaZEQ4ajviX9B6ZQJt0pny6reGNt1sOt2eztutcbK99tjbS/EVfv1+dD5aY7ffnq+hUQ6vyzsZFjJSjU1O00T7tqzPfjRg5OWDLbU+jwiM9hSjaq0lthfXJ06YZq6sfPdBLiufBw8X4fzI8pM5NpstLqva+ks16EpKQqJqjrSynYeQn A4FFSbbu oGBMsEB+sudW3rh8ZZosKOfGeIa5bD15w+mD0rcs5louqTqC4OoKTGZkaDDS2Se0/PE6zAKtl9pU8vs6zzzzgkUhCOGLRL/UbKi82fYmd232llu0tQut+bC6f/WYG/yB7U+78U/VMHrot6wdyXKbLuzgLwz6aFrWXemes3YGxViHCr0/XoetTHuUYkLYI4qnMWNV9jyDjrZ/Tcqf3rITAdJM1wOBZyEIPn9lYpvi/gs74Ez903WMyY2Wt8Dzn9Zi9sAKm0zYdQ+0gZYr+REsk9jd4MBVpt6NsKt0a7lgNj9Iv7rGFY+0WEhdbKKsAiS3y8kcFbpcuX2UlwO7fyJVY+OGIFcYBug5fn8hSlo31612nG4k77RKOwJtMCyaxXOObV0A1DQUujmR/ND1RDl88PBdFgZRUN3nWRoE0Fte4CHapLnHnMlJ5PBeeQNUEAMPyfWZFG4WHt9Za3mYff6OtbtEQ0qwrPSuleejz X-Bogosity: Ham, tests=bogofilter, spamicity=0.000060, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Feb 26, 2024 at 9:04=E2=80=AFPM David Hildenbrand wrote: > > On 26.02.24 13:57, Ryan Roberts wrote: > > On 26/02/2024 08:35, Lance Yang wrote: > >> Hey Fengwei, > >> > >> Thanks for taking time to review! > >> > >>> On Mon, Feb 26, 2024 at 10:38=E2=80=AFAM Yin Fengwei wrote: > >>>> On Sun, Feb 25, 2024 at 8:32=E2=80=AFPM Lance Yang wrote: > >> [...] > >>>> --- a/mm/madvise.c > >>>> +++ b/mm/madvise.c > >>>> @@ -676,11 +676,43 @@ static int madvise_free_pte_range(pmd_t *pmd, = unsigned long addr, > >>>> */ > >>>> if (folio_test_large(folio)) { > >>>> int err; > >>>> + unsigned long next_addr, align; > >>>> > >>>> - if (folio_estimated_sharers(folio) !=3D 1) > >>>> - break; > >>>> - if (!folio_trylock(folio)) > >>>> - break; > >>>> + if (folio_estimated_sharers(folio) !=3D 1 || > >>>> + !folio_trylock(folio)) > >>>> + goto skip_large_folio; > >>>> + > >>>> + align =3D folio_nr_pages(folio) * PAGE_SIZE; > >>>> + next_addr =3D ALIGN_DOWN(addr + align, align); > >>> There is a possible corner case: > >>> If there is a cow folio associated with this folio and the cow folio > >>> has smaller size than this folio for whatever reason, this change can= 't > >>> handle it correctly. > >> > >> Thanks for pointing that out; it's very helpful to me! > >> I made some changes. Could you please check if this corner case is now= resolved? > >> > >> As a diff against this patch. > >> > >> diff --git a/mm/madvise.c b/mm/madvise.c > >> index bcbf56595a2e..c7aacc9f9536 100644 > >> --- a/mm/madvise.c > >> +++ b/mm/madvise.c > >> @@ -686,10 +686,12 @@ static int madvise_free_pte_range(pmd_t *pmd, un= signed long addr, > >> next_addr =3D ALIGN_DOWN(addr + align, align); > >> > >> /* > >> - * If we mark only the subpages as lazyfree, > >> - * split the large folio. > >> + * If we mark only the subpages as lazyfree, or > >> + * if there is a cow folio associated with this f= olio, > >> + * then split the large folio. > >> */ > >> - if (next_addr > end || next_addr - addr !=3D alig= n) > >> + if (next_addr > end || next_addr - addr !=3D alig= n || > >> + folio_total_mapcount(folio) !=3D folio_nr_pag= es(folio)) > > > > I still don't think this is correct. I think you were previously assumi= ng that > > if you see a page from a large folio then the whole large folio should = be > > contiguously mapped? This new check doesn't validate that assumption re= liably; > > you need to iterate through every pte to generate a batch, like David d= oes in > > folio_pte_batch() for this to be safe. > > > > An example of when this check is insufficient; let's say you have a 4 p= age anon > > folio mapped contiguously in a process (total_mapcount=3D4). The proces= s is forked > > (total_mapcount=3D8). Then each process munmaps the second 2 pages > > (total_mapcount=3D4). In place of the munmapped 2 pages, 2 new pages ar= e mapped. > > Then call madvise. It's probably even easier to trigger for file-backed= memory > > (I think this code path is used for both file and anon?) > > What would work here is using folio_pte_batch() to get how many PTEs are > mapped *here*, then comparing the the batch size to folio_nr_pages(). If > both match, we are mapping all subpages. Thanks! I'll use folio_pte_batch() here in v2. Best, Lance > > -- > Cheers, > > David / dhildenb >