From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4B0FDEB64DA for ; Wed, 12 Jul 2023 06:32:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BD4A26B0071; Wed, 12 Jul 2023 02:32:33 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B86376B0072; Wed, 12 Jul 2023 02:32:33 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A4CBD6B0075; Wed, 12 Jul 2023 02:32:33 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 94DC96B0071 for ; Wed, 12 Jul 2023 02:32:33 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 60C28120150 for ; Wed, 12 Jul 2023 06:32:33 +0000 (UTC) X-FDA: 81001990986.12.705572E Received: from mail-qt1-f173.google.com (mail-qt1-f173.google.com [209.85.160.173]) by imf17.hostedemail.com (Postfix) with ESMTP id 7A89540007 for ; Wed, 12 Jul 2023 06:32:31 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=uyGuYrze; spf=pass (imf17.hostedemail.com: domain of yuzhao@google.com designates 209.85.160.173 as permitted sender) smtp.mailfrom=yuzhao@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1689143551; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=zZpxFys7nJdvmBPjIJ49j1uiao8v048XzoTDU+vGGrY=; b=s+Ht7qmtJ4KHHlg7ffixRasT3pF9W8RxBSrqDuh/N+dI8FwscFQ8D3893nYCrt0GzcOnFi n3AcLh5F47EZKymK+1NAgUXp2dlbrPeLprB/fazIgOb9An86b1ZoIoPhO12ek16niBRuOZ dxb9cK7ru+1F7y5UPFAVmQtcMw1y0nY= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=uyGuYrze; spf=pass (imf17.hostedemail.com: domain of yuzhao@google.com designates 209.85.160.173 as permitted sender) smtp.mailfrom=yuzhao@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1689143551; a=rsa-sha256; cv=none; b=p2JyW9dLVgeqx93Kgxc/qNy5hYmhMj4AscxGvfdUAJmQ9N6X36ORIODQfwGFuw2u2uknqP 5Wp5LZCD9Ugpe/HdrCUnzf7Xod9h3vbGR2OyWG27pnYw0HjYRq8R2twdoR44PTEaHXjaei 0D0fqYtipVOAyph9ENnxzFEE32iWa5Y= Received: by mail-qt1-f173.google.com with SMTP id d75a77b69052e-4036bd4fff1so134511cf.0 for ; Tue, 11 Jul 2023 23:32:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1689143550; x=1691735550; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=zZpxFys7nJdvmBPjIJ49j1uiao8v048XzoTDU+vGGrY=; b=uyGuYrzeVa5Bc8Dj1mPDQiSMBckJ1dcgG3j7LVGxZ1wlmcdmZsHIo6jNJnlYit7fJ1 xH2BVbT3dBlHskg1w8DmNVmZuIpItmo6RTWYbvXMg3vGtbEGJ+e4yYK0Iq7BFv5dnVA7 uGG3nK4TiJRvLJYoz6sV8EYvCNUpyCpm16/SjGJRR4qXZ055HeFwRZBs15YdtKKEpCxF UcVH3luQlSPFkvH9WeRTBoLWMoFL+gquZx3y0z/vvV2sR+l9z7TtXgHLJcOywowfc6VW FfYLSi4aDV6S1trmPRat3youSjdpEcp/id5GxR4cLAVkvhm5pzfnXJyjCCFKvBtsrHYm 1aKw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689143550; x=1691735550; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=zZpxFys7nJdvmBPjIJ49j1uiao8v048XzoTDU+vGGrY=; b=BOSuqxaW127CT5sRROwjpn+3fXgcKv/iwnC7pxid6X4exwWXIRDwDlufXGmojHSG+e Wy96+B0jWIN6k4i+b62X8Kl0leGThaWXdGV4oEZF+nzf8MGbZfu+20j+ofL9ja3CWIsb QLZtHKG5LT/vxaijLLgaIWIk2fm5foeXd3ZkdDR22JkSTlT0tuv4Akti/OPQNxrALlcs 0f3yd7ivdSF6D244IE56cwK/0o8pTmSZhx9QrOtAJUx5HJ9dkJIJdrfFP+78kZ/jbUnv Wfa11XLtY/IzSTl7hHVQ9nhFkABrLAgU5UjrG7bHYb7wzN9fm/KSD4ITvc8iuKyu4ujI kkUA== X-Gm-Message-State: ABy/qLajSVrxZKMm4WGME7+PltsJ2EhHRzREVS4QGRuKfW2KWPB1afEQ xfC2mM0ZaSy2/tnE6V+hly9aW+DWQgFwqZNO0sFVdQ== X-Google-Smtp-Source: APBJJlHitQTlvL6I4z82gB03QXe0/JP5JePZh/j6YDTLZIyn9YFxONiiRnRDGZg9nAyqPQvt4Prx16OpGyUGDkeOfEI= X-Received: by 2002:a05:622a:1898:b0:403:b242:3e30 with SMTP id v24-20020a05622a189800b00403b2423e30mr80006qtc.1.1689143550478; Tue, 11 Jul 2023 23:32:30 -0700 (PDT) MIME-Version: 1.0 References: <20230712060144.3006358-1-fengwei.yin@intel.com> <20230712060144.3006358-4-fengwei.yin@intel.com> In-Reply-To: <20230712060144.3006358-4-fengwei.yin@intel.com> From: Yu Zhao Date: Wed, 12 Jul 2023 00:31:54 -0600 Message-ID: Subject: Re: [RFC PATCH v2 3/3] mm: mlock: update mlock_pte_range to handle large folio To: Yin Fengwei Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, willy@infradead.org, david@redhat.com, ryan.roberts@arm.com, shy828301@gmail.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 7A89540007 X-Rspam-User: X-Stat-Signature: jbrrbkszi1719cdq51sqyymw73996dp4 X-Rspamd-Server: rspam01 X-HE-Tag: 1689143551-747267 X-HE-Meta: U2FsdGVkX1/XOcDgV22gonJCfCd2LWhkNkK5fgyWH/LXKwKpSFl5U0HsEsyISjyGJqYhwcbk1+XFWHt6FHo3NGuqKjKlToCFJN0sVoZjuIj6vAA3piGjbrS3j+obyuWuPK3KzIChOPxEfLwtQAj6+QnP0Vkl5iw+7iMboGpvCbvkVdWzBjgeGfxJBiGUQh3egWXO/kBag0E61cWIg3EBKZncHgzcLby/90VLeXx0aGeN7Zy8fmg3vwAMI26xvjYI1Im/cSfhelTBjmaUmKemUy8dCci4jddCIUQiCPolSFoxhLtsu/CSPqnTFiC+uD7hpMnKyNQxTtPhxLpkhZ3WX9dqiJQeFsP2HZwz2H+cKsBIhi4Y8XnZ/KXhJvTsO1PeJKqwtbvCXrQj3LeWiX1XngGLjo+CBFjwSfZLrBe2yvjWiuq4i+6acSf92vL99XgGyRfe2bDwbVwjx8BP8KxM3EkmE55YIddwTS28FGn8p4xPFyq8/EFDKg8Y4bKVXrO9IWmTrAHqYioVqWSqGbVMQmtLXUV7ZFbvAabYfteFuTRxYU34l7XGllqxYSKwE1Oze5EEfmM5WpqLbtG5m5McjTZBitjBIqicJzTX0fGpoC+CluoSxJDKprquaMtehNMQJNiFX9RN/0GzOIWog2edB+NNqtnTqCfALvq3xkER0ll3wk4RaEJNdfafgireLWf0WHtMSFsW/tvOiBGzuSegLpo0ik63GfyNwoWzmmGfHC1Hwpj/koxo9cmMAR34IB5e3RfoO9mK9T7fG305espy5BoJynuyyoIC5zskre8VI7kg3rk4MTLkiafHu+x8bgRBTdx/sUiQjSDmyR6NbSexuBZykZ1EPCEJdF0gjroZ6cY0M7jNjvRrYYb8PkxSaAdg6Fpkapvt2C8rXlYymuWu4gjfn7LmBPNZvCiKwrwtIqkP2jfN5tExmjxnCiCP8ONZM9xBrM1XI4KVm4GJCdz WV4a2AAc pxlu6jXW5+HVTqaVpbzs4zWkunoBzC+FxHvV4KIpCyg+tX3j3pkZ+6wnE0TdJ4ApMSu9HmgfasrA5WJW4SvF+eI/TsbONNg6CVTr67FFXi8AUSEgeY5mnKR6gGGYX8pjfgxnEVZt4l/wo5wWGdk8lD9FLFC1qeJnsUxqkokpinJjwM//9rYnr7Zu1gYjLoTNBv1C8KD2XMm0SJxcl6LTnmwYD4L7Gvw0SlE9tVCXcZD2aKHPuAOqXF0iXCm+QSMYBfKOsrONjIdTAGNrzgx1fKuEYbPxtf2Rq6v2Lxz7gdrfx9TU= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Jul 12, 2023 at 12:02=E2=80=AFAM Yin Fengwei wrote: > > Current kernel only lock base size folio during mlock syscall. > Add large folio support with following rules: > - Only mlock large folio when it's in VM_LOCKED VMA range > > - If there is cow folio, mlock the cow folio as cow folio > is also in VM_LOCKED VMA range. > > - munlock will apply to the large folio which is in VMA range > or cross the VMA boundary. > > The last rule is used to handle the case that the large folio is > mlocked, later the VMA is split in the middle of large folio > and this large folio become cross VMA boundary. > > Signed-off-by: Yin Fengwei > --- > mm/mlock.c | 104 ++++++++++++++++++++++++++++++++++++++++++++++++++--- > 1 file changed, 99 insertions(+), 5 deletions(-) > > diff --git a/mm/mlock.c b/mm/mlock.c > index 0a0c996c5c214..f49e079066870 100644 > --- a/mm/mlock.c > +++ b/mm/mlock.c > @@ -305,6 +305,95 @@ void munlock_folio(struct folio *folio) > local_unlock(&mlock_fbatch.lock); > } > > +static inline bool should_mlock_folio(struct folio *folio, > + struct vm_area_struct *vma) > +{ > + if (vma->vm_flags & VM_LOCKED) > + return (!folio_test_large(folio) || > + folio_within_vma(folio, vma)); > + > + /* > + * For unlock, allow munlock large folio which is partially > + * mapped to VMA. As it's possible that large folio is > + * mlocked and VMA is split later. > + * > + * During memory pressure, such kind of large folio can > + * be split. And the pages are not in VM_LOCKed VMA > + * can be reclaimed. > + */ > + > + return true; Looks good, or just should_mlock_folio() // or whatever name you see fit, can_mlock_folio()? { return !(vma->vm_flags & VM_LOCKED) || folio_within_vma(); } > +} > + > +static inline unsigned int get_folio_mlock_step(struct folio *folio, > + pte_t pte, unsigned long addr, unsigned long end) > +{ > + unsigned int nr; > + > + nr =3D folio_pfn(folio) + folio_nr_pages(folio) - pte_pfn(pte); > + return min_t(unsigned int, nr, (end - addr) >> PAGE_SHIFT); > +} > + > +void mlock_folio_range(struct folio *folio, struct vm_area_struct *vma, > + pte_t *pte, unsigned long addr, unsigned int nr) > +{ > + struct folio *cow_folio; > + unsigned int step =3D 1; > + > + mlock_folio(folio); > + if (nr =3D=3D 1) > + return; > + > + for (; nr > 0; pte +=3D step, addr +=3D (step << PAGE_SHIFT), nr = -=3D step) { > + pte_t ptent; > + > + step =3D 1; > + ptent =3D ptep_get(pte); > + > + if (!pte_present(ptent)) > + continue; > + > + cow_folio =3D vm_normal_folio(vma, addr, ptent); > + if (!cow_folio || cow_folio =3D=3D folio) { > + continue; > + } > + > + mlock_folio(cow_folio); > + step =3D get_folio_mlock_step(folio, ptent, > + addr, addr + (nr << PAGE_SHIFT)); > + } > +} > + > +void munlock_folio_range(struct folio *folio, struct vm_area_struct *vma= , > + pte_t *pte, unsigned long addr, unsigned int nr) > +{ > + struct folio *cow_folio; > + unsigned int step =3D 1; > + > + munlock_folio(folio); > + if (nr =3D=3D 1) > + return; > + > + for (; nr > 0; pte +=3D step, addr +=3D (step << PAGE_SHIFT), nr = -=3D step) { > + pte_t ptent; > + > + step =3D 1; > + ptent =3D ptep_get(pte); > + > + if (!pte_present(ptent)) > + continue; > + > + cow_folio =3D vm_normal_folio(vma, addr, ptent); > + if (!cow_folio || cow_folio =3D=3D folio) { > + continue; > + } > + > + munlock_folio(cow_folio); > + step =3D get_folio_mlock_step(folio, ptent, > + addr, addr + (nr << PAGE_SHIFT)); > + } > +} I'll finish the above later. > static int mlock_pte_range(pmd_t *pmd, unsigned long addr, > unsigned long end, struct mm_walk *walk) > > @@ -314,6 +403,7 @@ static int mlock_pte_range(pmd_t *pmd, unsigned long = addr, > pte_t *start_pte, *pte; > pte_t ptent; > struct folio *folio; > + unsigned int step =3D 1; > > ptl =3D pmd_trans_huge_lock(pmd, vma); > if (ptl) { > @@ -329,24 +419,28 @@ static int mlock_pte_range(pmd_t *pmd, unsigned lon= g addr, > goto out; > } > > - start_pte =3D pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); > + pte =3D start_pte =3D pte_offset_map_lock(vma->vm_mm, pmd, addr, = &ptl); > if (!start_pte) { > walk->action =3D ACTION_AGAIN; > return 0; > } > - for (pte =3D start_pte; addr !=3D end; pte++, addr +=3D PAGE_SIZE= ) { > + > + for (; addr !=3D end; pte +=3D step, addr +=3D (step << PAGE_SHIF= T)) { > + step =3D 1; > ptent =3D ptep_get(pte); > if (!pte_present(ptent)) > continue; > folio =3D vm_normal_folio(vma, addr, ptent); > if (!folio || folio_is_zone_device(folio)) > continue; > - if (folio_test_large(folio)) > + if (!should_mlock_folio(folio, vma)) > continue; > + > + step =3D get_folio_mlock_step(folio, ptent, addr, end); > if (vma->vm_flags & VM_LOCKED) > - mlock_folio(folio); > + mlock_folio_range(folio, vma, pte, addr, step); > else > - munlock_folio(folio); > + munlock_folio_range(folio, vma, pte, addr, step); > } > pte_unmap(start_pte); > out: Looks good.