From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 48127C001B0 for ; Mon, 24 Jul 2023 16:11:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B6D1A8E0001; Mon, 24 Jul 2023 12:11:05 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AF6486B0074; Mon, 24 Jul 2023 12:11:05 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 96E628E0001; Mon, 24 Jul 2023 12:11:05 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 828C16B0071 for ; Mon, 24 Jul 2023 12:11:05 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 2BAB9140AEA for ; Mon, 24 Jul 2023 16:11:05 +0000 (UTC) X-FDA: 81046994490.30.E029F24 Received: from mail-ed1-f52.google.com (mail-ed1-f52.google.com [209.85.208.52]) by imf16.hostedemail.com (Postfix) with ESMTP id F0D20180005 for ; Mon, 24 Jul 2023 16:11:02 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=tigo2h0U; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf16.hostedemail.com: domain of emmir@google.com designates 209.85.208.52 as permitted sender) smtp.mailfrom=emmir@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1690215063; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=u3kbNyGxTzgeac58CFsNMUin81+ND8D8Z3uhWh4woO8=; b=t5uADnoguhw/WCQ7AQD1YhZ0K132KqtXp/EiSl0DaQAUTCnynJNA+0hnWb5vbhYhBIhhjb +OKhEAEjdHNGuNmv/MCpL0lWZiEWOk/pEvNekFSbFEdeFaUKOe6eSs1A6CkrgPGvugxQI5 RJHx23H4Jo3UYj8/NIu5yVjbYB4u3Co= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=tigo2h0U; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf16.hostedemail.com: domain of emmir@google.com designates 209.85.208.52 as permitted sender) smtp.mailfrom=emmir@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1690215063; a=rsa-sha256; cv=none; b=LIN9FwhmZ8/aEBOCcyIMEI09gY8XeHDUv10YX7aevZrit6Td0yEh5Z5Upt1QxtbbuTqlCO tYscgXixUasOH8X3FaggDy3g/VFhRWZC9jalo9vNR/AmsfftEsMD5DSxm73fbLXRFsDmxv 75mxg/noJurltUAx7kdNFou43FexhJs= Received: by mail-ed1-f52.google.com with SMTP id 4fb4d7f45d1cf-516500163b2so14637a12.1 for ; Mon, 24 Jul 2023 09:11:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1690215061; x=1690819861; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=u3kbNyGxTzgeac58CFsNMUin81+ND8D8Z3uhWh4woO8=; b=tigo2h0UJbLSdNBggb8+yka3tLWlOxt9O4d6NIR4mh9iEiIf1en0sl8VXgCF7R+a/t tT7LAnPTQCFRJoo0KWqoqyRU9MAHIoPOGygj1DT5qhRnO0ELBnXIh5zBzmwOe5dznmpR oK2ugIQjvmAQRczWo4nzF8BFniwOIxp8Ob3QNgni7FEXBDJPkhRsjBlyvf+2pOpfyScy U2GCJzaS3tue+rOz+Sz9IInuWpFJ7fOOQw2hFvT1311EWcKQhsxRHWi26As9g75IdYr+ ayHwpRltXX4GhBn6/blJt9V95VxuAayWW8bx5GgGtHMzMjrA98jhAVUaJyPI8/gpIiuw u1CA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690215061; x=1690819861; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=u3kbNyGxTzgeac58CFsNMUin81+ND8D8Z3uhWh4woO8=; b=SImHcdliOpLIkziCVy/oipZInAbTtXco8kOGGDTJtF9LdmkmMuBLojXWw86PFpe0oc oDoPv2f9+WFvkzGJ/zo2zS/iQcR+Xx11asY2E/HKO7ks19FrQjpDhh4rEkjcmjaBjojT jfpn9UZWiPD8ndKg+xpRMAtulIBA/7MRP8P0U179xXfEeX3pzpUt3lYuebw22mDgGRZJ G6L0+oqjPG2Fba3VqPJL+9pwb/2p9Q8pLs8sBsN2YBVGOYMbuVtPooJApPi+tHhVMQYO kDoKqodQM/seh6YbPuWsyTN2ylEf5RtE4d6Xus+ABdBVNwmarH9IQJ5/l1R03O1Tj4oe bB4w== X-Gm-Message-State: ABy/qLY5kpQqXCxUDV0jjgQwWOjH7hrJ550yHcdtctrtqCq4mkRFtqJc ZJxvg+sdejd582Mh8q1x6ppCu+dxYB3REEvdhsubzg== X-Google-Smtp-Source: APBJJlF8bHvRHeT2oqat2ATehlhxW6USprsvQrLWXhzWPyCpmacj5+zK0woo1RH6qpDsIl+dtTSAFcyumUKkKtf+1lM= X-Received: by 2002:a50:d798:0:b0:522:28a1:2095 with SMTP id w24-20020a50d798000000b0052228a12095mr116981edi.3.1690215061016; Mon, 24 Jul 2023 09:11:01 -0700 (PDT) MIME-Version: 1.0 References: <20230713101415.108875-6-usama.anjum@collabora.com> <7eedf953-7cf6-c342-8fa8-b7626d69ab63@collabora.com> <382f4435-2088-08ce-20e9-bc1a15050861@collabora.com> <44eddc7d-fd68-1595-7e4f-e196abe37311@collabora.com> In-Reply-To: <44eddc7d-fd68-1595-7e4f-e196abe37311@collabora.com> From: =?UTF-8?B?TWljaGHFgiBNaXJvc8WCYXc=?= Date: Mon, 24 Jul 2023 18:10:49 +0200 Message-ID: Subject: Re: [v2] fs/proc/task_mmu: Implement IOCTL for efficient page table scanning To: Muhammad Usama Anjum Cc: =?UTF-8?B?TWljaGHFgiBNaXJvc8WCYXc=?= , Andrei Vagin , Danylo Mocherniuk , Alex Sierra , Alexander Viro , Andrew Morton , Axel Rasmussen , Christian Brauner , Cyrill Gorcunov , Dan Williams , David Hildenbrand , Greg KH , "Gustavo A . R . Silva" , "Liam R . Howlett" , Matthew Wilcox , Mike Rapoport , Nadav Amit , Pasha Tatashin , Paul Gofman , Peter Xu , Shuah Khan , Suren Baghdasaryan , Vlastimil Babka , Yang Shi , Yun Zhou , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org, kernel@collabora.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Stat-Signature: as4ga4tas7myznbzuinajsnbtqmnjzg6 X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: F0D20180005 X-HE-Tag: 1690215062-124641 X-HE-Meta: U2FsdGVkX188qxJh86TYM7aV53vZjmipxU9lcnVJAVvp6GJOYtC35udsbW1bdBSMWP7e5IBZMWK6cxz9AeGGr2GXrNlIvZUKYinNsLCW3wiHYlLodBz8FQglAWvW50TGMRrBWfZmjAqRDSiSGRotA73swf1PIUiitjNcH3+OGpr+n9Y15YOeSJMWEu3u+R8Ye1F8Whu66XYQz/5rRNKzz4DAepZA7eJOdSsVuFbKLYR7woSeOlLIu5M+Qk5Uc2Io0NcnoRISupxcX1xKQyKFukMlOa+Qm+uMZGwqftA+EWQ0XXNNBu88JizkDXb2GgmO+0PjKflqQbMpYtrC3NVH2R+z3lcizPajW6kx5PKCz+fe0FztqQm0bQZCo5aA1WAItlTaNdmovzqgmn+/ECrNizGnwf17+U0cbhWo4Y+M9TphOMc/GoyCYj2NOzlcKz1BukNaLU8uoOpf1/hAFDOKSHYlTyIwn7x/Wlr9x+GKT2u+guv2gtNlNceUuhMCrkHSYF8B6BG65ZAwvDH6tbAgWPK4YbGF4nlM13TVb6ATxtwgwwPXaZcBMozyVV3y0pRrgxjM88P1cG02eUwCwxq3pnUj2MqSYQVCc0r+s34MSMM9S/uhtzJLCiIrgSQOODaOoJCMJO3/kk9svkbleEHTzXMyHiPJyOG2hpGrqgiio2DSeWtJA9MRpRwj+skJNxhlRRiLRA2SgYa+R9qa0+GlMG+EkarzaExMcj5GLGlOSE7N6BQd2FLhfDIKDypbMF9dn/OlAlBrN7FEh8CLyRtBgyUBDZy7UA/uC6S15od7NNEjLdwtDmsyqgJG2tzAiPXg/KqsO0kIx0ogjL1SFTWE2Zr2Bli7DLmfBAkywPVcCeHnB7mjLG9V/+Gn/WUhStfKtkj95n3eamzY9Hca8CZVlKS5Kzz2nBUsVMg8WMEV37ZepZybDTU6BiDv4sqAQ3k7Q6pF+43Z8HU7+1aNUuy 7d+liDW7 xq6ckx+f0TXTEsHoHhPfFsPvSDKUTgG8M9HuzC54/28h1bzaUOBWxRoLrkHt1OX5sfnMHPuZCUCI5woD1l8NoS6kALoDActspJQbd/g/EHHoQnwf56OMV3vfx132/uQEQys56B0x0WDXp9399AFYJp5LVIdGn0MT/dTWMlwybokXWdJk6lW7n6HaswDMFHCQGLno3W6uCaLQ91i/TeqbSTxPMCaStaWI6q1jUa2vtX1jpBc+snd+fxiXZ7hAO5pinUTl9V+ARablHdoi6OAXqxwNempcfi08XKXL7OTUUYlj8DD9LKfcbSSGYnV2/GO9HpPGzUMlnRZXWG3A03o3LtDmUED6zctFxNXTPEmQM2BXyXQ933fXL0BpUuTodvFcsU4s+VZm9xEFpLNv+0gdg0AjM7A== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, 24 Jul 2023 at 17:22, Muhammad Usama Anjum wrote: > > On 7/24/23 7:38=E2=80=AFPM, Micha=C5=82 Miros=C5=82aw wrote: > > On Mon, 24 Jul 2023 at 16:04, Muhammad Usama Anjum > > wrote: > >> > >> Fixed found bugs. Testing it further. > >> > >> - Split and backoff in case buffer full case as well > >> - Fix the wrong breaking of loop if page isn't interesting, skip intea= d > >> - Untag the address and save them into struct > >> - Round off the end address to next page > >> > >> Signed-off-by: Muhammad Usama Anjum > >> --- > >> fs/proc/task_mmu.c | 54 ++++++++++++++++++++++++++-------------------= - > >> 1 file changed, 31 insertions(+), 23 deletions(-) > >> > >> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c > >> index add21fdf3c9a..64b326d0ec6d 100644 > >> --- a/fs/proc/task_mmu.c > >> +++ b/fs/proc/task_mmu.c > >> @@ -2044,7 +2050,7 @@ static int pagemap_scan_thp_entry(pmd_t *pmd, > >> unsigned long start, > >> * Break huge page into small pages if the WP operation > >> * need to be performed is on a portion of the huge page. > >> */ > >> - if (end !=3D start + HPAGE_SIZE) { > >> + if (end !=3D start + HPAGE_SIZE || ret =3D=3D -ENOSPC) { > > > > Why is it needed? If `end =3D=3D start + HPAGE_SIZE` then we're handlin= g a > > full hugepage anyway. > If we weren't able to add the complete thp in the output buffer and we ne= ed > to perform WP on the entire page, we should split and rollback. Otherwise > we'll WP the entire thp and we'll lose the state on the remaining THP whi= ch > wasn't added to output. > > Lets say max=3D100 > only 100 pages would be added to output > we need to split and rollback otherwise other 412 pages would get WP In this case *end will be truncated by output() to match the number of pages that fit. > >> @@ -2066,8 +2072,8 @@ static int pagemap_scan_pmd_entry(pmd_t *pmd, > >> unsigned long start, > >> { > >> struct pagemap_scan_private *p =3D walk->private; > >> struct vm_area_struct *vma =3D walk->vma; > >> + unsigned long addr, categories, next; > >> pte_t *pte, *start_pte; > >> - unsigned long addr; > >> bool flush =3D false; > >> spinlock_t *ptl; > >> int ret; > >> @@ -2088,12 +2094,14 @@ static int pagemap_scan_pmd_entry(pmd_t *pmd, > >> unsigned long start, > >> } > >> > >> for (addr =3D start; addr !=3D end; pte++, addr +=3D PAGE_SIZE= ) { > >> - unsigned long categories =3D p->cur_vma_category | > >> - pagemap_page_category(vma, addr, ptep_get(pte)= ); > >> - unsigned long next =3D addr + PAGE_SIZE; > >> + categories =3D p->cur_vma_category | > >> + pagemap_page_category(vma, addr, ptep_get= (pte)); > >> + next =3D addr + PAGE_SIZE; > > > > Why moving the variable declarations out of the loop? > Saving spaces inside loop. What are pros of declation of variable in loop= ? Informing the reader that the variables have scope limited to the loop body= . [...] > >> @@ -2219,22 +2225,24 @@ static int pagemap_scan_get_args(struct pm_sca= n_arg > >> *arg, > >> arg->category_anyof_mask | arg->return_mask) & ~PM_SCAN_C= ATEGORIES) > >> return -EINVAL; > >> > >> - start =3D untagged_addr((unsigned long)arg->start); > >> - end =3D untagged_addr((unsigned long)arg->end); > >> - vec =3D untagged_addr((unsigned long)arg->vec); > >> + arg->start =3D untagged_addr((unsigned long)arg->start); > >> + arg->end =3D untagged_addr((unsigned long)arg->end); > >> + arg->vec =3D untagged_addr((unsigned long)arg->vec); > > > > BTW, We should we keep the tag in args writeback(). > Sorry what? > After this function, the start, end and vec would be used. We need to mak= e > sure that the address are untagged before that. We do write back the address the walk ended at to arg->start in userspace. This pointer I think needs the tag reconstructed so that retrying the ioctl() will be possible. Best Regards Micha=C5=82 Miros=C5=82aw