From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0AEE5EB64D9 for ; Thu, 15 Jun 2023 20:07:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7ED0F6B0074; Thu, 15 Jun 2023 16:07:38 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 79CC08E0003; Thu, 15 Jun 2023 16:07:38 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 63DEF8E0002; Thu, 15 Jun 2023 16:07:38 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 5686C6B0074 for ; Thu, 15 Jun 2023 16:07:38 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 27DD080A84 for ; Thu, 15 Jun 2023 20:07:38 +0000 (UTC) X-FDA: 80906067396.15.8A0C493 Received: from mail-ed1-f41.google.com (mail-ed1-f41.google.com [209.85.208.41]) by imf30.hostedemail.com (Postfix) with ESMTP id 09A7F80019 for ; Thu, 15 Jun 2023 20:07:35 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=grNeFjMF; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf30.hostedemail.com: domain of emmir@google.com designates 209.85.208.41 as permitted sender) smtp.mailfrom=emmir@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1686859656; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=TcXKgHkINlZtkH6FcAJl31LqrUpiNimJraZYTIfEacY=; b=kB0X8Ek+yFGj1uADC+PExv3vw5ybzEq+5O3eWm/fd5xbweEwWy5UmWLPhB54UZtwpJzD8D hC/GHIIhSRbu/6cskvhvYA++fBLo6uQ4udtcONJkKbd1w9CDGzo+D2LFG7i+cQvDAvFIF7 ur/f5p9Daof/TXabIifik0CLfIFEWio= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=grNeFjMF; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf30.hostedemail.com: domain of emmir@google.com designates 209.85.208.41 as permitted sender) smtp.mailfrom=emmir@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1686859656; a=rsa-sha256; cv=none; b=xo9o0z3qE2nX8NSR5k8dYdFANruHP7GzuxDj1u31xfsTKnO3zOHEhVJThEDC3CHkdeVAW0 u9bLA859Cf9j16jBGd0+/f9k2BSljexllFMwwuui577Dg5mBOhaz8EZbx+2dxKyqgt5g7h jNa38ulI5jt8WACtz8dPlh8CGPieOP8= Received: by mail-ed1-f41.google.com with SMTP id 4fb4d7f45d1cf-513ea2990b8so2813a12.0 for ; Thu, 15 Jun 2023 13:07:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686859654; x=1689451654; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=TcXKgHkINlZtkH6FcAJl31LqrUpiNimJraZYTIfEacY=; b=grNeFjMFUEplWpGORmy77WwGAa6tuYg9L3WkV5mckN5vDk/zbhDik+lHM/UNgpP4pd hOOm+WnJ7a4iQyAmGfMIEHtYCyEjiOB4fybvLRjQ953HrpODT7pybSJLC9oyrmvc1uAa Q5t0iw3S7Cu+ZA/WqgFLv0n3V0z2Iv+BsFTHoBDgXWNu+X4LTbCrMOKZdBHMF2+Gz+9E C8mezgnBZ0C8B3l28HKwJtAiDRd/qOx2CFcehRN62EOe8r6f8Izc6DyVxQ48aAB/jfzZ Fbunklgz4dlOAoPmDadl7tLsJBvnhxposHcHyBr7ITs8tsH4/taorio4KkE5YTJeEn/y +CYw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686859654; x=1689451654; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=TcXKgHkINlZtkH6FcAJl31LqrUpiNimJraZYTIfEacY=; b=F7P7jAjC/yvrKvnD37bodSewY9pdHLFd1Y3m88dRQyOYkpdETlzCTOgQGX/ta11YI2 olAOQ+FqQ1G+33Kd6v/J8qCJYtzFEcQ0NjPG11j7y3Y7WLosW+MoJJ7hT8xSxXjpYC6f 2opXESPhCdZ9iz09DH6EywHaGgFbr4u3gMzLh4kw18atwhLGAHTA8t2iISuJwmUQ/Jp/ knNCAPXmPg6C//GH7cN+bsjeEz/JL3R39QfV35vjfxSVCGhY39ZDqs6BJgvLApPMzJQi S03ELRuwMvJYE2oKnbWN1iKR+2uKjob4ibVAU5qAY2K6EKMZQezB7fuymPcC8J5N3m6D 1Pbg== X-Gm-Message-State: AC+VfDw5SWkH7gXCofjn1Iobf404dfXNJ/rsoAEZRvMk/A/oWJGcUR9o yTD40ifyN5tjM/qS/xHaLAlh49N6mjWImhRR2Z+IUQ== X-Google-Smtp-Source: ACHHUZ4X+9erQPhmydL6aaFuWi9P+BEB8cvKp3i4+5UPNoYBQtJLxjzSx6pSUJJ8b5JUmvgtrKsHYgvI4lYLFVRURZE= X-Received: by 2002:a50:a6d3:0:b0:51a:1ffd:10e with SMTP id f19-20020a50a6d3000000b0051a1ffd010emr133474edc.3.1686859654409; Thu, 15 Jun 2023 13:07:34 -0700 (PDT) MIME-Version: 1.0 References: <20230613102905.2808371-1-usama.anjum@collabora.com> <20230613102905.2808371-3-usama.anjum@collabora.com> <0db01d90-09d6-08a4-bbb8-70670d3baa94@collabora.com> <34203acf-7270-7ade-a60e-ae0f729dcf70@collabora.com> <96b7cc00-d213-ad7d-1b48-b27f75b04d22@collabora.com> <39bc8212-9ee8-dbc1-d468-f6be438b683b@collabora.com> In-Reply-To: <39bc8212-9ee8-dbc1-d468-f6be438b683b@collabora.com> From: =?UTF-8?B?TWljaGHFgiBNaXJvc8WCYXc=?= Date: Thu, 15 Jun 2023 22:07:22 +0200 Message-ID: Subject: Re: [PATCH v18 2/5] fs/proc/task_mmu: Implement IOCTL to get and optionally clear info about PTEs To: Muhammad Usama Anjum Cc: Peter Xu , David Hildenbrand , Andrew Morton , Andrei Vagin , Danylo Mocherniuk , Paul Gofman , Cyrill Gorcunov , Mike Rapoport , Nadav Amit , Alexander Viro , Shuah Khan , Christian Brauner , Yang Shi , Vlastimil Babka , "Liam R . Howlett" , Yun Zhou , Suren Baghdasaryan , Alex Sierra , Matthew Wilcox , Pasha Tatashin , Axel Rasmussen , "Gustavo A . R . Silva" , Dan Williams , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org, Greg KH , kernel@collabora.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 09A7F80019 X-Stat-Signature: 1ad8m7iq1ke83b4qpzdzj3o1bwn6kmoe X-Rspam-User: X-HE-Tag: 1686859655-675522 X-HE-Meta: U2FsdGVkX18LYhNteREJYZR2Ak9vo6H6w9JfLoW/Iwh9dyvE68JcW8I4bRpVv5Ndz3+IbI/XuHO8e1tLOkEvhrvV5ufLHYOevmaAYdroy3kZBYG53DnoaR4gVnynwi4o2h5RWbOSPa7myf/az8qi1LkGeMcAbGPq/Lo4hcJNDvgBeBlJaRimPqkk2OurF1YE4hM6oKv7yjZ5k7UAN+ZLH2cXKA1aJT8uIngo1Hpnz2/dO01/fOouO1UpNyruYQ2aQPm+o3+Qp5JisNLVWpln5m90MFnCosQLYQ1hzoHmpUhBrU2Qdj5ef9SqZjYYAho55VLdcMyPUysLMXNLki8AeocRK8epIOASnrA1kHn6T1XhVDyd3l8wpOXuVdBl7RTzREe5gFhgBT+Merfy63Ofp7RdLncPdzZdzc0Du+VEepxq0AyaoQiL4vt9GTpV7gUkJLK6JamnqlJPgqgC/Ahttze9aE09H+wHUVtGdJxShSyx13EsCMayQL1uddruxJ8pveqcWaGLeFrr3l/DzyzmDY2uQPBCagZQJWKh13m8+dsz8J8ijuxtPD+8ZtZT1PMltZjK/jJGfhm4Jq9xqUjhbYO6dglZ8T6jv7UyjQOgO0wVtS1NTpIZ4tJiye4RtVTGJODXETU/23DNIuj6BLvcn5i/XL56Cr8nnScp27sK+YdSDbiOBBbV4HWVnFF6A2ZAgJJ0w7WY4Au43gP+LB4XmHlFtm9l3aHSDwtoREu+L5PscZt1dDvgkfhfMeHZzCntzs4KvJqtqRu9W9ixfvLHylRz5v3/gY2gtzCRyvnxu0OtbzSK/KAkICJ22Q68N/RFrwXTb6YZqYvYyhC5CJzOovg0bTKlxufdUHBhPh2ptzqCPn/QiQat6ADk6Wk/4+xGUmGy9afPLF+xxRO0TWQt140Q56M9bjVWkcO4q7yu6EEZ0fk5bmAASvASIgdlaiKzYWZaAElu1i1yBoIGW5L pOvxzzj2 188YYX2/Jh4+58ABSQyZu1LD62xaGMwtQyLEiSG7p0UCduishg6g78WIdu3+UCBSDi3pecPHZiOXIHM1+RTWsdVM1+I1GYwFi0gbfNoSS/LBjd/qUtTzvpJHAQ+ipcOG8A9cJt4+yIWqTnPC1SfNllSORNaQOgRb4sDUpP571MJfvyV8ZqnHDjLRRTZM3q9nwUcotv6xgXRUXPJxJJ3RH72o+Gia+ulew5Q+PdAu6OJQxSOp8OLLq3D7peQddY+dFdmeTB8fRfCLMqCCJe807Jt1B0nDfZVdFlBlp/sVeh+ME6n6AoebbSjetPcpp4sffNBRCubuP8GZoECD3If2gODcQjPFjlvsjtOv/L/ttJiASB+tGhQ0gZ/s8P2z+1g1SdjUD9IspyT0nsQKrwd3POmtHzgrYxJbdClUW/Yn7TehuBpQpLzLrx7dFjn9JbY+MJl7P X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, 15 Jun 2023 at 17:11, Muhammad Usama Anjum wrote: > On 6/15/23 7:52=E2=80=AFPM, Micha=C5=82 Miros=C5=82aw wrote: > > On Thu, 15 Jun 2023 at 15:58, Muhammad Usama Anjum > > wrote: > >> I'll send next revision now. > >> On 6/14/23 11:00=E2=80=AFPM, Micha=C5=82 Miros=C5=82aw wrote: > >>> (A quick reply to answer open questions in case they help the next ve= rsion.) > >>> > >>> On Wed, 14 Jun 2023 at 19:10, Muhammad Usama Anjum > >>> wrote: > >>>> On 6/14/23 8:14=E2=80=AFPM, Micha=C5=82 Miros=C5=82aw wrote: > >>>>> On Wed, 14 Jun 2023 at 15:46, Muhammad Usama Anjum > >>>>> wrote: > >>>>>> > >>>>>> On 6/14/23 3:36=E2=80=AFAM, Micha=C5=82 Miros=C5=82aw wrote: > >>>>>>> On Tue, 13 Jun 2023 at 12:29, Muhammad Usama Anjum > >>>>>>> wrote: > >>> [...] > >>>>>>>> + if (cur_buf->bitmap =3D=3D bitmap && > >>>>>>>> + cur_buf->start + cur_buf->len * PAGE_SIZE =3D=3D add= r) { > >>>>>>>> + cur_buf->len +=3D n_pages; > >>>>>>>> + p->found_pages +=3D n_pages; > >>>>>>>> + } else { > >>>>>>>> + if (cur_buf->len && p->vec_buf_index >=3D p->vec= _buf_len) > >>>>>>>> + return -ENOMEM; > >>>>>>> > >>>>>>> Shouldn't this be -ENOSPC? -ENOMEM usually signifies that the ker= nel > >>>>>>> ran out of memory when allocating, not that there is no space in = a > >>>>>>> user-provided buffer. > >>>>>> There are 3 kinds of return values here: > >>>>>> * PM_SCAN_FOUND_MAX_PAGES (1) ---> max_pages have been found. Abor= t the > >>>>>> page walk from next entry > >>>>>> * 0 ---> continue the page walk > >>>>>> * -ENOMEM --> Abort the page walk from current entry, user buffer = is full > >>>>>> which is not error, but only a stop signal. This -ENOMEM is just > >>>>>> differentiater from (1). This -ENOMEM is for internal use and isn'= t > >>>>>> returned to user. > >>>>> > >>>>> But why ENOSPC is not good here? I was used before, I think. > >>>> -ENOSPC is being returned in form of true error from > >>>> pagemap_scan_hugetlb_entry(). So I'd to remove -ENOSPC from here as = it > >>>> wasn't true error here, it was only a way to abort the walk immediat= ely. > >>>> I'm liking the following erturn code from here now: > >>>> > >>>> #define PM_SCAN_BUFFER_FULL (-256) > >>> > >>> I guess this will be reworked anyway, but I'd prefer this didn't need > >>> custom errors etc. If we agree to decoupling the selection and GET > >>> output, it could be: > >>> > >>> bool is_interesting_page(p, flags); // this one does the > >>> required/anyof/excluded match > >>> size_t output_range(p, start, len, flags); // this one fills the > >>> output vector and returns how many pages were fit > >>> > >>> In this setup, `is_interesting_page() && (n_out =3D output_range()) < > >>> n_pages` means this is the final range, no more will fit. And if > >>> `n_out =3D=3D 0` then no pages fit and no WP is needed (no other spec= ial > >>> cases). > >> Right now, pagemap_scan_output() performs the work of both of these tw= o > >> functions. The part can be broken into is_interesting_pages() and we c= an > >> leave the remaining part as it is. > >> > >> Saying that n_out < n_pages tells us the buffer is full covers one cas= e. > >> But there is case of maximum pages have been found and walk needs to b= e > >> aborted. > > > > This case is exactly what `n_out < n_pages` will cover (if scan_output > > uses max_pages properly to limit n_out). > > Isn't it that when the buffer is full we want to abort the scan always > > (with WP if `n_out > 0`)? > Wouldn't it be duplication of condition if buffer is full inside > pagemap_scan_output() and just outside it. Inside pagemap_scan_output() w= e > check if we have space before putting data inside it. I'm using this same > condition to indicate that buffer is full. I'm not sure what do you mean? The buffer-full conditions would be checked in ..scan_output() and communicated to the caller by returning N less than `n_pages` passed in. This is exactly how e.g. read() works: if you get less than requested you've hit the end of the file. If the file happens to have size that is equal to the provided buffer length, the next read() will return 0. > >>>>> While here, I wonder if we really need to fail the call if there ar= e > >>>>> unknown bits in those masks set: if this bit set is expanded with > >>>>> another category flags, a newer userspace run on older kernel would > >>>>> get EINVAL even if the "treat unknown as 0" be what it requires. > >>>>> There is no simple way in the API to discover what bits the kernel > >>>>> supports. We could allow a no-op (no WP nor GET) call to help with > >>>>> that and then rejecting unknown bits would make sense. > >>>> I've not seen any examples of this. But I've seen examples of return= ing > >>>> error if kernel doesn't support a feature. Each new feature comes wi= th a > >>>> kernel version, greater than this version support this feature. If u= ser is > >>>> trying to use advanced feature which isn't present in a kernel, we s= hould > >>>> return error and not proceed to confuse the user/kernel. In fact if = we look > >>>> at userfaultfd_api(), we return error immediately if feature has som= e bit > >>>> set which kernel doesn't support. > >>> > >>> I think we should have a way of detecting the supported flags if we > >>> don't want a forward compatibility policy for flags here. Maybe it > >>> would be enough to allow all the no-op combinations for this purpose? > >> Again I don't think UFFD is doing anything like this. > > > > If it's cheap and easy to provide a user with a way to detect the > > supported features - why not do it? > I'm sorry. But it would bring up something new and iterations will be > needed to just play around. I like the UFFD way. Let's then first agree on what would have to be changed. I guess we could leverage that `scan_len =3D 0` doesn't make much sense otherwise and let it be used to check the other fields for support. Best Regards Micha=C5=82 Miros=C5=82aw