From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CD3B0C7619A for ; Tue, 11 Apr 2023 09:30:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 01D24900004; Tue, 11 Apr 2023 05:30:01 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EE7A5900002; Tue, 11 Apr 2023 05:30:00 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D60BE900004; Tue, 11 Apr 2023 05:30:00 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id C135F900002 for ; Tue, 11 Apr 2023 05:30:00 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 14F671407C2 for ; Tue, 11 Apr 2023 09:30:00 +0000 (UTC) X-FDA: 80668588560.10.608CF26 Received: from mail-ej1-f52.google.com (mail-ej1-f52.google.com [209.85.218.52]) by imf28.hostedemail.com (Postfix) with ESMTP id 3EB1BC000B for ; Tue, 11 Apr 2023 09:29:57 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=FhUN32+q; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf28.hostedemail.com: domain of emmir@google.com designates 209.85.218.52 as permitted sender) smtp.mailfrom=emmir@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1681205398; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=BJF2YBEivbAPJS4QQn75FqV7euOHKuYSxMpUJHAYYIA=; b=BZLi6N74EpL29wBMUCNjgXPGqtERW24X3R/qFXFevhpXCrHlGRnZA1ljRX/JTQBhD2gmQ2 LYqNiplDm7PDHvr6K6UBQOJG/Yc4JRBxeKZl//vhiByUh5ofmBl9Nr8UygnbmemNrqEIQM CyOFvRp6t/rEFCx57fr2Ucxie+pISIM= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=FhUN32+q; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf28.hostedemail.com: domain of emmir@google.com designates 209.85.218.52 as permitted sender) smtp.mailfrom=emmir@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1681205398; a=rsa-sha256; cv=none; b=xCMI3Gh0H+bal6S3/5DVLpSbz6SYUq2TW5/KQNcUBvG95oFzS3HTpeScgM7Yeyd3snyWu3 hJol5TR1b9Hau7zHLgNvV6hTub58C1z92nO7FQ0Coqy0JRdVVlJ/6PDobaQl/TH6ZVsibL thWvxSxMA2aTo5IqNLw26X4ziS1EKCw= Received: by mail-ej1-f52.google.com with SMTP id kt17so6475483ejb.11 for ; Tue, 11 Apr 2023 02:29:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1681205397; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=BJF2YBEivbAPJS4QQn75FqV7euOHKuYSxMpUJHAYYIA=; b=FhUN32+q9jPdgf97VAaqXvf8IScUuRysz38fQkweVq+QLxWO/S2VPxcB+Qra0cqTXQ DtKxdwM0c4aDN8Vjr37sLs5OIJ750plHhvw4RIlibWv36S8feJJ6E0gw7j3/K8m2SdBR lXRWVDa8fKPoNrLAfpuAgYZO4Kv1GqlYQ3orj3DhgB/2xC0Vl+Ip0J44oJYPsRguLxu8 o1zGRE8Q6fBpRs44zqc9poxIovdoonRREuuNqA9OC8Yu37zQBxA6eV9rpf2Dvke8Yh1x Pa6Mx+lhdT46XqV9rT8hZtXXed0gSq3C4Pp2U2lqKXWSGu2AtU4k91ilBtZ+CVa8PMp4 cK4A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1681205397; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=BJF2YBEivbAPJS4QQn75FqV7euOHKuYSxMpUJHAYYIA=; b=Zje39beDOXpcWcxBokbO6yCVRn4j0b/T7jTgunRvBUnRDCsX41YC4IMS+Q2jf1ujjZ E8bH7KSKg1ke5BdXsex77jmaVDx647EXzqIAcmxoweANXrhrkUnFK+u76BAoB2WC4jhs ULplRb3a9hMLgIdDcZV1NusUWqlg3vfe1TCqqEQcxfXo2eXlo2qSxxbaaEzUftjSXM9e ONUlMC1Sfq44nt8WBVPAiVi7mCxafVfWBwy6NLq8Vxl281Ls6O+3PtL3qN3Q7xN00LZU 2hvaLvwSZ6Ygvi1i+Gio/HGBVMsBphyKTuV7NjH11JhyckMfczehvW4nxSMWTYc0MCHk cXrA== X-Gm-Message-State: AAQBX9eugoo/m8XBMO/gktaG5ZaeHqi+Edh8VVF8svF+D9BWHRr3VhvO Fwf7ruCJUNJIlBpcvC7blzByW6i8dlPhSrGvsC+xdw== X-Google-Smtp-Source: AKy350YciC056geHubwxcdJsmaoBPi16cp8aurI4qOglRyDHG/cvxK/Ql9YEmFxW0jx+zwWEqAPGI0ZBL6z80oGpEw0= X-Received: by 2002:a17:906:11d8:b0:94e:fdd:9319 with SMTP id o24-20020a17090611d800b0094e0fdd9319mr1132630eja.15.1681205396442; Tue, 11 Apr 2023 02:29:56 -0700 (PDT) MIME-Version: 1.0 References: <20230406074005.1784728-1-usama.anjum@collabora.com> <20230406074005.1784728-3-usama.anjum@collabora.com> <0351b563-5193-6431-aa9c-c5bf5741b791@collabora.com> <8a837998-604f-a871-729e-aa274a621481@collabora.com> <05e14540-7092-5dd2-d503-473b673af716@collabora.com> In-Reply-To: <05e14540-7092-5dd2-d503-473b673af716@collabora.com> From: =?UTF-8?B?TWljaGHFgiBNaXJvc8WCYXc=?= Date: Tue, 11 Apr 2023 11:29:44 +0200 Message-ID: Subject: Re: [PATCH v12 2/5] fs/proc/task_mmu: Implement IOCTL to get and optionally clear info about PTEs To: Muhammad Usama Anjum Cc: Mike Rapoport , Peter Xu , David Hildenbrand , Andrew Morton , Andrei Vagin , Danylo Mocherniuk , Paul Gofman , Cyrill Gorcunov , Nadav Amit , Alexander Viro , Shuah Khan , Christian Brauner , Yang Shi , Vlastimil Babka , "Liam R . Howlett" , Yun Zhou , Suren Baghdasaryan , Alex Sierra , Matthew Wilcox , Pasha Tatashin , Axel Rasmussen , "Gustavo A . R . Silva" , Dan Williams , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org, Greg KH , kernel@collabora.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 3EB1BC000B X-Rspamd-Server: rspam09 X-Rspam-User: X-Stat-Signature: et4pkfezm163rqxizwkt1fh1amyuzh9p X-HE-Tag: 1681205397-9327 X-HE-Meta: U2FsdGVkX18Yx9upvYdHM1OGQrptoAeH4ZYbADT5fKKftzP/mDBYii60jBHPzku6Yz0/KlRXiGs4AwosIsn3lG4kH4Ak8F5QEd74tfdJqNMEnuo8W1tfsyXsAAaZ8fnIOoqKRmDPEY7BO1vDHDpBMbzfe+ss6JoOUEKG0J2/5geka2jEiumbkl8wrkF/0jXw9/loXAqgvh5iDTeTJdOUq87NN0z7+GcPdEwHkbLT0J0hnDJwRrpGopkOwcODlmGDb827Ff64bxhEtnKBrsBYXetagXhk0VdFzUcuxmDdhYhP62ZzDzuH/qPt7oGeB/riRmVgC5IbKAuEDQ12anw4F10ehzUuhDfn/NLACQnvkojzleNiE+ek8VErKEnwnnF7oTW6/9m9SeovndsjwhnKVaLVMwU6TavFWtv5oOMkNrFekBaYkcZ5DcqIjD0Tv2RAuHwCHojXWCcPGDAcZNH3TMJbBQdLICxe/VMpgmgl0gIyXXfgIFxY+NzXQbNBEou1XZlP1VUmC26d3N9Dh50mKLnCqTVRtrH9hfVj2yse9EGrKbw9FKj35v7U7RLjbHb8R6+/j3WPV1PpqkzkQmEPnQC4Ro4lmiImL673qdjChgbLJOM/W4DRoutvVFHkGc4/ZlblYx8fbe2AsTc1oZ0MKLeUQg54ApWCQIsSlHMLMZI2wiAogrfMaMMQ+tfmxNetSQhV5QZzJ/x1r64dDUG8zXcu4Xrvmb2a2wAX42agnPZCV0j7YVpf8zq1DkGNTUuzhrS/294lSVfpcPk2H862/EYjjl2KXeWrS6amU+L6Dm9YGCdEjIiok0Sti5wtMJq93LNoQyCjOOAQhrX6CntZZZyz2x0RzOhCQLbWWPfpPexs3+WjfnMztdQedlx+KFePD1wBHxrBp7wj1pwwpJdCzyI6Ra0KyIGFxQp0AKRY6i5xqoHmXzvwpt/fKasV0/my1Y9V47Qzc+aVvNfalnI q9dg51oP FxrJUNVWulK0tyo0z6Vs6ByulfKX5io0Noc5KPd416RSU/As3D778A78xpIpCB26VIyo/jACZdlkLIxBr8YBvrpBe824YTj/EStP3AaLd8qc6b9nzkPa5LauNz4MErIL21EpCBuswAX+v69oWUQ2g0fzaLTWttbbSajiA3DUX7/oLYsHsP6J+wd0+HPy//4cJTxqyospT/VU4cNFaCpEyOAT82x1EtJCjKEkAryDmBY8sp7sUJJKJeMPUlFZjMxkCrIh8ZteBg3J3pkWLAcl55oamen2Bf3ubmYKbgl1xJxC7YsBfRqQRSWSdzT7QWJ9oIuh1qz2ZhDLehQ2+/4YxYKQYLvb9tcupf/Bjv3u/112Q4uC6SNUd0ZFDgJXAQPQJgvUYM9voOZ4gaWwBdEMmjW6oIwBeVO+Q+MXj X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, 7 Apr 2023 at 13:11, Muhammad Usama Anjum wrote: > On 4/7/23 3:14=E2=80=AFPM, Micha=C5=82 Miros=C5=82aw wrote: > > On Fri, 7 Apr 2023 at 12:04, Muhammad Usama Anjum > > wrote: > >> On 4/7/23 12:34=E2=80=AFPM, Micha=C5=82 Miros=C5=82aw wrote: > >>> On Thu, 6 Apr 2023 at 23:04, Muhammad Usama Anjum > >>> wrote: > >>>> On 4/7/23 1:00=E2=80=AFAM, Micha=C5=82 Miros=C5=82aw wrote: > >>>>> On Thu, 6 Apr 2023 at 19:58, Muhammad Usama Anjum > >>>>> wrote: > > [...] > >>>>>>>> + /* > >>>>>>>> + * Allocate smaller buffer to get output from inside the= page walk > >>>>>>>> + * functions and walk page range in PAGEMAP_WALK_SIZE si= ze chunks. As > >>>>>>>> + * we want to return output to user in compact form wher= e no two > >>>>>>>> + * consecutive regions should be continuous and have the= same flags. > >>>>>>>> + * So store the latest element in p.cur between differen= t walks and > >>>>>>>> + * store the p.cur at the end of the walk to the user bu= ffer. > >>>>>>>> + */ > >>>>>>>> + p.vec =3D kmalloc_array(p.vec_len, sizeof(struct page_re= gion), > >>>>>>>> + GFP_KERNEL); > >>>>>>>> + if (!p.vec) > >>>>>>>> + return -ENOMEM; > >>>>>>>> + > >>>>>>>> + walk_start =3D walk_end =3D start; > >>>>>>>> + while (walk_end < end && !ret) { > >>>>>>> > >>>>>>> The loop will stop if a previous iteration returned ENOSPC (and t= he > >>>>>>> error will be lost) - is it intended? > >>>>>> It is intentional. -ENOSPC means that the user buffer is full even= though > >>>>>> there was more memory to walk over. We don't treat this error. So = when > >>>>>> buffer gets full, we stop walking over further as user buffer has = gotten > >>>>>> full and return as success. > >>>>> > >>>>> Thanks. What's the difference between -ENOSPC and > >>>>> PM_SCAN_FOUND_MAX_PAGES? They seem to result in the same effect (co= de > >>>>> flow). > >>>> -ENOSPC --> user buffer has been filled completely > >>>> PM_SCAN_FOUND_MAX_PAGES --> max_pages have been found, user buffer m= ay > >>>> still have more space > >>> > >>> What is the difference in code behaviour when those two cases are > >>> compared? (I'd expect none.) > >> There is difference: > >> We add data to user buffer. If it succeeds with return code 0, we enga= ge > >> the WP. If it succeeds with PM_SCAN_FOUND_MAX_PAGES, we still engage t= he > >> WP. But if we get -ENOSPC, we don't perform engage as the data wasn't = added > >> to the user buffer. > > > > Thanks! I see it now. I see a few more corner cases here: > > 1. If we did engage WP but fail to copy the vector we return -EFAULT > > but the WP is already engaged. I'm not sure this is something worth > > guarding against, but documenting that would be helpful I think. > Sure. > > > 2. If uffd_wp_range() fails, but we have already processed pages > > earlier, we should treat the error like ENOSPC and back out the failed > > range (the earier changes would be lost otherwise). > Backing out is easier to do for hugepages. But for normal pages, we'll ha= ve > to write some code to find where the current data was added (in cur or in > vec) and back out from that. I'll have to write some more code to avoid t= he > side-effects as well. If I read the code correctly, the last page should always be in `cur` and on failure only a single page is needed to be backed out. Did I miss something? > But aren't we going over-engineering here? Error occurred and we are tryi= ng > to keep the previously generated correct data and returning successfully > still to the user? I don't think we should do this. An error is error. We > should return the error simply even if the memory flags would get lost. W= e > don't know what caused the error in uffd_wp_range(). Under normal > situation, we there shouldn't have had error. In this case it means that on (intermittent) allocation error we get inconsistent or non-deterministic results. I wouldn't want to be the one debugging this later - I'd prefer either the syscall be "exception-safe" (give consistent and predictable output) or kill the process. Best Regards Micha=C5=82 Miros=C5=82aw