From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 82935C77B72 for ; Mon, 17 Apr 2023 11:11:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E4DB7900004; Mon, 17 Apr 2023 07:11:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E24CA8E0001; Mon, 17 Apr 2023 07:11:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D13C6900004; Mon, 17 Apr 2023 07:11:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id C1A5C8E0001 for ; Mon, 17 Apr 2023 07:11:39 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 8AA7E40561 for ; Mon, 17 Apr 2023 11:11:39 +0000 (UTC) X-FDA: 80690617518.17.3CF8B6E Received: from madras.collabora.co.uk (madras.collabora.co.uk [46.235.227.172]) by imf08.hostedemail.com (Postfix) with ESMTP id 93FE6160010 for ; Mon, 17 Apr 2023 11:11:36 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=collabora.com header.s=mail header.b=AfH+oLdt; spf=pass (imf08.hostedemail.com: domain of usama.anjum@collabora.com designates 46.235.227.172 as permitted sender) smtp.mailfrom=usama.anjum@collabora.com; dmarc=pass (policy=quarantine) header.from=collabora.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1681729897; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=g7yqgcr6avi1wGXANrTFvtmMIlk5yySaJZfP4XJ+mD8=; b=YGrO0f7MS/9sB6w2lkJJLHwcRHfy2qdh2xDN5izfCW5aAj0aCdshj3DdSeq3T7/bVnmr1w eqAKmwlnFtgXZiMUPnJuTZq3ExIFYnZFEo8fHTyQY4XuJ3e6B0o8Z4jd7IpRWAeGJTr5qZ U7k/Aoaj3IV0ME49Ewx4omIAmbqljcE= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=collabora.com header.s=mail header.b=AfH+oLdt; spf=pass (imf08.hostedemail.com: domain of usama.anjum@collabora.com designates 46.235.227.172 as permitted sender) smtp.mailfrom=usama.anjum@collabora.com; dmarc=pass (policy=quarantine) header.from=collabora.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1681729897; a=rsa-sha256; cv=none; b=6t85UvmYoBL6FYpg3N//Gm5BXgKsrQOLU1rROExcJwYgZUSJkQAdbOSWURQDMOfiEqNWAQ Rr9EDYUrEpJbpcgdslCVFHDBo1GCHko8Ri/nHYNK3uxTLUbmZHRXwzI1QYH5XVKLN7gK2D eiU3GPurXnqRat6mS07HAHlEsYJMhXI= Received: from [192.168.10.39] (unknown [39.37.187.173]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: usama.anjum) by madras.collabora.co.uk (Postfix) with ESMTPSA id AC26F66031ED; Mon, 17 Apr 2023 12:11:25 +0100 (BST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=collabora.com; s=mail; t=1681729894; bh=7J+vWx7jiKBTtSoBYYCb9cOinhDAAq5QIxXvstP9zw8=; h=Date:Cc:Subject:To:References:From:In-Reply-To:From; b=AfH+oLdtCJAJ1LK0B+VX9BAP/KEgoAVxTZH0q7JH+W1rZq9/vigMBnJSHPa0AOGlj vBWpM6GzaTfmFaRV5OorymDmhv/ULEH2iDzSL2F9dfvJZLaueg8prORLaZNVd0+nGt flSJ4If9+zqiFj3CdgT53b15lhzkT6csHd+YHgXRRNe9/mcTbKWRMpmaHNN/T1+TEn r33Eu0HMcyr/fa5k/upxcwQVT2Wg6623CHaxCjKxScPcqZVIL2RJHe8uXjqTjy5XlL i7ScpAQb+Ftuxy6iJcGNuPhHgzfkUt621uDrPpqX3sOU5ysIMq2Hdn5DzxaXEJMHIR I7qSGRSr/yVCQ== Message-ID: <938a065b-ecae-5905-b394-6cbed0bfe946@collabora.com> Date: Mon, 17 Apr 2023 16:11:18 +0500 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.9.0 Cc: Muhammad Usama Anjum , Mike Rapoport , Peter Xu , David Hildenbrand , Andrew Morton , Andrei Vagin , Danylo Mocherniuk , Paul Gofman , Cyrill Gorcunov , Nadav Amit , Alexander Viro , Shuah Khan , Christian Brauner , Yang Shi , Vlastimil Babka , "Liam R . Howlett" , Yun Zhou , Suren Baghdasaryan , Alex Sierra , Matthew Wilcox , Pasha Tatashin , Axel Rasmussen , "Gustavo A . R . Silva" , Dan Williams , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org, Greg KH , kernel@collabora.com Subject: Re: [PATCH v12 2/5] fs/proc/task_mmu: Implement IOCTL to get and optionally clear info about PTEs To: =?UTF-8?B?TWljaGHFgiBNaXJvc8WCYXc=?= References: <20230406074005.1784728-1-usama.anjum@collabora.com> <20230406074005.1784728-3-usama.anjum@collabora.com> <0351b563-5193-6431-aa9c-c5bf5741b791@collabora.com> <8a837998-604f-a871-729e-aa274a621481@collabora.com> <05e14540-7092-5dd2-d503-473b673af716@collabora.com> Content-Language: en-US From: Muhammad Usama Anjum In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 93FE6160010 X-Stat-Signature: jcfih1xu3dw9aobrp7h6hz46zyfzosou X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1681729896-576577 X-HE-Meta: U2FsdGVkX1/2veGXoJZvnIKnx4ENt6ghqCjb+n5/64CgB51ov4WzWzRjliRRRZ3bgY6aKmzwRoawSRxcHeO1LZe8/rubnt0wnetZiF3nKqfsBIBBMO/ohYb/A2Un9C/hTLMH1Pu/0IjNXSYxdr4UyRX8pq7H7NzFwcjLGWVak8siWjCgo5vZNwi6gmCsNYOrW9oObsyjTCC51yf1hgER1Q30m50eGLXM6QRDiuQYsL6Y0gpJdX1n41fasRbgXfruex5pPRAQtBx3sWJ5Yq4Clg1rl2aCWXTLi+wuFgtW4nnUW0V4LAaOYniYnbDVeG8CRZXm1npAb7iyE4oPGF1IluoOxt/Li7rQXQRUqFuuvLtTM4H51F0gZetMHP4oAjTEW677i/pFX//emf+7vMQSsbHhB+xhmpDMRRc4ZuFS4iGoey20VNZB78siK8Gcqvmn28zRq/JiUuRfXX3Pg5cvGVZsC02808BpQiHkv/0wnu5NZbYHAfAoJPC4IGepn/SsI+vV/PZuRSZWMYnRPgRy0w/Bk3AnkLeGijM4/QXN0p/JaRdBiAldc2EGPkbiG+flMeTpLdjyJCdeozySGBEJrEO1yxv9+zdddmgJxmJtJ5PGiLtLIjDrIrsxbP/1lCpS4yDRzoUvpnd74SUpVq8i9NPqoQzrI55oD8i0bOXZbpwxvVyJOjdY6hXRBPyrzcfskLsDQw0efx3hFolmi7V+SGzXUVOK2Rkb6RnQwVrO8DtH7yyEtl/VXF82rdOK3pM8/10wAzZM3Ghae2JrE1nfySzIvwd5MWjsOWJ1tE70m+0/3IwX9fj4FLMY8b8Mp9hVxa1vTXCIBq30F3NlYEdKdv3EgmbUD5zAqNV6Bbgr4k6WGtl1ABS3FpRSYpboFe3tiwys2U2wuXvkQ3w4f/icRFHXbJW2WIWWYhRRkjFciljYWgVd9aRrs0FDRFjvTiMtlKev0JVXjJ0c5Q9KDMv nik88a0M atH3kfe+URzfX5bX+AyFFjVS5WA0DqkmD9yYfrMm1EnicEAhEPvGmJyDPVspqpuSUP6ZXMXkqXf3/DmEMNUDVGk+m/EaNu7duAgmQrSNAeaUTAmhz1OO9XoKCjSJQ4ZsYjSYuz3VoHVEiImfwHouRvXc3cYDzDxykMAPCi2Pww63v2XjUa+zoQ4IxRJ+GiGXO/gmipOQPJHpUhik0lUFxrYZMZkCPCLPZslJfsS+LgNdVR6F1psw5t3pYS+VCLFvt65V4fbTJwCQwMjVsp1PakcnKFqYdB4tmQ1sq0aroE0otdtsBX91vjqqm9fYfhocYiKXr/i8ZiDpzc3c= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 4/11/23 2:29 PM, Michał Mirosław wrote: > On Fri, 7 Apr 2023 at 13:11, Muhammad Usama Anjum > wrote: >> On 4/7/23 3:14 PM, Michał Mirosław wrote: >>> On Fri, 7 Apr 2023 at 12:04, Muhammad Usama Anjum >>> wrote: >>>> On 4/7/23 12:34 PM, Michał Mirosław wrote: >>>>> On Thu, 6 Apr 2023 at 23:04, Muhammad Usama Anjum >>>>> wrote: >>>>>> On 4/7/23 1:00 AM, Michał Mirosław wrote: >>>>>>> On Thu, 6 Apr 2023 at 19:58, Muhammad Usama Anjum >>>>>>> wrote: >>> [...] >>>>>>>>>> + /* >>>>>>>>>> + * Allocate smaller buffer to get output from inside the page walk >>>>>>>>>> + * functions and walk page range in PAGEMAP_WALK_SIZE size chunks. As >>>>>>>>>> + * we want to return output to user in compact form where no two >>>>>>>>>> + * consecutive regions should be continuous and have the same flags. >>>>>>>>>> + * So store the latest element in p.cur between different walks and >>>>>>>>>> + * store the p.cur at the end of the walk to the user buffer. >>>>>>>>>> + */ >>>>>>>>>> + p.vec = kmalloc_array(p.vec_len, sizeof(struct page_region), >>>>>>>>>> + GFP_KERNEL); >>>>>>>>>> + if (!p.vec) >>>>>>>>>> + return -ENOMEM; >>>>>>>>>> + >>>>>>>>>> + walk_start = walk_end = start; >>>>>>>>>> + while (walk_end < end && !ret) { >>>>>>>>> >>>>>>>>> The loop will stop if a previous iteration returned ENOSPC (and the >>>>>>>>> error will be lost) - is it intended? >>>>>>>> It is intentional. -ENOSPC means that the user buffer is full even though >>>>>>>> there was more memory to walk over. We don't treat this error. So when >>>>>>>> buffer gets full, we stop walking over further as user buffer has gotten >>>>>>>> full and return as success. >>>>>>> >>>>>>> Thanks. What's the difference between -ENOSPC and >>>>>>> PM_SCAN_FOUND_MAX_PAGES? They seem to result in the same effect (code >>>>>>> flow). >>>>>> -ENOSPC --> user buffer has been filled completely >>>>>> PM_SCAN_FOUND_MAX_PAGES --> max_pages have been found, user buffer may >>>>>> still have more space >>>>> >>>>> What is the difference in code behaviour when those two cases are >>>>> compared? (I'd expect none.) >>>> There is difference: >>>> We add data to user buffer. If it succeeds with return code 0, we engage >>>> the WP. If it succeeds with PM_SCAN_FOUND_MAX_PAGES, we still engage the >>>> WP. But if we get -ENOSPC, we don't perform engage as the data wasn't added >>>> to the user buffer. >>> >>> Thanks! I see it now. I see a few more corner cases here: >>> 1. If we did engage WP but fail to copy the vector we return -EFAULT >>> but the WP is already engaged. I'm not sure this is something worth >>> guarding against, but documenting that would be helpful I think. >> Sure. >> >>> 2. If uffd_wp_range() fails, but we have already processed pages >>> earlier, we should treat the error like ENOSPC and back out the failed >>> range (the earier changes would be lost otherwise). >> Backing out is easier to do for hugepages. But for normal pages, we'll have >> to write some code to find where the current data was added (in cur or in >> vec) and back out from that. I'll have to write some more code to avoid the >> side-effects as well. > > If I read the code correctly, the last page should always be in `cur` > and on failure only a single page is needed to be backed out. Did I > miss something? I'm leaving using uffd_wp_range() in next revision as it is costing performance. This will not be needed in next revision. > >> But aren't we going over-engineering here? Error occurred and we are trying >> to keep the previously generated correct data and returning successfully >> still to the user? I don't think we should do this. An error is error. We >> should return the error simply even if the memory flags would get lost. We >> don't know what caused the error in uffd_wp_range(). Under normal >> situation, we there shouldn't have had error. > > In this case it means that on (intermittent) allocation error we get > inconsistent or non-deterministic results. I wouldn't want to be the > one debugging this later - I'd prefer either the syscall be > "exception-safe" (give consistent and predictable output) or kill the > process. > > Best Regards > Michał Mirosław -- BR, Muhammad Usama Anjum