From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1F7AFC433F5 for ; Wed, 9 Mar 2022 18:50:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 83CB08D0002; Wed, 9 Mar 2022 13:50:28 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 7E93B8D0001; Wed, 9 Mar 2022 13:50:28 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 662ED8D0002; Wed, 9 Mar 2022 13:50:28 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0197.hostedemail.com [216.40.44.197]) by kanga.kvack.org (Postfix) with ESMTP id 571458D0001 for ; Wed, 9 Mar 2022 13:50:28 -0500 (EST) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 06346181DD42A for ; Wed, 9 Mar 2022 18:50:28 +0000 (UTC) X-FDA: 79225738536.23.DF5F897 Received: from mail-pg1-f172.google.com (mail-pg1-f172.google.com [209.85.215.172]) by imf14.hostedemail.com (Postfix) with ESMTP id 6E6D810000A for ; Wed, 9 Mar 2022 18:50:27 +0000 (UTC) Received: by mail-pg1-f172.google.com with SMTP id o8so2713375pgf.9 for ; Wed, 09 Mar 2022 10:50:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=1uHgklcqMNOIpY0vZIN5vRSW5yidN8+O7IFJXUXhgeU=; b=FEXzS8ZZoUqIR13rDKSsQLSXbU/Lkf34jRd5q3ZvVnDaqKdUcDGiGT4FmYEhOLt635 9QDP3RvLPcSYamh7a0Oe+R02xJqQglNXOG3AOAn+9shE9VhWhj7ndmQvMErZhqkcejTC GswaZ5HEd/zi6i9lmaxhpi7mgdRJ/Z0kSNg2fQAh61wQq5ag/FF3tUnDW13th25H/QBA JWfa2t2iXN5SFtGwKc3Wvc6FtX0SjObNG/7P+WN2tHmv6fZ3jT1DWSoT+NxylnjvWwIo 4o2stXISFGFEDECLJHHsEX5WOw4z6YGDqO0qj5/ND6irHtqYdWsC6wEGAFa1ijipYu/l pvvA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=1uHgklcqMNOIpY0vZIN5vRSW5yidN8+O7IFJXUXhgeU=; b=apSwwonfkFODgsqtkQMoIREmgbmcKmTQ7aiYAIMSjLrpF5AbpdNLoUFDQAXBfmcFWz 0VNMCH9/obCrGX1SC5s4yUyDoqTNw8M7Flu/FvMFqTmVjThGVetqJ0EUTRgEqLDpYZUz OoWDNk1mMHuRqEFNpjC5G2ekDF91SwTWdtAMqnbUOU5qzclOlGATHhFXACtLqsI40so/ c8Zykupj8T02ia777obEKDTfkFRPo6TctWsi2EpdrgWWNk/7SI5aj2KiEKl3mwgn7CAS 3YXDwnMAHkZGyzitGBjO6YTCCy6yoXgCDL0kZNWAkxLhNFFsdjFHa9MLdLYX9OZxwnBG osXQ== X-Gm-Message-State: AOAM530YSae5Gpp+xZA8DJon5v9u+RKBS/QIVIDnobFCfc+AEhthA2Tp NSGt/odZh0T8qwAamoe392c= X-Google-Smtp-Source: ABdhPJzfpb1gtz0+VW7kvlgYoJZtJZbnppzPm8GnEpR0hV4UCPxrwg5mpALr5o7TKkAHuKUWD7L46w== X-Received: by 2002:a63:fc0e:0:b0:365:39dd:2cd0 with SMTP id j14-20020a63fc0e000000b0036539dd2cd0mr908304pgi.173.1646851826094; Wed, 09 Mar 2022 10:50:26 -0800 (PST) Received: from smtpclient.apple (c-24-6-216-183.hsd1.ca.comcast.net. [24.6.216.183]) by smtp.gmail.com with ESMTPSA id p28-20020a056a000a1c00b004f6519e61b7sm4339457pfh.21.2022.03.09.10.50.24 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 09 Mar 2022 10:50:25 -0800 (PST) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 15.0 \(3693.60.0.1.1\)) Subject: Re: [PATCH] mm: madvise: return correct bytes advised with process_madvise From: Nadav Amit In-Reply-To: Date: Wed, 9 Mar 2022 10:50:24 -0800 Cc: Andrew Morton , yuehaibing@huawei.com, Stephen Rothwell , David Rientjes , edgararriaga@google.com, Michal Hocko , Linux-MM , Linux Kernel Mailing List Content-Transfer-Encoding: quoted-printable Message-Id: References: <1646803679-11433-1-git-send-email-quic_charante@quicinc.com> To: Charan Teja Kalla , Minchan Kim X-Mailer: Apple Mail (2.3693.60.0.1.1) X-Rspamd-Queue-Id: 6E6D810000A X-Stat-Signature: stg73rz5c8fqf46p16fmpyqsrgb9btdy X-Rspam-User: Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=FEXzS8ZZ; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf14.hostedemail.com: domain of nadav.amit@gmail.com designates 209.85.215.172 as permitted sender) smtp.mailfrom=nadav.amit@gmail.com X-Rspamd-Server: rspam03 X-HE-Tag: 1646851827-187261 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: > On Mar 9, 2022, at 8:47 AM, Minchan Kim wrote: >=20 > On Wed, Mar 09, 2022 at 10:57:59AM +0530, Charan Teja Kalla wrote: >> The process_madvise() system call returns error even after processing >> some VMA's passed in the 'struct iovec' vector list which leaves the >> user confused to know where to restart the advise next. It is also >> against this syscall man page[1] documentation where it mentions that >> "return value may be less than the total number of requested bytes, = if >> an error occurred after some iovec elements were already processed.". >>=20 >> Consider a user passed 10 VMA's in the 'struct iovec' vector list of >> which 9 are processed but one. Then it just returns the error caused = on >> that failed VMA despite the first 9 VMA's processed, leaving the user >> confused about on which VMA it is failed. Returning the number of = bytes >> processed here can help the user to know which VMA it is failed on = and >> thus can retry/skip the advise on that VMA. >>=20 >> [1]https://man7.org/linux/man-pages/man2/process_madvise.2.html. >>=20 >> Fixes: ecb8ac8b1f14("mm/madvise: introduce process_madvise() syscall: = an external memory hinting API" >> Signed-off-by: Charan Teja Kalla >> --- >> mm/madvise.c | 12 +++++++++--- >> 1 file changed, 9 insertions(+), 3 deletions(-) >>=20 >> diff --git a/mm/madvise.c b/mm/madvise.c >> index 38d0f51..d3b49b3 100644 >> --- a/mm/madvise.c >> +++ b/mm/madvise.c >> @@ -1426,15 +1426,21 @@ SYSCALL_DEFINE5(process_madvise, int, pidfd, = const struct iovec __user *, vec, >>=20 >> while (iov_iter_count(&iter)) { >> iovec =3D iov_iter_iovec(&iter); >> + /* >> + * Even when [start, end) passed to do_madvise covers >> + * some unmapped addresses, it continues processing with >> + * returning ENOMEM at the end. Thus consider the range >> + * as processed when do_madvise() returns ENOMEM. >> + * This makes process_madvise() never returns ENOMEM. >> + */ >=20 > Looks like that this patch has two things. first, returns processed > bytes instead of error in case of error. Second, keep working on > rest vmas on -ENOMEM due to unmapped hole. >=20 > First thing totally makes sense to me(that's exactly I wanted to > do but somehow missed) so it should go stable tree. However, > second stuff might be arguble so it would be great if you split > the patch. I fully understand and relate to the basic motivation of this patch. The ENOMEM that this patch checks for, IIUC, is the ENOMEM that is returned on unmapped holes. Such ENOMEM does not appear, according to the man page, to be a valid reason to return ENOMEM to userspace. Presumably process_madvise() is expected to skip unmapped holes and not to fail because of them. Having said that, I do not think that the check that the patch does is clean or clearly documented. In addition, this patch (and some work on process_madvise()) raise in my mind a couple of questions: 1. There are other errors that process_madvise might encounter and can be propagated back to userspace, but are not documented. For instance if can_madv_lru_vma() fails on MADV_COLD, userspace will get EINVAL. EINVAL is not documented as a valid error-code for such case in either madvise() and process_madvise() man pages. 2. If an advice fails due to other reason, specifically split_huge_page(), there might be partial success within a VMA. I guess for now it is fine to ignore such failures since there is no guarantee on the OS behavior following most advices (excluding MADV_DONTNEED). Regards, Nadav=