From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 25132C433EF for ; Fri, 18 Mar 2022 15:37:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 348BB8D0002; Fri, 18 Mar 2022 11:37:55 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2F5DB8D0001; Fri, 18 Mar 2022 11:37:55 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 173528D0002; Fri, 18 Mar 2022 11:37:55 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0245.hostedemail.com [216.40.44.245]) by kanga.kvack.org (Postfix) with ESMTP id 0696D8D0001 for ; Fri, 18 Mar 2022 11:37:55 -0400 (EDT) Received: from smtpin17.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id B6729A3ED4 for ; Fri, 18 Mar 2022 15:37:54 +0000 (UTC) X-FDA: 79257912468.17.7787B86 Received: from mail-pf1-f176.google.com (mail-pf1-f176.google.com [209.85.210.176]) by imf13.hostedemail.com (Postfix) with ESMTP id 58EA420025 for ; Fri, 18 Mar 2022 15:37:54 +0000 (UTC) Received: by mail-pf1-f176.google.com with SMTP id s11so9707805pfu.13 for ; Fri, 18 Mar 2022 08:37:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to; bh=nw1gVfsrX5TVqVbU6JW2Pvb7rhoK13gC1EWpvY8GUzk=; b=V9GZdjqUjhPWfzDhRwgSYuMGZqkZwUCDIu8Ly/vG0H6HALjQdpgWnhe1AkH/Hckq9t EF3KVAcYrtH7o9jmUy46lFZiMW4KCWlE1zvbQQSAqOIjwVDgSJ8O2xgyHstQgMtN6Bpx NiDgDYNnEBMOU8pGHkKT800OVIJ/BqXjQ5J0heIlD9LMQxXooyIZD+RD61JtSMBK5lkr k7MbbskS7Uzxf9h/ACQ+sox3k135Sx8nz51lJ/msgxSUL4VYsyU4orAI3h5YUiIoWyEd 5EXGB32lRPALEhQyGkHDfBfNWOeks4Lk7EKi4Fl84VuIPgvf2sA0Twmyq2lL0lYAug9j gFQg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :references:mime-version:content-disposition :content-transfer-encoding:in-reply-to; bh=nw1gVfsrX5TVqVbU6JW2Pvb7rhoK13gC1EWpvY8GUzk=; b=HZ07g7CJN5jQ0F9vBxfJf2pv6A7p2LhJ6FUF5ltkqza52UEtPyS/FpDIDm9SH8iP+J Wn8IkFxhPX6g4gzLcwPdsoINwqRizbGsamenqsabjuMyUUPKYmzqmWF9FnXUVPE2DG+R hvZ5RoTBTd7fOANKsKHs/4ymgbM0Z9glvLIAq9gff7OrXgxPecnnoSf9LnB75YLN1cQY vSbgtAWaMEOC22IeZ97kTFsFVV0m08ThMSiq1OeAn3jwZtiQhAnbzK28gRop0IHpXCoD PRJ2PvgIFrvJrLvJGSR3xBACHXu0Lg+oz4ouL5tob08/Tu68XHZA0DH7jouJFeXuHxZu WWnQ== X-Gm-Message-State: AOAM531IqX6IqHdpnR3UpIQaZ076abEagVMFk0oKdA2DEot/09FvRfdu jeefQZwdLBgNOOggz5oT/dw= X-Google-Smtp-Source: ABdhPJwDUIHB6+NZdEJ+cPQLFdeJ68gW2iHEfXUiokmdBbTLYgDb67L2UC9qU0S/dsTC83peA/21Ng== X-Received: by 2002:a63:ef41:0:b0:381:7f41:64d8 with SMTP id c1-20020a63ef41000000b003817f4164d8mr8322625pgk.312.1647617873029; Fri, 18 Mar 2022 08:37:53 -0700 (PDT) Received: from google.com ([2620:15c:211:201:bd26:cec:b459:db6e]) by smtp.gmail.com with ESMTPSA id b2-20020a056a000a8200b004e1414f0bb1sm10528428pfl.135.2022.03.18.08.37.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 18 Mar 2022 08:37:52 -0700 (PDT) Date: Fri, 18 Mar 2022 08:37:50 -0700 From: Minchan Kim To: Charan Teja Kalla Cc: Nadav Amit , Suren Baghdasaryan , Andrew Morton , Vlastimil Babka , David Rientjes , Stephen Rothwell , Edgar Arriaga =?iso-8859-1?Q?Garc=EDa?= , Michal Hocko , linux-mm , LKML , "# 5 . 10+" Subject: Re: [PATCH V2,2/2] mm: madvise: skip unmapped vma holes passed to process_madvise Message-ID: References: <4f091776142f2ebf7b94018146de72318474e686.1647008754.git.quic_charante@quicinc.com> <20220315164807.7a9cf1694ee2db8709a8597c@linux-foundation.org> <5428f192-1537-fa03-8e9c-4a8322772546@quicinc.com> <20220316142906.e41e39d2315e35ef43f4aad6@linux-foundation.org> <74852e90-003b-84b8-9836-72258e3c5057@quicinc.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <74852e90-003b-84b8-9836-72258e3c5057@quicinc.com> X-Rspam-User: X-Rspamd-Queue-Id: 58EA420025 X-Stat-Signature: x3m97t4zayaoiwkwx5gkzz88448ywmm4 Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=V9GZdjqU; dmarc=fail reason="SPF not aligned (relaxed), DKIM not aligned (relaxed)" header.from=kernel.org (policy=none); spf=pass (imf13.hostedemail.com: domain of minchan.kim@gmail.com designates 209.85.210.176 as permitted sender) smtp.mailfrom=minchan.kim@gmail.com X-Rspamd-Server: rspam03 X-HE-Tag: 1647617874-243956 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Mar 18, 2022 at 07:35:41PM +0530, Charan Teja Kalla wrote: > Thank you for valuable inputs. >=20 > On 3/18/2022 2:08 AM, Nadav Amit wrote: > >>>>>> IMO, it's worth to note in man page. > >>>>>> > >>>>> Or the current patch for just ENOMEM is sufficient here and we ju= st have > >>>>> to update the man page? > >>>> I think the "On success, process_madvise() returns the number of b= ytes > >>>> advised" behaviour sounds useful. But madvise() doesn't do that. > >>>> > >>>> RETURN VALUE > >>>> On success, madvise() returns zero. On error, it returns -= 1 and errno > >>>> is set to indicate the error. > >>>> > >>>> So why is it desirable in the case of process_madvise()? > >>> Since process_madvise deal with multiple ranges and could fail at o= ne of > >>> them in the middle or pocessing, people could decide where the call > >>> failed and then make a strategy whether they will abort at the poin= t or > >>> continue to hint next addresses. Here, problem of the strategy is A= PI > >>> doesn't return any error vaule if it has processed any bytes so the= y > >>> would have limitation to decide a policy. That's the limitation for > >>> every vector IO syscalls, unfortunately. > >>> > >>>> > >>>> > >>>> And why was process_madvise() designed this way? Or was it > >>>> always simply an error in the manpage? > >> Taking a closer look, indeed manpage seems to be wrong. > >> https://elixir.bootlin.com/linux/v5.17-rc8/source/mm/madvise.c#L1154 > >> indicates that in the presence of unmapped holes madvise will skip > >> them but will return ENOMEM and that's what process_madvise is > >> ultimately returning in this case. So, the manpage claim of "This > >> return value may be less than the total number of requested bytes, i= f > >> an error occurred after some iovec elements were already processed." > >> does not reflect the reality in our case because the return value wi= ll > >> be -ENOMEM. After the desired behavior is finalized I'll modify the > >> manpage accordingly. > > Since process_madvise() might be used in sort of non-cooperative mode= , > > I think that the caller cannot guarantee that it knows exactly the > > memory layout of the process whose memory it madvise=E2=80=99s. I kno= w that > > MADV_DONTNEED for instance is not supported (at least today) by > > process_madvise(), but if it were, the caller may want which exact > > memory was madvise'd even if the target process ran some other > > memory layout changing syscalls (e.g., munmap()). > >=20 > > IOW, skipping holes and just returning the total number of madvise=E2= =80=99d > > bytes might not be enough. >=20 > Then does the advised bytes range by default including holes is a > correct design? > Say the [start, len) range passed in the iovec by the user contains the > layout like, vma1 -- hole-- vma2 -- hole -- vma3. >=20 > Under ideal case, where all vma's are eligible for advise, the total > bytes processed returning should be vma3->end - vma1->start. This is > success case. >=20 > Now, say that vma1 is succeeded but vma2(say VM_LOCKED) is failed at > advise. In such case processed bytes will be > vma2->start-vma1->start(still consider hole as bytes processed), so tha= t > user may restart/skip at vma2, then continue. This return type will be > partially processed bytes. >=20 > If the system doesn't found any VMA in the passed range by user, it > returns ENOMEM as not a single advisable vma is found in the range. As I mentioned in other reply, let's do not make any exception(i.e., skipping hole) for vectored memory syscall but exact processed bytes on the exact ranges.