linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Jason Baron <jbaron@akamai.com>
To: Nicholas Piggin <npiggin@gmail.com>,
	Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: 'Vlastimil Babka' <vbabka@suse.cz>,
	'Alexander Viro' <viro@zeniv.linux.org.uk>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, 'Michal Hocko' <mhocko@kernel.org>,
	netdev@vger.kernel.org, Eric Dumazet <eric.dumazet@gmail.com>
Subject: Re: [PATCH] fs/select: add vmalloc fallback for select(2)
Date: Fri, 23 Sep 2016 12:47:55 -0400	[thread overview]
Message-ID: <57E55CBB.5060309@akamai.com> (raw)
In-Reply-To: <20160923172434.7ad8f2e0@roar.ozlabs.ibm.com>

Hi,

On 09/23/2016 03:24 AM, Nicholas Piggin wrote:
> On Fri, 23 Sep 2016 14:42:53 +0800
> "Hillf Danton" <hillf.zj@alibaba-inc.com> wrote:
>
>>>
>>> The select(2) syscall performs a kmalloc(size, GFP_KERNEL) where size grows
>>> with the number of fds passed. We had a customer report page allocation
>>> failures of order-4 for this allocation. This is a costly order, so it might
>>> easily fail, as the VM expects such allocation to have a lower-order fallback.
>>>
>>> Such trivial fallback is vmalloc(), as the memory doesn't have to be
>>> physically contiguous. Also the allocation is temporary for the duration of the
>>> syscall, so it's unlikely to stress vmalloc too much.
>>>
>>> Note that the poll(2) syscall seems to use a linked list of order-0 pages, so
>>> it doesn't need this kind of fallback.
>
> How about something like this? (untested)
>
> Eric isn't wrong about vmalloc sucking :)
>
> Thanks,
> Nick
>
>
> ---
>   fs/select.c | 57 +++++++++++++++++++++++++++++++++++++++++++--------------
>   1 file changed, 43 insertions(+), 14 deletions(-)
>
> diff --git a/fs/select.c b/fs/select.c
> index 8ed9da5..3b4834c 100644
> --- a/fs/select.c
> +++ b/fs/select.c
> @@ -555,6 +555,7 @@ int core_sys_select(int n, fd_set __user *inp, fd_set __user *outp,
>   	void *bits;
>   	int ret, max_fds;
>   	unsigned int size;
> +	size_t nr_bytes;
>   	struct fdtable *fdt;
>   	/* Allocate small arguments on the stack to save memory and be faster */
>   	long stack_fds[SELECT_STACK_ALLOC/sizeof(long)];
> @@ -576,21 +577,39 @@ int core_sys_select(int n, fd_set __user *inp, fd_set __user *outp,
>   	 * since we used fdset we need to allocate memory in units of
>   	 * long-words.
>   	 */
> -	size = FDS_BYTES(n);
> +	ret = -ENOMEM;
>   	bits = stack_fds;
> -	if (size > sizeof(stack_fds) / 6) {
> -		/* Not enough space in on-stack array; must use kmalloc */
> +	size = FDS_BYTES(n);
> +	nr_bytes = 6 * size;
> +
> +	if (unlikely(nr_bytes > PAGE_SIZE)) {
> +		/* Avoid multi-page allocation if possible */
>   		ret = -ENOMEM;
> -		bits = kmalloc(6 * size, GFP_KERNEL);
> -		if (!bits)
> -			goto out_nofds;
> +		fds.in = kmalloc(size, GFP_KERNEL);
> +		fds.out = kmalloc(size, GFP_KERNEL);
> +		fds.ex = kmalloc(size, GFP_KERNEL);
> +		fds.res_in = kmalloc(size, GFP_KERNEL);
> +		fds.res_out = kmalloc(size, GFP_KERNEL);
> +		fds.res_ex = kmalloc(size, GFP_KERNEL);
> +
> +		if (!(fds.in && fds.out && fds.ex &&
> +				fds.res_in && fds.res_out && fds.res_ex))
> +			goto out;
> +	} else {
> +		if (nr_bytes > sizeof(stack_fds)) {
> +			/* Not enough space in on-stack array */
> +			if (nr_bytes > PAGE_SIZE * 2)

The 'if' looks extraneous?

Also, I wonder if we can just avoid some allocations altogether by 
checking by if the user fd_set pointers are NULL? That can avoid failures :)

Thanks,

-Jason

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2016-09-23 16:47 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-22 15:28 Vlastimil Babka
2016-09-22 16:24 ` Eric Dumazet
2016-09-22 16:40   ` Vlastimil Babka
2016-09-23  6:42 ` Hillf Danton
2016-09-23  7:24   ` Nicholas Piggin
2016-09-23 16:47     ` Jason Baron [this message]
2016-09-27  8:44       ` Vlastimil Babka
2016-09-27 11:24         ` Nicholas Piggin
2016-09-27 11:37           ` David Laight
2016-09-27 11:42             ` Nicholas Piggin
2016-09-27 11:51               ` Vlastimil Babka
2016-09-28 16:30                 ` David Laight
2016-09-28 20:04                   ` Vlastimil Babka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=57E55CBB.5060309@akamai.com \
    --to=jbaron@akamai.com \
    --cc=eric.dumazet@gmail.com \
    --cc=hillf.zj@alibaba-inc.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=npiggin@gmail.com \
    --cc=vbabka@suse.cz \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox