linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Tariq Toukan <tariqt@mellanox.com>
To: Eric Dumazet <edumazet@google.com>,
	Tariq Toukan <tariqt@mellanox.com>,
	Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Tom Herbert <tom@herbertland.com>,
	Eric Dumazet <eric.dumazet@gmail.com>,
	Alexander Duyck <alexander.duyck@gmail.com>,
	"David S . Miller" <davem@davemloft.net>,
	netdev <netdev@vger.kernel.org>, Martin KaFai Lau <kafai@fb.com>,
	Saeed Mahameed <saeedm@mellanox.com>,
	Willem de Bruijn <willemb@google.com>,
	Brenden Blanco <bblanco@plumgrid.com>,
	Alexei Starovoitov <ast@kernel.org>,
	linux-mm <linux-mm@kvack.org>
Subject: Re: [PATCH v3 net-next 08/14] mlx4: use order-0 pages for RX
Date: Thu, 16 Feb 2017 15:08:00 +0200	[thread overview]
Message-ID: <37bc04eb-71c9-0433-304d-87fcf8b06be3@mellanox.com> (raw)
In-Reply-To: <CANn89iJip45peBQB9Tn1mWVg+1QYZH+01CqkAUctd3xqwPw8Zg@mail.gmail.com>


On 15/02/2017 6:57 PM, Eric Dumazet wrote:
> On Wed, Feb 15, 2017 at 8:42 AM, Tariq Toukan <tariqt@mellanox.com> wrote:
>> Isn't it the same principle in page_frag_alloc() ?
>> It is called form __netdev_alloc_skb()/__napi_alloc_skb().
>>
>> Why is it ok to have order-3 pages (PAGE_FRAG_CACHE_MAX_ORDER) there?
> This is not ok.
>
> This is a very well known problem, we already mentioned that here in the past,
> but at least core networking stack uses  order-0 pages on PowerPC.
You're right, we should have done this as well in mlx4 on PPC.
> mlx4 driver suffers from this problem 100% more than other drivers ;)
>
> One problem at a time Tariq. Right now, only mlx4 has this big problem
> compared to other NIC.
We _do_ agree that the series improves the driver's quality, stability,
and performance in a fragmented system.

But due to the late rc we're in, and the fact that we know what benchmarks
our customers are going to run, we cannot Ack the series and get it
as is inside kernel 4.11.

We are interested to get your series merged along another perf improvement
we are preparing for next rc1. This way we will earn the desired stability
without breaking existing benchmarks.
I think this is the right thing to do at this point of time.


The idea behind the perf improvement, suggested by Jesper, is to split
the napi_poll call mlx4_en_process_rx_cq() loop into two.
The first loop extracts completed CQEs and starts prefetching on data
and RX descriptors. The second loop process the real packets.

>
> Then, if we _still_ hit major issues, we might also need to force
> napi_get_frags()
> to allocate skb->head using kmalloc() instead of a page frag.
>
> That is a very simple fix.
>
> Remember that we have skb->truesize that is an approximation, it will
> never be completely accurate,
> but we need to make it better.
>
> mlx4 driver pretends to have a frag truesize of 1536 bytes, but this
> is obviously wrong when host is under memory pressure
> (2 frags per page -> truesize should be 2048)
>
>
>> By using netdev/napi_alloc_skb, you'll get that the SKB's linear data is a
>> frag of a huge page,
>> and it is not going to be freed before the other non-linear frags.
>> Cannot this cause the same threats (memory pinning and so...)?
>>
>> Currently, mlx4 doesn't use this generic API, while most other drivers do.
>>
>> Similar claims are true for TX:
>> https://github.com/torvalds/linux/commit/5640f7685831e088fe6c2e1f863a6805962f8e81
> We do not have such problem on TX. GFP_KERNEL allocations do not have
> the same issues.
>
> Tasks are usually not malicious in our DC, and most serious
> applications use memcg or such memory control.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2017-02-16 13:08 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20170213195858.5215-1-edumazet@google.com>
     [not found] ` <20170213195858.5215-9-edumazet@google.com>
     [not found]   ` <CAKgT0Ufx0Y=9kjLax36Gx4e7Y-A7sKZDNYxgJ9wbCT4_vxHhGA@mail.gmail.com>
     [not found]     ` <CANn89iLkPB_Dx1L2dFfwOoeXOmPhu_C3OO2yqZi8+Rvjr=-EtA@mail.gmail.com>
     [not found]       ` <CAKgT0UeB_e_Z7LM1_r=en8JJdgLhoYFstWpCDQN6iawLYZJKDA@mail.gmail.com>
2017-02-14 12:12         ` Jesper Dangaard Brouer
2017-02-14 13:45           ` Eric Dumazet
2017-02-14 14:12             ` Eric Dumazet
2017-02-14 14:56             ` Tariq Toukan
2017-02-14 15:51               ` Eric Dumazet
2017-02-14 16:03                 ` Eric Dumazet
2017-02-14 17:29                 ` Tom Herbert
2017-02-15 16:42                   ` Tariq Toukan
2017-02-15 16:57                     ` Eric Dumazet
2017-02-16 13:08                       ` Tariq Toukan [this message]
2017-02-16 15:47                         ` Eric Dumazet
2017-02-16 17:05                         ` Tom Herbert
2017-02-16 17:11                           ` Eric Dumazet
2017-02-16 20:49                             ` Saeed Mahameed
2017-02-16 19:03                           ` David Miller
2017-02-16 21:06                             ` Saeed Mahameed
2017-02-14 17:04               ` David Miller
2017-02-14 17:17                 ` David Laight
2017-02-14 17:22                   ` David Miller
2017-02-14 19:38                 ` Jesper Dangaard Brouer
2017-02-14 19:59                   ` David Miller
2017-02-14 17:29               ` Alexander Duyck
2017-02-14 18:46                 ` Jesper Dangaard Brouer
2017-02-14 19:02                   ` Eric Dumazet
2017-02-14 20:02                     ` Jesper Dangaard Brouer
2017-02-14 21:56                       ` Eric Dumazet
2017-02-14 19:06                   ` Alexander Duyck
2017-02-14 19:50                     ` Jesper Dangaard Brouer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=37bc04eb-71c9-0433-304d-87fcf8b06be3@mellanox.com \
    --to=tariqt@mellanox.com \
    --cc=alexander.duyck@gmail.com \
    --cc=ast@kernel.org \
    --cc=bblanco@plumgrid.com \
    --cc=brouer@redhat.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=eric.dumazet@gmail.com \
    --cc=kafai@fb.com \
    --cc=linux-mm@kvack.org \
    --cc=netdev@vger.kernel.org \
    --cc=saeedm@mellanox.com \
    --cc=tom@herbertland.com \
    --cc=willemb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox