Quantcast

[3.19.y-ckt stable] Linux 3.19.8-ckt22 stable review

classic Classic list List threaded Threaded
42 messages Options
123
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[3.19.y-ckt stable] Linux 3.19.8-ckt22 stable review

Kamal Mostafa
This is the start of the review cycle for the Linux 3.19.8-ckt22 stable
kernel.

This version contains 40 new patches, summarized below.  The new patches
are posted as replies to this message and also available in this git branch:

https://git.launchpad.net/~canonical-kernel/linux/+git/linux-stable-ckt/log/?h=linux-3.19.y-review

git://git.launchpad.net/~canonical-kernel/linux/+git/linux-stable-ckt  linux-3.19.y-review

The review period for version 3.19.8-ckt22 will be open for the next three
days.  To report a problem, please reply to the relevant follow-up patch
message.

For more information about the Linux 3.19.y-ckt extended stable kernel
series, see https://wiki.ubuntu.com/Kernel/Dev/ExtendedStable .

 -Kamal

--
 arch/arm64/net/bpf_jit_comp.c                |   1 +
 crypto/ahash.c                               |   3 +-
 drivers/base/regmap/regmap-spmi.c            |   2 +-
 drivers/gpu/drm/i915/intel_crt.c             |   8 +-
 drivers/gpu/drm/radeon/atombios_crtc.c       |  10 +++
 drivers/infiniband/hw/ipath/ipath_file_ops.c |   5 ++
 drivers/input/misc/max8997_haptic.c          |   6 +-
 drivers/net/ethernet/freescale/fec_main.c    |  10 ++-
 drivers/net/ethernet/mellanox/mlx4/en_rx.c   |   2 +-
 drivers/net/macvtap.c                        |   2 +-
 drivers/regulator/s2mps11.c                  |  28 +++++--
 drivers/usb/core/hub.c                       |   8 +-
 fs/isofs/rock.c                              |  13 ++-
 fs/namei.c                                   |  20 +----
 fs/ocfs2/acl.c                               |  63 ++++++++++++++
 fs/ocfs2/acl.h                               |   4 +
 fs/ocfs2/namei.c                             |  23 +----
 fs/ocfs2/refcounttree.c                      |  17 +---
 fs/ocfs2/xattr.c                             |  14 ++--
 fs/ocfs2/xattr.h                             |   4 +-
 include/linux/compiler-gcc.h                 | 120 ++++++++++++++++++++++++++-
 include/linux/compiler-gcc3.h                |  23 -----
 include/linux/compiler-gcc4.h                |  87 -------------------
 include/linux/compiler-gcc5.h                |  65 ---------------
 include/linux/mfd/samsung/s2mps11.h          |   2 +
 include/linux/mod_devicetable.h              |   3 +
 include/linux/skbuff.h                       |  17 ++++
 include/net/codel.h                          |   4 +
 include/net/sch_generic.h                    |  20 ++++-
 kernel/bpf/verifier.c                        |   1 -
 kernel/workqueue.c                           |  11 +++
 net/bridge/br_ioctl.c                        |   5 +-
 net/core/rtnetlink.c                         |  18 ++--
 net/core/skbuff.c                            |  11 +--
 net/decnet/dn_route.c                        |   9 +-
 net/ipv4/fib_frontend.c                      |   6 +-
 net/ipv4/route.c                             |  12 +++
 net/ipv4/tcp_output.c                        |   6 +-
 net/ipv6/reassembly.c                        |   6 +-
 net/llc/af_llc.c                             |   1 +
 net/netfilter/nf_conntrack_core.c            |   4 +-
 net/openvswitch/actions.c                    |   6 +-
 net/openvswitch/vport-netdev.c               |   2 +-
 net/openvswitch/vport.h                      |   7 --
 net/sched/sch_api.c                          |   8 +-
 net/sched/sch_cbq.c                          |  12 +--
 net/sched/sch_choke.c                        |   6 +-
 net/sched/sch_codel.c                        |  10 ++-
 net/sched/sch_drr.c                          |   9 +-
 net/sched/sch_dsmark.c                       |  11 +--
 net/sched/sch_fq.c                           |   4 +-
 net/sched/sch_fq_codel.c                     |  17 ++--
 net/sched/sch_hfsc.c                         |   9 +-
 net/sched/sch_hhf.c                          |  10 ++-
 net/sched/sch_htb.c                          |  24 +++---
 net/sched/sch_multiq.c                       |  16 ++--
 net/sched/sch_netem.c                        |  74 ++++++++++++++---
 net/sched/sch_pie.c                          |   5 +-
 net/sched/sch_prio.c                         |  15 ++--
 net/sched/sch_qfq.c                          |   9 +-
 net/sched/sch_red.c                          |  10 +--
 net/sched/sch_sfb.c                          |  10 +--
 net/sched/sch_sfq.c                          |  16 ++--
 net/sched/sch_tbf.c                          |  15 ++--
 net/vmw_vsock/af_vsock.c                     |  21 +----
 net/x25/x25_facilities.c                     |   1 +
 sound/pci/hda/patch_realtek.c                |  13 +++
 tools/lib/traceevent/parse-filter.c          |   6 +-
 68 files changed, 566 insertions(+), 454 deletions(-)

Al Viro (2):
      atomic_open(): fix the handling of create_error
      get_rock_ridge_filename(): handle malformed NM entries

Behan Webster (1):
      [3.19-stable] x86: LLVMLinux: Fix "incomplete type const struct x86cpu_device_id"

Chris Friesen (1):
      route: do not cache fib route info on local routes with oif

Daniel Borkmann (2):
      net: use skb_postpush_rcsum instead of own implementations
      vlan: pull on __vlan_insert_tag error path and fix csum correction

Daniel Jurgens (1):
      net/mlx4_en: Fix endianness bug in IPV6 csum calculation

Daniel Vetter (1):
      drm/i915: Bail out of pipe config compute loop on LPT

David S. Miller (1):
      decnet: Do not build routes to devices without decnet private data.

Doug Ledford (1):
      [3.19-stable only] fix backport "IB/security: restrict use of the write() interface"

Eric Dumazet (2):
      macvtap: segmented packet is consumed
      tcp: refresh skb timestamp at retransmit time

Greg Kroah-Hartman (1):
      Revert "usb: hub: do not clear BOS field during reset device"

Herbert Xu (1):
      crypto: hash - Fix page length clamping in hash walk

Ian Campbell (1):
      VSOCK: do not disconnect socket when peer has shutdown SEND only

Jack Pham (1):
      regmap: spmi: Fix regmap_spmi_ext_read in multi-byte case

Jann Horn (1):
      bpf: fix double-fdput in replace_map_fd_with_map_ptr()

Joe Perches (1):
      compiler-gcc: integrate the various compiler-gcc[345].h files

Junxiao Bi (1):
      ocfs2: fix posix_acl_create deadlock

Kaho Ng (1):
      ALSA: hda - Fix white noise on Asus UX501VW headset

Kangjie Lu (3):
      net: fix infoleak in llc
      net: fix infoleak in rtnetlink
      net: fix a kernel infoleak in x25 module

Krzysztof Kozlowski (1):
      regulator: s2mps11: Fix invalid selector mask and voltages for buck9

Linus Torvalds (1):
      nf_conntrack: avoid kernel pointer value leak in slab name

Lucas Stach (1):
      drm/radeon: fix PLL sharing on DCE6.1 (v2)

Marek Szyprowski (1):
      Input: max8997-haptic - fix NULL pointer dereference

Neil Horman (1):
      netem: Segment GSO packets on enqueue

Nikolay Aleksandrov (1):
      net: bridge: fix old ioctl unlocked net device walk

Paolo Abeni (1):
      ipv4/fib: don't warn when primary address is missing if in_dev is dead

Steven Rostedt (1):
      tools lib traceevent: Do not reassign parg after collapse_tree()

Steven Rostedt (Red Hat) (1):
      tools lib traceevent: Free filter tokens in process_filter()

Uwe Kleine-König (1):
      net: fec: only clear a queue's work bit if the queue was emptied

WANG Cong (4):
      net_sched: introduce qdisc_replace() helper
      net_sched: update hierarchical backlog too
      sch_htb: update backlog as well
      sch_dsmark: update backlog as well

Wanpeng Li (1):
      workqueue: fix rebind bound workers warning

Yura Pakhuchiy (1):
      ALSA: hda - Fix subwoofer pin on ASUS N751 and N551

Zi Shen Lim (1):
      arm64: bpf: jit JMP_JSET_{X,K}
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH 3.19.y-ckt 02/40] [3.19-stable only] fix backport "IB/security: restrict use of the write() interface"

Kamal Mostafa
3.19.8-ckt22 -stable review patch.  If anyone has any objections, please let me know.

---8<------------------------------------------------------------

From: Doug Ledford <[hidden email]>

Upstream commit e6bd18f57aad (IB/security: Restrict use of the write()
interface) handled the cases for all drivers in the current upstream
kernel.  The ipath driver had recently been deprecated and moved to
staging, and then removed entirely.  It had the same security flaw as
the qib driver.  Fix that up with this separate patch.

Note: The ipath driver only supports hardware that ended production
over 10 years ago, so there should be none of this hardware still
present in the wild.

Signed-off-by: Doug Ledford <[hidden email]>
Signed-off-by: Kamal Mostafa <[hidden email]>
---
 drivers/infiniband/hw/ipath/ipath_file_ops.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/infiniband/hw/ipath/ipath_file_ops.c b/drivers/infiniband/hw/ipath/ipath_file_ops.c
index 6d7f453..a0626b8 100644
--- a/drivers/infiniband/hw/ipath/ipath_file_ops.c
+++ b/drivers/infiniband/hw/ipath/ipath_file_ops.c
@@ -45,6 +45,8 @@
 #include <linux/cpu.h>
 #include <asm/pgtable.h>
 
+#include <rdma/ib.h>
+
 #include "ipath_kernel.h"
 #include "ipath_common.h"
 #include "ipath_user_sdma.h"
@@ -2240,6 +2242,9 @@ static ssize_t ipath_write(struct file *fp, const char __user *data,
  ssize_t ret = 0;
  void *dest;
 
+ if (WARN_ON_ONCE(!ib_safe_file_access(fp)))
+ return -EACCES;
+
  if (count < sizeof(cmd.type)) {
  ret = -EINVAL;
  goto bail;
--
2.7.4

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH 3.19.y-ckt 10/40] get_rock_ridge_filename(): handle malformed NM entries

Kamal Mostafa
In reply to this post by Kamal Mostafa
3.19.8-ckt22 -stable review patch.  If anyone has any objections, please let me know.

---8<------------------------------------------------------------

From: Al Viro <[hidden email]>

commit 99d825822eade8d827a1817357cbf3f889a552d6 upstream.

Payloads of NM entries are not supposed to contain NUL.  When we run
into such, only the part prior to the first NUL goes into the
concatenation (i.e. the directory entry name being encoded by a bunch
of NM entries).  We do stop when the amount collected so far + the
claimed amount in the current NM entry exceed 254.  So far, so good,
but what we return as the total length is the sum of *claimed*
sizes, not the actual amount collected.  And that can grow pretty
large - not unlimited, since you'd need to put CE entries in
between to be able to get more than the maximum that could be
contained in one isofs directory entry / continuation chunk and
we are stop once we'd encountered 32 CEs, but you can get about 8Kb
easily.  And that's what will be passed to readdir callback as the
name length.  8Kb __copy_to_user() from a buffer allocated by
__get_free_page()

Signed-off-by: Al Viro <[hidden email]>
Signed-off-by: Kamal Mostafa <[hidden email]>
---
 fs/isofs/rock.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/fs/isofs/rock.c b/fs/isofs/rock.c
index 735d752..204659a 100644
--- a/fs/isofs/rock.c
+++ b/fs/isofs/rock.c
@@ -203,6 +203,8 @@ int get_rock_ridge_filename(struct iso_directory_record *de,
  int retnamlen = 0;
  int truncate = 0;
  int ret = 0;
+ char *p;
+ int len;
 
  if (!ISOFS_SB(inode->i_sb)->s_rock)
  return 0;
@@ -267,12 +269,17 @@ repeat:
  rr->u.NM.flags);
  break;
  }
- if ((strlen(retname) + rr->len - 5) >= 254) {
+ len = rr->len - 5;
+ if (retnamlen + len >= 254) {
  truncate = 1;
  break;
  }
- strncat(retname, rr->u.NM.name, rr->len - 5);
- retnamlen += rr->len - 5;
+ p = memchr(rr->u.NM.name, '\0', len);
+ if (unlikely(p))
+ len = p - rr->u.NM.name;
+ memcpy(retname + retnamlen, rr->u.NM.name, len);
+ retnamlen += len;
+ retname[retnamlen] = '\0';
  break;
  case SIG('R', 'E'):
  kfree(rs.buffer);
--
2.7.4

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH 3.19.y-ckt 19/40] nf_conntrack: avoid kernel pointer value leak in slab name

Kamal Mostafa
In reply to this post by Kamal Mostafa
3.19.8-ckt22 -stable review patch.  If anyone has any objections, please let me know.

---8<------------------------------------------------------------

From: Linus Torvalds <[hidden email]>

commit 31b0b385f69d8d5491a4bca288e25e63f1d945d0 upstream.

The slab name ends up being visible in the directory structure under
/sys, and even if you don't have access rights to the file you can see
the filenames.

Just use a 64-bit counter instead of the pointer to the 'net' structure
to generate a unique name.

This code will go away in 4.7 when the conntrack code moves to a single
kmemcache, but this is the backportable simple solution to avoiding
leaking kernel pointers to user space.

Fixes: 5b3501faa874 ("netfilter: nf_conntrack: per netns nf_conntrack_cachep")
Signed-off-by: Linus Torvalds <[hidden email]>
Acked-by: Eric Dumazet <[hidden email]>
Signed-off-by: David S. Miller <[hidden email]>
Signed-off-by: Kamal Mostafa <[hidden email]>
---
 net/netfilter/nf_conntrack_core.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 46d1b26..0ab748b 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -1736,6 +1736,7 @@ void nf_conntrack_init_end(void)
 
 int nf_conntrack_init_net(struct net *net)
 {
+ static atomic64_t unique_id;
  int ret = -ENOMEM;
  int cpu;
 
@@ -1759,7 +1760,8 @@ int nf_conntrack_init_net(struct net *net)
  if (!net->ct.stat)
  goto err_pcpu_lists;
 
- net->ct.slabname = kasprintf(GFP_KERNEL, "nf_conntrack_%p", net);
+ net->ct.slabname = kasprintf(GFP_KERNEL, "nf_conntrack_%llu",
+ (u64)atomic64_inc_return(&unique_id));
  if (!net->ct.slabname)
  goto err_slabname;
 
--
2.7.4

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH 3.19.y-ckt 29/40] ipv4/fib: don't warn when primary address is missing if in_dev is dead

Kamal Mostafa
In reply to this post by Kamal Mostafa
3.19.8-ckt22 -stable review patch.  If anyone has any objections, please let me know.

---8<------------------------------------------------------------

From: Paolo Abeni <[hidden email]>

[ Upstream commit 391a20333b8393ef2e13014e6e59d192c5594471 ]

After commit fbd40ea0180a ("ipv4: Don't do expensive useless work
during inetdev destroy.") when deleting an interface,
fib_del_ifaddr() can be executed without any primary address
present on the dead interface.

The above is safe, but triggers some "bug: prim == NULL" warnings.

This commit avoids warning if the in_dev is dead

Signed-off-by: Paolo Abeni <[hidden email]>
Signed-off-by: David S. Miller <[hidden email]>
Signed-off-by: Kamal Mostafa <[hidden email]>
---
 net/ipv4/fib_frontend.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index d4c698c..499a777 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -799,7 +799,11 @@ void fib_del_ifaddr(struct in_ifaddr *ifa, struct in_ifaddr *iprim)
  if (ifa->ifa_flags & IFA_F_SECONDARY) {
  prim = inet_ifa_byprefix(in_dev, any, ifa->ifa_mask);
  if (prim == NULL) {
- pr_warn("%s: bug: prim == NULL\n", __func__);
+ /* if the device has been deleted, we don't perform
+ * address promotion
+ */
+ if (!in_dev->dead)
+ pr_warn("%s: bug: prim == NULL\n", __func__);
  return;
  }
  if (iprim && iprim != prim) {
--
2.7.4

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH 3.19.y-ckt 36/40] net: fix infoleak in llc

Kamal Mostafa
In reply to this post by Kamal Mostafa
3.19.8-ckt22 -stable review patch.  If anyone has any objections, please let me know.

---8<------------------------------------------------------------

From: Kangjie Lu <[hidden email]>

[ Upstream commit b8670c09f37bdf2847cc44f36511a53afc6161fd ]

The stack object “info” has a total size of 12 bytes. Its last byte
is padding which is not initialized and leaked via “put_cmsg”.

Signed-off-by: Kangjie Lu <[hidden email]>
Signed-off-by: David S. Miller <[hidden email]>
Signed-off-by: Kamal Mostafa <[hidden email]>
---
 net/llc/af_llc.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/llc/af_llc.c b/net/llc/af_llc.c
index 2c0b83c..05e5c76 100644
--- a/net/llc/af_llc.c
+++ b/net/llc/af_llc.c
@@ -626,6 +626,7 @@ static void llc_cmsg_rcv(struct msghdr *msg, struct sk_buff *skb)
  if (llc->cmsg_flags & LLC_CMSG_PKTINFO) {
  struct llc_pktinfo info;
 
+ memset(&info, 0, sizeof(info));
  info.lpi_ifindex = llc_sk(skb->sk)->dev->ifindex;
  llc_pdu_decode_dsap(skb, &info.lpi_sap);
  llc_pdu_decode_da(skb, info.lpi_mac);
--
2.7.4

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH 3.19.y-ckt 40/40] net: fix a kernel infoleak in x25 module

Kamal Mostafa
In reply to this post by Kamal Mostafa
3.19.8-ckt22 -stable review patch.  If anyone has any objections, please let me know.

---8<------------------------------------------------------------

From: Kangjie Lu <[hidden email]>

[ Upstream commit 79e48650320e6fba48369fccf13fd045315b19b8 ]

Stack object "dte_facilities" is allocated in x25_rx_call_request(),
which is supposed to be initialized in x25_negotiate_facilities.
However, 5 fields (8 bytes in total) are not initialized. This
object is then copied to userland via copy_to_user, thus infoleak
occurs.

Signed-off-by: Kangjie Lu <[hidden email]>
Signed-off-by: David S. Miller <[hidden email]>
Signed-off-by: Kamal Mostafa <[hidden email]>
---
 net/x25/x25_facilities.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/x25/x25_facilities.c b/net/x25/x25_facilities.c
index 7ecd04c..997ff7b 100644
--- a/net/x25/x25_facilities.c
+++ b/net/x25/x25_facilities.c
@@ -277,6 +277,7 @@ int x25_negotiate_facilities(struct sk_buff *skb, struct sock *sk,
 
  memset(&theirs, 0, sizeof(theirs));
  memcpy(new, ours, sizeof(*new));
+ memset(dte, 0, sizeof(*dte));
 
  len = x25_parse_facilities(skb, &theirs, dte, &x25->vc_facil_mask);
  if (len < 0)
--
2.7.4

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH 3.19.y-ckt 38/40] VSOCK: do not disconnect socket when peer has shutdown SEND only

Kamal Mostafa
In reply to this post by Kamal Mostafa
3.19.8-ckt22 -stable review patch.  If anyone has any objections, please let me know.

---8<------------------------------------------------------------

From: Ian Campbell <[hidden email]>

[ Upstream commit dedc58e067d8c379a15a8a183c5db318201295bb ]

The peer may be expecting a reply having sent a request and then done a
shutdown(SHUT_WR), so tearing down the whole socket at this point seems
wrong and breaks for me with a client which does a SHUT_WR.

Looking at other socket family's stream_recvmsg callbacks doing a shutdown
here does not seem to be the norm and removing it does not seem to have
had any adverse effects that I can see.

I'm using Stefan's RFC virtio transport patches, I'm unsure of the impact
on the vmci transport.

Signed-off-by: Ian Campbell <[hidden email]>
Cc: "David S. Miller" <[hidden email]>
Cc: Stefan Hajnoczi <[hidden email]>
Cc: Claudio Imbrenda <[hidden email]>
Cc: Andy King <[hidden email]>
Cc: Dmitry Torokhov <[hidden email]>
Cc: Jorgen Hansen <[hidden email]>
Cc: Adit Ranadive <[hidden email]>
Cc: [hidden email]
Signed-off-by: David S. Miller <[hidden email]>
Signed-off-by: Kamal Mostafa <[hidden email]>
---
 net/vmw_vsock/af_vsock.c | 21 +--------------------
 1 file changed, 1 insertion(+), 20 deletions(-)

diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 1d0e39c..316e856 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -1796,27 +1796,8 @@ vsock_stream_recvmsg(struct kiocb *kiocb,
  else if (sk->sk_shutdown & RCV_SHUTDOWN)
  err = 0;
 
- if (copied > 0) {
- /* We only do these additional bookkeeping/notification steps
- * if we actually copied something out of the queue pair
- * instead of just peeking ahead.
- */
-
- if (!(flags & MSG_PEEK)) {
- /* If the other side has shutdown for sending and there
- * is nothing more to read, then modify the socket
- * state.
- */
- if (vsk->peer_shutdown & SEND_SHUTDOWN) {
- if (vsock_stream_has_data(vsk) <= 0) {
- sk->sk_state = SS_UNCONNECTED;
- sock_set_flag(sk, SOCK_DONE);
- sk->sk_state_change(sk);
- }
- }
- }
+ if (copied > 0)
  err = copied;
- }
 
 out_wait:
  finish_wait(sk_sleep(sk), &wait);
--
2.7.4

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH 3.19.y-ckt 35/40] netem: Segment GSO packets on enqueue

Kamal Mostafa
In reply to this post by Kamal Mostafa
3.19.8-ckt22 -stable review patch.  If anyone has any objections, please let me know.

---8<------------------------------------------------------------

From: Neil Horman <[hidden email]>

[ Upstream commit 6071bd1aa13ed9e41824bafad845b7b7f4df5cfd ]

This was recently reported to me, and reproduced on the latest net kernel,
when attempting to run netperf from a host that had a netem qdisc attached
to the egress interface:

[  788.073771] ---------------------[ cut here ]---------------------------
[  788.096716] WARNING: at net/core/dev.c:2253 skb_warn_bad_offload+0xcd/0xda()
[  788.129521] bnx2: caps=(0x00000001801949b3, 0x0000000000000000) len=2962
data_len=0 gso_size=1448 gso_type=1 ip_summed=3
[  788.182150] Modules linked in: sch_netem kvm_amd kvm crc32_pclmul ipmi_ssif
ghash_clmulni_intel sp5100_tco amd64_edac_mod aesni_intel lrw gf128mul
glue_helper ablk_helper edac_mce_amd cryptd pcspkr sg edac_core hpilo ipmi_si
i2c_piix4 k10temp fam15h_power hpwdt ipmi_msghandler shpchp acpi_power_meter
pcc_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c
sd_mod crc_t10dif crct10dif_generic mgag200 syscopyarea sysfillrect sysimgblt
i2c_algo_bit drm_kms_helper ahci ata_generic pata_acpi ttm libahci
crct10dif_pclmul pata_atiixp tg3 libata crct10dif_common drm crc32c_intel ptp
serio_raw bnx2 r8169 hpsa pps_core i2c_core mii dm_mirror dm_region_hash dm_log
dm_mod
[  788.465294] CPU: 16 PID: 0 Comm: swapper/16 Tainted: G        W
------------   3.10.0-327.el7.x86_64 #1
[  788.511521] Hardware name: HP ProLiant DL385p Gen8, BIOS A28 12/17/2012
[  788.542260]  ffff880437c036b8 f7afc56532a53db9 ffff880437c03670
ffffffff816351f1
[  788.576332]  ffff880437c036a8 ffffffff8107b200 ffff880633e74200
ffff880231674000
[  788.611943]  0000000000000001 0000000000000003 0000000000000000
ffff880437c03710
[  788.647241] Call Trace:
[  788.658817]  <IRQ>  [<ffffffff816351f1>] dump_stack+0x19/0x1b
[  788.686193]  [<ffffffff8107b200>] warn_slowpath_common+0x70/0xb0
[  788.713803]  [<ffffffff8107b29c>] warn_slowpath_fmt+0x5c/0x80
[  788.741314]  [<ffffffff812f92f3>] ? ___ratelimit+0x93/0x100
[  788.767018]  [<ffffffff81637f49>] skb_warn_bad_offload+0xcd/0xda
[  788.796117]  [<ffffffff8152950c>] skb_checksum_help+0x17c/0x190
[  788.823392]  [<ffffffffa01463a1>] netem_enqueue+0x741/0x7c0 [sch_netem]
[  788.854487]  [<ffffffff8152cb58>] dev_queue_xmit+0x2a8/0x570
[  788.880870]  [<ffffffff8156ae1d>] ip_finish_output+0x53d/0x7d0
...

The problem occurs because netem is not prepared to handle GSO packets (as it
uses skb_checksum_help in its enqueue path, which cannot manipulate these
frames).

The solution I think is to simply segment the skb in a simmilar fashion to the
way we do in __dev_queue_xmit (via validate_xmit_skb), with some minor changes.
When we decide to corrupt an skb, if the frame is GSO, we segment it, corrupt
the first segment, and enqueue the remaining ones.

tested successfully by myself on the latest net kernel, to which this applies

Signed-off-by: Neil Horman <[hidden email]>
CC: Jamal Hadi Salim <[hidden email]>
CC: "David S. Miller" <[hidden email]>
CC: [hidden email]
CC: [hidden email]
CC: [hidden email]
Acked-by: Eric Dumazet <[hidden email]>
Signed-off-by: David S. Miller <[hidden email]>
Signed-off-by: Kamal Mostafa <[hidden email]>
---
 net/sched/sch_netem.c | 61 +++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 59 insertions(+), 2 deletions(-)

diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index ab3ab21..593979a 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -395,6 +395,25 @@ static void tfifo_enqueue(struct sk_buff *nskb, struct Qdisc *sch)
  sch->q.qlen++;
 }
 
+/* netem can't properly corrupt a megapacket (like we get from GSO), so instead
+ * when we statistically choose to corrupt one, we instead segment it, returning
+ * the first packet to be corrupted, and re-enqueue the remaining frames
+ */
+static struct sk_buff *netem_segment(struct sk_buff *skb, struct Qdisc *sch)
+{
+ struct sk_buff *segs;
+ netdev_features_t features = netif_skb_features(skb);
+
+ segs = skb_gso_segment(skb, features & ~NETIF_F_GSO_MASK);
+
+ if (IS_ERR_OR_NULL(segs)) {
+ qdisc_reshape_fail(skb, sch);
+ return NULL;
+ }
+ consume_skb(skb);
+ return segs;
+}
+
 /*
  * Insert one skb into qdisc.
  * Note: parent depends on return value to account for queue length.
@@ -407,7 +426,11 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch)
  /* We don't fill cb now as skb_unshare() may invalidate it */
  struct netem_skb_cb *cb;
  struct sk_buff *skb2;
+ struct sk_buff *segs = NULL;
+ unsigned int len = 0, last_len, prev_len = qdisc_pkt_len(skb);
+ int nb = 0;
  int count = 1;
+ int rc = NET_XMIT_SUCCESS;
 
  /* Random duplication */
  if (q->duplicate && q->duplicate >= get_crandom(&q->dup_cor))
@@ -453,10 +476,23 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch)
  * do it now in software before we mangle it.
  */
  if (q->corrupt && q->corrupt >= get_crandom(&q->corrupt_cor)) {
+ if (skb_is_gso(skb)) {
+ segs = netem_segment(skb, sch);
+ if (!segs)
+ return NET_XMIT_DROP;
+ } else {
+ segs = skb;
+ }
+
+ skb = segs;
+ segs = segs->next;
+
  if (!(skb = skb_unshare(skb, GFP_ATOMIC)) ||
     (skb->ip_summed == CHECKSUM_PARTIAL &&
-     skb_checksum_help(skb)))
- return qdisc_drop(skb, sch);
+     skb_checksum_help(skb))) {
+ rc = qdisc_drop(skb, sch);
+ goto finish_segs;
+ }
 
  skb->data[prandom_u32() % skb_headlen(skb)] ^=
  1<<(prandom_u32() % 8);
@@ -516,6 +552,27 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch)
  sch->qstats.requeues++;
  }
 
+finish_segs:
+ if (segs) {
+ while (segs) {
+ skb2 = segs->next;
+ segs->next = NULL;
+ qdisc_skb_cb(segs)->pkt_len = segs->len;
+ last_len = segs->len;
+ rc = qdisc_enqueue(segs, sch);
+ if (rc != NET_XMIT_SUCCESS) {
+ if (net_xmit_drop_count(rc))
+ qdisc_qstats_drop(sch);
+ } else {
+ nb++;
+ len += last_len;
+ }
+ segs = skb2;
+ }
+ sch->q.qlen += nb;
+ if (nb > 1)
+ qdisc_tree_reduce_backlog(sch, 1 - nb, prev_len - len);
+ }
  return NET_XMIT_SUCCESS;
 }
 
--
2.7.4

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH 3.19.y-ckt 37/40] net: fix infoleak in rtnetlink

Kamal Mostafa
In reply to this post by Kamal Mostafa
3.19.8-ckt22 -stable review patch.  If anyone has any objections, please let me know.

---8<------------------------------------------------------------

From: Kangjie Lu <[hidden email]>

[ Upstream commit 5f8e44741f9f216e33736ea4ec65ca9ac03036e6 ]

The stack object “map” has a total size of 32 bytes. Its last 4
bytes are padding generated by compiler. These padding bytes are
not initialized and sent out via “nla_put”.

Signed-off-by: Kangjie Lu <[hidden email]>
Signed-off-by: David S. Miller <[hidden email]>
Signed-off-by: Kamal Mostafa <[hidden email]>
---
 net/core/rtnetlink.c | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 743ee58..193b83c 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1040,14 +1040,16 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
  goto nla_put_failure;
 
  if (1) {
- struct rtnl_link_ifmap map = {
- .mem_start   = dev->mem_start,
- .mem_end     = dev->mem_end,
- .base_addr   = dev->base_addr,
- .irq         = dev->irq,
- .dma         = dev->dma,
- .port        = dev->if_port,
- };
+ struct rtnl_link_ifmap map;
+
+ memset(&map, 0, sizeof(map));
+ map.mem_start   = dev->mem_start;
+ map.mem_end     = dev->mem_end;
+ map.base_addr   = dev->base_addr;
+ map.irq         = dev->irq;
+ map.dma         = dev->dma;
+ map.port        = dev->if_port;
+
  if (nla_put(skb, IFLA_MAP, sizeof(map), &map))
  goto nla_put_failure;
  }
--
2.7.4

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH 3.19.y-ckt 39/40] net: bridge: fix old ioctl unlocked net device walk

Kamal Mostafa
In reply to this post by Kamal Mostafa
3.19.8-ckt22 -stable review patch.  If anyone has any objections, please let me know.

---8<------------------------------------------------------------

From: Nikolay Aleksandrov <[hidden email]>

[ Upstream commit 31ca0458a61a502adb7ed192bf9716c6d05791a5 ]

get_bridge_ifindices() is used from the old "deviceless" bridge ioctl
calls which aren't called with rtnl held. The comment above says that it is
called with rtnl but that is not really the case.
Here's a sample output from a test ASSERT_RTNL() which I put in
get_bridge_ifindices and executed "brctl show":
[  957.422726] RTNL: assertion failed at net/bridge//br_ioctl.c (30)
[  957.422925] CPU: 0 PID: 1862 Comm: brctl Tainted: G        W  O
4.6.0-rc4+ #157
[  957.423009] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.8.1-20150318_183358- 04/01/2014
[  957.423009]  0000000000000000 ffff880058adfdf0 ffffffff8138dec5
0000000000000400
[  957.423009]  ffffffff81ce8380 ffff880058adfe58 ffffffffa05ead32
0000000000000001
[  957.423009]  00007ffec1a444b0 0000000000000400 ffff880053c19130
0000000000008940
[  957.423009] Call Trace:
[  957.423009]  [<ffffffff8138dec5>] dump_stack+0x85/0xc0
[  957.423009]  [<ffffffffa05ead32>]
br_ioctl_deviceless_stub+0x212/0x2e0 [bridge]
[  957.423009]  [<ffffffff81515beb>] sock_ioctl+0x22b/0x290
[  957.423009]  [<ffffffff8126ba75>] do_vfs_ioctl+0x95/0x700
[  957.423009]  [<ffffffff8126c159>] SyS_ioctl+0x79/0x90
[  957.423009]  [<ffffffff8163a4c0>] entry_SYSCALL_64_fastpath+0x23/0xc1

Since it only reads bridge ifindices, we can use rcu to safely walk the net
device list. Also remove the wrong rtnl comment above.

Signed-off-by: Nikolay Aleksandrov <[hidden email]>
Signed-off-by: David S. Miller <[hidden email]>
Signed-off-by: Kamal Mostafa <[hidden email]>
---
 net/bridge/br_ioctl.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/net/bridge/br_ioctl.c b/net/bridge/br_ioctl.c
index 8d423bc..f876f70 100644
--- a/net/bridge/br_ioctl.c
+++ b/net/bridge/br_ioctl.c
@@ -21,18 +21,19 @@
 #include <asm/uaccess.h>
 #include "br_private.h"
 
-/* called with RTNL */
 static int get_bridge_ifindices(struct net *net, int *indices, int num)
 {
  struct net_device *dev;
  int i = 0;
 
- for_each_netdev(net, dev) {
+ rcu_read_lock();
+ for_each_netdev_rcu(net, dev) {
  if (i >= num)
  break;
  if (dev->priv_flags & IFF_EBRIDGE)
  indices[i++] = dev->ifindex;
  }
+ rcu_read_unlock();
 
  return i;
 }
--
2.7.4

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH 3.19.y-ckt 17/40] workqueue: fix rebind bound workers warning

Kamal Mostafa
In reply to this post by Kamal Mostafa
3.19.8-ckt22 -stable review patch.  If anyone has any objections, please let me know.

---8<------------------------------------------------------------

From: Wanpeng Li <[hidden email]>

commit f7c17d26f43d5cc1b7a6b896cd2fa24a079739b9 upstream.

------------[ cut here ]------------
WARNING: CPU: 0 PID: 16 at kernel/workqueue.c:4559 rebind_workers+0x1c0/0x1d0
Modules linked in:
CPU: 0 PID: 16 Comm: cpuhp/0 Not tainted 4.6.0-rc4+ #31
Hardware name: IBM IBM System x3550 M4 Server -[7914IUW]-/00Y8603, BIOS -[D7E128FUS-1.40]- 07/23/2013
 0000000000000000 ffff881037babb58 ffffffff8139d885 0000000000000010
 0000000000000000 0000000000000000 0000000000000000 ffff881037babba8
 ffffffff8108505d ffff881037ba0000 000011cf3e7d6e60 0000000000000046
Call Trace:
 dump_stack+0x89/0xd4
 __warn+0xfd/0x120
 warn_slowpath_null+0x1d/0x20
 rebind_workers+0x1c0/0x1d0
 workqueue_cpu_up_callback+0xf5/0x1d0
 notifier_call_chain+0x64/0x90
 ? trace_hardirqs_on_caller+0xf2/0x220
 ? notify_prepare+0x80/0x80
 __raw_notifier_call_chain+0xe/0x10
 __cpu_notify+0x35/0x50
 notify_down_prepare+0x5e/0x80
 ? notify_prepare+0x80/0x80
 cpuhp_invoke_callback+0x73/0x330
 ? __schedule+0x33e/0x8a0
 cpuhp_down_callbacks+0x51/0xc0
 cpuhp_thread_fun+0xc1/0xf0
 smpboot_thread_fn+0x159/0x2a0
 ? smpboot_create_threads+0x80/0x80
 kthread+0xef/0x110
 ? wait_for_completion+0xf0/0x120
 ? schedule_tail+0x35/0xf0
 ret_from_fork+0x22/0x50
 ? __init_kthread_worker+0x70/0x70
---[ end trace eb12ae47d2382d8f ]---
notify_down_prepare: attempt to take down CPU 0 failed

This bug can be reproduced by below config w/ nohz_full= all cpus:

CONFIG_BOOTPARAM_HOTPLUG_CPU0=y
CONFIG_DEBUG_HOTPLUG_CPU0=y
CONFIG_NO_HZ_FULL=y

As Thomas pointed out:

| If a down prepare callback fails, then DOWN_FAILED is invoked for all
| callbacks which have successfully executed DOWN_PREPARE.
|
| But, workqueue has actually two notifiers. One which handles
| UP/DOWN_FAILED/ONLINE and one which handles DOWN_PREPARE.
|
| Now look at the priorities of those callbacks:
|
| CPU_PRI_WORKQUEUE_UP        = 5
| CPU_PRI_WORKQUEUE_DOWN      = -5
|
| So the call order on DOWN_PREPARE is:
|
| CB 1
| CB ...
| CB workqueue_up() -> Ignores DOWN_PREPARE
| CB ...
| CB X ---> Fails
|
| So we call up to CB X with DOWN_FAILED
|
| CB 1
| CB ...
| CB workqueue_up() -> Handles DOWN_FAILED
| CB ...
| CB X-1
|
| So the problem is that the workqueue stuff handles DOWN_FAILED in the up
| callback, while it should do it in the down callback. Which is not a good idea
| either because it wants to be called early on rollback...
|
| Brilliant stuff, isn't it? The hotplug rework will solve this problem because
| the callbacks become symetric, but for the existing mess, we need some
| workaround in the workqueue code.

The boot CPU handles housekeeping duty(unbound timers, workqueues,
timekeeping, ...) on behalf of full dynticks CPUs. It must remain
online when nohz full is enabled. There is a priority set to every
notifier_blocks:

workqueue_cpu_up > tick_nohz_cpu_down > workqueue_cpu_down

So tick_nohz_cpu_down callback failed when down prepare cpu 0, and
notifier_blocks behind tick_nohz_cpu_down will not be called any
more, which leads to workers are actually not unbound. Then hotplug
state machine will fallback to undo and online cpu 0 again. Workers
will be rebound unconditionally even if they are not unbound and
trigger the warning in this progress.

This patch fix it by catching !DISASSOCIATED to avoid rebind bound
workers.

Cc: Tejun Heo <[hidden email]>
Cc: Lai Jiangshan <[hidden email]>
Cc: Thomas Gleixner <[hidden email]>
Cc: Peter Zijlstra <[hidden email]>
Cc: Frédéric Weisbecker <[hidden email]>
Suggested-by: Lai Jiangshan <[hidden email]>
Signed-off-by: Wanpeng Li <[hidden email]>
Signed-off-by: Kamal Mostafa <[hidden email]>
---
 kernel/workqueue.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index b838f171..d5c7ca1 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -4570,6 +4570,17 @@ static void rebind_workers(struct worker_pool *pool)
   pool->attrs->cpumask) < 0);
 
  spin_lock_irq(&pool->lock);
+
+ /*
+ * XXX: CPU hotplug notifiers are weird and can call DOWN_FAILED
+ * w/o preceding DOWN_PREPARE.  Work around it.  CPU hotplug is
+ * being reworked and this can go away in time.
+ */
+ if (!(pool->flags & POOL_DISASSOCIATED)) {
+ spin_unlock_irq(&pool->lock);
+ return;
+ }
+
  pool->flags &= ~POOL_DISASSOCIATED;
 
  for_each_pool_worker(worker, pool) {
--
2.7.4

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH 3.19.y-ckt 25/40] decnet: Do not build routes to devices without decnet private data.

Kamal Mostafa
In reply to this post by Kamal Mostafa
3.19.8-ckt22 -stable review patch.  If anyone has any objections, please let me know.

---8<------------------------------------------------------------

From: "David S. Miller" <[hidden email]>

[ Upstream commit a36a0d4008488fa545c74445d69eaf56377d5d4e ]

In particular, make sure we check for decnet private presence
for loopback devices.

Signed-off-by: David S. Miller <[hidden email]>
Signed-off-by: Kamal Mostafa <[hidden email]>
---
 net/decnet/dn_route.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/net/decnet/dn_route.c b/net/decnet/dn_route.c
index daccc4a..4047341 100644
--- a/net/decnet/dn_route.c
+++ b/net/decnet/dn_route.c
@@ -1042,10 +1042,13 @@ source_ok:
  if (!fld.daddr) {
  fld.daddr = fld.saddr;
 
- err = -EADDRNOTAVAIL;
  if (dev_out)
  dev_put(dev_out);
+ err = -EINVAL;
  dev_out = init_net.loopback_dev;
+ if (!dev_out->dn_ptr)
+ goto out;
+ err = -EADDRNOTAVAIL;
  dev_hold(dev_out);
  if (!fld.daddr) {
  fld.daddr =
@@ -1118,6 +1121,8 @@ source_ok:
  if (dev_out == NULL)
  goto out;
  dn_db = rcu_dereference_raw(dev_out->dn_ptr);
+ if (!dn_db)
+ goto e_inval;
  /* Possible improvement - check all devices for local addr */
  if (dn_dev_islocal(dev_out, fld.daddr)) {
  dev_put(dev_out);
@@ -1159,6 +1164,8 @@ select_source:
  dev_put(dev_out);
  dev_out = init_net.loopback_dev;
  dev_hold(dev_out);
+ if (!dev_out->dn_ptr)
+ goto e_inval;
  fld.flowidn_oif = dev_out->ifindex;
  if (res.fi)
  dn_fib_info_put(res.fi);
--
2.7.4

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH 3.19.y-ckt 34/40] sch_dsmark: update backlog as well

Kamal Mostafa
In reply to this post by Kamal Mostafa
3.19.8-ckt22 -stable review patch.  If anyone has any objections, please let me know.

---8<------------------------------------------------------------

From: WANG Cong <[hidden email]>

[ Upstream commit bdf17661f63a79c3cb4209b970b1cc39e34f7543 ]

Similarly, we need to update backlog too when we update qlen.

Cc: Jamal Hadi Salim <[hidden email]>
Signed-off-by: Cong Wang <[hidden email]>
Signed-off-by: David S. Miller <[hidden email]>
Signed-off-by: Kamal Mostafa <[hidden email]>
---
 net/sched/sch_dsmark.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/sched/sch_dsmark.c b/net/sched/sch_dsmark.c
index 11f9530..eb87a2a 100644
--- a/net/sched/sch_dsmark.c
+++ b/net/sched/sch_dsmark.c
@@ -256,6 +256,7 @@ static int dsmark_enqueue(struct sk_buff *skb, struct Qdisc *sch)
  return err;
  }
 
+ qdisc_qstats_backlog_inc(sch, skb);
  sch->q.qlen++;
 
  return NET_XMIT_SUCCESS;
@@ -278,6 +279,7 @@ static struct sk_buff *dsmark_dequeue(struct Qdisc *sch)
  return NULL;
 
  qdisc_bstats_update(sch, skb);
+ qdisc_qstats_backlog_dec(sch, skb);
  sch->q.qlen--;
 
  index = skb->tc_index & (p->indices - 1);
@@ -393,6 +395,7 @@ static void dsmark_reset(struct Qdisc *sch)
 
  pr_debug("%s(sch %p,[qdisc %p])\n", __func__, sch, p);
  qdisc_reset(p->q);
+ sch->qstats.backlog = 0;
  sch->q.qlen = 0;
 }
 
--
2.7.4

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH 3.19.y-ckt 33/40] sch_htb: update backlog as well

Kamal Mostafa
In reply to this post by Kamal Mostafa
3.19.8-ckt22 -stable review patch.  If anyone has any objections, please let me know.

---8<------------------------------------------------------------

From: WANG Cong <[hidden email]>

[ Upstream commit 431e3a8e36a05a37126f34b41aa3a5a6456af04e ]

We saw qlen!=0 but backlog==0 on our production machine:

qdisc htb 1: dev eth0 root refcnt 2 r2q 10 default 1 direct_packets_stat 0 ver 3.17
 Sent 172680457356 bytes 222469449 pkt (dropped 0, overlimits 123575834 requeues 0)
 backlog 0b 72p requeues 0

The problem is we only count qlen for HTB qdisc but not backlog.
We need to update backlog too when we update qlen, so that we
can at least know the average packet length.

Cc: Jamal Hadi Salim <[hidden email]>
Acked-by: Jamal Hadi Salim <[hidden email]>
Signed-off-by: Cong Wang <[hidden email]>
Signed-off-by: David S. Miller <[hidden email]>
Signed-off-by: Kamal Mostafa <[hidden email]>
---
 net/sched/sch_htb.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index 6b118b1..ccff006 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -600,6 +600,7 @@ static int htb_enqueue(struct sk_buff *skb, struct Qdisc *sch)
  htb_activate(q, cl);
  }
 
+ qdisc_qstats_backlog_inc(sch, skb);
  sch->q.qlen++;
  return NET_XMIT_SUCCESS;
 }
@@ -889,6 +890,7 @@ static struct sk_buff *htb_dequeue(struct Qdisc *sch)
 ok:
  qdisc_bstats_update(sch, skb);
  qdisc_unthrottled(sch);
+ qdisc_qstats_backlog_dec(sch, skb);
  sch->q.qlen--;
  return skb;
  }
@@ -955,6 +957,7 @@ static unsigned int htb_drop(struct Qdisc *sch)
  unsigned int len;
  if (cl->un.leaf.q->ops->drop &&
     (len = cl->un.leaf.q->ops->drop(cl->un.leaf.q))) {
+ sch->qstats.backlog -= len;
  sch->q.qlen--;
  if (!cl->un.leaf.q->q.qlen)
  htb_deactivate(q, cl);
@@ -984,12 +987,12 @@ static void htb_reset(struct Qdisc *sch)
  }
  cl->prio_activity = 0;
  cl->cmode = HTB_CAN_SEND;
-
  }
  }
  qdisc_watchdog_cancel(&q->watchdog);
  __skb_queue_purge(&q->direct_queue);
  sch->q.qlen = 0;
+ sch->qstats.backlog = 0;
  memset(q->hlevel, 0, sizeof(q->hlevel));
  memset(q->row_mask, 0, sizeof(q->row_mask));
  for (i = 0; i < TC_HTB_NUMPRIO; i++)
--
2.7.4

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH 3.19.y-ckt 32/40] net_sched: update hierarchical backlog too

Kamal Mostafa
In reply to this post by Kamal Mostafa
3.19.8-ckt22 -stable review patch.  If anyone has any objections, please let me know.

---8<------------------------------------------------------------

From: WANG Cong <[hidden email]>

[ Upstream commit 2ccccf5fb43ff62b2b96cc58d95fc0b3596516e4 ]

When the bottom qdisc decides to, for example, drop some packet,
it calls qdisc_tree_decrease_qlen() to update the queue length
for all its ancestors, we need to update the backlog too to
keep the stats on root qdisc accurate.

Cc: Jamal Hadi Salim <[hidden email]>
Acked-by: Jamal Hadi Salim <[hidden email]>
Signed-off-by: Cong Wang <[hidden email]>
Signed-off-by: David S. Miller <[hidden email]>
Signed-off-by: Kamal Mostafa <[hidden email]>
---
 include/net/codel.h       |  4 ++++
 include/net/sch_generic.h |  5 +++--
 net/sched/sch_api.c       |  8 +++++---
 net/sched/sch_cbq.c       |  5 +++--
 net/sched/sch_choke.c     |  6 ++++--
 net/sched/sch_codel.c     | 10 ++++++----
 net/sched/sch_drr.c       |  3 ++-
 net/sched/sch_fq.c        |  4 +++-
 net/sched/sch_fq_codel.c  | 17 ++++++++++++-----
 net/sched/sch_hfsc.c      |  3 ++-
 net/sched/sch_hhf.c       | 10 +++++++---
 net/sched/sch_htb.c       | 10 ++++++----
 net/sched/sch_multiq.c    |  8 +++++---
 net/sched/sch_netem.c     |  3 ++-
 net/sched/sch_pie.c       |  5 +++--
 net/sched/sch_prio.c      |  7 ++++---
 net/sched/sch_qfq.c       |  3 ++-
 net/sched/sch_red.c       |  3 ++-
 net/sched/sch_sfb.c       |  3 ++-
 net/sched/sch_sfq.c       | 16 +++++++++-------
 net/sched/sch_tbf.c       |  7 +++++--
 21 files changed, 91 insertions(+), 49 deletions(-)

diff --git a/include/net/codel.h b/include/net/codel.h
index aeee280..7302a4d 100644
--- a/include/net/codel.h
+++ b/include/net/codel.h
@@ -158,11 +158,13 @@ struct codel_vars {
  * struct codel_stats - contains codel shared variables and stats
  * @maxpacket: largest packet we've seen so far
  * @drop_count: temp count of dropped packets in dequeue()
+ * @drop_len: bytes of dropped packets in dequeue()
  * ecn_mark: number of packets we ECN marked instead of dropping
  */
 struct codel_stats {
  u32 maxpacket;
  u32 drop_count;
+ u32 drop_len;
  u32 ecn_mark;
 };
 
@@ -297,6 +299,7 @@ static struct sk_buff *codel_dequeue(struct Qdisc *sch,
   vars->rec_inv_sqrt);
  goto end;
  }
+ stats->drop_len += qdisc_pkt_len(skb);
  qdisc_drop(skb, sch);
  stats->drop_count++;
  skb = dequeue_func(vars, sch);
@@ -319,6 +322,7 @@ static struct sk_buff *codel_dequeue(struct Qdisc *sch,
  if (params->ecn && INET_ECN_set_ce(skb)) {
  stats->ecn_mark++;
  } else {
+ stats->drop_len += qdisc_pkt_len(skb);
  qdisc_drop(skb, sch);
  stats->drop_count++;
 
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 9449d4f..3535e3c 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -392,7 +392,8 @@ struct Qdisc *dev_graft_qdisc(struct netdev_queue *dev_queue,
       struct Qdisc *qdisc);
 void qdisc_reset(struct Qdisc *qdisc);
 void qdisc_destroy(struct Qdisc *qdisc);
-void qdisc_tree_decrease_qlen(struct Qdisc *qdisc, unsigned int n);
+void qdisc_tree_reduce_backlog(struct Qdisc *qdisc, unsigned int n,
+       unsigned int len);
 struct Qdisc *qdisc_alloc(struct netdev_queue *dev_queue,
   const struct Qdisc_ops *ops);
 struct Qdisc *qdisc_create_dflt(struct netdev_queue *dev_queue,
@@ -697,7 +698,7 @@ static inline struct Qdisc *qdisc_replace(struct Qdisc *sch, struct Qdisc *new,
  old = *pold;
  *pold = new;
  if (old != NULL) {
- qdisc_tree_decrease_qlen(old, old->q.qlen);
+ qdisc_tree_reduce_backlog(old, old->q.qlen, old->qstats.backlog);
  qdisc_reset(old);
  }
  sch_tree_unlock(sch);
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index a25fae3..a2a7a81 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -740,14 +740,15 @@ static u32 qdisc_alloc_handle(struct net_device *dev)
  return 0;
 }
 
-void qdisc_tree_decrease_qlen(struct Qdisc *sch, unsigned int n)
+void qdisc_tree_reduce_backlog(struct Qdisc *sch, unsigned int n,
+       unsigned int len)
 {
  const struct Qdisc_class_ops *cops;
  unsigned long cl;
  u32 parentid;
  int drops;
 
- if (n == 0)
+ if (n == 0 && len == 0)
  return;
  drops = max_t(int, n, 0);
  while ((parentid = sch->parent)) {
@@ -766,10 +767,11 @@ void qdisc_tree_decrease_qlen(struct Qdisc *sch, unsigned int n)
  cops->put(sch, cl);
  }
  sch->q.qlen -= n;
+ sch->qstats.backlog -= len;
  __qdisc_qstats_drop(sch, drops);
  }
 }
-EXPORT_SYMBOL(qdisc_tree_decrease_qlen);
+EXPORT_SYMBOL(qdisc_tree_reduce_backlog);
 
 static void notify_and_destroy(struct net *net, struct sk_buff *skb,
        struct nlmsghdr *n, u32 clid,
diff --git a/net/sched/sch_cbq.c b/net/sched/sch_cbq.c
index 17ad79e..f6e7a60 100644
--- a/net/sched/sch_cbq.c
+++ b/net/sched/sch_cbq.c
@@ -1909,7 +1909,7 @@ static int cbq_delete(struct Qdisc *sch, unsigned long arg)
 {
  struct cbq_sched_data *q = qdisc_priv(sch);
  struct cbq_class *cl = (struct cbq_class *)arg;
- unsigned int qlen;
+ unsigned int qlen, backlog;
 
  if (cl->filters || cl->children || cl == &q->link)
  return -EBUSY;
@@ -1917,8 +1917,9 @@ static int cbq_delete(struct Qdisc *sch, unsigned long arg)
  sch_tree_lock(sch);
 
  qlen = cl->q->q.qlen;
+ backlog = cl->q->qstats.backlog;
  qdisc_reset(cl->q);
- qdisc_tree_decrease_qlen(cl->q, qlen);
+ qdisc_tree_reduce_backlog(cl->q, qlen, backlog);
 
  if (cl->next_alive)
  cbq_deactivate_class(cl);
diff --git a/net/sched/sch_choke.c b/net/sched/sch_choke.c
index c009eb9..3f6437d 100644
--- a/net/sched/sch_choke.c
+++ b/net/sched/sch_choke.c
@@ -128,8 +128,8 @@ static void choke_drop_by_idx(struct Qdisc *sch, unsigned int idx)
  choke_zap_tail_holes(q);
 
  qdisc_qstats_backlog_dec(sch, skb);
+ qdisc_tree_reduce_backlog(sch, 1, qdisc_pkt_len(skb));
  qdisc_drop(skb, sch);
- qdisc_tree_decrease_qlen(sch, 1);
  --sch->q.qlen;
 }
 
@@ -449,6 +449,7 @@ static int choke_change(struct Qdisc *sch, struct nlattr *opt)
  old = q->tab;
  if (old) {
  unsigned int oqlen = sch->q.qlen, tail = 0;
+ unsigned dropped = 0;
 
  while (q->head != q->tail) {
  struct sk_buff *skb = q->tab[q->head];
@@ -460,11 +461,12 @@ static int choke_change(struct Qdisc *sch, struct nlattr *opt)
  ntab[tail++] = skb;
  continue;
  }
+ dropped += qdisc_pkt_len(skb);
  qdisc_qstats_backlog_dec(sch, skb);
  --sch->q.qlen;
  qdisc_drop(skb, sch);
  }
- qdisc_tree_decrease_qlen(sch, oqlen - sch->q.qlen);
+ qdisc_tree_reduce_backlog(sch, oqlen - sch->q.qlen, dropped);
  q->head = 0;
  q->tail = tail;
  }
diff --git a/net/sched/sch_codel.c b/net/sched/sch_codel.c
index de28f8e..0d60ea5 100644
--- a/net/sched/sch_codel.c
+++ b/net/sched/sch_codel.c
@@ -79,12 +79,13 @@ static struct sk_buff *codel_qdisc_dequeue(struct Qdisc *sch)
 
  skb = codel_dequeue(sch, &q->params, &q->vars, &q->stats, dequeue);
 
- /* We cant call qdisc_tree_decrease_qlen() if our qlen is 0,
+ /* We cant call qdisc_tree_reduce_backlog() if our qlen is 0,
  * or HTB crashes. Defer it for next round.
  */
  if (q->stats.drop_count && sch->q.qlen) {
- qdisc_tree_decrease_qlen(sch, q->stats.drop_count);
+ qdisc_tree_reduce_backlog(sch, q->stats.drop_count, q->stats.drop_len);
  q->stats.drop_count = 0;
+ q->stats.drop_len = 0;
  }
  if (skb)
  qdisc_bstats_update(sch, skb);
@@ -115,7 +116,7 @@ static int codel_change(struct Qdisc *sch, struct nlattr *opt)
 {
  struct codel_sched_data *q = qdisc_priv(sch);
  struct nlattr *tb[TCA_CODEL_MAX + 1];
- unsigned int qlen;
+ unsigned int qlen, dropped = 0;
  int err;
 
  if (!opt)
@@ -149,10 +150,11 @@ static int codel_change(struct Qdisc *sch, struct nlattr *opt)
  while (sch->q.qlen > sch->limit) {
  struct sk_buff *skb = __skb_dequeue(&sch->q);
 
+ dropped += qdisc_pkt_len(skb);
  qdisc_qstats_backlog_dec(sch, skb);
  qdisc_drop(skb, sch);
  }
- qdisc_tree_decrease_qlen(sch, qlen - sch->q.qlen);
+ qdisc_tree_reduce_backlog(sch, qlen - sch->q.qlen, dropped);
 
  sch_tree_unlock(sch);
  return 0;
diff --git a/net/sched/sch_drr.c b/net/sched/sch_drr.c
index d4b3f82..e599803 100644
--- a/net/sched/sch_drr.c
+++ b/net/sched/sch_drr.c
@@ -53,9 +53,10 @@ static struct drr_class *drr_find_class(struct Qdisc *sch, u32 classid)
 static void drr_purge_queue(struct drr_class *cl)
 {
  unsigned int len = cl->qdisc->q.qlen;
+ unsigned int backlog = cl->qdisc->qstats.backlog;
 
  qdisc_reset(cl->qdisc);
- qdisc_tree_decrease_qlen(cl->qdisc, len);
+ qdisc_tree_reduce_backlog(cl->qdisc, len, backlog);
 }
 
 static const struct nla_policy drr_policy[TCA_DRR_MAX + 1] = {
diff --git a/net/sched/sch_fq.c b/net/sched/sch_fq.c
index 333cd94..daad41f 100644
--- a/net/sched/sch_fq.c
+++ b/net/sched/sch_fq.c
@@ -643,6 +643,7 @@ static int fq_change(struct Qdisc *sch, struct nlattr *opt)
  struct fq_sched_data *q = qdisc_priv(sch);
  struct nlattr *tb[TCA_FQ_MAX + 1];
  int err, drop_count = 0;
+ unsigned drop_len = 0;
  u32 fq_log;
 
  if (!opt)
@@ -714,10 +715,11 @@ static int fq_change(struct Qdisc *sch, struct nlattr *opt)
 
  if (!skb)
  break;
+ drop_len += qdisc_pkt_len(skb);
  kfree_skb(skb);
  drop_count++;
  }
- qdisc_tree_decrease_qlen(sch, drop_count);
+ qdisc_tree_reduce_backlog(sch, drop_count, drop_len);
 
  sch_tree_unlock(sch);
  return err;
diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c
index 398484c..122c3a6 100644
--- a/net/sched/sch_fq_codel.c
+++ b/net/sched/sch_fq_codel.c
@@ -173,7 +173,7 @@ static unsigned int fq_codel_drop(struct Qdisc *sch)
 static int fq_codel_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 {
  struct fq_codel_sched_data *q = qdisc_priv(sch);
- unsigned int idx;
+ unsigned int idx, prev_backlog;
  struct fq_codel_flow *flow;
  int uninitialized_var(ret);
 
@@ -201,6 +201,7 @@ static int fq_codel_enqueue(struct sk_buff *skb, struct Qdisc *sch)
  if (++sch->q.qlen <= sch->limit)
  return NET_XMIT_SUCCESS;
 
+ prev_backlog = sch->qstats.backlog;
  q->drop_overlimit++;
  /* Return Congestion Notification only if we dropped a packet
  * from this flow.
@@ -209,7 +210,7 @@ static int fq_codel_enqueue(struct sk_buff *skb, struct Qdisc *sch)
  return NET_XMIT_CN;
 
  /* As we dropped a packet, better let upper stack know this */
- qdisc_tree_decrease_qlen(sch, 1);
+ qdisc_tree_reduce_backlog(sch, 1, prev_backlog - sch->qstats.backlog);
  return NET_XMIT_SUCCESS;
 }
 
@@ -239,6 +240,7 @@ static struct sk_buff *fq_codel_dequeue(struct Qdisc *sch)
  struct fq_codel_flow *flow;
  struct list_head *head;
  u32 prev_drop_count, prev_ecn_mark;
+ unsigned int prev_backlog;
 
 begin:
  head = &q->new_flows;
@@ -257,6 +259,7 @@ begin:
 
  prev_drop_count = q->cstats.drop_count;
  prev_ecn_mark = q->cstats.ecn_mark;
+ prev_backlog = sch->qstats.backlog;
 
  skb = codel_dequeue(sch, &q->cparams, &flow->cvars, &q->cstats,
     dequeue);
@@ -274,12 +277,14 @@ begin:
  }
  qdisc_bstats_update(sch, skb);
  flow->deficit -= qdisc_pkt_len(skb);
- /* We cant call qdisc_tree_decrease_qlen() if our qlen is 0,
+ /* We cant call qdisc_tree_reduce_backlog() if our qlen is 0,
  * or HTB crashes. Defer it for next round.
  */
  if (q->cstats.drop_count && sch->q.qlen) {
- qdisc_tree_decrease_qlen(sch, q->cstats.drop_count);
+ qdisc_tree_reduce_backlog(sch, q->cstats.drop_count,
+  q->cstats.drop_len);
  q->cstats.drop_count = 0;
+ q->cstats.drop_len = 0;
  }
  return skb;
 }
@@ -363,11 +368,13 @@ static int fq_codel_change(struct Qdisc *sch, struct nlattr *opt)
  while (sch->q.qlen > sch->limit) {
  struct sk_buff *skb = fq_codel_dequeue(sch);
 
+ q->cstats.drop_len += qdisc_pkt_len(skb);
  kfree_skb(skb);
  q->cstats.drop_count++;
  }
- qdisc_tree_decrease_qlen(sch, q->cstats.drop_count);
+ qdisc_tree_reduce_backlog(sch, q->cstats.drop_count, q->cstats.drop_len);
  q->cstats.drop_count = 0;
+ q->cstats.drop_len = 0;
 
  sch_tree_unlock(sch);
  return 0;
diff --git a/net/sched/sch_hfsc.c b/net/sched/sch_hfsc.c
index 134f7d2..d3e21da 100644
--- a/net/sched/sch_hfsc.c
+++ b/net/sched/sch_hfsc.c
@@ -895,9 +895,10 @@ static void
 hfsc_purge_queue(struct Qdisc *sch, struct hfsc_class *cl)
 {
  unsigned int len = cl->qdisc->q.qlen;
+ unsigned int backlog = cl->qdisc->qstats.backlog;
 
  qdisc_reset(cl->qdisc);
- qdisc_tree_decrease_qlen(cl->qdisc, len);
+ qdisc_tree_reduce_backlog(cl->qdisc, len, backlog);
 }
 
 static void
diff --git a/net/sched/sch_hhf.c b/net/sched/sch_hhf.c
index 15d3aab..792c6f3 100644
--- a/net/sched/sch_hhf.c
+++ b/net/sched/sch_hhf.c
@@ -390,6 +390,7 @@ static int hhf_enqueue(struct sk_buff *skb, struct Qdisc *sch)
  struct hhf_sched_data *q = qdisc_priv(sch);
  enum wdrr_bucket_idx idx;
  struct wdrr_bucket *bucket;
+ unsigned int prev_backlog;
 
  idx = hhf_classify(skb, sch);
 
@@ -417,6 +418,7 @@ static int hhf_enqueue(struct sk_buff *skb, struct Qdisc *sch)
  if (++sch->q.qlen <= sch->limit)
  return NET_XMIT_SUCCESS;
 
+ prev_backlog = sch->qstats.backlog;
  q->drop_overlimit++;
  /* Return Congestion Notification only if we dropped a packet from this
  * bucket.
@@ -425,7 +427,7 @@ static int hhf_enqueue(struct sk_buff *skb, struct Qdisc *sch)
  return NET_XMIT_CN;
 
  /* As we dropped a packet, better let upper stack know this. */
- qdisc_tree_decrease_qlen(sch, 1);
+ qdisc_tree_reduce_backlog(sch, 1, prev_backlog - sch->qstats.backlog);
  return NET_XMIT_SUCCESS;
 }
 
@@ -535,7 +537,7 @@ static int hhf_change(struct Qdisc *sch, struct nlattr *opt)
 {
  struct hhf_sched_data *q = qdisc_priv(sch);
  struct nlattr *tb[TCA_HHF_MAX + 1];
- unsigned int qlen;
+ unsigned int qlen, prev_backlog;
  int err;
  u64 non_hh_quantum;
  u32 new_quantum = q->quantum;
@@ -585,12 +587,14 @@ static int hhf_change(struct Qdisc *sch, struct nlattr *opt)
  }
 
  qlen = sch->q.qlen;
+ prev_backlog = sch->qstats.backlog;
  while (sch->q.qlen > sch->limit) {
  struct sk_buff *skb = hhf_dequeue(sch);
 
  kfree_skb(skb);
  }
- qdisc_tree_decrease_qlen(sch, qlen - sch->q.qlen);
+ qdisc_tree_reduce_backlog(sch, qlen - sch->q.qlen,
+  prev_backlog - sch->qstats.backlog);
 
  sch_tree_unlock(sch);
  return 0;
diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index 520ffe9..6b118b1 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -1267,7 +1267,6 @@ static int htb_delete(struct Qdisc *sch, unsigned long arg)
 {
  struct htb_sched *q = qdisc_priv(sch);
  struct htb_class *cl = (struct htb_class *)arg;
- unsigned int qlen;
  struct Qdisc *new_q = NULL;
  int last_child = 0;
 
@@ -1287,9 +1286,11 @@ static int htb_delete(struct Qdisc *sch, unsigned long arg)
  sch_tree_lock(sch);
 
  if (!cl->level) {
- qlen = cl->un.leaf.q->q.qlen;
+ unsigned int qlen = cl->un.leaf.q->q.qlen;
+ unsigned int backlog = cl->un.leaf.q->qstats.backlog;
+
  qdisc_reset(cl->un.leaf.q);
- qdisc_tree_decrease_qlen(cl->un.leaf.q, qlen);
+ qdisc_tree_reduce_backlog(cl->un.leaf.q, qlen, backlog);
  }
 
  /* delete from hash and active; remainder in destroy_class */
@@ -1423,10 +1424,11 @@ static int htb_change_class(struct Qdisc *sch, u32 classid,
  sch_tree_lock(sch);
  if (parent && !parent->level) {
  unsigned int qlen = parent->un.leaf.q->q.qlen;
+ unsigned int backlog = parent->un.leaf.q->qstats.backlog;
 
  /* turn parent into inner node */
  qdisc_reset(parent->un.leaf.q);
- qdisc_tree_decrease_qlen(parent->un.leaf.q, qlen);
+ qdisc_tree_reduce_backlog(parent->un.leaf.q, qlen, backlog);
  qdisc_destroy(parent->un.leaf.q);
  if (parent->prio_activity)
  htb_deactivate(q, parent);
diff --git a/net/sched/sch_multiq.c b/net/sched/sch_multiq.c
index f36ff83..23437d6 100644
--- a/net/sched/sch_multiq.c
+++ b/net/sched/sch_multiq.c
@@ -218,7 +218,8 @@ static int multiq_tune(struct Qdisc *sch, struct nlattr *opt)
  if (q->queues[i] != &noop_qdisc) {
  struct Qdisc *child = q->queues[i];
  q->queues[i] = &noop_qdisc;
- qdisc_tree_decrease_qlen(child, child->q.qlen);
+ qdisc_tree_reduce_backlog(child, child->q.qlen,
+  child->qstats.backlog);
  qdisc_destroy(child);
  }
  }
@@ -238,8 +239,9 @@ static int multiq_tune(struct Qdisc *sch, struct nlattr *opt)
  q->queues[i] = child;
 
  if (old != &noop_qdisc) {
- qdisc_tree_decrease_qlen(old,
- old->q.qlen);
+ qdisc_tree_reduce_backlog(old,
+  old->q.qlen,
+  old->qstats.backlog);
  qdisc_destroy(old);
  }
  sch_tree_unlock(sch);
diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index dc44993..ab3ab21 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -597,7 +597,8 @@ deliver:
  if (unlikely(err != NET_XMIT_SUCCESS)) {
  if (net_xmit_drop_count(err)) {
  qdisc_qstats_drop(sch);
- qdisc_tree_decrease_qlen(sch, 1);
+ qdisc_tree_reduce_backlog(sch, 1,
+  qdisc_pkt_len(skb));
  }
  }
  goto tfifo_dequeue;
diff --git a/net/sched/sch_pie.c b/net/sched/sch_pie.c
index b783a44..71ae3b9 100644
--- a/net/sched/sch_pie.c
+++ b/net/sched/sch_pie.c
@@ -183,7 +183,7 @@ static int pie_change(struct Qdisc *sch, struct nlattr *opt)
 {
  struct pie_sched_data *q = qdisc_priv(sch);
  struct nlattr *tb[TCA_PIE_MAX + 1];
- unsigned int qlen;
+ unsigned int qlen, dropped = 0;
  int err;
 
  if (!opt)
@@ -232,10 +232,11 @@ static int pie_change(struct Qdisc *sch, struct nlattr *opt)
  while (sch->q.qlen > sch->limit) {
  struct sk_buff *skb = __skb_dequeue(&sch->q);
 
+ dropped += qdisc_pkt_len(skb);
  qdisc_qstats_backlog_dec(sch, skb);
  qdisc_drop(skb, sch);
  }
- qdisc_tree_decrease_qlen(sch, qlen - sch->q.qlen);
+ qdisc_tree_reduce_backlog(sch, qlen - sch->q.qlen, dropped);
 
  sch_tree_unlock(sch);
  return 0;
diff --git a/net/sched/sch_prio.c b/net/sched/sch_prio.c
index a677f54..e671b1a 100644
--- a/net/sched/sch_prio.c
+++ b/net/sched/sch_prio.c
@@ -191,7 +191,7 @@ static int prio_tune(struct Qdisc *sch, struct nlattr *opt)
  struct Qdisc *child = q->queues[i];
  q->queues[i] = &noop_qdisc;
  if (child != &noop_qdisc) {
- qdisc_tree_decrease_qlen(child, child->q.qlen);
+ qdisc_tree_reduce_backlog(child, child->q.qlen, child->qstats.backlog);
  qdisc_destroy(child);
  }
  }
@@ -210,8 +210,9 @@ static int prio_tune(struct Qdisc *sch, struct nlattr *opt)
  q->queues[i] = child;
 
  if (old != &noop_qdisc) {
- qdisc_tree_decrease_qlen(old,
- old->q.qlen);
+ qdisc_tree_reduce_backlog(old,
+  old->q.qlen,
+  old->qstats.backlog);
  qdisc_destroy(old);
  }
  sch_tree_unlock(sch);
diff --git a/net/sched/sch_qfq.c b/net/sched/sch_qfq.c
index 6fea320..e2b8fd4 100644
--- a/net/sched/sch_qfq.c
+++ b/net/sched/sch_qfq.c
@@ -221,9 +221,10 @@ static struct qfq_class *qfq_find_class(struct Qdisc *sch, u32 classid)
 static void qfq_purge_queue(struct qfq_class *cl)
 {
  unsigned int len = cl->qdisc->q.qlen;
+ unsigned int backlog = cl->qdisc->qstats.backlog;
 
  qdisc_reset(cl->qdisc);
- qdisc_tree_decrease_qlen(cl->qdisc, len);
+ qdisc_tree_reduce_backlog(cl->qdisc, len, backlog);
 }
 
 static const struct nla_policy qfq_policy[TCA_QFQ_MAX + 1] = {
diff --git a/net/sched/sch_red.c b/net/sched/sch_red.c
index d5abcee..8c0508c 100644
--- a/net/sched/sch_red.c
+++ b/net/sched/sch_red.c
@@ -210,7 +210,8 @@ static int red_change(struct Qdisc *sch, struct nlattr *opt)
  q->flags = ctl->flags;
  q->limit = ctl->limit;
  if (child) {
- qdisc_tree_decrease_qlen(q->qdisc, q->qdisc->q.qlen);
+ qdisc_tree_reduce_backlog(q->qdisc, q->qdisc->q.qlen,
+  q->qdisc->qstats.backlog);
  qdisc_destroy(q->qdisc);
  q->qdisc = child;
  }
diff --git a/net/sched/sch_sfb.c b/net/sched/sch_sfb.c
index fa7f0a3..e1d634e 100644
--- a/net/sched/sch_sfb.c
+++ b/net/sched/sch_sfb.c
@@ -518,7 +518,8 @@ static int sfb_change(struct Qdisc *sch, struct nlattr *opt)
 
  sch_tree_lock(sch);
 
- qdisc_tree_decrease_qlen(q->qdisc, q->qdisc->q.qlen);
+ qdisc_tree_reduce_backlog(q->qdisc, q->qdisc->q.qlen,
+  q->qdisc->qstats.backlog);
  qdisc_destroy(q->qdisc);
  q->qdisc = child;
 
diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
index 0f65ba4..ae24b05 100644
--- a/net/sched/sch_sfq.c
+++ b/net/sched/sch_sfq.c
@@ -369,7 +369,7 @@ static int
 sfq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 {
  struct sfq_sched_data *q = qdisc_priv(sch);
- unsigned int hash;
+ unsigned int hash, dropped;
  sfq_index x, qlen;
  struct sfq_slot *slot;
  int uninitialized_var(ret);
@@ -484,7 +484,7 @@ enqueue:
  return NET_XMIT_SUCCESS;
 
  qlen = slot->qlen;
- sfq_drop(sch);
+ dropped = sfq_drop(sch);
  /* Return Congestion Notification only if we dropped a packet
  * from this flow.
  */
@@ -492,7 +492,7 @@ enqueue:
  return NET_XMIT_CN;
 
  /* As we dropped a packet, better let upper stack know this */
- qdisc_tree_decrease_qlen(sch, 1);
+ qdisc_tree_reduce_backlog(sch, 1, dropped);
  return NET_XMIT_SUCCESS;
 }
 
@@ -560,6 +560,7 @@ static void sfq_rehash(struct Qdisc *sch)
  struct sfq_slot *slot;
  struct sk_buff_head list;
  int dropped = 0;
+ unsigned int drop_len = 0;
 
  __skb_queue_head_init(&list);
 
@@ -588,6 +589,7 @@ static void sfq_rehash(struct Qdisc *sch)
  if (x >= SFQ_MAX_FLOWS) {
 drop:
  qdisc_qstats_backlog_dec(sch, skb);
+ drop_len += qdisc_pkt_len(skb);
  kfree_skb(skb);
  dropped++;
  continue;
@@ -617,7 +619,7 @@ drop:
  }
  }
  sch->q.qlen -= dropped;
- qdisc_tree_decrease_qlen(sch, dropped);
+ qdisc_tree_reduce_backlog(sch, dropped, drop_len);
 }
 
 static void sfq_perturbation(unsigned long arg)
@@ -641,7 +643,7 @@ static int sfq_change(struct Qdisc *sch, struct nlattr *opt)
  struct sfq_sched_data *q = qdisc_priv(sch);
  struct tc_sfq_qopt *ctl = nla_data(opt);
  struct tc_sfq_qopt_v1 *ctl_v1 = NULL;
- unsigned int qlen;
+ unsigned int qlen, dropped = 0;
  struct red_parms *p = NULL;
 
  if (opt->nla_len < nla_attr_size(sizeof(*ctl)))
@@ -690,8 +692,8 @@ static int sfq_change(struct Qdisc *sch, struct nlattr *opt)
 
  qlen = sch->q.qlen;
  while (sch->q.qlen > q->limit)
- sfq_drop(sch);
- qdisc_tree_decrease_qlen(sch, qlen - sch->q.qlen);
+ dropped += sfq_drop(sch);
+ qdisc_tree_reduce_backlog(sch, qlen - sch->q.qlen, dropped);
 
  del_timer(&q->perturb_timer);
  if (q->perturb_period) {
diff --git a/net/sched/sch_tbf.c b/net/sched/sch_tbf.c
index 56a1aef..c2fbde7 100644
--- a/net/sched/sch_tbf.c
+++ b/net/sched/sch_tbf.c
@@ -160,6 +160,7 @@ static int tbf_segment(struct sk_buff *skb, struct Qdisc *sch)
  struct tbf_sched_data *q = qdisc_priv(sch);
  struct sk_buff *segs, *nskb;
  netdev_features_t features = netif_skb_features(skb);
+ unsigned int len = 0, prev_len = qdisc_pkt_len(skb);
  int ret, nb;
 
  segs = skb_gso_segment(skb, features & ~NETIF_F_GSO_MASK);
@@ -172,6 +173,7 @@ static int tbf_segment(struct sk_buff *skb, struct Qdisc *sch)
  nskb = segs->next;
  segs->next = NULL;
  qdisc_skb_cb(segs)->pkt_len = segs->len;
+ len += segs->len;
  ret = qdisc_enqueue(segs, q->qdisc);
  if (ret != NET_XMIT_SUCCESS) {
  if (net_xmit_drop_count(ret))
@@ -183,7 +185,7 @@ static int tbf_segment(struct sk_buff *skb, struct Qdisc *sch)
  }
  sch->q.qlen += nb;
  if (nb > 1)
- qdisc_tree_decrease_qlen(sch, 1 - nb);
+ qdisc_tree_reduce_backlog(sch, 1 - nb, prev_len - len);
  consume_skb(skb);
  return nb > 0 ? NET_XMIT_SUCCESS : NET_XMIT_DROP;
 }
@@ -399,7 +401,8 @@ static int tbf_change(struct Qdisc *sch, struct nlattr *opt)
 
  sch_tree_lock(sch);
  if (child) {
- qdisc_tree_decrease_qlen(q->qdisc, q->qdisc->q.qlen);
+ qdisc_tree_reduce_backlog(q->qdisc, q->qdisc->q.qlen,
+  q->qdisc->qstats.backlog);
  qdisc_destroy(q->qdisc);
  q->qdisc = child;
  }
--
2.7.4

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH 3.19.y-ckt 31/40] net_sched: introduce qdisc_replace() helper

Kamal Mostafa
In reply to this post by Kamal Mostafa
3.19.8-ckt22 -stable review patch.  If anyone has any objections, please let me know.

---8<------------------------------------------------------------

From: WANG Cong <[hidden email]>

[ Upstream commit 86a7996cc8a078793670d82ed97d5a99bb4e8496 ]

Remove nearly duplicated code and prepare for the following patch.

Cc: Jamal Hadi Salim <[hidden email]>
Acked-by: Jamal Hadi Salim <[hidden email]>
Signed-off-by: Cong Wang <[hidden email]>
Signed-off-by: David S. Miller <[hidden email]>
Signed-off-by: Kamal Mostafa <[hidden email]>
---
 include/net/sch_generic.h | 17 +++++++++++++++++
 net/sched/sch_cbq.c       |  7 +------
 net/sched/sch_drr.c       |  6 +-----
 net/sched/sch_dsmark.c    |  8 +-------
 net/sched/sch_hfsc.c      |  6 +-----
 net/sched/sch_htb.c       |  9 +--------
 net/sched/sch_multiq.c    |  8 +-------
 net/sched/sch_netem.c     | 10 +---------
 net/sched/sch_prio.c      |  8 +-------
 net/sched/sch_qfq.c       |  6 +-----
 net/sched/sch_red.c       |  7 +------
 net/sched/sch_sfb.c       |  7 +------
 net/sched/sch_tbf.c       |  8 +-------
 13 files changed, 29 insertions(+), 78 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index c605d30..9449d4f 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -688,6 +688,23 @@ static inline void qdisc_reset_queue(struct Qdisc *sch)
  sch->qstats.backlog = 0;
 }
 
+static inline struct Qdisc *qdisc_replace(struct Qdisc *sch, struct Qdisc *new,
+  struct Qdisc **pold)
+{
+ struct Qdisc *old;
+
+ sch_tree_lock(sch);
+ old = *pold;
+ *pold = new;
+ if (old != NULL) {
+ qdisc_tree_decrease_qlen(old, old->q.qlen);
+ qdisc_reset(old);
+ }
+ sch_tree_unlock(sch);
+
+ return old;
+}
+
 static inline unsigned int __qdisc_queue_drop(struct Qdisc *sch,
       struct sk_buff_head *list)
 {
diff --git a/net/sched/sch_cbq.c b/net/sched/sch_cbq.c
index beeb75f..17ad79e 100644
--- a/net/sched/sch_cbq.c
+++ b/net/sched/sch_cbq.c
@@ -1624,13 +1624,8 @@ static int cbq_graft(struct Qdisc *sch, unsigned long arg, struct Qdisc *new,
  new->reshape_fail = cbq_reshape_fail;
 #endif
  }
- sch_tree_lock(sch);
- *old = cl->q;
- cl->q = new;
- qdisc_tree_decrease_qlen(*old, (*old)->q.qlen);
- qdisc_reset(*old);
- sch_tree_unlock(sch);
 
+ *old = qdisc_replace(sch, new, &cl->q);
  return 0;
 }
 
diff --git a/net/sched/sch_drr.c b/net/sched/sch_drr.c
index 3387060..d4b3f82 100644
--- a/net/sched/sch_drr.c
+++ b/net/sched/sch_drr.c
@@ -226,11 +226,7 @@ static int drr_graft_class(struct Qdisc *sch, unsigned long arg,
  new = &noop_qdisc;
  }
 
- sch_tree_lock(sch);
- drr_purge_queue(cl);
- *old = cl->qdisc;
- cl->qdisc = new;
- sch_tree_unlock(sch);
+ *old = qdisc_replace(sch, new, &cl->qdisc);
  return 0;
 }
 
diff --git a/net/sched/sch_dsmark.c b/net/sched/sch_dsmark.c
index 227114f..11f9530 100644
--- a/net/sched/sch_dsmark.c
+++ b/net/sched/sch_dsmark.c
@@ -67,13 +67,7 @@ static int dsmark_graft(struct Qdisc *sch, unsigned long arg,
  new = &noop_qdisc;
  }
 
- sch_tree_lock(sch);
- *old = p->q;
- p->q = new;
- qdisc_tree_decrease_qlen(*old, (*old)->q.qlen);
- qdisc_reset(*old);
- sch_tree_unlock(sch);
-
+ *old = qdisc_replace(sch, new, &p->q);
  return 0;
 }
 
diff --git a/net/sched/sch_hfsc.c b/net/sched/sch_hfsc.c
index e6c7416..134f7d2 100644
--- a/net/sched/sch_hfsc.c
+++ b/net/sched/sch_hfsc.c
@@ -1215,11 +1215,7 @@ hfsc_graft_class(struct Qdisc *sch, unsigned long arg, struct Qdisc *new,
  new = &noop_qdisc;
  }
 
- sch_tree_lock(sch);
- hfsc_purge_queue(sch, cl);
- *old = cl->qdisc;
- cl->qdisc = new;
- sch_tree_unlock(sch);
+ *old = qdisc_replace(sch, new, &cl->qdisc);
  return 0;
 }
 
diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index f1acb0f..520ffe9 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -1165,14 +1165,7 @@ static int htb_graft(struct Qdisc *sch, unsigned long arg, struct Qdisc *new,
      cl->common.classid)) == NULL)
  return -ENOBUFS;
 
- sch_tree_lock(sch);
- *old = cl->un.leaf.q;
- cl->un.leaf.q = new;
- if (*old != NULL) {
- qdisc_tree_decrease_qlen(*old, (*old)->q.qlen);
- qdisc_reset(*old);
- }
- sch_tree_unlock(sch);
+ *old = qdisc_replace(sch, new, &cl->un.leaf.q);
  return 0;
 }
 
diff --git a/net/sched/sch_multiq.c b/net/sched/sch_multiq.c
index 42dd218..f36ff83 100644
--- a/net/sched/sch_multiq.c
+++ b/net/sched/sch_multiq.c
@@ -303,13 +303,7 @@ static int multiq_graft(struct Qdisc *sch, unsigned long arg, struct Qdisc *new,
  if (new == NULL)
  new = &noop_qdisc;
 
- sch_tree_lock(sch);
- *old = q->queues[band];
- q->queues[band] = new;
- qdisc_tree_decrease_qlen(*old, (*old)->q.qlen);
- qdisc_reset(*old);
- sch_tree_unlock(sch);
-
+ *old = qdisc_replace(sch, new, &q->queues[band]);
  return 0;
 }
 
diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index 179f1c8..dc44993 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -1036,15 +1036,7 @@ static int netem_graft(struct Qdisc *sch, unsigned long arg, struct Qdisc *new,
 {
  struct netem_sched_data *q = qdisc_priv(sch);
 
- sch_tree_lock(sch);
- *old = q->qdisc;
- q->qdisc = new;
- if (*old) {
- qdisc_tree_decrease_qlen(*old, (*old)->q.qlen);
- qdisc_reset(*old);
- }
- sch_tree_unlock(sch);
-
+ *old = qdisc_replace(sch, new, &q->qdisc);
  return 0;
 }
 
diff --git a/net/sched/sch_prio.c b/net/sched/sch_prio.c
index 8e5cd34..a677f54 100644
--- a/net/sched/sch_prio.c
+++ b/net/sched/sch_prio.c
@@ -268,13 +268,7 @@ static int prio_graft(struct Qdisc *sch, unsigned long arg, struct Qdisc *new,
  if (new == NULL)
  new = &noop_qdisc;
 
- sch_tree_lock(sch);
- *old = q->queues[band];
- q->queues[band] = new;
- qdisc_tree_decrease_qlen(*old, (*old)->q.qlen);
- qdisc_reset(*old);
- sch_tree_unlock(sch);
-
+ *old = qdisc_replace(sch, new, &q->queues[band]);
  return 0;
 }
 
diff --git a/net/sched/sch_qfq.c b/net/sched/sch_qfq.c
index 3ec7e88..6fea320 100644
--- a/net/sched/sch_qfq.c
+++ b/net/sched/sch_qfq.c
@@ -619,11 +619,7 @@ static int qfq_graft_class(struct Qdisc *sch, unsigned long arg,
  new = &noop_qdisc;
  }
 
- sch_tree_lock(sch);
- qfq_purge_queue(cl);
- *old = cl->qdisc;
- cl->qdisc = new;
- sch_tree_unlock(sch);
+ *old = qdisc_replace(sch, new, &cl->qdisc);
  return 0;
 }
 
diff --git a/net/sched/sch_red.c b/net/sched/sch_red.c
index 6c0534c..d5abcee 100644
--- a/net/sched/sch_red.c
+++ b/net/sched/sch_red.c
@@ -313,12 +313,7 @@ static int red_graft(struct Qdisc *sch, unsigned long arg, struct Qdisc *new,
  if (new == NULL)
  new = &noop_qdisc;
 
- sch_tree_lock(sch);
- *old = q->qdisc;
- q->qdisc = new;
- qdisc_tree_decrease_qlen(*old, (*old)->q.qlen);
- qdisc_reset(*old);
- sch_tree_unlock(sch);
+ *old = qdisc_replace(sch, new, &q->qdisc);
  return 0;
 }
 
diff --git a/net/sched/sch_sfb.c b/net/sched/sch_sfb.c
index 5819dd8..fa7f0a3 100644
--- a/net/sched/sch_sfb.c
+++ b/net/sched/sch_sfb.c
@@ -614,12 +614,7 @@ static int sfb_graft(struct Qdisc *sch, unsigned long arg, struct Qdisc *new,
  if (new == NULL)
  new = &noop_qdisc;
 
- sch_tree_lock(sch);
- *old = q->qdisc;
- q->qdisc = new;
- qdisc_tree_decrease_qlen(*old, (*old)->q.qlen);
- qdisc_reset(*old);
- sch_tree_unlock(sch);
+ *old = qdisc_replace(sch, new, &q->qdisc);
  return 0;
 }
 
diff --git a/net/sched/sch_tbf.c b/net/sched/sch_tbf.c
index a4afde1..56a1aef 100644
--- a/net/sched/sch_tbf.c
+++ b/net/sched/sch_tbf.c
@@ -502,13 +502,7 @@ static int tbf_graft(struct Qdisc *sch, unsigned long arg, struct Qdisc *new,
  if (new == NULL)
  new = &noop_qdisc;
 
- sch_tree_lock(sch);
- *old = q->qdisc;
- q->qdisc = new;
- qdisc_tree_decrease_qlen(*old, (*old)->q.qlen);
- qdisc_reset(*old);
- sch_tree_unlock(sch);
-
+ *old = qdisc_replace(sch, new, &q->qdisc);
  return 0;
 }
 
--
2.7.4

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH 3.19.y-ckt 26/40] route: do not cache fib route info on local routes with oif

Kamal Mostafa
In reply to this post by Kamal Mostafa
3.19.8-ckt22 -stable review patch.  If anyone has any objections, please let me know.

---8<------------------------------------------------------------

From: Chris Friesen <[hidden email]>

[ Upstream commit d6d5e999e5df67f8ec20b6be45e2229455ee3699 ]

For local routes that require a particular output interface we do not want
to cache the result.  Caching the result causes incorrect behaviour when
there are multiple source addresses on the interface.  The end result
being that if the intended recipient is waiting on that interface for the
packet he won't receive it because it will be delivered on the loopback
interface and the IP_PKTINFO ipi_ifindex will be set to the loopback
interface as well.

This can be tested by running a program such as "dhcp_release" which
attempts to inject a packet on a particular interface so that it is
received by another program on the same board.  The receiving process
should see an IP_PKTINFO ipi_ifndex value of the source interface
(e.g., eth1) instead of the loopback interface (e.g., lo).  The packet
will still appear on the loopback interface in tcpdump but the important
aspect is that the CMSG info is correct.

Sample dhcp_release command line:

   dhcp_release eth1 192.168.204.222 02:11:33:22:44:66

Signed-off-by: Allain Legacy <[hidden email]>
Signed off-by: Chris Friesen <[hidden email]>
Reviewed-by: Julian Anastasov <[hidden email]>
Signed-off-by: David S. Miller <[hidden email]>
Signed-off-by: Kamal Mostafa <[hidden email]>
---
 net/ipv4/route.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index e7e7cd8..5de9acd 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1964,6 +1964,18 @@ static struct rtable *__mkroute_output(const struct fib_result *res,
  */
  if (fi && res->prefixlen < 4)
  fi = NULL;
+ } else if ((type == RTN_LOCAL) && (orig_oif != 0) &&
+   (orig_oif != dev_out->ifindex)) {
+ /* For local routes that require a particular output interface
+ * we do not want to cache the result.  Caching the result
+ * causes incorrect behaviour when there are multiple source
+ * addresses on the interface, the end result being that if the
+ * intended recipient is waiting on that interface for the
+ * packet he won't receive it because it will be delivered on
+ * the loopback interface and the IP_PKTINFO ipi_ifindex will
+ * be set to the loopback interface as well.
+ */
+ fi = NULL;
  }
 
  fnhe = NULL;
--
2.7.4

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH 3.19.y-ckt 30/40] bpf: fix double-fdput in replace_map_fd_with_map_ptr()

Kamal Mostafa
In reply to this post by Kamal Mostafa
3.19.8-ckt22 -stable review patch.  If anyone has any objections, please let me know.

---8<------------------------------------------------------------

From: Jann Horn <[hidden email]>

[ Upstream commit 8358b02bf67d3a5d8a825070e1aa73f25fb2e4c7 ]

When bpf(BPF_PROG_LOAD, ...) was invoked with a BPF program whose bytecode
references a non-map file descriptor as a map file descriptor, the error
handling code called fdput() twice instead of once (in __bpf_map_get() and
in replace_map_fd_with_map_ptr()). If the file descriptor table of the
current task is shared, this causes f_count to be decremented too much,
allowing the struct file to be freed while it is still in use
(use-after-free). This can be exploited to gain root privileges by an
unprivileged user.

This bug was introduced in
commit 0246e64d9a5f ("bpf: handle pseudo BPF_LD_IMM64 insn"), but is only
exploitable since
commit 1be7f75d1668 ("bpf: enable non-root eBPF programs") because
previously, CAP_SYS_ADMIN was required to reach the vulnerable code.

(posted publicly according to request by maintainer)

Signed-off-by: Jann Horn <[hidden email]>
Signed-off-by: Linus Torvalds <[hidden email]>
Acked-by: Alexei Starovoitov <[hidden email]>
Acked-by: Daniel Borkmann <[hidden email]>
Signed-off-by: David S. Miller <[hidden email]>
Signed-off-by: Kamal Mostafa <[hidden email]>
---
 kernel/bpf/verifier.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 0d2cbc9..45fdf5c 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1817,7 +1817,6 @@ static int replace_map_fd_with_map_ptr(struct verifier_env *env)
  if (IS_ERR(map)) {
  verbose("fd %d is not pointing to valid bpf_map\n",
  insn->imm);
- fdput(f);
  return PTR_ERR(map);
  }
 
--
2.7.4

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH 3.19.y-ckt 28/40] vlan: pull on __vlan_insert_tag error path and fix csum correction

Kamal Mostafa
In reply to this post by Kamal Mostafa
3.19.8-ckt22 -stable review patch.  If anyone has any objections, please let me know.

---8<------------------------------------------------------------

From: Daniel Borkmann <[hidden email]>

[ Upstream commit 9241e2df4fbc648a92ea0752918e05c26255649e ]

When __vlan_insert_tag() fails from skb_vlan_push() path due to the
skb_cow_head(), we need to undo the __skb_push() in the error path
as well that was done earlier to move skb->data pointer to mac header.

Moreover, I noticed that when in the non-error path the __skb_pull()
is done and the original offset to mac header was non-zero, we fixup
from a wrong skb->data offset in the checksum complete processing.

So the skb_postpush_rcsum() really needs to be done before __skb_pull()
where skb->data still points to the mac header start and thus operates
under the same conditions as in __vlan_insert_tag().

Fixes: 93515d53b133 ("net: move vlan pop/push functions into common code")
Signed-off-by: Daniel Borkmann <[hidden email]>
Reviewed-by: Jiri Pirko <[hidden email]>
Signed-off-by: David S. Miller <[hidden email]>
Signed-off-by: Kamal Mostafa <[hidden email]>
---
 net/core/skbuff.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index b987b6c..9afbb1b 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -4356,13 +4356,16 @@ int skb_vlan_push(struct sk_buff *skb, __be16 vlan_proto, u16 vlan_tci)
  __skb_push(skb, offset);
  err = __vlan_insert_tag(skb, skb->vlan_proto,
  vlan_tx_tag_get(skb));
- if (err)
+ if (err) {
+ __skb_pull(skb, offset);
  return err;
+ }
+
  skb->protocol = skb->vlan_proto;
  skb->mac_len += VLAN_HLEN;
- __skb_pull(skb, offset);
 
  skb_postpush_rcsum(skb, skb->data + (2 * ETH_ALEN), VLAN_HLEN);
+ __skb_pull(skb, offset);
  }
  __vlan_hwaccel_put_tag(skb, vlan_proto, vlan_tci);
  return 0;
--
2.7.4

123
Loading...