[PATCH 4.4 00/74] 4.4.5-stable review

classic Classic list List threaded Threaded
86 messages Options
12345
Reply | Threaded
Open this post in threaded view
|

[PATCH 4.4 38/74] pata-rb532-cf: get rid of the irq_to_gpio() call

Greg KH-4
4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Gabor Juhos <[hidden email]>

commit 018361767a21fb2d5ebd3ac182c04baf8a8b4e08 upstream.

The RB532 platform specific irq_to_gpio() implementation has been
removed with commit 832f5dacfa0b ("MIPS: Remove all the uses of
custom gpio.h"). Now the platform uses the generic stub which causes
the following error:

  pata-rb532-cf pata-rb532-cf: no GPIO found for irq149
  pata-rb532-cf: probe of pata-rb532-cf failed with error -2

Drop the irq_to_gpio() call and get the GPIO number from platform
data instead. After this change, the driver works again:

  scsi host0: pata-rb532-cf
  ata1: PATA max PIO4 irq 149
  ata1.00: CFA: CF 1GB, 20080820, max MWDMA4
  ata1.00: 1989792 sectors, multi 0: LBA
  ata1.00: configured for PIO4
  scsi 0:0:0:0: Direct-Access     ATA      CF 1GB           0820 PQ: 0\
  ANSI: 5
  sd 0:0:0:0: [sda] 1989792 512-byte logical blocks: (1.01 GB/971 MiB)
  sd 0:0:0:0: [sda] Write Protect is off
  sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't\
  support DPO or FUA
   sda: sda1 sda2
  sd 0:0:0:0: [sda] Attached SCSI disk

Fixes: 832f5dacfa0b ("MIPS: Remove all the uses of custom gpio.h")
Cc: Alban Bedel <[hidden email]>
Cc: Ralf Baechle <[hidden email]>
Cc: Arnd Bergmann <[hidden email]>
Signed-off-by: Gabor Juhos <[hidden email]>
Signed-off-by: Tejun Heo <[hidden email]>
Signed-off-by: Greg Kroah-Hartman <[hidden email]>

---
 drivers/ata/pata_rb532_cf.c |   11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

--- a/drivers/ata/pata_rb532_cf.c
+++ b/drivers/ata/pata_rb532_cf.c
@@ -32,6 +32,8 @@
 #include <linux/libata.h>
 #include <scsi/scsi_host.h>
 
+#include <asm/mach-rc32434/rb.h>
+
 #define DRV_NAME "pata-rb532-cf"
 #define DRV_VERSION "0.1.0"
 #define DRV_DESC "PATA driver for RouterBOARD 532 Compact Flash"
@@ -107,6 +109,7 @@ static int rb532_pata_driver_probe(struc
  int gpio;
  struct resource *res;
  struct ata_host *ah;
+ struct cf_device *pdata;
  struct rb532_cf_info *info;
  int ret;
 
@@ -122,7 +125,13 @@ static int rb532_pata_driver_probe(struc
  return -ENOENT;
  }
 
- gpio = irq_to_gpio(irq);
+ pdata = dev_get_platdata(&pdev->dev);
+ if (!pdata) {
+ dev_err(&pdev->dev, "no platform data specified\n");
+ return -EINVAL;
+ }
+
+ gpio = pdata->gpio_pin;
  if (gpio < 0) {
  dev_err(&pdev->dev, "no GPIO found for irq%d\n", irq);
  return -ENOENT;


Reply | Threaded
Open this post in threaded view
|

[PATCH 4.4 39/74] Btrfs: fix loading of orphan roots leading to BUG_ON

Greg KH-4
In reply to this post by Greg KH-4
4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Filipe Manana <[hidden email]>

commit 909c3a22da3b8d2cfd3505ca5658f0176859d400 upstream.

When looking for orphan roots during mount we can end up hitting a
BUG_ON() (at root-item.c:btrfs_find_orphan_roots()) if a log tree is
replayed and qgroups are enabled. This is because after a log tree is
replayed, a transaction commit is made, which triggers qgroup extent
accounting which in turn does backref walking which ends up reading and
inserting all roots in the radix tree fs_info->fs_root_radix, including
orphan roots (deleted snapshots). So after the log tree is replayed, when
finding orphan roots we hit the BUG_ON with the following trace:

[118209.182438] ------------[ cut here ]------------
[118209.183279] kernel BUG at fs/btrfs/root-tree.c:314!
[118209.184074] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[118209.185123] Modules linked in: btrfs dm_flakey dm_mod crc32c_generic ppdev xor raid6_pq evdev sg parport_pc parport acpi_cpufreq tpm_tis tpm psmouse
processor i2c_piix4 serio_raw pcspkr i2c_core button loop autofs4 ext4 crc16 mbcache jbd2 sd_mod sr_mod cdrom ata_generic virtio_scsi ata_piix libata
virtio_pci virtio_ring virtio scsi_mod e1000 floppy [last unloaded: btrfs]
[118209.186318] CPU: 14 PID: 28428 Comm: mount Tainted: G        W       4.5.0-rc5-btrfs-next-24+ #1
[118209.186318] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014
[118209.186318] task: ffff8801ec131040 ti: ffff8800af34c000 task.ti: ffff8800af34c000
[118209.186318] RIP: 0010:[<ffffffffa04237d7>]  [<ffffffffa04237d7>] btrfs_find_orphan_roots+0x1fc/0x244 [btrfs]
[118209.186318] RSP: 0018:ffff8800af34faa8  EFLAGS: 00010246
[118209.186318] RAX: 00000000ffffffef RBX: 00000000ffffffef RCX: 0000000000000001
[118209.186318] RDX: 0000000080000000 RSI: 0000000000000001 RDI: 00000000ffffffff
[118209.186318] RBP: ffff8800af34fb08 R08: 0000000000000001 R09: 0000000000000000
[118209.186318] R10: ffff8800af34f9f0 R11: 6db6db6db6db6db7 R12: ffff880171b97000
[118209.186318] R13: ffff8801ca9d65e0 R14: ffff8800afa2e000 R15: 0000160000000000
[118209.186318] FS:  00007f5bcb914840(0000) GS:ffff88023edc0000(0000) knlGS:0000000000000000
[118209.186318] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[118209.186318] CR2: 00007f5bcaceb5d9 CR3: 00000000b49b5000 CR4: 00000000000006e0
[118209.186318] Stack:
[118209.186318]  fffffbffffffffff 010230ffffffffff 0101000000000000 ff84000000000000
[118209.186318]  fbffffffffffffff 30ffffffffffffff 0000000000000101 ffff880082348000
[118209.186318]  0000000000000000 ffff8800afa2e000 ffff8800afa2e000 0000000000000000
[118209.186318] Call Trace:
[118209.186318]  [<ffffffffa042e2db>] open_ctree+0x1e37/0x21b9 [btrfs]
[118209.186318]  [<ffffffffa040a753>] btrfs_mount+0x97e/0xaed [btrfs]
[118209.186318]  [<ffffffff8108e1c0>] ? trace_hardirqs_on+0xd/0xf
[118209.186318]  [<ffffffff8117b87e>] mount_fs+0x67/0x131
[118209.186318]  [<ffffffff81192d2b>] vfs_kern_mount+0x6c/0xde
[118209.186318]  [<ffffffffa0409f81>] btrfs_mount+0x1ac/0xaed [btrfs]
[118209.186318]  [<ffffffff8108e1c0>] ? trace_hardirqs_on+0xd/0xf
[118209.186318]  [<ffffffff8108c26b>] ? lockdep_init_map+0xb9/0x1b3
[118209.186318]  [<ffffffff8117b87e>] mount_fs+0x67/0x131
[118209.186318]  [<ffffffff81192d2b>] vfs_kern_mount+0x6c/0xde
[118209.186318]  [<ffffffff81195637>] do_mount+0x8a6/0x9e8
[118209.186318]  [<ffffffff8119598d>] SyS_mount+0x77/0x9f
[118209.186318]  [<ffffffff81493017>] entry_SYSCALL_64_fastpath+0x12/0x6b
[118209.186318] Code: 64 00 00 85 c0 89 c3 75 24 f0 41 80 4c 24 20 20 49 8b bc 24 f0 01 00 00 4c 89 e6 e8 e8 65 00 00 85 c0 89 c3 74 11 83 f8 ef 75 02 <0f> 0b
4c 89 e7 e8 da 72 00 00 eb 1c 41 83 bc 24 00 01 00 00 00
[118209.186318] RIP  [<ffffffffa04237d7>] btrfs_find_orphan_roots+0x1fc/0x244 [btrfs]
[118209.186318]  RSP <ffff8800af34faa8>
[118209.230735] ---[ end trace 83938f987d85d477 ]---

So fix this by not treating the error -EEXIST, returned when attempting
to insert a root already inserted by the backref walking code, as an error.

The following test case for xfstests reproduces the bug:

  seq=`basename $0`
  seqres=$RESULT_DIR/$seq
  echo "QA output created by $seq"
  tmp=/tmp/$$
  status=1 # failure is the default!
  trap "_cleanup; exit \$status" 0 1 2 3 15

  _cleanup()
  {
      _cleanup_flakey
      cd /
      rm -f $tmp.*
  }

  # get standard environment, filters and checks
  . ./common/rc
  . ./common/filter
  . ./common/dmflakey

  # real QA test starts here
  _supported_fs btrfs
  _supported_os Linux
  _require_scratch
  _require_dm_target flakey
  _require_metadata_journaling $SCRATCH_DEV

  rm -f $seqres.full

  _scratch_mkfs >>$seqres.full 2>&1
  _init_flakey
  _mount_flakey

  _run_btrfs_util_prog quota enable $SCRATCH_MNT

  # Create 2 directories with one file in one of them.
  # We use these just to trigger a transaction commit later, moving the file from
  # directory a to directory b and doing an fsync against directory a.
  mkdir $SCRATCH_MNT/a
  mkdir $SCRATCH_MNT/b
  touch $SCRATCH_MNT/a/f
  sync

  # Create our test file with 2 4K extents.
  $XFS_IO_PROG -f -s -c "pwrite -S 0xaa 0 8K" $SCRATCH_MNT/foobar | _filter_xfs_io

  # Create a snapshot and delete it. This doesn't really delete the snapshot
  # immediately, just makes it inaccessible and invisible to user space, the
  # snapshot is deleted later by a dedicated kernel thread (cleaner kthread)
  # which is woke up at the next transaction commit.
  # A root orphan item is inserted into the tree of tree roots, so that if a
  # power failure happens before the dedicated kernel thread does the snapshot
  # deletion, the next time the filesystem is mounted it resumes the snapshot
  # deletion.
  _run_btrfs_util_prog subvolume snapshot $SCRATCH_MNT $SCRATCH_MNT/snap
  _run_btrfs_util_prog subvolume delete $SCRATCH_MNT/snap

  # Now overwrite half of the extents we wrote before. Because we made a snapshpot
  # before, which isn't really deleted yet (since no transaction commit happened
  # after we did the snapshot delete request), the non overwritten extents get
  # referenced twice, once by the default subvolume and once by the snapshot.
  $XFS_IO_PROG -c "pwrite -S 0xbb 4K 8K" $SCRATCH_MNT/foobar | _filter_xfs_io

  # Now move file f from directory a to directory b and fsync directory a.
  # The fsync on the directory a triggers a transaction commit (because a file
  # was moved from it to another directory) and the file fsync leaves a log tree
  # with file extent items to replay.
  mv $SCRATCH_MNT/a/f $SCRATCH_MNT/a/b
  $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/a
  $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/foobar

  echo "File digest before power failure:"
  md5sum $SCRATCH_MNT/foobar | _filter_scratch

  # Now simulate a power failure and mount the filesystem to replay the log tree.
  # After the log tree was replayed, we used to hit a BUG_ON() when processing
  # the root orphan item for the deleted snapshot. This is because when processing
  # an orphan root the code expected to be the first code inserting the root into
  # the fs_info->fs_root_radix radix tree, while in reallity it was the second
  # caller attempting to do it - the first caller was the transaction commit that
  # took place after replaying the log tree, when updating the qgroup counters.
  _flakey_drop_and_remount

  echo "File digest before after failure:"
  # Must match what he got before the power failure.
  md5sum $SCRATCH_MNT/foobar | _filter_scratch

  _unmount_flakey
  status=0
  exit

Fixes: 2d9e97761087 ("Btrfs: use btrfs_get_fs_root in resolve_indirect_ref")
Signed-off-by: Filipe Manana <[hidden email]>
Reviewed-by: Qu Wenruo <[hidden email]>
Signed-off-by: Chris Mason <[hidden email]>
Signed-off-by: Greg Kroah-Hartman <[hidden email]>

diff --git a/fs/btrfs/root-tree.c b/fs/btrfs/root-tree.c
index 7cf8509deda7..2c849b08a91b 100644
--- a/fs/btrfs/root-tree.c
+++ b/fs/btrfs/root-tree.c
@@ -310,8 +310,16 @@ int btrfs_find_orphan_roots(struct btrfs_root *tree_root)
  set_bit(BTRFS_ROOT_ORPHAN_ITEM_INSERTED, &root->state);
 
  err = btrfs_insert_fs_root(root->fs_info, root);
+ /*
+ * The root might have been inserted already, as before we look
+ * for orphan roots, log replay might have happened, which
+ * triggers a transaction commit and qgroup accounting, which
+ * in turn reads and inserts fs roots while doing backref
+ * walking.
+ */
+ if (err == -EEXIST)
+ err = 0;
  if (err) {
- BUG_ON(err == -EEXIST);
  btrfs_free_fs_root(root);
  break;
  }


Reply | Threaded
Open this post in threaded view
|

[PATCH 4.4 19/74] iommu/amd: Apply workaround for ATS write permission check

Greg KH-4
In reply to this post by Greg KH-4
4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Jay Cornwall <[hidden email]>

commit 358875fd52ab8f00f66328cbf1a1d2486f265829 upstream.

The AMD Family 15h Models 30h-3Fh (Kaveri) BIOS and Kernel Developer's
Guide omitted part of the BIOS IOMMU L2 register setup specification.
Without this setup the IOMMU L2 does not fully respect write permissions
when handling an ATS translation request.

The IOMMU L2 will set PTE dirty bit when handling an ATS translation with
write permission request, even when PTE RW bit is clear. This may occur by
direct translation (which would cause a PPR) or by prefetch request from
the ATC.

This is observed in practice when the IOMMU L2 modifies a PTE which maps a
pagecache page. The ext4 filesystem driver BUGs when asked to writeback
these (non-modified) pages.

Enable ATS write permission check in the Kaveri IOMMU L2 if BIOS has not.

Signed-off-by: Jay Cornwall <[hidden email]>
Signed-off-by: Joerg Roedel <[hidden email]>
Signed-off-by: Greg Kroah-Hartman <[hidden email]>

---
 drivers/iommu/amd_iommu_init.c |   29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -1016,6 +1016,34 @@ static void amd_iommu_erratum_746_workar
 }
 
 /*
+ * Family15h Model 30h-3fh (IOMMU Mishandles ATS Write Permission)
+ * Workaround:
+ *     BIOS should enable ATS write permission check by setting
+ *     L2_DEBUG_3[AtsIgnoreIWDis](D0F2xF4_x47[0]) = 1b
+ */
+static void amd_iommu_ats_write_check_workaround(struct amd_iommu *iommu)
+{
+ u32 value;
+
+ if ((boot_cpu_data.x86 != 0x15) ||
+    (boot_cpu_data.x86_model < 0x30) ||
+    (boot_cpu_data.x86_model > 0x3f))
+ return;
+
+ /* Test L2_DEBUG_3[AtsIgnoreIWDis] == 1 */
+ value = iommu_read_l2(iommu, 0x47);
+
+ if (value & BIT(0))
+ return;
+
+ /* Set L2_DEBUG_3[AtsIgnoreIWDis] = 1 */
+ iommu_write_l2(iommu, 0x47, value | BIT(0));
+
+ pr_info("AMD-Vi: Applying ATS write check workaround for IOMMU at %s\n",
+ dev_name(&iommu->dev->dev));
+}
+
+/*
  * This function clues the initialization function for one IOMMU
  * together and also allocates the command buffer and programs the
  * hardware. It does NOT enable the IOMMU. This is done afterwards.
@@ -1284,6 +1312,7 @@ static int iommu_init_pci(struct amd_iom
  }
 
  amd_iommu_erratum_746_workaround(iommu);
+ amd_iommu_ats_write_check_workaround(iommu);
 
  iommu->iommu_dev = iommu_device_create(&iommu->dev->dev, iommu,
        amd_iommu_groups, "ivhd%d",


Reply | Threaded
Open this post in threaded view
|

[PATCH 4.4 37/74] tracing: Do not have comm filter override event comm field

Greg KH-4
In reply to this post by Greg KH-4
4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Steven Rostedt (Red Hat) <[hidden email]>

commit e57cbaf0eb006eaa207395f3bfd7ce52c1b5539c upstream.

Commit 9f61668073a8d "tracing: Allow triggers to filter for CPU ids and
process names" added a 'comm' filter that will filter events based on the
current tasks struct 'comm'. But this now hides the ability to filter events
that have a 'comm' field too. For example, sched_migrate_task trace event.
That has a 'comm' field of the task to be migrated.

 echo 'comm == "bash"' > events/sched_migrate_task/filter

will now filter all sched_migrate_task events for tasks named "bash" that
migrates other tasks (in interrupt context), instead of seeing when "bash"
itself gets migrated.

This fix requires a couple of changes.

1) Change the look up order for filter predicates to look at the events
   fields before looking at the generic filters.

2) Instead of basing the filter function off of the "comm" name, have the
   generic "comm" filter have its own filter_type (FILTER_COMM). Test
   against the type instead of the name to assign the filter function.

3) Add a new "COMM" filter that works just like "comm" but will filter based
   on the current task, even if the trace event contains a "comm" field.

Do the same for "cpu" field, adding a FILTER_CPU and a filter "CPU".

Fixes: 9f61668073a8d "tracing: Allow triggers to filter for CPU ids and process names"
Reported-by: Matt Fleming <[hidden email]>
Signed-off-by: Steven Rostedt <[hidden email]>
Signed-off-by: Greg Kroah-Hartman <[hidden email]>

diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index 429fdfc3baf5..925730bc9fc1 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -568,6 +568,8 @@ enum {
  FILTER_DYN_STRING,
  FILTER_PTR_STRING,
  FILTER_TRACE_FN,
+ FILTER_COMM,
+ FILTER_CPU,
 };
 
 extern int trace_event_raw_init(struct trace_event_call *call);
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index ab09829d3b97..05ddc0820771 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -97,16 +97,16 @@ trace_find_event_field(struct trace_event_call *call, char *name)
  struct ftrace_event_field *field;
  struct list_head *head;
 
- field = __find_event_field(&ftrace_generic_fields, name);
+ head = trace_get_fields(call);
+ field = __find_event_field(head, name);
  if (field)
  return field;
 
- field = __find_event_field(&ftrace_common_fields, name);
+ field = __find_event_field(&ftrace_generic_fields, name);
  if (field)
  return field;
 
- head = trace_get_fields(call);
- return __find_event_field(head, name);
+ return __find_event_field(&ftrace_common_fields, name);
 }
 
 static int __trace_define_field(struct list_head *head, const char *type,
@@ -171,8 +171,10 @@ static int trace_define_generic_fields(void)
 {
  int ret;
 
- __generic_field(int, cpu, FILTER_OTHER);
- __generic_field(char *, comm, FILTER_PTR_STRING);
+ __generic_field(int, CPU, FILTER_CPU);
+ __generic_field(int, cpu, FILTER_CPU);
+ __generic_field(char *, COMM, FILTER_COMM);
+ __generic_field(char *, comm, FILTER_COMM);
 
  return ret;
 }
diff --git a/kernel/trace/trace_events_filter.c b/kernel/trace/trace_events_filter.c
index f93a219b18da..6816302542b2 100644
--- a/kernel/trace/trace_events_filter.c
+++ b/kernel/trace/trace_events_filter.c
@@ -1043,13 +1043,14 @@ static int init_pred(struct filter_parse_state *ps,
  return -EINVAL;
  }
 
- if (is_string_field(field)) {
+ if (field->filter_type == FILTER_COMM) {
+ filter_build_regex(pred);
+ fn = filter_pred_comm;
+ pred->regex.field_len = TASK_COMM_LEN;
+ } else if (is_string_field(field)) {
  filter_build_regex(pred);
 
- if (!strcmp(field->name, "comm")) {
- fn = filter_pred_comm;
- pred->regex.field_len = TASK_COMM_LEN;
- } else if (field->filter_type == FILTER_STATIC_STRING) {
+ if (field->filter_type == FILTER_STATIC_STRING) {
  fn = filter_pred_string;
  pred->regex.field_len = field->size;
  } else if (field->filter_type == FILTER_DYN_STRING)
@@ -1072,7 +1073,7 @@ static int init_pred(struct filter_parse_state *ps,
  }
  pred->val = val;
 
- if (!strcmp(field->name, "cpu"))
+ if (field->filter_type == FILTER_CPU)
  fn = filter_pred_cpu;
  else
  fn = select_comparison_fn(pred->op, field->size,


Reply | Threaded
Open this post in threaded view
|

[PATCH 4.4 36/74] ata: ahci: dont mark HotPlugCapable Ports as external/removable

Greg KH-4
In reply to this post by Greg KH-4
4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Manuel Lauss <[hidden email]>

commit dc8b4afc4a04fac8ee55a19b59f2356a25e7e778 upstream.

The HPCP bit is set by bioses for on-board sata ports either because
they think sata is hotplug capable in general or to allow Windows
to display a "device eject" icon on ports which are routed to an
external connector bracket.

However in Redhat Bugzilla #1310682, users report that with kernel 4.4,
where this bit test first appeared, a lot of partitions on sata drives
are now mounted automatically.

This patch should fix redhat and a lot of other distros which
unconditionally automount all devices which have the "removable"
bit set.

Signed-off-by: Manuel Lauss <[hidden email]>
Signed-off-by: Tejun Heo <[hidden email]>
Fixes: 8a3e33cf92c7 ("ata: ahci: find eSATA ports and flag them as removable" changes userspace behavior)
Link: http://lkml.kernel.org/g/56CF35FA.1070500@...
Signed-off-by: Greg Kroah-Hartman <[hidden email]>

---
 drivers/ata/libahci.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

--- a/drivers/ata/libahci.c
+++ b/drivers/ata/libahci.c
@@ -1142,8 +1142,7 @@ static void ahci_port_init(struct device
 
  /* mark esata ports */
  tmp = readl(port_mmio + PORT_CMD);
- if ((tmp & PORT_CMD_HPCP) ||
-    ((tmp & PORT_CMD_ESP) && (hpriv->cap & HOST_CAP_SXS)))
+ if ((tmp & PORT_CMD_ESP) && (hpriv->cap & HOST_CAP_SXS))
  ap->pflags |= ATA_PFLAG_EXTERNAL;
 }
 


Reply | Threaded
Open this post in threaded view
|

[PATCH 4.4 34/74] arm64: vmemmap: use virtual projection of linear region

Greg KH-4
In reply to this post by Greg KH-4
4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Ard Biesheuvel <[hidden email]>

commit dfd55ad85e4a7fbaa82df12467515ac3c81e8a3e upstream.

Commit dd006da21646 ("arm64: mm: increase VA range of identity map") made
some changes to the memory mapping code to allow physical memory to reside
at an offset that exceeds the size of the virtual mapping.

However, since the size of the vmemmap area is proportional to the size of
the VA area, but it is populated relative to the physical space, we may
end up with the struct page array being mapped outside of the vmemmap
region. For instance, on my Seattle A0 box, I can see the following output
in the dmesg log.

   vmemmap : 0xffffffbdc0000000 - 0xffffffbfc0000000   (     8 GB maximum)
             0xffffffbfc0000000 - 0xffffffbfd0000000   (   256 MB actual)

We can fix this by deciding that the vmemmap region is not a projection of
the physical space, but of the virtual space above PAGE_OFFSET, i.e., the
linear region. This way, we are guaranteed that the vmemmap region is of
sufficient size, and we can even reduce the size by half.

Signed-off-by: Ard Biesheuvel <[hidden email]>
Signed-off-by: Will Deacon <[hidden email]>
Signed-off-by: Greg Kroah-Hartman <[hidden email]>

---
 arch/arm64/include/asm/pgtable.h |    7 ++++---
 arch/arm64/mm/init.c             |    4 ++--
 2 files changed, 6 insertions(+), 5 deletions(-)

--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -34,13 +34,13 @@
 /*
  * VMALLOC and SPARSEMEM_VMEMMAP ranges.
  *
- * VMEMAP_SIZE: allows the whole VA space to be covered by a struct page array
+ * VMEMAP_SIZE: allows the whole linear region to be covered by a struct page array
  * (rounded up to PUD_SIZE).
  * VMALLOC_START: beginning of the kernel VA space
  * VMALLOC_END: extends to the available space below vmmemmap, PCI I/O space,
  * fixed mappings and modules
  */
-#define VMEMMAP_SIZE ALIGN((1UL << (VA_BITS - PAGE_SHIFT)) * sizeof(struct page), PUD_SIZE)
+#define VMEMMAP_SIZE ALIGN((1UL << (VA_BITS - PAGE_SHIFT - 1)) * sizeof(struct page), PUD_SIZE)
 
 #ifndef CONFIG_KASAN
 #define VMALLOC_START (VA_START)
@@ -51,7 +51,8 @@
 
 #define VMALLOC_END (PAGE_OFFSET - PUD_SIZE - VMEMMAP_SIZE - SZ_64K)
 
-#define vmemmap ((struct page *)(VMALLOC_END + SZ_64K))
+#define VMEMMAP_START (VMALLOC_END + SZ_64K)
+#define vmemmap ((struct page *)VMEMMAP_START - (memstart_addr >> PAGE_SHIFT))
 
 #define FIRST_USER_ADDRESS 0UL
 
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -319,8 +319,8 @@ void __init mem_init(void)
 #endif
   MLG(VMALLOC_START, VMALLOC_END),
 #ifdef CONFIG_SPARSEMEM_VMEMMAP
-  MLG((unsigned long)vmemmap,
-      (unsigned long)vmemmap + VMEMMAP_SIZE),
+  MLG(VMEMMAP_START,
+      VMEMMAP_START + VMEMMAP_SIZE),
   MLM((unsigned long)virt_to_page(PAGE_OFFSET),
       (unsigned long)virt_to_page(high_memory)),
 #endif


Reply | Threaded
Open this post in threaded view
|

[PATCH 4.4 35/74] PM / sleep / x86: Fix crash on graph trace through x86 suspend

Greg KH-4
In reply to this post by Greg KH-4
4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Todd E Brandt <[hidden email]>

commit 92f9e179a702a6adbc11e2fedc76ecd6ffc9e3f7 upstream.

Pause/unpause graph tracing around do_suspend_lowlevel as it has
inconsistent call/return info after it jumps to the wakeup vector.
The graph trace buffer will otherwise become misaligned and
may eventually crash and hang on suspend.

To reproduce the issue and test the fix:
Run a function_graph trace over suspend/resume and set the graph
function to suspend_devices_and_enter. This consistently hangs the
system without this fix.

Signed-off-by: Todd Brandt <[hidden email]>
Signed-off-by: Rafael J. Wysocki <[hidden email]>
Signed-off-by: Greg Kroah-Hartman <[hidden email]>

---
 arch/x86/kernel/acpi/sleep.c |    7 +++++++
 1 file changed, 7 insertions(+)

--- a/arch/x86/kernel/acpi/sleep.c
+++ b/arch/x86/kernel/acpi/sleep.c
@@ -16,6 +16,7 @@
 #include <asm/cacheflush.h>
 #include <asm/realmode.h>
 
+#include <linux/ftrace.h>
 #include "../../realmode/rm/wakeup.h"
 #include "sleep.h"
 
@@ -107,7 +108,13 @@ int x86_acpi_suspend_lowlevel(void)
        saved_magic = 0x123456789abcdef0L;
 #endif /* CONFIG_64BIT */
 
+ /*
+ * Pause/unpause graph tracing around do_suspend_lowlevel as it has
+ * inconsistent call/return info after it jumps to the wakeup vector.
+ */
+ pause_graph_tracing();
  do_suspend_lowlevel();
+ unpause_graph_tracing();
  return 0;
 }
 


Reply | Threaded
Open this post in threaded view
|

[PATCH 4.4 33/74] Adding Intel Lewisburg device IDs for SATA

Greg KH-4
In reply to this post by Greg KH-4
4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Alexandra Yates <[hidden email]>

commit f5bdd66c705484b4bc77eb914be15c1b7881fae7 upstream.

This patch complements the list of device IDs previously
added for lewisburg sata.

Signed-off-by: Alexandra Yates <[hidden email]>
Signed-off-by: Tejun Heo <[hidden email]>
Signed-off-by: Greg Kroah-Hartman <[hidden email]>

---
 drivers/ata/ahci.c |    6 ++++++
 1 file changed, 6 insertions(+)

--- a/drivers/ata/ahci.c
+++ b/drivers/ata/ahci.c
@@ -367,15 +367,21 @@ static const struct pci_device_id ahci_p
  { PCI_VDEVICE(INTEL, 0xa107), board_ahci }, /* Sunrise Point-H RAID */
  { PCI_VDEVICE(INTEL, 0xa10f), board_ahci }, /* Sunrise Point-H RAID */
  { PCI_VDEVICE(INTEL, 0x2822), board_ahci }, /* Lewisburg RAID*/
+ { PCI_VDEVICE(INTEL, 0x2823), board_ahci }, /* Lewisburg AHCI*/
  { PCI_VDEVICE(INTEL, 0x2826), board_ahci }, /* Lewisburg RAID*/
+ { PCI_VDEVICE(INTEL, 0x2827), board_ahci }, /* Lewisburg RAID*/
  { PCI_VDEVICE(INTEL, 0xa182), board_ahci }, /* Lewisburg AHCI*/
  { PCI_VDEVICE(INTEL, 0xa184), board_ahci }, /* Lewisburg RAID*/
  { PCI_VDEVICE(INTEL, 0xa186), board_ahci }, /* Lewisburg RAID*/
  { PCI_VDEVICE(INTEL, 0xa18e), board_ahci }, /* Lewisburg RAID*/
+ { PCI_VDEVICE(INTEL, 0xa1d2), board_ahci }, /* Lewisburg RAID*/
+ { PCI_VDEVICE(INTEL, 0xa1d6), board_ahci }, /* Lewisburg RAID*/
  { PCI_VDEVICE(INTEL, 0xa202), board_ahci }, /* Lewisburg AHCI*/
  { PCI_VDEVICE(INTEL, 0xa204), board_ahci }, /* Lewisburg RAID*/
  { PCI_VDEVICE(INTEL, 0xa206), board_ahci }, /* Lewisburg RAID*/
  { PCI_VDEVICE(INTEL, 0xa20e), board_ahci }, /* Lewisburg RAID*/
+ { PCI_VDEVICE(INTEL, 0xa252), board_ahci }, /* Lewisburg RAID*/
+ { PCI_VDEVICE(INTEL, 0xa256), board_ahci }, /* Lewisburg RAID*/
 
  /* JMicron 360/1/3/5/6, match class to avoid IDE function */
  { PCI_VENDOR_ID_JMICRON, PCI_ANY_ID, PCI_ANY_ID, PCI_ANY_ID,


Reply | Threaded
Open this post in threaded view
|

[PATCH 4.4 31/74] block: bio: introduce helpers to get the 1st and last bvec

Greg KH-4
In reply to this post by Greg KH-4
4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Ming Lei <[hidden email]>

commit 7bcd79ac50d9d83350a835bdb91c04ac9e098412 upstream.

The bio passed to bio_will_gap() may be fast cloned from upper
layer(dm, md, bcache, fs, ...), or from bio splitting in block
core.

Unfortunately bio_will_gap() just figures out the last bvec via
'bi_io_vec[prev->bi_vcnt - 1]' directly, and this way is obviously
wrong.

This patch introduces two helpers for getting the first and last
bvec of one bio for fixing the issue.

Reported-by: Sagi Grimberg <[hidden email]>
Reviewed-by: Sagi Grimberg <[hidden email]>
Reviewed-by: Christoph Hellwig <[hidden email]>
Signed-off-by: Ming Lei <[hidden email]>
Signed-off-by: Jens Axboe <[hidden email]>
Signed-off-by: Greg Kroah-Hartman <[hidden email]>

---
 include/linux/bio.h |   37 +++++++++++++++++++++++++++++++++++++
 1 file changed, 37 insertions(+)

--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -310,6 +310,43 @@ static inline void bio_clear_flag(struct
  bio->bi_flags &= ~(1U << bit);
 }
 
+static inline void bio_get_first_bvec(struct bio *bio, struct bio_vec *bv)
+{
+ *bv = bio_iovec(bio);
+}
+
+static inline void bio_get_last_bvec(struct bio *bio, struct bio_vec *bv)
+{
+ struct bvec_iter iter = bio->bi_iter;
+ int idx;
+
+ if (!bio_flagged(bio, BIO_CLONED)) {
+ *bv = bio->bi_io_vec[bio->bi_vcnt - 1];
+ return;
+ }
+
+ if (unlikely(!bio_multiple_segments(bio))) {
+ *bv = bio_iovec(bio);
+ return;
+ }
+
+ bio_advance_iter(bio, &iter, iter.bi_size);
+
+ if (!iter.bi_bvec_done)
+ idx = iter.bi_idx - 1;
+ else /* in the middle of bvec */
+ idx = iter.bi_idx;
+
+ *bv = bio->bi_io_vec[idx];
+
+ /*
+ * iter.bi_bvec_done records actual length of the last bvec
+ * if this bio ends in the middle of one io vector
+ */
+ if (iter.bi_bvec_done)
+ bv->bv_len = iter.bi_bvec_done;
+}
+
 enum bip_flags {
  BIP_BLOCK_INTEGRITY = 1 << 0, /* block layer owns integrity data */
  BIP_MAPPED_INTEGRITY = 1 << 1, /* ref tag has been remapped */


Reply | Threaded
Open this post in threaded view
|

[PATCH 4.4 02/74] drivers: sh: Restore legacy clock domain on SuperH platforms

Greg KH-4
In reply to this post by Greg KH-4
4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Geert Uytterhoeven <[hidden email]>

commit 0378ba4899d5fbd8494ed6580cbc81d7b44dbac6 upstream.

CONFIG_ARCH_SHMOBILE is not only enabled for Renesas ARM platforms
(which are DT based and multi-platform), but also on a select set of
Renesas SuperH platforms (SH7722/SH7723/SH7724/SH7343/SH7366). Hence
since commit 0ba58de231066e47 ("drivers: sh: Get rid of
CONFIG_ARCH_SHMOBILE_MULTI"), the legacy clock domain is no longer
installed on these SuperH platforms, and module clocks may not be
enabled when needed, leading to driver failures.

To fix this, add an additional check for CONFIG_OF.

Fixes: 0ba58de231066e47 ("drivers: sh: Get rid of CONFIG_ARCH_SHMOBILE_MULTI").
Signed-off-by: Geert Uytterhoeven <[hidden email]>
Signed-off-by: Simon Horman <[hidden email]>
Signed-off-by: Greg Kroah-Hartman <[hidden email]>

---
 drivers/sh/pm_runtime.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/sh/pm_runtime.c
+++ b/drivers/sh/pm_runtime.c
@@ -34,7 +34,7 @@ static struct pm_clk_notifier_block plat
 
 static int __init sh_pm_runtime_init(void)
 {
- if (IS_ENABLED(CONFIG_ARCH_SHMOBILE)) {
+ if (IS_ENABLED(CONFIG_OF) && IS_ENABLED(CONFIG_ARCH_SHMOBILE)) {
  if (!of_find_compatible_node(NULL, NULL,
      "renesas,cpg-mstp-clocks"))
  return 0;


Reply | Threaded
Open this post in threaded view
|

[PATCH 4.4 32/74] writeback: flush inode cgroup wb switches instead of pinning super_block

Greg KH-4
In reply to this post by Greg KH-4
4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Tejun Heo <[hidden email]>

commit a1a0e23e49037c23ea84bc8cc146a03584d13577 upstream.

If cgroup writeback is in use, inodes can be scheduled for
asynchronous wb switching.  Before 5ff8eaac1636 ("writeback: keep
superblock pinned during cgroup writeback association switches"), this
could race with umount leading to super_block being destroyed while
inodes are pinned for wb switching.  5ff8eaac1636 fixed it by bumping
s_active while wb switches are in flight; however, this allowed
in-flight wb switches to make umounts asynchronous when the userland
expected synchronosity - e.g. fsck immediately following umount may
fail because the device is still busy.

This patch removes the problematic super_block pinning and instead
makes generic_shutdown_super() flush in-flight wb switches.  wb
switches are now executed on a dedicated isw_wq so that they can be
flushed and isw_nr_in_flight keeps track of the number of in-flight wb
switches so that flushing can be avoided in most cases.

v2: Move cgroup_writeback_umount() further below and add MS_ACTIVE
    check in inode_switch_wbs() as Jan an Al suggested.

Signed-off-by: Tejun Heo <[hidden email]>
Reported-by: Tahsin Erdogan <[hidden email]>
Cc: Jan Kara <[hidden email]>
Cc: Al Viro <[hidden email]>
Link: http://lkml.kernel.org/g/CAAeU0aNCq7LGODvVGRU-oU_o-6enii5ey0p1c26D1ZzYwkDc5A@...
Fixes: 5ff8eaac1636 ("writeback: keep superblock pinned during cgroup writeback association switches")
Reviewed-by: Jan Kara <[hidden email]>
Tested-by: Tahsin Erdogan <[hidden email]>
Signed-off-by: Jens Axboe <[hidden email]>
Signed-off-by: Greg Kroah-Hartman <[hidden email]>

---
 fs/fs-writeback.c         |   54 ++++++++++++++++++++++++++++++++++------------
 fs/super.c                |    1
 include/linux/writeback.h |    5 ++++
 3 files changed, 47 insertions(+), 13 deletions(-)

--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -223,6 +223,9 @@ static void wb_wait_for_completion(struc
 #define WB_FRN_HIST_MAX_SLOTS (WB_FRN_HIST_THR_SLOTS / 2 + 1)
  /* one round can affect upto 5 slots */
 
+static atomic_t isw_nr_in_flight = ATOMIC_INIT(0);
+static struct workqueue_struct *isw_wq;
+
 void __inode_attach_wb(struct inode *inode, struct page *page)
 {
  struct backing_dev_info *bdi = inode_to_bdi(inode);
@@ -317,7 +320,6 @@ static void inode_switch_wbs_work_fn(str
  struct inode_switch_wbs_context *isw =
  container_of(work, struct inode_switch_wbs_context, work);
  struct inode *inode = isw->inode;
- struct super_block *sb = inode->i_sb;
  struct address_space *mapping = inode->i_mapping;
  struct bdi_writeback *old_wb = inode->i_wb;
  struct bdi_writeback *new_wb = isw->new_wb;
@@ -424,8 +426,9 @@ skip_switch:
  wb_put(new_wb);
 
  iput(inode);
- deactivate_super(sb);
  kfree(isw);
+
+ atomic_dec(&isw_nr_in_flight);
 }
 
 static void inode_switch_wbs_rcu_fn(struct rcu_head *rcu_head)
@@ -435,7 +438,7 @@ static void inode_switch_wbs_rcu_fn(stru
 
  /* needs to grab bh-unsafe locks, bounce to work item */
  INIT_WORK(&isw->work, inode_switch_wbs_work_fn);
- schedule_work(&isw->work);
+ queue_work(isw_wq, &isw->work);
 }
 
 /**
@@ -471,20 +474,20 @@ static void inode_switch_wbs(struct inod
 
  /* while holding I_WB_SWITCH, no one else can update the association */
  spin_lock(&inode->i_lock);
-
- if (inode->i_state & (I_WB_SWITCH | I_FREEING) ||
-    inode_to_wb(inode) == isw->new_wb)
- goto out_unlock;
-
- if (!atomic_inc_not_zero(&inode->i_sb->s_active))
- goto out_unlock;
-
+ if (!(inode->i_sb->s_flags & MS_ACTIVE) ||
+    inode->i_state & (I_WB_SWITCH | I_FREEING) ||
+    inode_to_wb(inode) == isw->new_wb) {
+ spin_unlock(&inode->i_lock);
+ goto out_free;
+ }
  inode->i_state |= I_WB_SWITCH;
  spin_unlock(&inode->i_lock);
 
  ihold(inode);
  isw->inode = inode;
 
+ atomic_inc(&isw_nr_in_flight);
+
  /*
  * In addition to synchronizing among switchers, I_WB_SWITCH tells
  * the RCU protected stat update paths to grab the mapping's
@@ -494,8 +497,6 @@ static void inode_switch_wbs(struct inod
  call_rcu(&isw->rcu_head, inode_switch_wbs_rcu_fn);
  return;
 
-out_unlock:
- spin_unlock(&inode->i_lock);
 out_free:
  if (isw->new_wb)
  wb_put(isw->new_wb);
@@ -849,6 +850,33 @@ restart:
  wb_put(last_wb);
 }
 
+/**
+ * cgroup_writeback_umount - flush inode wb switches for umount
+ *
+ * This function is called when a super_block is about to be destroyed and
+ * flushes in-flight inode wb switches.  An inode wb switch goes through
+ * RCU and then workqueue, so the two need to be flushed in order to ensure
+ * that all previously scheduled switches are finished.  As wb switches are
+ * rare occurrences and synchronize_rcu() can take a while, perform
+ * flushing iff wb switches are in flight.
+ */
+void cgroup_writeback_umount(void)
+{
+ if (atomic_read(&isw_nr_in_flight)) {
+ synchronize_rcu();
+ flush_workqueue(isw_wq);
+ }
+}
+
+static int __init cgroup_writeback_init(void)
+{
+ isw_wq = alloc_workqueue("inode_switch_wbs", 0, 0);
+ if (!isw_wq)
+ return -ENOMEM;
+ return 0;
+}
+fs_initcall(cgroup_writeback_init);
+
 #else /* CONFIG_CGROUP_WRITEBACK */
 
 static struct bdi_writeback *
--- a/fs/super.c
+++ b/fs/super.c
@@ -415,6 +415,7 @@ void generic_shutdown_super(struct super
  sb->s_flags &= ~MS_ACTIVE;
 
  fsnotify_unmount_inodes(sb);
+ cgroup_writeback_umount();
 
  evict_inodes(sb);
 
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -198,6 +198,7 @@ void wbc_attach_and_unlock_inode(struct
 void wbc_detach_inode(struct writeback_control *wbc);
 void wbc_account_io(struct writeback_control *wbc, struct page *page,
     size_t bytes);
+void cgroup_writeback_umount(void);
 
 /**
  * inode_attach_wb - associate an inode with its wb
@@ -301,6 +302,10 @@ static inline void wbc_account_io(struct
 {
 }
 
+static inline void cgroup_writeback_umount(void)
+{
+}
+
 #endif /* CONFIG_CGROUP_WRITEBACK */
 
 /*


Reply | Threaded
Open this post in threaded view
|

[PATCH 4.4 30/74] libata: Align ata_devices id on a cacheline

Greg KH-4
In reply to this post by Greg KH-4
4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Harvey Hunt <[hidden email]>

commit 4ee34ea3a12396f35b26d90a094c75db95080baa upstream.

The id buffer in ata_device is a DMA target, but it isn't explicitly
cacheline aligned. Due to this, adjacent fields can be overwritten with
stale data from memory on non coherent architectures. As a result, the
kernel is sometimes unable to communicate with an ATA device.

Fix this by ensuring that the id buffer is cacheline aligned.

This issue is similar to that fixed by Commit 84bda12af31f
("libata: align ap->sector_buf").

Signed-off-by: Harvey Hunt <[hidden email]>
Cc: [hidden email]
Signed-off-by: Tejun Heo <[hidden email]>
Signed-off-by: Greg Kroah-Hartman <[hidden email]>

---
 include/linux/libata.h |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/include/linux/libata.h
+++ b/include/linux/libata.h
@@ -718,7 +718,7 @@ struct ata_device {
  union {
  u16 id[ATA_ID_WORDS]; /* IDENTIFY xxx DEVICE data */
  u32 gscr[SATA_PMP_GSCR_DWORDS]; /* PMP GSCR block */
- };
+ } ____cacheline_aligned;
 
  /* DEVSLP Timing Variables from Identify Device Data Log */
  u8 devslp_timing[ATA_LOG_DEVSLP_SIZE];


Reply | Threaded
Open this post in threaded view
|

[PATCH 4.4 28/74] drm/amdgpu: return from atombios_dp_get_dpcd only when error

Greg KH-4
In reply to this post by Greg KH-4
4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Arindam Nath <[hidden email]>

commit 0b39c531cfa12dad54eac238c2e303b994df1ef7 upstream.

In amdgpu_connector_hotplug(), we need to start DP link
training only after we have received DPCD. The function
amdgpu_atombios_dp_get_dpcd() returns non-zero value only
when an error condition is met, otherwise returns zero.
So in case the function encounters an error, we need to
skip rest of the code and return from amdgpu_connector_hotplug()
immediately. Only when we are successfull in reading DPCD
pin, we should carry on with turning-on the monitor.

Signed-off-by: Arindam Nath <[hidden email]>
Signed-off-by: Alex Deucher <[hidden email]>
Signed-off-by: Greg Kroah-Hartman <[hidden email]>

---
 drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c
@@ -77,7 +77,7 @@ void amdgpu_connector_hotplug(struct drm
  } else if (amdgpu_atombios_dp_needs_link_train(amdgpu_connector)) {
  /* Don't try to start link training before we
  * have the dpcd */
- if (!amdgpu_atombios_dp_get_dpcd(amdgpu_connector))
+ if (amdgpu_atombios_dp_get_dpcd(amdgpu_connector))
  return;
 
  /* set it to OFF so that drm_helper_connector_dpms()


Reply | Threaded
Open this post in threaded view
|

[PATCH 4.4 05/74] btrfs: async-thread: Fix a use-after-free error for trace

Greg KH-4
In reply to this post by Greg KH-4
4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Qu Wenruo <[hidden email]>

commit 0a95b851370b84a4b9d92ee6d1fa0926901d0454 upstream.

Parameter of trace_btrfs_work_queued() can be freed in its workqueue.
So no one use use that pointer after queue_work().

Fix the user-after-free bug by move the trace line before queue_work().

Reported-by: Dave Jones <[hidden email]>
Signed-off-by: Qu Wenruo <[hidden email]>
Reviewed-by: David Sterba <[hidden email]>
Signed-off-by: Chris Mason <[hidden email]>
Signed-off-by: Greg Kroah-Hartman <[hidden email]>

---
 fs/btrfs/async-thread.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/fs/btrfs/async-thread.c
+++ b/fs/btrfs/async-thread.c
@@ -328,8 +328,8 @@ static inline void __btrfs_queue_work(st
  list_add_tail(&work->ordered_list, &wq->ordered_list);
  spin_unlock_irqrestore(&wq->list_lock, flags);
  }
- queue_work(wq->normal_wq, &work->normal_work);
  trace_btrfs_work_queued(work);
+ queue_work(wq->normal_wq, &work->normal_work);
 }
 
 void btrfs_queue_work(struct btrfs_workqueue *wq,


Reply | Threaded
Open this post in threaded view
|

[PATCH 4.4 29/74] libata: fix HDIO_GET_32BIT ioctl

Greg KH-4
In reply to this post by Greg KH-4
4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Arnd Bergmann <[hidden email]>

commit 287e6611ab1eac76c2c5ebf6e345e04c80ca9c61 upstream.

As reported by Soohoon Lee, the HDIO_GET_32BIT ioctl does not
work correctly in compat mode with libata.

I have investigated the issue further and found multiple problems
that all appeared with the same commit that originally introduced
HDIO_GET_32BIT handling in libata back in linux-2.6.8 and presumably
also linux-2.4, as the code uses "copy_to_user(arg, &val, 1)" to copy
a 'long' variable containing either 0 or 1 to user space.

The problems with this are:

* On big-endian machines, this will always write a zero because it
  stores the wrong byte into user space.

* In compat mode, the upper three bytes of the variable are updated
  by the compat_hdio_ioctl() function, but they now contain
  uninitialized stack data.

* The hdparm tool calling this ioctl uses a 'static long' variable
  to store the result. This means at least the upper bytes are
  initialized to zero, but calling another ioctl like HDIO_GET_MULTCOUNT
  would fill them with data that remains stale when the low byte
  is overwritten. Fortunately libata doesn't implement any of the
  affected ioctl commands, so this would only happen when we query
  both an IDE and an ATA device in the same command such as
  "hdparm -N -c /dev/hda /dev/sda"

* The libata code for unknown reasons started using ATA_IOC_GET_IO32
  and ATA_IOC_SET_IO32 as aliases for HDIO_GET_32BIT and HDIO_SET_32BIT,
  while the ioctl commands that were added later use the normal
  HDIO_* names. This is harmless but rather confusing.

This addresses all four issues by changing the code to use put_user()
on an 'unsigned long' variable in HDIO_GET_32BIT, like the IDE subsystem
does, and by clarifying the names of the ioctl commands.

Signed-off-by: Arnd Bergmann <[hidden email]>
Reported-by: Soohoon Lee <[hidden email]>
Tested-by: Soohoon Lee <[hidden email]>
Signed-off-by: Tejun Heo <[hidden email]>
Signed-off-by: Greg Kroah-Hartman <[hidden email]>

---
 drivers/ata/libata-scsi.c |   11 +++++------
 include/linux/ata.h       |    4 ++--
 2 files changed, 7 insertions(+), 8 deletions(-)

--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -675,19 +675,18 @@ static int ata_ioc32(struct ata_port *ap
 int ata_sas_scsi_ioctl(struct ata_port *ap, struct scsi_device *scsidev,
      int cmd, void __user *arg)
 {
- int val = -EINVAL, rc = -EINVAL;
+ unsigned long val;
+ int rc = -EINVAL;
  unsigned long flags;
 
  switch (cmd) {
- case ATA_IOC_GET_IO32:
+ case HDIO_GET_32BIT:
  spin_lock_irqsave(ap->lock, flags);
  val = ata_ioc32(ap);
  spin_unlock_irqrestore(ap->lock, flags);
- if (copy_to_user(arg, &val, 1))
- return -EFAULT;
- return 0;
+ return put_user(val, (unsigned long __user *)arg);
 
- case ATA_IOC_SET_IO32:
+ case HDIO_SET_32BIT:
  val = (unsigned long) arg;
  rc = 0;
  spin_lock_irqsave(ap->lock, flags);
--- a/include/linux/ata.h
+++ b/include/linux/ata.h
@@ -487,8 +487,8 @@ enum ata_tf_protocols {
 };
 
 enum ata_ioctls {
- ATA_IOC_GET_IO32 = 0x309,
- ATA_IOC_SET_IO32 = 0x324,
+ ATA_IOC_GET_IO32 = 0x309, /* HDIO_GET_32BIT */
+ ATA_IOC_SET_IO32 = 0x324, /* HDIO_SET_32BIT */
 };
 
 /* core structures */


Reply | Threaded
Open this post in threaded view
|

[PATCH 4.4 18/74] arm/arm64: KVM: Fix ioctl error handling

Greg KH-4
In reply to this post by Greg KH-4
4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Michael S. Tsirkin <[hidden email]>

commit 4cad67fca3fc952d6f2ed9e799621f07666a560f upstream.

Calling return copy_to_user(...) in an ioctl will not
do the right thing if there's a pagefault:
copy_to_user returns the number of bytes not copied
in this case.

Fix up kvm to do
        return copy_to_user(...)) ?  -EFAULT : 0;

everywhere.

Acked-by: Christoffer Dall <[hidden email]>
Signed-off-by: Michael S. Tsirkin <[hidden email]>
Signed-off-by: Marc Zyngier <[hidden email]>
Signed-off-by: Greg Kroah-Hartman <[hidden email]>

---
 arch/arm/kvm/guest.c   |    2 +-
 arch/arm64/kvm/guest.c |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

--- a/arch/arm/kvm/guest.c
+++ b/arch/arm/kvm/guest.c
@@ -155,7 +155,7 @@ static int get_timer_reg(struct kvm_vcpu
  u64 val;
 
  val = kvm_arm_timer_get_reg(vcpu, reg->id);
- return copy_to_user(uaddr, &val, KVM_REG_SIZE(reg->id));
+ return copy_to_user(uaddr, &val, KVM_REG_SIZE(reg->id)) ? -EFAULT : 0;
 }
 
 static unsigned long num_core_regs(void)
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -186,7 +186,7 @@ static int get_timer_reg(struct kvm_vcpu
  u64 val;
 
  val = kvm_arm_timer_get_reg(vcpu, reg->id);
- return copy_to_user(uaddr, &val, KVM_REG_SIZE(reg->id));
+ return copy_to_user(uaddr, &val, KVM_REG_SIZE(reg->id)) ? -EFAULT : 0;
 }
 
 /**


Reply | Threaded
Open this post in threaded view
|

[PATCH 4.4 09/74] parisc: Fix ptrace syscall number and return value modification

Greg KH-4
In reply to this post by Greg KH-4
4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Helge Deller <[hidden email]>

commit 98e8b6c9ac9d1b1e9d1122dfa6783d5d566bb8f7 upstream.

Mike Frysinger reported that his ptrace testcase showed strange
behaviour on parisc: It was not possible to avoid a syscall and the
return value of a syscall couldn't be changed.

To modify a syscall number, we were missing to save the new syscall
number to gr20 which is then picked up later in assembly again.

The effect that the return value couldn't be changed is a side-effect of
another bug in the assembly code. When a process is ptraced, userspace
expects each syscall to report entrance and exit of a syscall.  If a
syscall number was given which doesn't exist, we jumped to the normal
syscall exit code instead of informing userspace that the (non-existant)
syscall exits. This unexpected behaviour confuses userspace and thus the
bug was misinterpreted as if we can't change the return value.

This patch fixes both problems and was tested on 64bit kernel with
32bit userspace.

Signed-off-by: Helge Deller <[hidden email]>
Cc: Mike Frysinger <[hidden email]>
Tested-by: Mike Frysinger <[hidden email]>
Signed-off-by: Greg Kroah-Hartman <[hidden email]>

---
 arch/parisc/kernel/ptrace.c  |   16 +++++++++++-----
 arch/parisc/kernel/syscall.S |    5 ++++-
 2 files changed, 15 insertions(+), 6 deletions(-)

--- a/arch/parisc/kernel/ptrace.c
+++ b/arch/parisc/kernel/ptrace.c
@@ -269,14 +269,19 @@ long compat_arch_ptrace(struct task_stru
 
 long do_syscall_trace_enter(struct pt_regs *regs)
 {
- long ret = 0;
-
  /* Do the secure computing check first. */
  secure_computing_strict(regs->gr[20]);
 
  if (test_thread_flag(TIF_SYSCALL_TRACE) &&
-    tracehook_report_syscall_entry(regs))
- ret = -1L;
+    tracehook_report_syscall_entry(regs)) {
+ /*
+ * Tracing decided this syscall should not happen or the
+ * debugger stored an invalid system call number. Skip
+ * the system call and the system call restart handling.
+ */
+ regs->gr[20] = -1UL;
+ goto out;
+ }
 
 #ifdef CONFIG_64BIT
  if (!is_compat_task())
@@ -290,7 +295,8 @@ long do_syscall_trace_enter(struct pt_re
  regs->gr[24] & 0xffffffff,
  regs->gr[23] & 0xffffffff);
 
- return ret ? : regs->gr[20];
+out:
+ return regs->gr[20];
 }
 
 void do_syscall_trace_exit(struct pt_regs *regs)
--- a/arch/parisc/kernel/syscall.S
+++ b/arch/parisc/kernel/syscall.S
@@ -343,7 +343,7 @@ tracesys_next:
 #endif
 
  comiclr,>>= __NR_Linux_syscalls, %r20, %r0
- b,n .Lsyscall_nosys
+ b,n .Ltracesys_nosys
 
  LDREGX  %r20(%r19), %r19
 
@@ -359,6 +359,9 @@ tracesys_next:
  be      0(%sr7,%r19)
  ldo R%tracesys_exit(%r2),%r2
 
+.Ltracesys_nosys:
+ ldo -ENOSYS(%r0),%r28 /* set errno */
+
  /* Do *not* call this function on the gateway page, because it
  makes a direct call to syscall_trace. */
 


Reply | Threaded
Open this post in threaded view
|

[PATCH 4.4 07/74] block: Initialize max_dev_sectors to 0

Greg KH-4
In reply to this post by Greg KH-4
4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Keith Busch <[hidden email]>

commit 5f009d3f8e6685fe8c6215082c1696a08b411220 upstream.

The new queue limit is not used by the majority of block drivers, and
should be initialized to 0 for the driver's requested settings to be used.

Signed-off-by: Keith Busch <[hidden email]>
Acked-by: Martin K. Petersen <[hidden email]>
Reviewed-by: Sagi Grimberg <[hidden email]>
Reviewed-by: Christoph Hellwig <[hidden email]>
Signed-off-by: Jens Axboe <[hidden email]>
Signed-off-by: Greg Kroah-Hartman <[hidden email]>

---
 block/blk-settings.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -91,8 +91,8 @@ void blk_set_default_limits(struct queue
  lim->seg_boundary_mask = BLK_SEG_BOUNDARY_MASK;
  lim->virt_boundary_mask = 0;
  lim->max_segment_size = BLK_MAX_SEGMENT_SIZE;
- lim->max_sectors = lim->max_dev_sectors = lim->max_hw_sectors =
- BLK_SAFE_MAX_SECTORS;
+ lim->max_sectors = lim->max_hw_sectors = BLK_SAFE_MAX_SECTORS;
+ lim->max_dev_sectors = 0;
  lim->chunk_sectors = 0;
  lim->max_write_same_sectors = 0;
  lim->max_discard_sectors = 0;


Reply | Threaded
Open this post in threaded view
|

[PATCH 4.4 08/74] PCI: keystone: Fix MSI code that retrieves struct pcie_port pointer

Greg KH-4
In reply to this post by Greg KH-4
4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Murali Karicheri <[hidden email]>

commit 79e3f4a853ed161cd4c06d84b50beebf961a47c6 upstream.

Commit cbce7900598c ("PCI: designware: Make driver arch-agnostic") changed
the host bridge sysdata pointer from the ARM pci_sys_data to the DesignWare
pcie_port structure, and changed pcie-designware.c to reflect that.  But it
did not change the corresponding code in pci-keystone-dw.c, so it caused
crashes on Keystone:

  Unable to handle kernel NULL pointer dereference at virtual address 00000030
  pgd = c0003000
  [00000030] *pgd=80000800004003, *pmd=00000000
  Internal error: Oops: 206 [#1] PREEMPT SMP ARM
  CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.4.2-00139-gb74f926 #2
  Hardware name: Keystone
  PC is at ks_dw_pcie_msi_irq_unmask+0x24/0x58

Change pci-keystone-dw.c to expect sysdata to be the struct pcie_port
pointer.

[bhelgaas: changelog]
Fixes: cbce7900598c ("PCI: designware: Make driver arch-agnostic")
Signed-off-by: Murali Karicheri <[hidden email]>
Signed-off-by: Bjorn Helgaas <[hidden email]>
CC: Zhou Wang <[hidden email]>
Signed-off-by: Greg Kroah-Hartman <[hidden email]>

---
 drivers/pci/host/pci-keystone-dw.c |   11 +++--------
 1 file changed, 3 insertions(+), 8 deletions(-)

--- a/drivers/pci/host/pci-keystone-dw.c
+++ b/drivers/pci/host/pci-keystone-dw.c
@@ -58,11 +58,6 @@
 
 #define to_keystone_pcie(x) container_of(x, struct keystone_pcie, pp)
 
-static inline struct pcie_port *sys_to_pcie(struct pci_sys_data *sys)
-{
- return sys->private_data;
-}
-
 static inline void update_reg_offset_bit_pos(u32 offset, u32 *reg_offset,
      u32 *bit_pos)
 {
@@ -108,7 +103,7 @@ static void ks_dw_pcie_msi_irq_ack(struc
  struct pcie_port *pp;
 
  msi = irq_data_get_msi_desc(d);
- pp = sys_to_pcie(msi_desc_to_pci_sysdata(msi));
+ pp = (struct pcie_port *) msi_desc_to_pci_sysdata(msi);
  ks_pcie = to_keystone_pcie(pp);
  offset = d->irq - irq_linear_revmap(pp->irq_domain, 0);
  update_reg_offset_bit_pos(offset, &reg_offset, &bit_pos);
@@ -146,7 +141,7 @@ static void ks_dw_pcie_msi_irq_mask(stru
  u32 offset;
 
  msi = irq_data_get_msi_desc(d);
- pp = sys_to_pcie(msi_desc_to_pci_sysdata(msi));
+ pp = (struct pcie_port *) msi_desc_to_pci_sysdata(msi);
  ks_pcie = to_keystone_pcie(pp);
  offset = d->irq - irq_linear_revmap(pp->irq_domain, 0);
 
@@ -167,7 +162,7 @@ static void ks_dw_pcie_msi_irq_unmask(st
  u32 offset;
 
  msi = irq_data_get_msi_desc(d);
- pp = sys_to_pcie(msi_desc_to_pci_sysdata(msi));
+ pp = (struct pcie_port *) msi_desc_to_pci_sysdata(msi);
  ks_pcie = to_keystone_pcie(pp);
  offset = d->irq - irq_linear_revmap(pp->irq_domain, 0);
 


Reply | Threaded
Open this post in threaded view
|

[PATCH 4.4 04/74] btrfs: Fix no_space in write and rm loop

Greg KH-4
In reply to this post by Greg KH-4
4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Zhao Lei <[hidden email]>

commit e1746e8381cd2af421f75557b5cae3604fc18b35 upstream.

I see no_space in v4.4-rc1 again in xfstests generic/102.
It happened randomly in some node only.
(one of 4 phy-node, and a kvm with non-virtio block driver)

By bisect, we can found the first-bad is:
 commit bdced438acd8 ("block: setup bi_phys_segments after splitting")'
But above patch only triggered the bug by making bio operation
faster(or slower).

Main reason is in our space_allocating code, we need to commit
page writeback before wait it complish, this patch fixed above
bug.

BTW, there is another reason for generic/102 fail, caused by
disable default mixed-blockgroup, I'll fix it in xfstests.

Signed-off-by: Zhao Lei <[hidden email]>
Signed-off-by: Chris Mason <[hidden email]>
Signed-off-by: Greg Kroah-Hartman <[hidden email]>

---
 fs/btrfs/extent-tree.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -4086,8 +4086,10 @@ commit_trans:
     !atomic_read(&root->fs_info->open_ioctl_trans)) {
  need_commit--;
 
- if (need_commit > 0)
+ if (need_commit > 0) {
+ btrfs_start_delalloc_roots(fs_info, 0, -1);
  btrfs_wait_ordered_roots(fs_info, -1);
+ }
 
  trans = btrfs_join_transaction(root);
  if (IS_ERR(trans))


12345