steamsprocket.org.uk

Fix tmux eating Ctrl+Tab/Ctrl+Shift+Tab

I have no idea why this happens, but on one of my machines, tmux swallows Ctrl+Tab and Ctrl+Shift+Tab. I have other machines, running the same tmux version and config, where this doesn’t happen. The only significant difference I’m aware of is that I access the other machines via SSH, but I’ve tried ssh localhost and the problem persists.

I gave up trying to work out the reason, and resorted to explicitly configuring tmux to send on the relevant escape sequences:

# Pass on Ctrl+Tab and Ctrl+Shift+Tab
bind-key -n C-Tab send-keys Escape [27\;5\;9~
bind-key -n C-S-Tab send-keys Escape [27\;6\;9~

Running a script after updating a TLS certificate with certbot

This proved to be blessedly simple. As per the documentation, any executable in /etc/letsencrypt/renewal-hooks/deploy will be run after a certificate was successfully renewed (may need to be owned by root). This worked first time:

#!/bin/bash
 
cp -L /etc/letsencrypt/live/<domain>/{fullchain,privkey}.pem /etc/exim4/
chown Debian-exim:Debian-exim /etc/exim4/{fullchain,privkey}.pem
systemctl restart exim4

Setting IO scheduler for use with ZFS

If you’re using rotational hard drives, Linux’s default IO scheduler can interact very badly with ZFS’s IO scheduler, greatly reducing performance. This is further exaggerated if you have any SMR devices due to their pathological worst-case performance characteristics.

I’ve found that switching to “none” (this was called “noop” historically) can improve performance by a full order of magnitude or more, which can take SMR resilvers from “this will take over a month” to “this will take three days”. With purely CMR drives, it’s not such an impressive improvement: I didn’t test very extensively so the error bars could be large, but I found around a factor of two.

This scheduling problem is well-known and ZFS used to have an option to set “noop” automatically, but this was removed in 0.8.3, presumably because the developers felt it wasn’t ZFS’s place to change system settings like this.

The current recommendation is to use a udev rule if you need this. In my systems, all my rotational drives are used for bulk storage with ZFS, so this can be achieved very easily by setting a udev rule that applies to all rotational drives. Create a file with the following content, call it something like 66-io-scheduler.rules, and drop it in /etc/udev/rules.d/:

ACTION=="add|change", KERNEL=="sd[a-z]", ATTRS{queue/rotational}=="1", RUN+="/bin/echo none > /sys/block/%k/queue/scheduler"

Update: A previous version of this post used simply echo as the command. This doesn’t work because udev doesn’t use PATH so needs a fully qualified path to the executable.

Fixing emails from Nextcloud via Debian’s default exim setup

At some point, my Nextcloud quietly stopped being able to send email. This is currently Nextcloud 27.1.3, running in docker on Debian 11.8, but it might have been broken for some time.

When setting up exim on Debian, debconf asks you a few questions and generates a default config. My setup has no other deviations from the default, and the only notable config option is to set it to listen only on some specified local addresses.

Nextcloud is set to connect to one of those local addresses to send email, and this used to work. Now however, the email setting test fails with a generic error message, and causes exim to log TLS error on connection from (<hostname>) [172.21.0.2] (gnutls_handshake): Decryption has failed.

Long story short, this happens because exim uses a self-signed certificate out of the box, and Nextcloud is unhappy about it but won’t tell you that.

I don’t care about TLS at all. This traffic never leaves the local machine, and has no need to be encrypted, but that doesn’t appear to be an option. Nextcloud allows either “TLS” or “None/STARTTLS” as its encryption options; there is no “None”. In theory it should be possible to get exim to stop advertising STARTTLS support in response to EHLO; in practice after an hour or so trying every config option that looked relevant, I’m forced to conclude that it’s not actually possible – I found some references online to compiling it without TLS support, but I’m not willing to go that far.

Okay, so I need an actual certificate that will propitiate Nextcloud. Letsencrypt to the rescue? Well yes, but also erghh. First, a TLS certificate can obviously only be generated for a hostname, not an IP address, so I created a new DNS entry pointing to the 10.x.x.x local address in question, and reconfigured Nextcloud to use that hostname. The I generated the certificate using acme-dns for verification – this might not be the best option, but I already have it set up so it’s easy for me. One way or another, trying to create a certificate for a private service is almost certainly going to require a DNS-01 challenge.

Debian’s exim is set up so that most configuration is done by editing /etc/exim4/update-exim4.conf.conf, then running update-exim4.conf, then restarting the exim4 service. The config lines needed are:

MAIN_TLS_ENABLE=yes
MAIN_TLS_CERTIFICATE=/path/to/fullchain.pem
MAIN_TLS_PRIVATEKEY=/path/to/privkey.pem

So far so good, but it still doesn’t work. Quite reasonably, certbot will only allow creating certificate files accessible by root. The idea is that a service needing a certificate should start up as root, read whatever privileged data it needs, then drop privileges. Sadly, exim does not do this. It runs as Debian-exim, and only attempts to read the certificate on demand. This leads to errors like TLS error on connection from (<hostname>) [172.21.0.2] (cert/key setup: cert=/path/to/fullchain.pem key=/path/to/privkey.pem): Error while reading file.

The only solution I’ve found so far is to copy the files to another location and change their owner to Debian-exim. Finally, this works, and Nextcloud can now send emails!

One remaining problem: I don’t want to have to repeat this every few months when the certificate is updated. I see that there is a promising-looking /etc/letsencrypt/renewal-hooks/deploy/, and I’m really hoping that I can just drop a shell script in there and call it a day…

Update: The hook worked first time.

Setting ‘User Cannot Change Password’ using Python

Adding users manually to [[Active_Directory|Active Directory]] can be a chore, especially if you have a lot of users to add and/or you need to remember to set several options. Fortunately AD is readily scriptable. I generally use Python for this purpose, and there are numerous examples for how to do things like add a new user or add an account to a group.

One thing I couldn’t find was a simple account of how to set the option which the ADUCĀ ‘New User’ wizard labels ‘User Cannot Change Password’. This option is not as simple as setting a flag when creating the user object; you have to add an ACE denying the ‘Change Password’ permission. Microsoft helpfully document the process, so it’s just a matter of translating the instructions into Python.

# See http://msdn.microsoft.com/en-us/library/aa746398.aspx
# Standard definitions
ChangePasswordGuid = '{ab721a53-1e2f-11d0-9819-00aa0040529b}'
ADS_ACETYPE_ACCESS_DENIED_OBJECT = 0x6
SID_SELF = "S-1-5-10"
SID_EVERYONE = "S-1-1-0"
 
# Get the full localised names for 'Everyone' and 'Self'
# For English language Domain Controllers this is equivalent to:
# selfName = u'NT AUTHORITY\\SELF'
# everyoneName = u'Everyone'
selfAccount = win32security.LookupAccountSid(None,
        win32security.GetBinarySid(SID_SELF))
everyoneAccount = win32security.LookupAccountSid(None,
        win32security.GetBinarySid(SID_EVERYONE))
selfName = ("%s\\%s" % (selfAccount[1], selfAccount[0])).strip('\\')
everyoneName = ("%s\\%s" % (everyoneAccount[1], everyoneAccount[0])).strip('\\')
 
user = active_directory.find_user(userName)
sd = user.ntSecurityDescriptor
dacl = sd.DiscretionaryAcl
for ace in dacl:
    if ace.ObjectType.lower() == ChangePasswordGuid.lower():
        if ace.Trustee == selfName or ace.Trustee == everyoneName:
            ace.AceType = ADS_ACETYPE_ACCESS_DENIED_OBJECT
 
sd.DiscretionaryAcl = dacl
user.Put('ntSecurityDescriptor', sd)
user.SetInfo()

Two things to note:

First, I’ve assumed that the name of the user to modify is stored in ‘userName‘, and used the active_directory module to do the lookup. Alternatively the object can be retrieved using win32com directly, by doing something like the following:

location = "cn=Users,dc=example,dc=org"

or

location = "ou=Roaming Users,ou=My User OU,dc=example,dc=org"

followed by

user = win32com.client.GetObject("LDAP://cn=%s,%s" % (userName, location))

In the active_directory case userName could be either the logon name (eg. ‘jsmith’) or the canonical name (eg. ‘John Smith’). In the win32com case it has to be the latter.

Second, in my case the user objects already had the relevant ACEs so it was just a case of setting them to ‘deny’ when necessary. Microsoft’s documentation describes how to add the entries if they’re absent, so presumably there must be circumstances in which that would happen. Translating that description into Python is a simple enough process that it’s left as an exercise for the reader.

Packaging

This is the new computer case I received in the post today:

Alongside the box it came in:

And the box that came in:

ZFS: One or more devices has experienced an unrecoverable error

I’m using [[ZFS]] (via ZFS-FUSE), and at one point a zpool status gave me this rather scary report:

zpool status
  pool: srv
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: none requested
config:

        NAME                                                      STATE     READ WRITE CKSUM
        srv                                                       ONLINE       0     0     0
          disk/by-id/usb-Samsung_STORY_Station_0000002CE0C43-0:0  ONLINE       0     0     1
          disk/by-id/scsi-SATA_ST3500630AS_9QG3JRW0               ONLINE       0     0     0
          disk/by-id/scsi-SATA_SAMSUNG_HD103UJS13PJDWS516679      ONLINE       0     0     0

errors: No known data errors

‘Unrecoverable error’, eh? Crap. Wait, how can applications be unaffected by an unrecoverable error? How can there be ‘no known data errors’? Also, how is there a checksum error, but no read or write errors? What other operation could there be? Better check that link

…Well apparently ‘the device has experienced a read I/O error, write I/O error, or checksum validation error’. I guess that implies an answer to the last question: ‘READ’ and ‘WRITE’ refer specifically to disk I/O errors. Not to errors in reading or writing in general, just those where the disk itself has detected an error and reported it back. The way that information is presented is pretty apalling as you just need to know that peculiarity in order to interpret it, but okay, let’s press on.

‘Because the device is part of a mirror or RAID-Z device, ZFS was able to recover from the error and subsequently repair the damaged data’. What? No it isn’t. This is just a striped pool: no mirroring or RAID-Z involved. Any anyway, you said it was unrecoverable. Now I’m beginning to worry. The documentation claims that ZFS was able to recover from an unrecoverable error using data replication that doesn’t exist; WTF does that mean? Well obviously it means that the documentation was written by an imbecile, but what’s the message they’re clumsily trying to get across?

After some googling led me to this thread, I did eventually work this out. You get the message about an unrecoverable error if and only if (and this part’s genius) ZFS was able to recover. How very… special. If it wasn’t able to recover, you’ll instead be told that ‘a file or directory could not be read due to corrupt data’. No mention of the word ‘unrecoverable’ there.

But wait, how could it recover from it if there’s no data redundancy, and why does it think it’s a mirrored or RAID-Z device? The answer to that would appear to be that, through some happenstance, the error corrupted some metadata rather than actual file data. Since metadata always has at least one redundant copy, it corrected it as if it were mirrored. Phew.

So to recap: so far as ZFS is concerned, the alarming phrase ‘unrecoverable error’ means ‘error from which ZFS has successfully recovered’. Thanks for that Sun.

Facepalm

A Visual Git Reference

This page gives a good visual overview of numerous Git operations. If you understand Git in principle, but are unsure about the exact meaning of certain commands, then this may be useful; for each operation covered it gives a description and a block diagram of how it changes the state of the repository.

If you’re completely in the dark, or if you don’t understand what a [[Directed_Acyclic_Graph|DAG]] is then you might be better off with some of the introductory documentation.

Bonus: The images are all drawn using TeX and a package I’d never heard of called TikZ. The examples page is mindblowing.

DreamPie

This graphical Python shell appears to be an excellent tool for interactive Python use, possibly supplanting IPython. From the announcement:

* Has whatever you would expect from a graphical Python shell -
attribute completion, tooltips which show how to call functions,
highlighting of matching parentheses, etc.
* Fixes a lot of IDLE nuisances - in DreamPie interrupt always works,
history recall and completion works as expected, etc.
* Results are saved in the Result History.
* Long output is automatically folded so you can focus on what's important.
* Jython and IronPython support makes DreamPie a great tool for
exploring Java and .NET classes.
* You can copy any amount of code and immediately execute it, and you
can also copy code you typed interactively into a new file, with the
Copy Code Only command. No tabs are used!
* Free software licensed under GPL version 3.

Plus numerous other useful features highlighted on the website. My biggest gripe so far is that it uses Ctrl+Up/Down instead of simply Up/Down to go back through command history; maybe that’ll become configurable.

NB: I have no connection to this at all; I just think it looks good.

Automating Debian security updates

Thanks to ‘foom’ on LWN, comes a neat recipe for automatically installing critical updates on a Debian system with minimal risk: http://lwn.net/Articles/374542/. It’s explained in some detail, but in brief it uses (optionally) a separate sources.list into which you can put only critical sources, and makes dpkg/apt choose the default responses to any questions. Fortunately Debian are good at making sure that their security updates don’t change any behaviour unless absolutely unavoidable, so there shouldn’t be any unexpected surprises.