Unhappy data retention day

This article originally appeared on Hoyden About Town.

This morning, Australia’s mandatory 2 year data retention regime began. Internet activity through Australian ISPs (including mobile phone providers) is now recorded. Australians, according to Crikey, here is what is likely to be retained about your accessing this link today:

  • your name and similar identifying details on your Internet account
  • the Internet address of where you accessed Hoyden About Town from
  • the Internet address of Hoyden About Town itself
  • the date and time you accessed this site
  • how long you accessed it for (quickly, in the case of websites, no doubt, but what if you were Skyping with us?)
  • what technical services you used (HTTP over ADSL or mobile or cable or …)

If you are accessing this over a mobile device, your location is also stored, to quite a high degree of accuracy. This data is also by far the hardest to conceal using any method, since it’s revealed as a core part of your phone’s communication with cell towers.

At least the actual specific page you accessed would not (or at least need not) be retained, if I am interpreting the information at Allens and Crikey correctly.

Surveillance cameras attached to a building exterior
Surveillance, by Jonathan McIntosh CC BY-SA

Further reading:


Image credit: Surveillance by Jonathan McIntosh, Creative Commons Attribution-Sharealike.

Code release: Spam All the Links

The Geek Feminism blog’s Linkspam tradition started back in August 2009, in the very early days of the blog and by September it had occurred to us to take submissions through bookmarking services. From shortly after that point there were a sequence of scripts that pulled links out of RSS feeds. Last year, I began cleaning up my script and turning it into the one link-hoovering script to rule them all. It sucks links out of bookmarking sites, Twitter and WordPress sites and bundles them all up into an email that is sent to the linkspamming team there for curation, pre-formatted in HTML and with title and suggestion descriptions for each link. It even attempts to filter out links already posted in previous linkspams.

The Geek Feminism linkspammers aren’t the only link compilers in town, and it’s possible we’re not the only group who would find my script useful. I’ve therefore finished generalising it, and I’ve released it as Spam All the Links on Gitlab. It’s a Python 3 script that should run on most standard Python environments.

Spam All the Links

Spam All the Links is a command line script that fetches URL suggestions from
several sources and assembles them into one email. That email can in turn be
pasted into a blog entry or otherwise used to share the list of links.

Use case

Spam All the Links was written to assist in producing the Geek Feminism linkspam posts. It was developed to check WordPress comments, bookmarking websites such as Pinboard, and Twitter, for links tagged “geekfeminism”, assemble them into one email, and email them to an editor who could use the email as the basis for a blog post.

The script has been generalised to allow searches of RSS/Atom feeds, Twitter, and WordPress blog comments as specified by a configuration file.

Email output

The email output of the script has three components:

  1. a plain text email with the list of links
  2. a HTML email with the list of links
  3. an attachment with the HTML formatted links but no surrounding text so as to be easily copy and pasted

All three parts of the email can be templated with Jinja2.

Sources of links

Spam All the Links currently can be configured to check multiple sources of links, in these forms:

  1. RSS/Atom feeds, such as those produced by the bookmarking sites Pinboard or Diigo, where the link, title and description of the link can be derived from the equivalent fields in the RSS/Atom. (bookmarkfeed in the configuration file)
  2. RSS/Atom feeds where links can be found in the ‘body’ of a post (postfeed in the configuration file)
  3. Twitter searches (twitter in the configuration file)
  4. comments on WordPress blog entries (wpcommentsfeed in the configuration file)

More info, and the code, is available at the Spam All the Links repository at Gitlab. It is available under the MIT free software licence.

Quick links: nothing to hide

This article originally appeared on Hoyden About Town.

Data retention is coming to Australia very soon.

[Data retained] includes your name, address and other identifying information, your contract details, billing and payment information. In relation to each communication, it includes the date, start and finish times, and the identities of the other parties to the communication. And it includes the location data, such as the mobile cell towers or Wi-Fi hotspots you were accessing at the time…

But surely they’ve included special protections for communications between doctors and patients, and lawyers and clients? No. Never even discussed…

The Joint Committee recommended that the Act be amended to ensure that the metadata can’t be obtained by parties in civil litigation cases (I’ve mentioned before how excited litigation lawyers will be about all this lovely new data), and George Brandis said that would be fixed in the final amendments. But it isn’t there. The final Bill being bulldozed through Parliament right now contains no such protection. The fact remains that, under the Telecommunications Act, one of the situations in which a service provider cannot resist handing over stored data is when a court has required it by issuing a subpoena. In practice, that means that your ex-spouse, former business partners, suspicious insurance company or employer can get hold of a complete digital history of your movements and communications for the past two years, and use it against you in court.

Michael Bradley, Our privacy is about to be serially infringed, The Drum, March 19 2015

Surveillance cameras attached to a building exterior
Surveillance, by Jonathan McIntosh@Flickr CC BY-SA

Noted elsewhere: all this data will be stored by various companies with varying degrees of security awareness, so in practice it will sometimes be available to some criminals too.

Elsewhere:


Image credit: Surveillance by Jonathan McIntosh, Creative Commons Atttribution-Sharealike

Importing a large blog to WordPress.com: WXR splitting tools

I am about to import a very large WordPress blog (not this one) to WordPress.com.

There’s two issues:

1. The WXR (WordPress eXtended RSS) export from the site is 105MB uncompressed and 22MB compressed (with gzip -9). This is too large to upload to WordPress.com, which only accepts uploads of 15MB at most.

2. This site has 4000 media file uploads (and 6000 posts). The original host is going away: those 4000 media files (mostly images) must also be imported into WordPress.com.

The obvious solution to #1 is to split the upload into multiple files, but I have just tested on WordPress.com, and in order to get it to change the post contents to refer to the imported copy of the media files, rather than the original externally hosted copy which is about to go away, the media file and the post must be uploaded in the same XML file. The scripts that I’ve found that will split WXR files into multiple XML files do not attempt to put media files and the posts that refer to them in the same XML file (eg mainSplit.py doesn’t do this), they just split the contents of the export file up in the order they appear.

Anyone got leads on this one?

The right to forget, or, that one terrible road stop

I predict that soon the conversation will turn from the right to be forgotten to the right to forget.

Why so? Well, now Google Maps now tries to remember places I’ve been and include them in the maps it shows me. The trouble with this (ignoring any petty privacy, commercialisation, misc concerns you may be about to mention to me) is that there are some places that should be forgotten. In particular, all of Western Sydney’s commerce is now represented to me by one service station that we stopped at on a family trip because someone needed to use the loo, but couldn’t, because its loo was splattered with largely unspecified bodily fluids.

Get it together Google! This is even worse than the way my Youtube suggestions are now and forever filled with Thomas the Tank Engine videos because of an unfortunate and lengthy phase my son went through. I insist on not navigating Sydney in future primarily in terms of which horrible public toilet I am nearest.

 

It’s password management turtles all the way down

Since I mentioned password management in passing yesterday I recall a question I haven’t seen answered yet: how do you manage your password management passwords?

My setup is this: as advocated by, eg Bruce Schneier and Troy Hunt (but not, apparently, by Florêncio et al 2014, although I’ve only read the abstract and some of the press) I use a password manager, which stores huge long random passwords for all the sites I use and is in turn password protected.

While I’ve been doing this for several years, a few flaws have emerged:

  1. Google passwords. You have no idea how often you need to enter a Google password on an Android phone until… you do. And you’ll be reminded for every new device and then every password change, even if you’re a Heartbleed-level-or-greater password changer. It’s very very difficult to survive setting your Google password to F]U8NScS+RP7eL5)v=gj7f*/bX~$&` or even F]U8NScS+R frankly as an Android user. (Especially since if you have two factor turned on, the way you authenticate to an Android phone involves entering your password twice.)
  2. shared passwords, often required in business in particular but also in (cough) personal households, and not handled by most password managers in a model other “a password database for you” and “a password database for you and your boss” and so on for potentially combinatorial values of “you and [colleague]”

There are some services that attempt to solve that second point within an organisation, eg, Lastpass Enterprise but even allowing for that, let us enumerate the password manager passwords that a hypothetical individual called Mary currently has:

  1. personal password manager password
  2. work password manager password
  3. household password manager password
  4. volunteer organisation password manager password

And at the point where this hypothetical individual is remembering four separate extremely complex and secure passwords it’s beginning to look like the promised land of “the last password you’ll ever need” is, well, turtles all the way down.

It’s 2014 and the Internet is still atomising my household

Here’s some electronic things my household owns collectively:

  • our main camera
  • our television
  • our games consoles
  • our Kindle and Nexus tablet

Here’s the services I use almost daily that do not have any notion of collectively owned content or multiple publishers wanting to manage a single account:

  • Flickr
  • Google Play, or any other Google service
  • Xbox Live (to the extent I’ve explored it)

And this is epically frustrating, because here’s some use cases that these websites don’t handle well.

  • we share parenting of our children. We would like to be able to play one or both of them Frozen or Cars or whatever without both owning a copy from a streamable service or someone needing to leave a logged in Android device with a known password in the house at all times.
  • we both take photographs on our main camera. We sometimes can’t remember who took which one and in any case, it’s always me who post-processes them. We would like to be able to publish them on a photo sharing website and maybe sometimes attribute authorship (if one of us is especially proud of a shot and actually remembers taking it) and sometimes not!
  • we read the same books because I read them first and Andrew reads some subset of them on my recommendation, and we’d like to do that without both buying a copy.
  • we listen to the same music because Andrew listens to it first and I listen to some subset of it on his recommendation, and we’d like to do that without both buying a copy.

I mean, it’s disgusting really. One day we could even do the ultimate in simple gross violation of normal and healthy relationship boundaries some day and want to play each other’s saved games.

Right now we do pretty much what everyone does to some degree, as far as I can tell, which is to have a shared Amazon account and a shared Flickr account and still buy movies on optical discs for now even though five minutes of unskippable sections at the start are annoying and put our music on a fileserver and awkwardly manage our photos on a USB hard drive that can get plugged into different laptops and really not stream much stuff at all. Maybe one day we’ll have some kind of dedicated device that is logged into someone’s Google account and streams movies that are always bought through that account, or something like that.

Now traditionally when I make this point, someone will show up and say “yes, my dear, but something extremely complicated is going on here, much too complex and subtle for your delicate sensibilities, called making money through an advertising revenue model requiring demographic information and the entire world will go bankrupt if we allowed multiple people to share accounts even for content they produced in any recognised way, so don’t worry your pretty little head about it and let your husband buy the clicky button things from now on.”

To which I answer: this blog is (to the best of my knowledge) not owned by any of Yahoo!, Google or Microsoft and does not especially care about their revenue models. Moreover, if your comment boils down to “please try and see this from the side of the websites” I will replace your comment with the one from the previous paragraph, sexist content and all. (Also don’t explain to me that one can share passwords in various ways. I know. I do those things.)

I will concede one point: households don’t have continuity in the way that individuals do. My household will split into at least three and perhaps four someday. This is pretty much impossible to model in the present intellectual property+licencing rights model as far as I can tell.

And all the same, I’m annoyed that the software world is really hostile to the (very normal) way I live my life and is (surprise!) set up for a world in which each of the four people in my house sits in their own room with their own TV + gaming system + speakers + phone/tablet + ereader interacting with content they purchased entirely separately, and in many cases, in duplicate (possibly) maximising your revenue since whichever unfortunate day someone came up with the idea of an “account” on a computer system.

First ecosystem to fix this gets to sell me Frozen or something.

Opt-in Creative Commons licencing plugin for WordPress?

Does anyone have a recommendation for an opt-in Creative Commons licencing plugin for WordPress. That is, one where the default state is not to CC licence something, but when some action is taken, an individual post or page can be so licenced.

As background: I have no desire to write, maintain, or even debug a WordPress plugin. I want to know if there is something for this use case that Just Works.

I want opt-in, because it is too hard to remember, or to train others, to find an opt-out box when posting, and thus end up CC licensing things that weren’t intended to be, or can’t be, released under such a licence.

Some options I’ve already looked into:

WP License reloaded: was pretty much exactly what I wanted but doesn’t seem to be actively maintained and is now failing (possibly because the site in question is now hosted on SSL, I’m not sure, see above about not being interested in debugging).

Creative Commons Configurator: seems to be the most actively maintained CC plugin, but seems to be opt-out, and even that was only introduced recently.

Creative Commons Generator: opt-out.

Easy CC License: perhaps what I want, although I’d rather do this with an options dialogue of some kind than a shortcode.

Your crontab file should start with “crontab -l”!

I’ve never personally had this problem, but a number of people have told me that they’ve, often repeatedly, accidentally deleted their crontab by typing crontab -r (which silently removes a crontab) rather than crontab -l (which shows you what is in it) or crontab -e (which lets you edit it). It doesn’t help that “e” and “r” are next to each other on QWERTY keyboards.

Create a single backup of your crontab contents

Since I realised this was an issue, I’ve made the first line in my crontabs the following:

@daily crontab -l > ~/crontab.backup

If you ever accidentally use crontab -r, you can use crontab ~/crontab.backup to reinstall your crontab!

Adjust @daily to a time at which your computer is likely to be on, if it’s not always on, eg 0 10 * * * for 10am daily.

For bonus points, writing this entry reminded me that I hadn’t reinstalled my laptop’s crontab on my new machine, and meant it was easy for me to find and install!

Create timestamped backups of your crontab contents

The above is simple and suffices for me, but if you don’t have a backup routine that will grab ~/crontab.backup regularly enough for your needs, you could do something like this instead:

@daily mkdir -p ~/crontab-backups; crontab -l > ~/crontab-backups/crontab-`date +%Y%m%d-%H%M%S`; find ~/crontab-backups -type f -ctime +7 -delete

Explanation:

  1. mkdir -p ~/crontab-backups makes a directory crontab-backups in your home directory if it doesn’t already exist (and doesn’t complain if it does exist).
  2. crontab -l > ~/crontab-backups/crontab-`date +%Y%m%d-%H%M%S` puts your current crontab into a file named with a datestamp (eg crontab-20140711-124450 so that you can easily have more than one
  3. find ~/crontab-backups -type f -ctime +7 -delete finds all files (-type f) in ~/crontab-backups that were created more than 7 days ago (-ctime +7) and deletes them (-delete)

Warning: you don’t want to put anything else in ~/crontab-backups, because it too will be deleted after seven days.

Use python-flickrapi 1.2 even after the Flickr SSL transition

On June 27 2014, Flickr changed their API to be SSL-only. The Python flickrapi library was one of many pieces of software that used HTTP to connect to Flickr’s API, and that therefore broke for some users on June 27.

flickrapi supports HTTPS connections as of version 1.4.4, released on June 18 2014. If you are able to upgrade to a new version of flickrapi, you can get the latest flickrapi version from PyPI and ignore the rest of this post.

However, as of mid-2014, many Linux distros, including Ubuntu 14.04 (supported until 2019), still package flickrapi version 1.2, which cannot connect to Flickr’s API over HTTPS and is therefore now non-functional. Since developers may for various reasons choose to use their distro’s version of python-flickrapi, I’ve written a very very small Python class that overrides flickrapi’s FlickrAPI class to connect to Flickr over HTTPS rather than HTTP, and allows continued use of the Flickr API.

You can download my Python module that allows this: flickrapissl. See the README for usage.