wikimedia
Hackdiary: Wikihacking in Brighton
This weekend, I went down to Brighton for the MediaWiki Hackathon. I’m not a MediaWiki developer, and I only really do PHP when I need to. I do have access to the Toolserver which is a live, read-only mirror of the databases that power the Wikimedia projects. This is very useful: if I need to, I can log in via SSH, type in sql -r enwiki_p and run SQL queries against Wikipedia.
So, what did I build at the Hackathon?
Not as much as I’d like. The four regular MediaWiki developers there smashed lots of bugs1. Me? I worked on a few different things.
Open Plaques
I helped Jez from Open Plaques with the MediaWiki API, specifically Open Plaques can start using Wikimedia Commons to host images of plaques. A while back, I started pushing at Open Plaques to have Commons as an alternative file host to Flickr. Currently, Open Plaques recommends that you take a photo, CC license it, then post it on Flickr with a machine tag, and then Open Plaques pulls it in, and Flickr links to Open Plaques.
But Open Plaques could also support Commons in three ways:
- By making it easy to add images from Commons.
- We might be able to export all the license compatible files from Flickr to Commons, but with added metadata from Open Plaques (OP data is public domain although the photos are under various CC licenses, not all of which are Commons compatible).
- Using Commons as a file hosting back-end: when people come to Open Plaques, they could upload photos directly to Open Plaques, and we’d then push them straight on to Commons and use that as a file host. They’d obviously have to agree to the relevant license and so on.
The first thing is a fairly straightforward one: being able to simply provide a Commons URL and then extract image metadata and the path to the image. That’s relatively easy, and it looks like that might be something we can add.
Scala
After Jez left, I tried to work out what to do next. I decided it might be useful to have an API library in my trendier-than-thou functional programming language of choice, Scala. On the bus home, I made a few notes about the general design of such a library. After a bit of noodling with Maven, I actually got to the point where i could type mvn compile.
That I have to play these silly games every time I start a new Scala project is profoundly depressing. I basically either need to stop being a wimp and fully embrace SBT 0.10/0.11, or I need to write myself a new Maven archetype that does Scala properly and has all the libraries I want. And to wrap said archetype in a command line alias called something like “scala-new-project”. Scala is supposed to be a fun, pragmatic and functional (in the sense of not-dysfunctional) language. Choosing between SBT and Maven is basically a choice between choosing a build system designed by hipsters and a build system designed by enterprise people. Whatever choice you make you will regret.
My build still has one major issue: mvn scala:console gives me errors, and JLine still buggers up my shell after I exit the damn console.
Databinder Dispatch is a breath of fresh air. It is intimidating at first to use a library where 70% of it seems to be punctuation, but the design decisions make sense. For using the MediaWiki API, it’s actually very easy: you can basically do each different thing as a series of layered objects. Firstly, you construct a Request object that points to the API endpoint (MediaWiki isn’t a RESTful API), then you use the <:< pseudo-operator to add all the relevant headers (specifically User-Agent), and then you can do it again to add in the authentication token (cookies rather than OAuth, but the principle is the same). Then, finally, you can simply provide a Map of the query you are sending, and then you supply an inline response handler.
I haven’t yet gotten anywhere particularly interesting with the Scala library, but I’ve got some good ideas. It’s currently in a closed repo. When it sucks less, I’ll release it. That might not be for some time, sadly.
VMs, vagrants and maintenance/dev/
Over the weekend, I mentioned to one of the experienced MediaWiki developers about one of the things that puts me off MediaWiki hacking: dependencies. Hacking on MediaWiki generally means having a working MySQL install and a working Apache install, and so on. This can be a giant pain-in-the-ass on OS X, as the bazillions of tutorials on how to set them all up in the right way has shown. There’s a reason why people are using things like Vagrant and other VM based systems. The same sort of quasi-VM type strategy seems to be happening with RVM, the Ruby Version Manager, which the Ruby community nicked from Python’s virtualenv. Obviously, if you are in Javaland, you are running in a VM… the JVM.
But PHP is a bit more of a pain: this is the downside of being built around the very reasonable use case of “I want to be able to FTP it onto my server and have it work”. What you lose in the development stage, you more than make up for in deployment.
At this point, someone pointed me to this discussion on wikitech-l.
The current bleeding edge MediaWiki basically has a script/server, and a built-in SQLite. It’s almost like… Rails! You simply run maintenance/dev/install.php and it downloads PHP 5.4, and sets you up a development version of MediaWiki that uses PHP 5.4’s built in web server. You can then just run maintenance/dev/start.sh and it’ll boot up on port 4881.
There was one slight hiccup: the PHP 5.4 build script didn’t like the fact that i had a space in one of the names of a parent directory to the SVN checkout of MediaWiki and borked. I quickly renamed it to remove the space and it compiled fine. Once the initial compile was done, I can now boot up a new, clean MediaWiki install in a few seconds. This makes it dramatically easier to start hacking on MediaWiki.
And now.
Although I didn’t do any MediaWiki hacking over the weekend, I’ve just assigned to myself an issue. Lemme see if I can get some of my code running on Wikipedia…
How to hide the fundraising banner on Wikipedia
It’s that time of the year again: Wikimedia is raising money. It’d be really great if you could donate: donate.wikimedia.org
Many people express concern about the funding banners, so I should explain purely in my personal capacity what you can do about them if you don’t like seeing them.
There are three ways you can stop seeing the fundraising banners:
- Click the ‘X’ button in the top-right hand corner of the screen. This will remove the banner on the computer you are looking on it now until the next fundraising period. Given that it takes one click to stop the banner from appearing, I do find all the people moaning on Twitter about how “Jimmy keeps staring at me” amusing but off-base. One click and you won’t see the banner again for quite a while.
- If you are logged in, you can set an option in your Wikipedia account to filter the banners. Simply go to Special:Preferences then click “Gadgets”, then find the entry on the list that says “Styling to hide interface on isolated pages for ongoing WMF fundraiser 2011 test.” Click it, then scroll down and click ‘Save’.
- Here’s the solution of last resort: install this userscript. I wrote it last year, and it removes the fundraising banner from all of the Wikimedia sites. I have it only because I now have accounts on 147 different Wikimedia project sites thanks to single-user login, and it’s a lot easier than clicking the ‘X’ on all those different sites.
As I have been answering e-mails related to the fundraiser, I have a simple compromise: I have been filtering the banners on English Wikipedia but keeping them on Simple English Wikipedia. This way, if I get a question about the banners or the fundraiser generally, I can go to Simple and see them.
If you use Wikipedia enough that the presence of the banners annoys you and you choose to filter them out, be sure to go and donate.
Opt-in image filter: enabling censorware?
I’ve been meaning to write a long and detailed assessment of the opt-in image filter debate currently raging in Wikimedia circles. There’s lots of power plays going on and not a lot of good faith.
I’ve been watching from the sidelines: I’m fairly agnostic about the whole thing. My only contributions thus far have been to challenge what I see as bad arguments. If I were pushed, I’d say I’m mildly in favour of the proposal, but if it doesn’t happen, I won’t be too crestfallen. As I said, my primary interest is in the quality of the arguments (there’s a reason I’m a philosophy graduate student…).
One argument I’ve heard over and over again runs something like this…
We shouldn’t have an image filter as the categorisation system that comes with it would enable others to filter Wikipedia.
Basically, to enable an opt-in image filter, we’d build up categories and schemata for the opt-in filter which could then be reused by others who want to prevent others from having access to Wikipedia. It’s basically a utilitarian objection: rather than objecting to the principle of the filter, it is an objection to the probable knock-on effects that creating the filter would have on others.
There’s nothing wrong with the logical structure of the argument, but I do have some very strong doubts as to whether you should accept the conclusions.
First of all, a slight philosophical objection. The argument could go for a lot of reuse. People who create photos or music or anything else and license it as public domain or as CC BY or BY SA run the risk that someone they don’t like ends up using “their” content. I wouldn’t be too pleased if I found that one of the articles I’d written for Wikinews or one of the photos I’d put on Commons turned up on websites affiliated with, say, the British National Party. But that’s a risk I run from licensing stuff freely. I willingly take that risk because the benefit of having things like Wikipedia far outweigh the downside of having politically disagreeable, well, dicks reusing the content. It’s the same risk we take with open source: what if some big weapons giant starts using your code to power their weapons systems? What if someone takes your open source wiki system and starts something as profoundly stupid and anti-intellectual as Conservapedia? The answer: well, that sucks. I don’t see why the same answer shouldn’t apply to this kind of objection.
English Wikipedia already has the “bad image list”: a list of shocking images that can only be included in the article it is listed for on the list. If you want to use it elsewhere, an admin has to update the list. It’s basically to prevent that delightful image “Autofellatio6.jpg” from being inserted into My Little Pony articles and other amusing bits of vandalism. Does the bad image list enable censorware? Yes. But it has kind of an important and useful function: preventing vandalism. Similarly, the doctrine of double effect can be called into play here: yes, we may be building up a list of categories that could be reused by censorware sellers, but that’s not our primary intention.
Anyway, the major objection is a lot more major than this charge of inconsistency. The major objection is simply that the sort of filtering is different enough for it not to matter.
The nature of an opt-in image filter is very different from a filter that isn’t opt-in.
Imagine you wanted to build Net Nanny or one of the other brands of what Wikipedia calls content-control software. The goal is simple: there’s a bunch of bad, evil, no good content out there on the Wild West Web that you want to prevent little Bobby in Missouri (or Manchester or Minsk or Matsumoto or Mangaung) from getting to. Some of it is images, some of it is particular websites, some of it is specific pages, whatever. You build up a big old list of URLs and other factors such that you can give the system a URL and for a given set of categories, it can say yes or no. If you get it right, little Bobby doesn’t have to see Tubgirl or 1man1jar… ever.
But if you get it wrong, there are problems. If you have false positives, that’s fairly bad. People laugh at you for your false positives. They make snarky blog posts saying “har har har, you can’t look up ‘Same-sex marriage’ on Wikipedia because this shitty censorware thinks the word ‘sex’ means it must be pornography”. And, yes, that’s a real example: my university has (or at least did a few years ago, things may have changed) a censorware system that blocked the article on English Wikipedia for same-sex marriage. Or you’ll get censorware that blocks websites about breast cancer or testicular cancer because, well, breasts and testicles are naughty. And if you are a government that implements it on publicly-accessible wifi hotspots in places like libraries and airports, you may get angry civil libertarian types laughing at you on BoingBoing.net… which due to the censorware used by the government, you probably won’t be able to read. So false positives are bad for your public image: a few are okay, but go too far and you end up being the corporate equivalent of the prudish philistine who tries to put some boxer shorts on Michaelangelo’s David because why would that nice ninja from Teenage Mutant NinjaHero Turtles be spending his time making nude sculptures of the important moral figures of our Judeo-Christian heritage rather than fighting crime!?
But false negatives are much, much worse for the censorware makers. Because once a false negative sneaks through the censorware, it’s game over. If little Bobby does see Tubgirl, he need only copy it onto a USB flash drive and stow that away somewhere his parents can’t find, with a filename like “homework”. And maybe little Bobby will share said picture with his friends at school. And maybe in return one of his classmates who doesn’t have prudish parents who install software with Orwellian names like ‘Net Nanny’ will download him some better pornography and some lovely images of self-inflicted chainsaw suicides or whatever it is the kids are into this week. And, as I said, it’s game over. Once you peek behind the veil of censorship, all those things you wanted to keep little Bobby away from start finding their way in. First it’ll be sex with farm animals, and then the Communist Manifesto, then he’ll want to go to college, and then he’ll want to edit Wikipedia! Pass the smelling salts!
So, if you wanna make censorware, it’s gotta be pretty damn strict. And you’ve also got to keep the false negatives down for PR purposes because otherwise snarky people will relentlessly mock you. Oh, and you’ve got to keep your lists secret because this is capitalism and competition requires secrecy. And if you leak the list, people will start poking around on those websites.
Making an image filter is a lot simpler then because the requirements are different. The goal of the proposed image filter (and a large number of different variants on the same theme one could conjure up to answer different objections) isn’t to prevent access at all. It’s to enable individuals to opt-out of displaying some images. It doesn’t need to be a comprehensive list of all things that meet the criteria for potentially controversial, nor does it need to work as hard as the censorware manufacturers to keep false positives low. If I decide that I don’t want to see bums and willies and boobies and so on (because I’m at work or on the train or in a public library or whatever), it doesn’t actually matter to me much if the filter isn’t 100% comprehensive. It isn’t trying to stop me from seeing any images of a particular class, it’s just giving me the option to view them or not. If one slips through (a false negative), it’s not game over either. It just means that the filter wasn’t as good as it could be. Whatever. If, while anti-vandalism patrolling, I get to see 90% less shocking images, great, sign me up. If I get to see 25% less shocking images, great, whatever. Anything better than zero is just fine.
And what about false positives? Okay, I wouldn’t necessarily want 100% false positives, and 50% would be pushing it a bit, but really, if all I have to do is click the image and it pops back in, the cost to a false positive is damn near negligible.
The sort of categorisation system that would flow from this is very different because the costs of inclusion or failure to include are so much different from in the Net Nanny type of case. Could the Net Nannies of the world use the lists and categories that get generated from an opt-in image filter? Sure. But why would they bother: they would still need to go over them to check for false positives and false negatives, because of the costs of both.
Back in 2008, I built something called the nsfw profile. It’s a GRDDL profile for defining certain links as not safe for work. (Don’t worry about what a GRDDL profile is.) The idea of the thing is that you could add nsfw as a class on links and then attach custom behaviour or maybe some kind of nice browser trick that would warn you about the not safe for work link. It didn’t take off because, well, for whatever reasons, but imagine if it did. The whole world started adding descriptive markup to their links so that browsers could work out what links are NSFW and so on. Could you build Net Nanny on top of this? Of course not. Again, false positives would be too high and the false negatives would be even higher.
Now, for the reasons I’ve given, I don’t think that it would be very likely that censorware firms would be very likely to use the resulting categories and lists from the image filter as part of their listings.
But there’s more. Let’s put ourselves back in the position of designing some censorware like Net Nanny. If you wanted to make sure that people could get access to Wikipedia but didn’t get to see, err, Double_penetration.svg, what would you do? Obviously, you can’t block Wikipedia. That’d be stupid. Well, if I were making some censorware, I’d probably just do a recursive category search starting at Category:Human sexuality on enwiki, then I’d hire a bunch of people to poke through each page and mark it as “porn” or “not porn”. Then I’d take the list of all the “porn” pages, scrape each one, work out what images are on there and add those to the list of naughty images. Then I’d ask the MediaWiki API to give me a list of all the inter-wiki links from all those pages to the other language versions. Then I’d scrape those to get a list of all the files they use and add any that aren’t already on the bad images list to that list. I’d pop all the pages on the list too, and then I’d set up a cron job to run once a month to find new images and new pages, run them past our minimum wage porn raters and… there you go, you’ve got a pretty damn good list of the sex stuff you need to filter from Wikipedia to protect the Flanders family from The Simpsons. If you are a repressive regime or a corporate censorware manufacturer, filtering the porn from Wikipedia is the easy bit: it gets a bit harder out there on the rest of the web where there isn’t a volunteer community dutifully sorting pictures into categories with names like Suggestive use of sticking out tongue and Cameltoes.
If you do want to build censorware that finds all the naughty on Wikipedia, the Wikimedia community has done most of the work for you already. You just need a few Python scripts and some minimum wage porn raters (I have a funny feeling that in a recession, there will be plenty of people wanting to get paid to categorise porn).
If we really want to stop censorware companies from reusing a category system for images on the Wikimedia sites, the panopoly of sexual image categories on Wikimedia Commons shows that it might be a bit late for that objection. As with content, we shouldn’t worry too much about how people reuse it, we should worry more about whether we are providing the best service for readers and editors (again, I don’t want to subject my fellow public transport users to some of the stuff I see while anti-vandal patrolling).
Censors may be humourless philistines, but they aren’t total morons. If they want to find the naughty stuff on Wikipedia and block it for their users, they are more than capable of doing so. Worrying about whether they would reuse our filtering categories is a complete red herring. Our non-filtering categories provide what most of what they need already, and I’m betting nobody is going to call for those to be shut down. Objection overruled!
Smartphones: not for most people
The Wikipedia Editor Survey 2011 came out recently. Interesting reading. One particular tidbit:
- 84% of Wikipedia editors have a mobile phone.
- Of mobile phone owners, only 38% have a smartphone.
- 34% of editors read Wikipedia on their phone.
What does this tell us? There’s a lot of dumbphones, feature phones, old phones and just-about-scrapes-by-on-the-‘net phones out there. Most people aren’t using iPhones and Android smartphones… if by definition of ‘people’, you include the whole planet outside of the Western metropolises.
These are editors, not readers.
And really you should be. If you’ve got something to say, put it on the web, make it readable on smartphones and dumbphones, desktops, laptops, Kindles, Googlebots, semantic web agents. Blobs of Objective-C or Java are not a replacement for nice, lightweight, standards-based HTML. There’s a wide world out there beyond latte-drinking hipsters and Tim ‘Nice But Dim’ yuppie business executives on their smartphones.
One of the things I’m proud of about Wikipedia is that the community don’t give priority to rich smartphone owners over people in the developing world with older phones and less reliable connections. Hundreds or thousands of languages and devices, millions of topics, billions of web pages, trillions of bytes every hour but One Web for everyone regardless of browser, device, nation, race, religion, language or ideology. Let’s keep it that way.
Working out where Wikimedia needs more crowds
Yesterday, at GLAMcamp London we discussed a large variety of things. One of them was doing more ‘crowdsourcing’ and ‘gamification’. I don’t like either term much but the general idea works a bit like this.
Currently, a lot of tasks on Wikipedia and other Wikimedia projects are high-intensity. Think about reviewing a featured article. It requires a lot of thinking, and a lot more typing.
Some people today used the term ‘crowdsourcing’ for what we need to work on, but that’s inaccurate: the issue isn’t crowdsourcing but splitting up high-intensity tasks into lots of small modular tasks that can be done without a great deal of prior context or time investment. ‘Crowdsourcing’ applies to non-modular tasks too: think of all the people who ‘crowdsource’ amateur videos. Producing a TV ad isn’t a modular task. The term ‘crowdsourcing’ is so general as to be useless. What we are actually talking about is tasks with a high degree of modularity (aka. low intensity/low commitment) vs. tasks with a low degree of modularity (high intensity/high commitment).1
The low degree of modularity tasks also happen to be the sort of tasks it is easy to do on a mobile device: a smartphone or tablet device like an iPad, iPhone, Blackberry, Android device etc.
There will be some tasks which it will be impossible to turn into ultra-modular tasks. Writing a Good Article (GA) review will be hard to do on a smartphone. In fact, most article writing tasks will probably never get any more modular than it currently is.
Speaking personally, two things stand out as a perfect example of tasks on Wikipedia that are low intensity and highly modular:
- Reviewing Articles for Creation.
- Reviewing edits during the Pending Changes trial.
Both of these tasks are reasonably low intensity: they don’t require a lot of creativity, and can be done without typing. They are the sort of tasks I have done quite successfully on mobile devices.
If we can identify specific places where this kind of modularity can be used, we can build interfaces to help people who aren’t very active on Wikipedia to clear backlogs. Take images lacking descriptions.
Imagine this. After a busy day at work, you pull out a phone on the bus. It downloads a batch of random photos that lack descriptions (from Commons and/or English Wikipedia) and starts displaying them. You swipe to advance to the next one. At the bottom of the screen is a button that says “Identify”. You eventually run across a photo of a cat, tap ‘Identify’, choose what it is from a series of menus, and it places it into a review category with some basic information (‘Cats to check’!) and adds a description to the talk page. Later, an experienced user will check it and copy it into the description.
On English Wikipedia, images needing descriptions has a backlog of over 10,000. Imagine if an iPhone app became reasonably popular and there were a community of 1,000 people doing on average one or two image descriptions a week each. You only need a few hundred a day to be able to start really kicking the backlogs away.
We do some user testing, wrap this in a suitable interface: it may very minimal – getting to see completely random images may be enough of a Pavlovian trick to get a few people to tag some images. Or it may require fancy game mechanics: RPG-style levels, or perhaps a leaderboard. Perhaps just getting out of the way and making it convenient. Perhaps a mixture of things. Whatever. You test it and find out. It’s just like on Wikipedia: some people contribute to get barnstars, some to get high edit counts, some just for the love of the subject, some for the community and so on.
Wikipedia does a great job providing work for high-intensity committed users: there’s a lot more featured articles to write. But there is so much to do that is low-intensity, low-commitment and highly modularised. If the whole community has a think about it, they can undoubtedly come up with tasks that can be done on mobile phones. The specification is clear: set a task that takes no more than three minutes, requires no more than 140 characters of text input and can be done without reading more than one mobile phone screenful of text.
Make it easy and compelling to start doing these low intensity tasks, and it easily builds up into doing more complicated things. Just think about things experienced Wikipedians do frequently with Twinkle. Now imagine that for half the time on-wiki, you aren’t allowed to edit and can only use Twinkle: this is getting closer to the experience for the mobile user.
How does this fit with my views on gamification? Quite easy: I’m reasonably pragmatic. I still think that you will find much more enjoyment in life if you commit to work as part of an engaging community of meaning than you will through gamification. My earlier post is about how individuals act, and more importantly about how individuals should act. I think about things like this in the same way I think about something like needle exchange programmes. Yes, the world would probably suck less if there were less heroin addicts, but while heroin addicts exist, they should be able to get a clean needle, because having a heroin addict getting HIV from a dirty needle is like a real life serious business version of the jwz line about using regular expressions: “Now you have two problems.”
I’m not wild about gamification, but I’m even less wild about having 10,000+ images on Wikipedia lacking descriptions. If one solves the other, or many other similar tasks, I’m willing to do a deal with the devil. I don’t like the concept of social media either (for reasons I haven’t explained in a systematic fashion yet, for which I apologise) but that doesn’t mean I’m going to sit in the corner and refuse to use Twitter or Facebook to make some kind of point. I’m not that pig-headed.
The important thing is that we need to find ways of not only getting over the stagnating number of active editors, but actively jump into trying to design new kinds of ways for contributors to participate in free culture projects. One of the sources of conservatism in the Wikipedia community is this idea that there is a very limited pool of Wikipedians: ten thousand or so on English Wikipedia, with a limited amount of manpower. This trope has turned up in innumerable discussions over proposals, most recently in the pending changes trials. But what is important to note is that we collectively have power to determine how easy it is to participate in the community. And producing the sort of interactions that those goofy buzzwords “gamification” and “crowdsourcing” point very vaguely towards might help and that we should think about them.
p.s. How does this fit into GLAM? Simple. When GLAM ambassadors are thinking about how to engage the community, think about how to engage the low-intensity community as well as the sort of heroic people who churn out five GAs before breakfast. And if something like a ‘simple image description’ interface were built, it should be possible to make that available for GLAM projects.
p.p.s. User:HaeB told me on IRC about research by Luis von Ahn and also at Google. More information at Google Image Labeler and ESP game.
Wikimedia events in the UK next week
Next week there’s lots of Wikimedia-related events going on in the UK.
On Monday night, there’s going to be a meetup in Edinburgh.
On Tuesday night, there’s going to be a meetup in York, the first one in York.
On Wednesday, there’s the National Railway Museum workshop in York.
Then on Friday, in London at the British Library, it is GLAMcamp London where we discuss how Wikimedians and WMUK can work with galleries, libraries, archives and museums (or GLAMs). Hopefully, I’ll be there.
And next Saturday, there’s a meetup in Manchester.
If you are in any of those places at any of those times, please go along. If you can, try and add your name to the wiki. I’ve created Lanyrd events for most of them, so you can also add yourself on there.
Wikimedia UK are doing tons of events: there are events at the Medical Research Council, the Victoria and Albert Museum, the Institute of Physics and at TEDxBristol. And lots more which haven’t been announced yet. It’s exciting but tiring.
Malayalam Wikisource on CD
Here is an interesting thing from the Malayalam Wikimedia community I just saw: Wikisource Offline. Here is what it looks like. You’ll need Malayalam fonts installed to see how it looks. Details of how to install these are on Malayalam Wikipedia.
The Malayalam community has already put out a CD with 500 articles from Wikipedia, but this seems to be the first offline/CD distribution of Wikisource texts.
According to Santhosh Thottingal’s post, the CD contains novels, religious texts, poems and much more. I also read that the Malayalam community are working hard on translating texts from other languages including English into Malayalam. Putting them on CD is a really good way of getting these texts out to schools and the many places that still don’t have Internet access.
Creative Commons + community + decent input support + Unicode + passion = people doing interesting and awesome things in languages you probably haven’t heard of.
xkcd:
Wikipedia trivia: if you take any article, click on the first link in the article text not in parentheses or italics, and then repeat, you will eventually end up at “Philosophy”.
I had to request semi-protection for the XKCD article and get someone blocked for twelve hours because people kept going in and changing the first link to point to ‘philosophy’.
Also, Wikipedia knew about this before xkcd. See Wikipedia:Get to Philosophy.
Custom CSS for Wikipedia on iPad
If you go onto the iOS App Store and search for Wikipedia, you’ll find a wide variety of applications designed to make Wikipedia more readable than it is in the browser. I’ve tried a few but they all have one major problem: you can’t edit. The encyclopedia anyone can edit is not editable if you happen to use software specifically designed for the iPad.
I also don’t like the concept of this software: I don’t believe that you need special software to view specific classes of website. The whole idea of site-specific browsers always seemed strange to me: why do I want to waste disk and memory space making a custom application for one particular site, even if it is one of those vaguely defined “web apps”.
Enter CSS and media queries.
Not a lot of people know this but every registered user on every Wikimedia project including Wikipedia can set up a custom stylesheet as well as a custom JavaScript file. See Help:User style for the details.
Here are a few things I’ve got in my vector.css file with annotations:
So, if I want to edit how Wikipedia looks for me, I can just add stuff to my custom stylesheet, and I can use media queries to target specific devices.
I did a little bit of poking around to find out the media query you use that picks out the iPad and then added it to my vector.css file:
The only thing I’ve added here is changing the look of textareas. For some reason, on the iPad textareas would have all the text in a very small sans-serif font (Helvetica probably). Making it bigger and sticking it in a monospace font makes it easier for a nerdy hacker type like me to edit.
Ah, but that’s okay for insufferable Apple users. What about those enlightened, freedom-loving Android tablet users? Well, you can just stack up media queries for different tablets. I found a media query for the Xoom.
As for the Galaxy Tab, that’s a bit harder. It’s pretty difficult to come up with a media query, but you can use JavaScript…
if ('ontouchstart' in document.documentElement) {
// code here to load a custom CSS file and insert into the DOM
}
You can read more about the interesting issues the Galaxy Tab has here.
Of course, to pre-empt all the “you don’t care about normal people!” stuff, this is definitely for the very small intersection of CSS geeks and the dorkier end of Wikipedians. But there’s an interesting possibility here for the developers at the Wikimedia Foundation and for others trying to work out how to make MediaWiki look great on tablets. You can get the community to help build your mobile version simply by getting them to submit their personal CSS and JS files, combing through them and combining the best bits together.
Update: Steven Walling asked for some screenshots. Here they are:
Here’s what it looks like to edit a page when not logged in:

After I log in, the edit form looks like this instead:

London Wikimedia Academy notes
A few rough notes I made during the London Wikimedia Academy.
“Wikipedia is large enough that it has enough to annoy everyone.”
“There is stuff about Wikipedia that needs to be fixed.”
“Wikipedia suits the general reader well enough.”
“The public service broadcasting model works.” - people giving £20 each etc.
Timothy Garton Ash: Wikipedia is a small NGO (actually, technically WMF is). WMF has to do the same sort of things other NGOs have to do: make strategy, alliance etc. This seems quite important to do now.
“There are too many acronyms”
AGF: if you had to define Wikipedia’s culture, assume good faith. Academia does not do this! In academia, “there are minimum standards”.
The first expert reaction is to “paste an article” on top. But, people are trying to build the encyclopedia.
Transition from academic first reaction to working with the community: “way of the salami”.
Turning a bad Wikipedia article into a good one is like turning lecture notes turn into a book. You have to treat the notes with respect.
The alternative to the expert-driven route but rather show your working, show things that are verified.
Isaac Todhunter:
If he does not believe the statements of his tutor—probably a clergyman of mature knowledge, recognized ability and blameless character—his suspicion is irrational, and
manifests a want of the power of appreciating evidence, a want fatal to his success in that branch of science which he is supposed to be cultivating.
No arguments from authority.
The good expert on Wikipedia: “An expert is someone who knows where to look things up.”
But the person who turns up and starts talking authoritatively is not an expert for WP. When Wikipedia says “experts are unwelcome”, they mean this!
For popular culture and current affairs, Wikipedia can’t get much better: it aggregates the mainstream media. Wikipedia can still be better with academic input.
Rzepa’s talk is here
Alex Stinson
Half of students in the US always or frequently use Wikipedia.
“Students have to dig deeper and explain better” because they have to explain complex topics to a public audience.
Some examples of writing projects involving Wikipedia:
Indiana University: see User:Awadewit/TeachingEssay - Awadewit said it can be used for three things: copy editing (often on DYKs), source analysis, encyclopedia comparisons. It also has three goals: improving thesis-driven essays, basic writing skills and basic research skills.
University of Michigan, autumn 2008, over 100 students and 40 pages over 14 weeks. Many went to B, some to GA class.
This post is released under the Creative Commons Attribution 3.0 licence.
There’s more coverage here
UK government editing Wikipedia
Did you know that the UK Ministry of Defence’s network blocks the ability to edit Wikipedia:
The decision to block write access to Wikipedia via the EGS was taken on 30 November 2007 by staff of the Director General Information (DGInfo) and the Joint Security Co-ordination Centre (JSyCC). There was no compelling business reason to have the facility to update Wikipedia.
Of course there is! So that next year, Tom Scott can do a Ministry of Defence version of The 2010 “Editing Wikipedia From Inside Parliament” Awards.
Tom Scott missed some of the other edits to Wikimedia sites, including:
British Library and Wikisource: copyrights and permissions
I’m writing this post to explain as best as I can the current situation regarding the legal situation over unambigulously out-of-copyright texts in the British Library and the possibility of making public domain reproductions of them for release, say, on sites like Project Gutenberg or Wikisource.
I came to looking into this because of the difficulty of finding sources for the history of Ethical culture, a forerunner of modern day humanist or freethought movements. I had a look on the British Library Integrated Catalogue and found a lot of sources: I put the list up on the Ethical culture talk page. But as a lot of these are out of copyright, it makes sense to put them up on Wikisource.
But here’s the rub: the British Library forbid you from making scans of BL materials. That’s fine. There are problems: basically, for fragile materials, it is right that they insist on their people doing the copying. But there are problems with this: firstly, the cost is quite high. Making copies of BL materials requires payment of a fee that can be quite substantial. And the BL then derive a new copyright from the modified work and require you to pay a license to them to reuse it. Fine if you are a Hollywood producer or someone like that. But if you are trying to collect material that is out-of-copyright to include in an archive like Wikisource or Project Gutenberg, that’s not very useful.
I raised the question on the Wikisource Scriptorium about this: could we potentially make a non-image based copy. That is, someone could request an out-of-copyright item from the British Library, then go into the BL with a laptop and make a verbatim copy of the text and post it on Wikisource. Wikisource have one minor problem with this: they require that another Wikisource user does a proofread of the source. With a digital file of the original text, that’s easy: someone else looks at the DjVu file and compares it with the text copy. But where the original is on a piece of paper in a reading room, that’s not so easy.
But that’s not an issue I’m really that bothered about: that is a solveable issue. You can have two people with reader passes both go to the BL and check a source over before publication.
The copyright issue remains as does another issue: the conditions of use, specifically §25 of the conditions of use.
Copies of Library collections must only be made using Library copying facilities.
I phoned the British Library today to try and resolve this issue. I asked them whether making a verbatim text copy of an out-of-copyright work onto a laptop or into a notebook would result in them claiming a new copyright existed for the copy: they answered that no, this would not be an issue.
Secondly, I asked whether or not doing so would infringe §25 of the Conditions of Use or any other restriction placed on registered Readers. They also said that this would not be an issue. They said that by ‘Copies of Library collections’ they meant photographic/reprographic type copies rather than verbatim text transcriptions.
They did note that when people make such a copy, it would be very useful if they could give attribution to the British Library just in terms of the provenance of the work. For Wikisource, this is not an issue. On the talk page for texts posted on Wikisource, one is asked to provide details of the provenance of the work. I’m in the middle of transferring Evelyn Underhill’s famous book on Mysticism to Wikisource, and it is a public domain work from the Christian Classics Ethereal Library - you can see the attribution here. I assured the guy that experienced Wikimedians are about the most anal people you can find on matters of copyright, licensing and attribution. And, really, if you drew a Venn diagram of Wikimedians, British Library Readers and people willing to spend hours typing up obscure out-of-copyright works, the intersection of the three are probably going to be so nerdy and obsessive that providing comprehensive attribution will not be an issue.
I am not a lawyer, but it doesn’t seem possible for them to assert a copyright interest in a work that has been copied from a public domain text that they give you access to as a reader. I felt it was important to get clarification of the intent of §25.
It would be useful to get clarification of this from a lawyer: does this clarification suffice? Could I – or other Wikimedians, or the Wikimedia Foundation – be liable to a legal challenge from the British Library on the grounds of breaching the terms of use of the library by making verbatim text copies of these materials and publishing them?
Given that I have now made a good-faith effort to work out what the legal situation is, unless I am advised otherwise by either the British Library or by someone with expertise in this area, I plan to go ahead in the near future and start making and releasing public domain text copies of public domain works in the British Library. I encourage others to do likewise unless we hear of good reasons not to.
This post is licensed under a Creative Commons Attribution-ShareAlike 3.0 license.
New home for my photos
I’ve started uploading my photos to Wikimedia Commons. Unlike Flickr, it’s free as in beer, and it’s also free as in freedom. The most restrictive license for work on Commons is Creative Commons Attribution-ShareAlike.
I also trust the Wikimedia Foundation much more than I trust Yahoo at this point with regards to keeping data online. Wikimedia haven’t done something like shut down GeoCities or used the word “sunsetting” to describe Delicious. Yes, there are deletionists, but Wikimedia servers are paid for by the community, and the intention of the Foundation is to “empower and engage people around the world to collect and develop educational content under a free license or in the public domain, and to disseminate it effectively and globally”. Given the financial crisis and all the evil that turned up, I trust that far more than I trust big publicly-traded companies. No offense, Yahooligans.
Anyway, you can find some of my photos on my Commons profile. If you want, you can also subscribe to an RSS feed of my Commons uploads.
There are some photos I take which will be out of the scope of Commons. Sometime in the future, I will find a place to host these myself.