Building the semantic web one tag at a time…
This is another WraggeLabs experiment.
I’ll be posting various updates and additional thoughts on my blog at discontents – so you might want to keep an eye on it.
This site has one, modest aim: to build a corner of the semantic web by mobilising an army of determined machine taggers. It developed out of a talk on Linked Data I gave at the NSW Reference and Information Services Group seminar in 2010.
The first objective in this campaign is to see how many photos in Flickr we can tag with party identifiers from the National Library of Australia.
Huh? What? Why? Hopefully all your questions will be answered below.
A machine tag is a tag with added smarts. Don’t get me wrong, tags are great – they enable you to develop fine-grained descriptive systems. But the problem with plain old tags is that they tell you nothing about the relationship between the tag and the thing being tagged. If you tag a Flickr photo with the name ‘John Smith’, are you trying to say that John Smith is in the photo, took the photo, published the photo, or...? There's just no way of telling.
Machine tags use pre-defined vocabularies to include information about these sorts of relationships. If you wanted to say that John Smith was in a photo, you could use the 'describes' property from the FOAF (Friend Of A Friend) vocabulary. So your tag would be – foaf:describes=‘John Smith’.
That's pretty cool, but now the question is, which John Smith are you talking about? To answer that we have to enrich our machine tags further with the hearty goodness of identifiers.
Names are identifiers, but once we start dealing with large groups of people their limitations become clear – names are rarely unique. If we want to be able to uniquely identify people we have to use some other system.
Librarians have been doing this for many years, developing authority files that bring together the various forms of a person's name and assigning them a unique number or code. As these systems become available online, they offer all sorts of exciting possibilities for linking up information about people.
The National Library of Australia's People Australia is one such service, aggregating identity information from a growing number of sources and providing unique identifiers. Here’s the identifier for the opera singer, Nellie Melba:
Once we have this identifier we can use it to create an even smarter machine tag:
Our tag still tells us the nature of the relationship between photo and the person, but now it goes on to tell us that it’s referring to a specific individual who can be uniquely identified via the NLA. That's quite a lot of information to pump into one little tag.
You probably don't get as excited about this stuff as I do. You’re probably wondering what the point is. You’re probably also thinking that while these tags might be smart, they don't look very friendly. Well... they are called machine tags for a reason.
Strangely enough, machine tags are aimed at machines. The whole idea is to expose structured information about things in a way that dumb old computers can understand. Exposing data in these sorts of ways helps us to move from a web of documents to a web of data – towards the semantic web.
The semantic web promises smarter ways of navigating, searching and aggregating data on the web. The technologies themselves are still quite young, but even before we're in a position to harvest fully the bounty of semantics, machine tags like these present us with some useful opportunities.
As well as identifying individuals, the People Australia links provide access to a range of related biographical sources, such as the Australian Dictionary of Biography and the Encyclopaedia of Australian Science. That one little identifier means that we have a way of moving directly between photos on Flickr and related articles in the ADB and elsewhere. Just think of the mash-ups we'll be able to create!
At the moment the Flickr Machine Tag Challenge is limited to tags using NLA party identifiers. This is mainly because this whole obsession started with the Identity Browser I built using the People Australia machine interface.
But non-Australians can still join in. People Australia provides identifiers for all sorts of people and organisations you might not expect:
Of course you might want to use identifiers from other authority services, such as VIAF, Worldcat Identities, or the UK Names Project. If you do, let me know and I’ll add them to the harvester. This is just the beginning!
The same goes for other types of identifiers – you might start using identifiers for Library of Congress subject headings, or links to GeoName ids. I’m more than happy to add these to the challenge if you want to keep score.
I’m hoping that the various museums, libraries and archives that are publishing their photographic collections on Flickr will get behind this challenge – promoting it amongst their users, and encouraging taggers to explore their collections.
It's certainly in their interests to do so. Once the machine tags have been added to a photo, they can be harvested back using the Flickr API and added to the original collection record. Instant biographical linkage! When we've got a few more tags on the scoreboard, I’ll develop some working examples along the lines of my Flickr Context Harvester.
I'm also planning to add RSS feeds for taggers and sources, so repositories could easily display a list of recently tagged photos on their own websites.
I’m glad you asked. It's incredibly easy to join the Flickr Machine Tag Challenge. All you need to do is add a machine tag including a NLA party identifier to Flickr. The FMTC harvester will find it and automatically harvest your details.
How do you add a machine tag? That's also really easy thanks to my Identity Browser – head over to the help section for full details.
This product uses the Flickr API but is not endorsed or certified by Flickr.
Details of machine tags, taggers, photos and sources are harvested via the Flickr API every 24 hours and stored in a local database. If you all go into a machine tagging frenzy or let your competitive instincts get the better of you, I can increase the rate of harvests to give a more up-to-date picture.
As part of the harvest process, People Australia is queried using my Python client library to return a nice human-friendly name.
The site and harvester are all built in Django. I like Django. I keep finding new features that make me very happy. Django is good.