On Tuesday I headed off for an interesting day in York and a bit of a natter with the ADS gang. The idea was to bounce around some ideas for collaboration with Antiquist, as well as get their angle on the future of archaeological data repositories and much else besides. The thoughts below are my take on some topics arising from the discussion (mainly with Stuart Jeffrey and Julian Richards, though with fleeting conversations with most of the rest of the team as well).
One of the things that most people seem agreed upon these days is that a distributed model for data sharing is a Good Thing, but that it’s not so easy to accomplish in practice. Whilst an important debate as to the legal and ethical aspects of making this data available continues to unfold, one of the other pieces of the jigsaw revolves around agreeing on data standards, and what ought to be done if organisations can’t (or won’t) comply with them. A few developments seem to be taking this closer to reality however, and the view from ADS is that we may be closer than we think. The first factor, which I touched on in my last post, is that a number of repositories are beginning to expose their data as web services, albeit privately, and these may begin to emerge as de facto standards. This is not a fact that will please everyone. Legitimate concerns about transparency need to be raised, and, prior to seeing the documentation which is currently lacking, it’s hard to know how viable they are for general uptake, although Stuart seems positive that they are generally based on standards advocated by e.g., FISH. Markup Languages like KML have also demonstrated that it is possible for useful standards to emerge from organisations who have developed them for their own needs. The real issue lies in whether those organisations will release them for open management by the community or whether they will want to maintain control of them.
This relates heavily to the second aspect of the problem – for a distributed network of repositories to function, they all have to sing from the same songsheet, and the idea of using HEIRNET as a registry of web services would be predicated on that fact. In some ways these next few months could be interesting as we begin to move away from a situation in which there are no immediate solutions to providing archaeological web services to one in in which there could be several potential candidates. As is frequently the case with such things, the one most likely to gain acceptance is the one which gains the most adherents early on. I, for one, will be voicing support for the most open and community-driven. With that in mind, I look forward to seeing a few more specifications popping up in the weeks ahead. Let battle commence. 🙂
Stuart gave me a demo of the ArchaeoBrowser which I liked a lot (and a nod to Stewart Waller for the nice interface) but also provides a cautionary tale about using proprietary software solutions. Effectively it’s a browsing tool for archaeological entities across the entire UK drawn from a large number of HEIRs. He was at pains to point out that the dataset is not yet perfect, but it contains a million+ records and uses faceted classification in order to give an ultra-quick search result which can be done either spatially or semantically. The system has its drawbacks – the indexing has be done on the entire aggregated dataset so effectively it has to copy their data and hold it centrally. The final results ultimately point the user back to a URL hosted at the original HEIR, there is potential for broken links and there’s no live updating (a problem I’m familiar with from the VLMA). On the other hand, the ability to cross search and retrieve heterogeneous data in a common format using a common schema is really cool. But there’s a final twist. ADUIRI, the company who created the groovy but proprietary indexing software have ceased to exist. That means there’s some serious work to be done before any new information can be introduced to the system, if at all.
York have also just received a grant from the recent AHRC-JISC-EPSRC funding round to undertake their Archaeotools project which will look at further methods for data mining an ever increasing mountain of material. By using Natural Language Processing to harvest data from grey literature, it could revolutionise our understanding of what’s ‘out there’. And I’m glad to hear they’re committed to using OS solutions this time round 🙂
The Great Antiquist Jamboree/CodeCamp/SwapShop/Thing
One of the ideas we knocked about was something I’ve also discussed with Mark Lake, Dave Wheatley & Graeme Earl – namely to get the Antiquist CodeCamp (OK, so we do really need a new name for it) off the ground. The plan as such is to have an annual event over several days in which Master’s students from the various Arch/IT courses can attend workshops run by professional practitioners and tackle pet projects with help from their peers. As well as providing skills which would be of immediate benefit to the students, it would also be a practical exercise in collaboration and a great opportunity for them to network. There may be some work to do in getting the university beancounters to see the benefit in all this, but if the students get on board as well then the logistical problems will hopefully take care of themselves (yes, I know that’s naïve, but sometimes you just have to think positive).
Lastly, one comment from Judith Winters, the editor of Internet Archaeology, really fired my imagination. Mike Charno gave us a demo of an integrated IA article and ADS dataset which were an examplar of the LEAP project at CAA UK earlier this year. I found the ability to have an article which actually has the complete supporting data embedded within it revolutionary, so I was delighted to hear that once the practicalities of this kind of integration have been ironed out they hope to start partnering with folks outside York/ADS on similar projects (cf. StORe).
Now where did I put that Crossbones report…?