Following from my attendance at the Beyond the PDF workshop in January, I’ve been invited to Cambridge by Peter Murray-Rust. We have two just under weeks to revolutionise scholarly communications. Peter has organized a hackfest for the weekend of March 12th and 13th and the theme is “Scholarly HTML”.
[This is a draft blog post. I'm going to send it out to a couple of lists and invite people to drop by and annotate this document before I publish it to my blog. To comment you need to log in. Any old OpenID will do. If you want to try this out with your own docs, drop me a line and I will share the DropBox with you, then anything you add will show up in the web site].
What’s Scholarly HTML?
I think it’s a way of representing ‘research objects‘ along with associated data and metadata, for the web, so they can be efficiently created, then reviewed, discussed, machine processed, copied and do on.
The goal, in one sentence:
Scholarly HTML is a way to encapsulate research objects in a portable, preservable and sustainable way using simple technologies so that research can be not just on the web but of the web.
What I hope to do at the hackfest is map out some guidelines for what Scholarly HTML might look like and then start working out how to author it, and how to present it. And how to do C21 scholarship on top of it. Open review, machine processing, nanopublications – these things all need an efficient low cost platform. Yes, I will be talking about word processors, and our work on desktop repositories and yes Martin Fenner will be looking at WordPress – but I think it’s really important to think about what we want in an underlying format. Something that can be preserved and exchanged and made to work in different systems.
Me, I think that sometimes the format is hard to see when you are looking at tools like WordPress which are both authoring tool and publishing environment but it is doubly important in that case to make sure that what we’re creating is not locked in to a particular platform. How would you archive a document which depends on eight CMS plugins to work? I’m not saying the folks working on tools now are not thinking about these issues, just emphasising that we will be thinking about it at the workshop.
In scope for the format Scholarly HTML:
-
Articles, theses, reports, lab notebooks, blog posts, which reference associated stuff such as data .
-
Structure and semantics for document contents, like headings, being able to label parts – like “This bit is a Method” where Method is well defined.
-
Ways to represent citations in an unambiguous machine readable way so that readers can have them presented in ways of their own choosing.
-
Techniques for embedding domain semantics, and extensible methods for describing relations between and within objects.
-
Packaging so that research objects can be moved around, saved, posted, etc. This is going beyond HTML, I know, but it’s one of the great unsolved problems of the web. I’ve talked about this before – and Martin Fenner has picked up the Epub ball and taken it for a good run. More soon on this but just as a teaser, I’m now thinking of proposing that a minimal scholarly HTML package might consist of a zip file with at least one file at the root, index.html, which can link other pages if it wants. This is a minimum – the package could ALSO be an ePub and have a formal ORE manifest etc.
Related to the format, but not actually part of Scholarly HTML:
-
Annotations on the above, where annotation is taken in the broadest possible sense. Scholarly HTML needs to worry about anchors for annotation. Annotation can be used for peer and open review, inline discussion at all stages of a research object life cycle, adding formal semantics and lots of other scholarly processes.
-
Tools, tools tools. Word, WordPress and anything else that starts with W, maybe even some things that don’t. I want to be able to work on papers in OpenOffice and post them to collaboration spaced – including WordPress, so I think a lot of the hackfest should be about interop.
-
Authoring tricks and techniques. Stuff like KCite where you can add citations in any-old editor using text [ [cite source='pubmed']17237047[/cite], and my approach of using links to embed document semantics.
-
Browser-side plugins to make declarative Scholarly HTML come alive. There are some problems with browser-side code, particularly when things get in each other’s way, but one of the big pluses is that it can work across more than one system really easily, so you don’t just get a WordPress plugin, you get something that can be added to any system or turned into an app for offline use.
Important properties of scholarly HTML:
-
It’s declarative. That is if we are linking a document to, say, a map then the document will contain:
-
At least, a link to the map, which must be in a standard format. The link will be labelled in some way to say “I am a link to a map”, maybe with further attached semantics.
-
Optionally a static placeholder image for the map so that any old web browser can be used to at least show something.
Systems for rendering Scholarly HTML will know how to interpret the semantics that there is a map, or a molecule on the end of the link and do something useful. A web page containing Javascript tied directly to Google Maps doesn’t qualify – it’s not portable.
As Phillip Lord pointed out in email, if the basic declarative stuff is there then it should be OK to have the script as well. This makes sense – the version of a scholarly HTML document that people see will often have been enlivened by various server-side or browser-side changes.
Another example of where a declarative spec is important is citations. Scholarly HTML will have a way to represent citations – there’s a really useful discussion going on of the issues around this on the wordpress-for-scientists@googlegroups.com list.
-
-
It has a simple structural backbone, using HTML 5 elements to give the basic hierarchy with optionally much more detail. Using the HTML article element, it should be possible to identify the bounds of a work, and separate it from navigation and branding.
Copyright
Peter Sefton, 2011. Licensed under Creative Commons Attribution-Share Alike 2.5 Australia. <http://creativecommons.org/licenses/by-sa/2.5/au/>
This post was written in OpenOffice.org, using templates and tools provided by the Integrated Content Environment project and published to WordPress using The Fascinator.
That’s the best awnesr of all time! JMHO
5JyOQI nmtopsibgzff
gNv5QH , [url=http://aksdlczmtdnq.com/]aksdlczmtdnq[/url], [link=http://kgltlvprkfpg.com/]kgltlvprkfpg[/link], http://ylxznkxdpbwr.com/