The thoughts of a Code Gorilla

September 9, 2009

Moving host…

Filed under: Uncategorized — codegorilla @ 2:17 pm

No, not the blog… a service I run.

It throws up some questions though:

  • do I alter the layout of the installation?
  • do I change names of things?
  • do I take the opportunity to update some of the supporting applications?

Choices, choices…. I’ll let you know how I get on!

September 4, 2009

Its been a while….

Filed under: Uncategorized — codegorilla @ 2:58 pm

So what’s been happening?

Well, apart from motorsports taking over my personal life, I’ve been part of Repository Fringe; built a demonstator for the HILT project; and started work on Open Access Repository Junction.

(and I’ve a passle of things on the back-burner to deal with… including moving an entire service to a new host….)

April 18, 2009

Humans need humans to be human?

Filed under: Uncategorized — codegorilla @ 5:32 am

There was a story, back on Tuesday, about research which shows that people who socialise on-line are less (“lazy, self-deluding thickies”, “a confused your moral compass”)

The interesting part here is not the story, but a parallel comment I’ve found:

Individuals aren’t naturally paid-up members of the human race, except biologically. They need to be bounced around by the Brownian motion of society, which is a mechanism by which human beings constantly remind one another that they are … well … human beings.

Where does this wonderful quote come from?

Terry Pratchett, in his book “Men at Arms”. Published in 1993 – so before the Internet, well before “Social Networks”

March 24, 2009

Quick code, slow code

Filed under: Uncategorized — codegorilla @ 11:47 am

Some work is very quick: a rapid evolution of development and testing. Bugs are quickly found, and systems can be tweeked to refine & extend the code to produce a more polished result.

On the other hand, some work is really slow: the coding is quick, but it needs to be tested against a large dataset… and that takes ages to run.

Guess what I’m doing now? :(

January 29, 2009

Who’s the Daddy?

Filed under: Uncategorized — codegorilla @ 9:33 am

My auto-update worked!

… Not that I expected it not to, but last night was the first time I’d actually had it run, on it’s own, in a live service!

  1. It made the FTP connection & found the new file
  2. It munged the downloaded data into an update file
  3. It copied the current database and updated the new version
  4. It switched the service to the new database
  5. It updated the news ticker on the login page

<Does the happy dance />

November 28, 2008

The good old 80/20 rule

Filed under: Uncategorized — codegorilla @ 12:38 pm

You gotta love it…. it turns up everywhere:

The last 20% of getting a service just right for launch will take 80% of the allocated time.

Yes folks – I’m in the tidy-up / fettle / tweek phase, and they are all nasty wee fiddly bits: something doesn’t work quite right in IE; a button needs to be moved over just so; words used in help pages; etc…

September 11, 2008

A Deposit tool, my thoughts

Filed under: Uncategorized — codegorilla @ 8:46 am

At the Understanding Organisational Cultures workshop, I got talking to Andrew McGregor (JISC Program manager), and it came out that he has funded someone to look at biulding an iGoogle deposit tool.

I too have been thinking about this idea, and I have the deposit side mapped out, but the “social” side of it is still flakey… what ideas I do have are dependent on technologies/services that don’t exist yet.

Facebook OpenSocial deposit application

The basic premiss for this idea is that there is a recognised problem with getting deposits.
Yes, there is a problem with getting metadata; Yes there is a problem with sharing, duplication, and so forth…. but there is also a fundamental problem in getting authors to deposit.
Roughly speaking, less that 20% of research output makes its way into an IR; and less that 20% of that is author-deposited. This means that just 4% of the total research output is author-deposited. A problem indeed!

There are a number of ways that this can be approached, but one very strong contender is to “bring the deposit process into the users workflow”. There are many ways to do this. This idea is to use the realm of social networks to make the task of depositing easier

Technology.

I have opted for OpenSocial over FaceBook, as OpenSocial applications plug into over 75 different web environments (including iGoogle & Bebo), whereas Facebook apps work in…. just Facebook!
(Having said that, the changes needed to get something to written for the OpenSocial API to work with the FaceBook API is, apparently, minimal)

The basic build-process for any application within the OpenSocial framework is twofold:

  1. There is a small app that displays a summary of “social information” gathered locally (eg from Bebo users in a friends-list)
  2. There is a main application that runs on the providers servers, and renders onto the OpenSocial canvas space.

The scope of this suggestion is not to determine how an application can be used socially, but simply to provide an application that will enable people to deposit from a familiar place (ie bebo, iGoogle, or Orkut).

To mis-quote Rufus Pollack “The coolest thing to do with your code will be thought of by someone else”.

Create the OpenSocial Deposit Tool, knowing that someone else will come up with other ways to use it.

Socially
I know that (data-)librarians have a bee about metadata, but when it comes down to it, user-supplied metadata is actually not a huge issue:

  1. Can we trust the academic to get the metadata right (especially when it is recognised that professional cataloguers are not 100%)
  2. Pretty much every IR out there is mediated – deposits are reviewed, amended, catalogued, and then finally approved.
A rough drawing of the OpenSocial deposit app

A rough drawing of the OpenSocial deposit app

Rough version

This is the basic version, one up from a proof-of-concept. It is simply a sideways step from the EM-LOADER project.
In this version, the target repository is hard-wired to the Depot, and the users name and password for SWORD depositing are held in the users preferences.

This will prove that it can be done, but does not help real Institutional Repositories.

Neater version

Here we make the tool actually useful:

  1. Users can specify a preferred repository (good for when they are away from their institution)
  2. We use Repository junction to provide a list of local repositories (good as it makes the app almost self-configuring)
  3. We allow multiple deposits (notice checkboxes, not radio buttons)

Really really cool version

The really really cool version will do things like query PubMed (& co) for metadata; use tools such as content parsing to extract metadata; use Intute to see if the record already exists in Repository Land; use the Author Authority service for author names; and possibly handle updating records, not just depositing records.

The Social side

I’ve not really thought this one through much – one could pull in an RSS feed from your preferred repository, or maybe merge the feeds from the repositories found by Repository Junction.
Can you somehow link OpenSocial accounts with repository user IDs? (I don’t think so: too many people devolving authentication to their Institutional Authentication system)

September 10, 2008

Understanding Organisational Cultures, the workshop

Filed under: Uncategorized — codegorilla @ 3:17 pm

This turned out to be a really interesting event, and well worth the trip down!

We started with two talks: one from Dr Colin Macduff (Robert Gordon Uni.) talking about his experience in submitting his eThesis, and how it changed the way he approached the whole thesis; and a second from Dr Bruce Jefferson (Cranfield Uni) taking the role of the sceptic, and pointing out all the things that he wants, and why repositories are not helping him.

We also had a quick overview of the national picture from Neil jacobs, and three talks on ways to make changes, or recognising the opportunities for change to happen: Michael White (Stirling Uni); John Harrington (Cranfield); William Nixon (Uni of Glasgow).

What was particularly gratifying to see was that three of the six speakers were from Scottish Universities – are we ahead of the curve in this field?

Colin Macduff – The Convert

This was the story of his PhD. He started his PhD after many years as a member of staff, and he started his PhD as a classic piece of work: lots of text; several diagrams; plenty of references; all designed to be bound into that classic black hardback book, where it sits on the shelf and gathers dust.

With the new Robert Gordon eThesis program, he switched to producing an ETD document, which allowed him to include an additional errata and a feedback form, amongst other options.

Statistically, his eThesis has been visited over 2,000 times, and downloaded over 1,400. Comparing this to the number of requests to the RCN’s library of thesis, where there are less that 300 requests IN TOTAL for an entire year.

Never, he pointed out to chuckles from the audience, underestimate the vanity of academics: if their work is read more, it makes them feel more important.

Bruce Jefferson – The Sceptic

Bruce was quick to point out, at the outset, that he was playing a role: he was not necessarily taking his personal stance, but caricaturing himself.

So, he asked, what does an academic need to progress his/her chosen career?

  • Papers
  • Grants
  • Students

Papers are the indication of work output; Grants are the funds to get paid; and students supply the shortfall between the two for the institution to stay in business. All universities work to this: some are research led; others are teaching led; all are a balance between the two.

What do academic want?

  • Power
  • Cash
  • Fame

Power is wielded in Senate courts, on advisory panels, and in commercial boardrooms. Cash gives the comfortable lifestyle, the bigger/better/fancier toys to playresearch with, the sharper suits, the ability to attend more conferences. Fame is when people pay you to go to conferences, or pay you for consultation work.

The dictat at Cranfield, and probably accepted as an unwritten guide the world over, is to produce “the best possible article, published in the best possible journal”. Bruce did, however, also admit that less than 30% of the published output of the School of Management at Cranfield was Journal Article work.

Bruce then went on to talk about the REF, and how that defined the “worth” of a researcher: REF is (currently) The count of papers in Web of Science, divided by some citation factor (‘cos some subject areas are not as prolific as others) [c/cf]. This means that

  • If it’s not a journal paper, it doesn’t count
  • If it’s in a journal that Web of Science does not collate, it doesn’t count

This means that he is not interested in anything other than peer-reviewed articles from a specific set of journals.

Then there are other, less tangible, issues that relate specifically to these Institutional Repositories:

  • “All we do is fill in databases.” They usually have the same data as some other databases, so why am I doing all this duplicate work?
  • A researcher will do anything to get published: sign over copyright, rework the text, sell their first-born (ok, so maybe not the last one) – if the publisher even so much as hints at “No pre-publication”, it ain’t going into a repository
  • Where is the actual, physical, statistical, evidence that an IR improves actual citation

Basically, he points out, researchers have no problems with things that raise their profile (this improves “Fame”), however the question is one of work outlay for the return (in terms of “cash” and/or “power”)

Also, he admitted somewhat ruefully, academics are slow to change: they will keep to their known, and familiar, processes for as long as possible, which means the “Google Generation” will be discouraged for as long as possible.

Bruce closed with an experiment he is proposing: He is going to take all of the REF applicable articles, and split them into two groups, chosen to be of equal “weight” for Author(s) and Journal. One half will be deposited into the Cranfield Repository and the others will remain a control set. What he wants to determine is if the IR actually improves the citations.

Neil Jacobs – the national picture

Neil gave is a quick overview of the national picture: Acknowledge the existance of supra repsoitories (aXive, Uk PubMedCentral, etc), the broad spectrum of Intitutional Repositores (almost 80, with the Depot as a back-stop), and a range of middle-ware services (OpenDOAR, ROAR, Intute Search, etc)

JISC has been plowing a lot of money into “mashup” projects, spending money on the throw-away projects that may produce that silver bullet.

Like all managers, JISC needs to know if the money they are putting into this area is going to give a return – they need evidence to show that IRs are working – not hyperbole, not evangelism. Evidence.

Breakout session 1

The morning breakout session for the group I was in has two questions: “Can you identify a prevailing culture at your institution” and “To want extent does this determine the behaviour of your users and the development of the repository”.

We had quite a wide-ranging discussion, but some of the highlights for me came down to

  • If the repository is seen as a library tool, the academics don’t see the benefit of using it
  • If the ideas of Open Access are not promoted, then academics won’t see the benefit
  • If the repository is not marketed, then the academics won’t know about it
  • If the repository is not resourced, then it will die away as other work takes precidence

Lunch

Excellent lunch. Meetings and the like should be ranked by the lunch: this one scores highly!

Michael White

Michael White talked about the repository at Stirling: where it came from; what the official position is; and how they process items deposited.

Stirling started their repository in 2002, but it took the Scottish Declaration on Open Access (‘We believe that the interests of Scotland will be best served by the rapid adoption of open access to scientific and research literature.’) to make the senior managers of the University sit up and take an active interest in an Institutional Repository.

Stirling have mandated that all Thesis have to be deposited electronically as well as in paper form (they cannot submit the paper copy until the electronic copy has been accepted!), and this has applied to all students from September 2006.

Stirling also have an ePrint mandate, from September 2008 (but backdated 20 January 2007), which required the authors Final Accepted Draft. The academic deposits the binary, and a minimal set of data, and then the repository staff process the from: checking RoMEO; correcting the names for Stirling authors; adding DOIs and LCSH classification; etc. In terms of numbers, the Stirling repository had just 20 items author-deposited in the two years before mandate (with the rest being located and added by the library) whereas in the three months after mandate, they have had 200 author-deposits. Clearly the mandate, and the enthusiasm from the top of the university, has had an effect.

Michael estimates they have ~730 papers per year, so need 56 days (FTE) to process that – but things may change with the REF.

John Harrington

John spoke about the review they did of CERES (the Cranfield repository), where they looked at why the takeup of their repository was so low.

There were a number of factors:

  • There was a low awareness of the repository in general (there had been no Launch, for example),
  • A lack of dissemination,
  • Confusion over copyright and ownership issues,
  • Simply being too busy to devote the time to depositing, and Bruce pointed out
  • The journal is king.

They decided to have a three-fold approach to their advocacy:

  • A top-down re-branding and re-launch, with a formal presentation and dignitaries on hand.
  • A bottom-up exercise in building community support and finding champions or evangelists
  • A way of keeping CERES in the media – doing more than just emailing people.

There were a number of lessons that were brought sharply into focus:

  • The message must be clear, distinct, and succinct
  • The advocacy must be sustainable, and sustained
  • You must deliver on your promises, and your procedures must be backed up by effective systems
  • NEVER underestimate the personal touch (leave the comfort of your office and go visit people)

William Nixon

In many ways, William’s story is very similar to Michaels: Glasgow set up their repository in 2002; they have strong Senior Management support; they have a strong advocacy team; and they have the Scottish Open Access Declaration.

Where they differ is in the smaller details: Glasgow have positioned their repository as their publications database; they have capitalised on their RAE work and got support across the university; and (as one of the 22 pilot institutions) they have tied their REF assessment into their repository.

Glasgow have a very strong support team with their repository, and have staff time dedicated to the service. They offer (and most of their work is) based on the researchers file being emailed to the library and the repository staff doing all the work. Pre mandate, the library processed about 20 such requests over a 3-year period, but that number has risen post tenfold in a single year.

Breakout session 2

“Change is inevitable

  • What are the possible implications for institutions faced with having to manage the inevitable
  • What are the threats and the opportunities
  • How would you position the repository to derive optimum benefit from these changes

Again, we had a free-flowing discussion around this area, and flagged several issues.

My personal stance here is one that was reflected by Repository Fringe ‘08: The current technology is a library led solution; that focuses on a single aspect of scholarly output; and is an isolated silo of information.

We need to share information; we need a common Authoritive Authors database; we need be able to query intute for existing titles ensure duplicates are duplicates; we need to accept there will be duplicates, and share the data; we need to reduce the overheads for getting a deposit into the “repository”

Why can the respository be just the public information in a publications database? Why can the publications database tie into the universities M.I.S.? Grant numbers, Principle Investigators, associated researchers?

Why can’t we collect the relevant metadata at the time it’s known, not several years after the research was done?

We have to have gone through the current IR process to learn what works and what doesn’t work. We know what we want to do, but haven’t been able to articulate or define what is really quite a complex interwoven problem clearly, and JISC was right to support those who are trying to the problem, and to frame a coherent definition.

There is probably no simple solution for such a complex problem, but if research was easy everyone would do it.

Footnote

I am heartened: Stirling and Glasgow universities have said that in the couple of years between launch and mandate, they had about 20 author-deposits into their repositories, with the rest coming via library staff finding and adding themselves. The depot has 21 items in it, with basically zero advocacy.

September 1, 2008

Collectives…

Filed under: Uncategorized — codegorilla @ 8:40 am

We all know that collectives of things have names: A herd of cattle; a flock of sheep; a murder of crows; a troop of monkeys.

What is a collection of programmers called?

A string? a word? a class?

We reckon it should be a skive of programmers

Anyone got a better suggestion?

Blog at WordPress.com.