Query harvester

Watch our harvester take the drudgery out of research!

We all dip into the Trove Australian newspapers database with a sense of wonder and gratitude. There are such riches to be found. But what if you want to do more than merely dip and pick?

Our query harvester allows you to push beyond the browser, to mine deeper, to extract more. Harvest the details of hundreds, or thousands, or newspaper articles and save them for detailed analysis at your leisure.

Depending on the options you choose, just feed a Trove newspaper query to our harvester and watch in awe as it creates:

  • a CSV file containing all the article metadata, ready to be opened as a spreadsheet, or imported into the database of your choice
  • a zip file conveniently packaging the text contents of all the harvested articles, ready to be feed to a text analysis program
  • a zip file containing pdf versions of all the articles, saved for future reference

Warning: depending on the query, your zip files could get very large!




The new API-powered future

The long-awaited Application Programming Interface (API) to Trove is now available. Huzzah! The frustrations of broken screen-scrapers will soon be a thing of the past.

But this glorious new future will not come without cost. WraggeLabs will henceforth be embarking on a process of code consolidation and renewal as we overhaul all our Trove products to take advantage of this machine-readable goodness.

The first fruits of this process can be enjoyed in the new fully-web-driven version of QueryPic. Instant graphs! No downloads!

More such wonders are to come, but bear with us dear client, as it make take some weeks or months to fully update our wares. In the meantime, it is unlikely that any development of pre-API versions will be undertaken.

We should also announce the imminent retirement of our Unofficial API. It has served us well, but it’s time has now come. Farewell, old friend.


More Trove scraper drama

Once again a minor change to the Trove newspapers code (from <i> and <b> to <em> and <strong>) broke my scraper and the tools that depend on it. Fortunately @erochest was quickly on the job and has submitted a fix. If you’re having problems, please update scrape.py from the repository.

I will update the unofficial API shortly.


Trove Tools calamity

I am sad to report that due to a minor change in the Trove website (a <strong> tag was changed a <h1>!) most of my Trove tools are experiencing difficulties.

The good news is that the fix required is small and I’ve updated the scraper that powers most of the tools.

If you’re using the command line version of the Query Harvester you can download the latest code and be on your way. The Search Summariser wasn’t affected, but there’s a new version available for your perusal including some extra features (these will be documented anon).

The Zotero translator for the newspapers database has been updated and submitted to the Zotero repository. Once it has been approved the translator will be upgraded automatically.

I believe that the Unofficial Trove API is unaffected. If you believe otherwise, please let me know.

And now the bad news…

The Query Harvester GUI will need to be rebuilt. I will try and do this as soon as possible, but I can’t make any promises. Bribery and flattery might help.

My apologies for any discomfort or anxiety. I always say that screen scrapers are inherently fragile. It’s part of the game. But it’s still extremely annoying when something like this happens. :(

Let’s just look forward to an official Trove API which should ease our pain considerably.


Sneak preview – Trove Newspaper Harvester for Windows

Marvel at the wonders of our GUI!

The WraggeLabs foundry has been working long and late to fashion custom-tailored, gui-enabled binaries of our Trove Newspaper Harvester.

We are still testing and tinkering, but could wait no longer to share our excitement with our loyal followers.

Those of the Windows persuasion can now download a beta version. Please poke, prod and put through its paces.

No more grappling with Python or submitting to the tyranny of the command line. WraggeLabs is fighting hard for your right to double-click.

Trackbacks / Pings

Leave a Reply

Your email address will not be published. Required fields are marked *


You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>