not so quiet …

Posted December 26th, 2011 by Administrator and filed in Announcements
Comments Off

This site has been quiet but there’s a lot going on behind the scenes. Items of note:

  • Parallel R has landed! A great thanks to all who made it possible. Happy reading.
  • I’m planning some software updates, and forqlift is in the top slot.
  • There’s another fun project brewing … more details soon.

new book on the way: Parallel R

Posted July 5th, 2011 by Administrator and filed in Announcements
Comments Off

As promised, I have an announcement:

It’s a book!

Well, more like, a book-to-be. I’ve signed on with the fine folks at O’Reilly to publish Parallel R. It’s all about giving R, everyone’s preferred open-source data analysis tool, a parallel boost. If you’re doing large-scale work with R, then likely you’ll want to read this book. Especially if you’d like to blend R and Hadoop.

This will not be a solo venture: my partner in crime will be none other than Stephen Weston. Even if you don’t know him by name (and really, you should), there’s a good chance you know his work: he wrote the R packages nws, foreach, doSNOW, and doMC.

Look forward to more announcements over time.

news next week

Posted June 30th, 2011 by Administrator and filed in Announcements
Comments Off

I have some pretty cool news to announce. Drop by early next week for the full story.

a couple of new R packages: factualR and Segue

Posted January 15th, 2011 by Administrator and filed in Announcements
Comments Off

Two new R packages of note:

First, I have recently released factualR, which makes it easier for R researches to work with data from Factual.com. If you want to pull Factual data sets into an ever-familiar data.frame then you’re probably interested in factualR.

Second, I have joined the Segue project as a contributor. Segue is described as, “Parallel R In the Cloud, Two lines of code!” That means you get to use Amazon’s Elastic MapReduce as a parallel backend for lapply()-like operations.

(Note that Segue is best for jobs that are computation-bound, not necessarily data-bound. Running a scary Monte Carlo simulation? Testing fifty parameter variations on your wicked timeseries analysis? Give Segue a try!)

Enjoy.

New year, new toy: forqlift

Posted January 1st, 2011 by Administrator and filed in Announcements, forqlift
Comments Off

I have a new project.

It’s called forqlift.

This one should be of use to people who crunch data with Hadoop or Mahout.

Here’s a bit of a blurb from forqlift’s page to get you started:

SequenceFiles are nice, but they can be unwieldy at times. I wrote forqlift to make it easier to manage SequenceFiles.

forqlift is a command-line tool that lets you:

  • create SequenceFiles from files on your local filesystem (just like creating an archive with tar or zip)
  • set compression (none, bzip2, gzip) and value types (text or binary)
  • extract the contents of a SequenceFile back to the filesystem
  • convert popular archive formats — tar (including tar.bz2 and tar.gz) and zip — to and from SequenceFile format

Head over to the forqlift page for more info!