a couple of new R packages: factualR and Segue

Posted January 15th, 2011 by Administrator and filed in Announcements
Comments Off

Two new R packages of note:

First, I have recently released factualR, which makes it easier for R researches to work with data from Factual.com. If you want to pull Factual data sets into an ever-familiar data.frame then you’re probably interested in factualR.

Second, I have joined the Segue project as a contributor. Segue is described as, “Parallel R In the Cloud, Two lines of code!” That means you get to use Amazon’s Elastic MapReduce as a parallel backend for lapply()-like operations.

(Note that Segue is best for jobs that are computation-bound, not necessarily data-bound. Running a scary Monte Carlo simulation? Testing fifty parameter variations on your wicked timeseries analysis? Give Segue a try!)

Enjoy.

New year, new toy: forqlift

Posted January 1st, 2011 by Administrator and filed in Announcements, forqlift
Comments Off

I have a new project.

It’s called forqlift.

This one should be of use to people who crunch data with Hadoop or Mahout.

Here’s a bit of a blurb from forqlift’s page to get you started:

SequenceFiles are nice, but they can be unwieldy at times. I wrote forqlift to make it easier to manage SequenceFiles.

forqlift is a command-line tool that lets you:

  • create SequenceFiles from files on your local filesystem (just like creating an archive with tar or zip)
  • set compression (none, bzip2, gzip) and value types (text or binary)
  • extract the contents of a SequenceFile back to the filesystem
  • convert popular archive formats — tar (including tar.bz2 and tar.gz) and zip — to and from SequenceFile format

Head over to the forqlift page for more info!

novi 2.1.1 – Fedora 13 fixes

Posted September 27th, 2010 by Administrator and filed in novi (HOBbI)
Comments Off

novi version 2.1.1 downloads.

This is a bugfix release (hence the small rev bump) that should help novi build on Fedora 13, which is bundled with a newer version of the RPM API.

novi v2.1.0 release

Posted January 12th, 2010 by Administrator and filed in Announcements, novi (HOBbI)
Comments Off

There’s a new novi release in the downloads area, v2.1.0.

This version uses the “primary” repo metadata file specified in repodata/repomd.xml. Previous versions assumed the primary data was in repodata/primary.xml.gz, which is not always the case. For example, some repos include the primary metadata file’s SHA1 hash in the filename.

(This update only impacts those who use the repo: designation for their repos.)

systems architecture update

Posted June 19th, 2009 by Administrator and filed in systems infrastructure
Comments Off

I have updated a page in the systems infrastructure area:

getting started: the what and the why

It explains the concept of a systems architecture and why you really want your shop to implement one. I have added a couple more examples and cleaned up some of the wording.

Enjoy.