a couple of new R packages: factualR and Segue
Two new R packages of note:
First, I have recently released factualR, which makes it easier for R researches to work with data from Factual.com. If you want to pull Factual data sets into an ever-familiar data.frame then you’re probably interested in factualR.
Second, I have joined the Segue project as a contributor. Segue is described as, “Parallel R In the Cloud, Two lines of code!” That means you get to use Amazon’s Elastic MapReduce as a parallel backend for lapply()-like operations.
(Note that Segue is best for jobs that are computation-bound, not necessarily data-bound. Running a scary Monte Carlo simulation? Testing fifty parameter variations on your wicked timeseries analysis? Give Segue a try!)
Enjoy.
New year, new toy: forqlift
I have a new project.
It’s called forqlift.
This one should be of use to people who crunch data with Hadoop or Mahout.
Here’s a bit of a blurb from forqlift’s page to get you started:
SequenceFiles are nice, but they can be unwieldy at times. I wrote forqlift to make it easier to manage SequenceFiles.
forqlift is a command-line tool that lets you:
- create SequenceFiles from files on your local filesystem (just like creating an archive with tar or zip)
- set compression (none, bzip2, gzip) and value types (text or binary)
- extract the contents of a SequenceFile back to the filesystem
- convert popular archive formats — tar (including tar.bz2 and tar.gz) and zip — to and from SequenceFile format
Head over to the forqlift page for more info!
novi 2.1.1 – Fedora 13 fixes
novi version 2.1.1 downloads.
This is a bugfix release (hence the small rev bump) that should help novi build on Fedora 13, which is bundled with a newer version of the RPM API.
novi v2.1.0 release
There’s a new novi release in the downloads area, v2.1.0.
This version uses the “primary” repo metadata file specified in repodata/repomd.xml. Previous versions assumed the primary data was in repodata/primary.xml.gz, which is not always the case. For example, some repos include the primary metadata file’s SHA1 hash in the filename.
(This update only impacts those who use the repo: designation for their repos.)
systems architecture update
I have updated a page in the systems infrastructure area:
getting started: the what and the why
It explains the concept of a systems architecture and why you really want your shop to implement one. I have added a couple more examples and cleaned up some of the wording.
Enjoy.