Q Ethan McCallum bio photo

Q Ethan McCallum

Twitter

RSS feed

free paper
cover: Business Models for the Data Ecomony

upcoming book
cover: Making Analytics Work

NOTE: forqlift is written as a standalone JAR. In most cases, you'll need only Java Runtime Environment (JRE) 1.6 to run it.

NOTE FOR WINDOWS: If you're running forqlift under Windows, please make sure you have installed cygwin (specifically, the chmod command). forqlift is otherwise a standalone product.

version 0.9.0, released 2012/12/15

Notes: forqlift 0.9.0 adds the following features:

  • inspect file: determine the types of data and number of records in the SequenceFile
  • support for external data types: out of the box, forqlift supports base Hadoop Writable types, such as Text and BytesWritable. You can now add JARs to the forqlift install to support reading external Writable implementations, such as those from Hive or Mahout. (NOTE: forqlift will read and extract these data types, but it will not write them.)
  • clean up file names on extract: if your record keys are, say, URLs or file paths, forqlift will replace characters unsuitable for filesystems (such as & and /) with underscores (_)

Please review the EXAMPLES.txt in the distribution for more details.

Downloads:

version 0.8.0, released 2011/05/31

Warning: This release is considered alpha-quality. Please use it only if you wish to try out experimental features and provide feedback. The previous release appears more stable.

Notes: forqlift 0.8.0 includes an experimental new feature to interact directly with HDFS. This means that, instead of writing a SequenceFile to a local disk and then pushing it out to your Hadoop cluster, you can write that file to (and read from) HDFS without that intermediate step.

If you're using forqlift on a local Hadoop cluster, this will save you some time and disk space. (If you're shuttling data to and from a remote Hadoop cluster, such as something on Amazon's EC2 or Elastic MapReduce, this feature is likely of little interest. Your best bet is to build the SequenceFile locally and upload it as usual.)

How it works

To enable forqlift's HDFS access, pass the --hadoopconfig flag and point it to a file that defines the fs.default.name property, typically core-site.xml. For example:

forqlift --hadoopconfig=conf/core-site.xml  --file=hdfs:///tmp/foo.seq 
forqlift --hadoopconfig=conf/core-site.xml  hdfs:///tmp/foo.seq 

I admit, this is still a little raw. Over time I hope to polish this up and make it easier to use. Before I do that, though, I'd like to confirm that the core functionality of reading/writing directly to HDFS works. Please drop me a line to say whether it works for you: forqlift-questions at this domain. Many thanks.

Downloads:

version 0.7.1, released 2011/03/21

Notes: This release includes some mild UI enhancements, as well as some backend tweaks.

Downloads:

version 0.7.0, released 2011/01/30

Notes: This release includes a significant performance improvement in the fromarchive and toarchive commands which, respectively, convert a SequenceFile from and to a more common archive format (tar, tar+gz, tar+bz2, zip). If these file conversions were very slow for you in the previous release, please try this one and let me know what you think.

There's also a new --version flag shows, among other things, the version of Hadoop used to build forqlift.

Downloads:

version 0.6, released 2011/01/25

Notes: This release addresses several small code issues. If you ran into a problem using the previous version of forqlift, please give this one a try and let me know how it works out for you.

There are also several adjustments that will be invisible to the end-user, but will pave the way for future plans. Finally, the EXAMPLES.txt file is included.

Downloads:

version 0.5, released 2011/01/01 (initial release)

Notes: The EXAMPLES.txt file didn't make it into this release. Please refer to the forqlift examples page on the website, instead.

Downloads: