forqlift examples

Posted January 1st, 2011 by Administrator
Comments Off

(all examples assume the forqlift command is in your path)

forqlift’s syntax is similar to that of svn and other tools: you specify some action, followed by that action’s flags. For example:

forqlift create [... options for "create" ...]

get help / see options

To see help for all actions, run:

forqlift --help

To see help for a specific action:

forqlift [action] --help

(e.g., forqlift create --help)

create a SequenceFile

Inside the SequenceFile, each record will use the filename for the key and the file’s contents (asa Hadoop BytesWritable type) for the value.

forqlift create --file=/some/file.seq file1 file2 file3 /path/to/file4

create a SequenceFile, text data

This time, the value will be a Hadoop Text type, which means your Mapper and Reducer code can just fetch the contents as a big String. (If the value were still BytesWritable, you would have to first convert the raw bytes to text.)

forqlift create --file=/some/file.seq --data-type=text file1.txt file2.xml file3.txt /path/to/file4.xml

create a SequenceFile, compressed text data

Text tends to compress well. This can lead to big savings on bandwidth and storage, both of which are especially important if you’re on a slow line and/or you use cloud services, such as Amazon’s S3 or Elastic MapReduce.

This time, the value will be a Hadoop Text type, which means your Mapper and Reducer code can just fetch the contents as a big String. (If the value were still BytesWritable, you would have to first convert the raw bytes to text.)

forqlift create --file=/some/file.seq --data-type=text --compress=bzip2 /path/to/*.xml /another/path/*.txt

(NOTE: As of this writing, even though Elastic MapReduce supports bzip2 input files, it does not support bzip2 compression on SequenceFIles. In that case, please use gzip compression.)

list the contents of a SequenceFile

forqlift list /some/file.seq

extract the contents of a SequenceFile

Extract to current directory:
forqlift extract --file=/some/file.seq

Extract to another directory (paths will be created as needed):
forqlift extract --file=/some/file.seq --dir=/another/directory

convert a zip or tar(.bz2, .gz) file to a SequenceFile

(NOTE: This is an experimental feature!)

Note that you can also use the --data-type and --compress options, if need be.

forqlift fromarchive --file=/some/file/seq somefile.tar

You can also squeeze multiple zip or tar files into a single SequenceFile:

forqlift fromarchive --file=/some/file/seq file1.zip file2.tar.bz2 file3.tar.bz file4.tar

convert a SequenceFile into zip or tar format

(NOTE: This is an experimental feature!)

forqlift toarchive --file=/some/file.tar.bz2 file1.seq

or, create one file from several SequenceFiles:

forqlift toarchive --file=/some/file.tar.bz2 file1.seq file2.seq file3.seq

pass flags to forqlift’s jvm (set memory, etc)

Use the FORQLIFT_OPTS environment variable, the value of which gets passed to the JVM:

For example, to set forqlift’s JVM memory (heap size) to 512MB:

export FORQLIFT_OPTS="-Xmx512m"
forqlift .....

get information about forqlift’s version

pass the --version flag to forqlift to see the project version, and also the version of Hadoop used to build forqlift.

forqlift --version

Comments are closed.