forqlift examples
(all examples assume the forqlift command is in your path)
forqlift’s syntax is similar to that of svn and other tools: you specify some action, followed by that action’s flags. For example:
forqlift create [... options for "create" ...]
get help / see options
To see help for all actions, run:
forqlift --help
To see help for a specific action:
forqlift [action] --help
(e.g., forqlift create --help)
create a SequenceFile
Inside the SequenceFile, each record will use the filename for the key and the file’s contents (asa Hadoop BytesWritable type) for the value.
forqlift create --file=/some/file.seq file1 file2 file3 /path/to/file4
create a SequenceFile, text data
This time, the value will be a Hadoop Text type, which means your Mapper and Reducer code can just fetch the contents as a big String. (If the value were still BytesWritable, you would have to first convert the raw bytes to text.)
forqlift create --file=/some/file.seq --data-type=text file1.txt file2.xml file3.txt /path/to/file4.xml
create a SequenceFile, compressed text data
Text tends to compress well. This can lead to big savings on bandwidth and storage, both of which are especially important if you’re on a slow line and/or you use cloud services, such as Amazon’s S3 or Elastic MapReduce.
This time, the value will be a Hadoop Text type, which means your Mapper and Reducer code can just fetch the contents as a big String. (If the value were still BytesWritable, you would have to first convert the raw bytes to text.)
forqlift create --file=/some/file.seq --data-type=text --compress=bzip2 /path/to/*.xml /another/path/*.txt
(NOTE: As of this writing, even though Elastic MapReduce supports bzip2 input files, it does not support bzip2 compression on SequenceFIles. In that case, please use gzip compression.)
list the contents of a SequenceFile
forqlift list /some/file.seq
extract the contents of a SequenceFile
Extract to current directory:
forqlift extract --file=/some/file.seq
Extract to another directory (paths will be created as needed):
forqlift extract --file=/some/file.seq --dir=/another/directory
convert a zip or tar(.bz2, .gz) file to a SequenceFile
(NOTE: This is an experimental feature!)
Note that you can also use the --data-type and --compress options, if need be.
forqlift fromarchive --file=/some/file/seq somefile.tar
You can also squeeze multiple zip or tar files into a single SequenceFile:
forqlift fromarchive --file=/some/file/seq file1.zip file2.tar.bz2 file3.tar.bz file4.tar
convert a SequenceFile into zip or tar format
(NOTE: This is an experimental feature!)
forqlift toarchive --file=/some/file.tar.bz2 file1.seq
or, create one file from several SequenceFiles:
forqlift toarchive --file=/some/file.tar.bz2 file1.seq file2.seq file3.seq
pass flags to forqlift’s jvm (set memory, etc)
Use the FORQLIFT_OPTS environment variable, the value of which gets passed to the JVM:
For example, to set forqlift’s JVM memory (heap size) to 512MB:
export FORQLIFT_OPTS="-Xmx512m" forqlift .....
get information about forqlift’s version
pass the --version flag to forqlift to see the project version, and also the version of Hadoop used to build forqlift.
forqlift --version