Q Ethan McCallum bio photo

Q Ethan McCallum

Twitter

RSS feed

free paper
cover: Business Models for the Data Ecomony

upcoming book
cover: Making Analytics Work

charcuterie: a collection of oddments for Apache Pig


Welcome to charcuterie! This is a collection of tools for Apache Pig.

Right now it's my collection of UDFs (user-defined functions) related to low-level text handling (stemming, tokenizing, term frequency, extracting fields from HTML/XML/email) ... but, as I build tools to enhance whatever I'm working on at the time, it could eventually grow to include pretty much any type of Pig helper.

charcuterie is released under the Apache license, v2.0.

downloads

source bundle (build with Maven)

binary (pre-built JAR)

How do I use it? / What are the included UDFs?

charcuterie isn't an application; it's a collection of Pig UDFs. You can invoke any (or all) of the UDFs from your Pig scripts or during an interactive session in the Grunt shell.

The project ships as an executable JAR, which you can run to get the list of included UDFs and see some examples:

# see list of UDFs
java -jar charcuterie.jar --list

# get an example for a particular UDF
java -jar charcuterie.jar --example [UDF name]

# see all examples
java -jar charcuterie.jar --all-examples

Where do I get it?

You can download charcuterie from my website, at:

http://www.qethanm.cc/go/charcuterie

I'd like to talk to you about charcuterie / I have more questions! / I've found a bug

Sure. Please send your message to:

charcuterie-questions
-at-
exmachinatech.net

Who wrote it?

charcuterie is a project by Q Ethan McCallum (@qethanm)

Why the name, "charcuterie?"

The term "charcuterie" holds a few different meanings, depending on the context, but they all relate to foods derived from the pig.

Since this is my collection of Pig utilities, I wanted a name related to pig products.

I can't take credit for the name, though ... I didn't have any good ideas of my own, so I turned to Scott Robbin (@srobbin) and Ryan Briones (@ryanbriones) for help. We tossed about some pretty cool names, and Scott came up with the winner.