Q Ethan McCallum bio photo

Q Ethan McCallum


RSS feed

free paper
cover: Business Models for the Data Ecomony

upcoming book
cover: Making Analytics Work

Material that hasn’t yet been formally published lives on a separate page.

The rest of this page includes select material from formal print and electronic publications, plus certain guest posts I’ve written on others’ sites.

On Leadership

(September 2015)


Many people migrate from hands-on technical roles to leadership positions without formal management training. They often learn the hard way that being an engineering manager or analytics lead is not natural progression of their technical skill set, but instead requires that they develop a very different kind of muscle. In this O’Reilly Radar piece, a colleague and I offer guidance for the newly-minted leaders and leaders-to-be.

Business Models for the Data Economy

(October 2013)


cover: Business Models for the Data Ecomony

The recent surge in data collection and analysis opens up a number of business models, though only a couple of them get much attention. Business Models for the Data Economy explores eight ways to add value and generate revenue in the world of data.

This blog post on O’Reilly Radar, “Building a Business on Data” describes the paper in greater detail. You can also download it for free through the O’Reilly catalog.

Steering the ship that is data science

(May 2013 - O’Reilly Radar)


This was the second in a set of O’Reilly Radar posts I co-authored with Mike Loukides (@mikeloukides). We explored some parallels between today’s data science boom and the late-1990s tech boom. In particular, we ask: how can data science reap the rewards of being the Hot New Thing while avoiding its pitfalls?

Leading Indicators

(April 2013 - O’Reilly Radar)


This is the first in a set of O’Reilly Radar posts I co-authored with Mike Loukides (@mikeloukides). We pondered how to size up an organization’s data science efforts from the outside, perhaps as a possible interview candidate.

Bad Data Handbook: Mapping the World of Data Problems

(November 2012)


cover: Bad Data Handbook

A road map of data problems and solutions. This book describes various real-world data problems, from the hands-on technical grunt work to the high-level strategic issues.

I was the book’s editor, which means I was responsible for developing the concept and leading the project.
I supported and coordinated the efforts of nineteen contributing authors. I also co-wrote a chapter, “Data Quality Analysis Demystified: Knowing When Your Data Is Good Enough”.

Parallel R: Data Analysis in the Distributed World

(October 2011)


cover: cover: Parallel R

Paralell R describes strategies for getting R to work in the Big-Data era. In other words, Stephen Weston and I explain how to work past R’s limitations – being memory-bound and single-threaded – and let R work in a parallel, distributed manner suited to modern datasets.

The book covers well-known R packages for parallelism (Snow, Multicore, Parallel) as well as newer, Hadoop-related tools (RHIPE, Segue, Hadoop streaming). Much of my contribution explores how to mix R and Hadoop.

Managing RPM-Based Systems with Kickstart and Yum

(March 2007)


An exploration of automated builds and systems management, using the RedHat Kickstart and yum tools.

APR Networking & the Reactor Pattern

(2006/10/03 - Doctor Dobb’s Journal)

Introduction to Apache Portable Runtime (APR) networking. I use the classic Reactor pattern as an example.

What Is Jetty

(2006/06/14 - OnJava (O’Reilly Network))


A page from the O’Reilly “What Is” series, this article describes the Jetty servlet container and its underlying API. Jetty is designed with embedding in mind; that is, you can add webapp (servlet, JSP, web services) functionality to a Java application without having to repackage it as a formal WAR.

GNU Autoconf

(2005/12/09 - C/C++ User’s Journal)

Use GNU Autoconf to simplify cross-platform builds of your native-code apps. Familiar with the standard ./configure; make; make install routine? Autoconf is what drives the ./configure step.

App-Managed DataSources with commons-dbcp

(2005/11/17 - Java.net)


I’m all for standards, such as J2EE’s container-managed database connection pooling. Sometimes, though, you have to take a different path. This article explains how to create a database connection pool inside your application using two Jakarta libraries, commons-pool and commons-dbcp.

Processing XML with Xerces and SAX

(2005/11/10 - OnLAMP (O’Reilly Network))


Second in a two-part series, this article explains how to use the SAX side of the (Apache) Xerces C++ library to process XML documents.

The Perl-Compatible Regular Expressions Library

(2005/09/28 - C/C++ Users Journal)

Want the power of Perl’s regular expressions (regexps) in your C and C++ apps? Use the Perl-Compatible Regular Expressions Library, or PCRE.

Processing XML with Xerces and the DOM

(2005/09/08 - OnLAMP (O’Reilly Network))


First in a two-part series, this article explains how to use the DOM side of the (Apache) Xerces C++ library to process XML documents.

Simplify Network Programming with libCURL

(2005/05/05 - Linux DevCenter)


The curl commandline tool is a Swiss-Army knife of URL handling and downloading. Use its backend libCURL library to add file-transfer power to your native-code applications.

Pre-Patched Kickstart Installs

(2005/02/17 - Linux DevCenter)


Third in a series, this article explains how to create a pre-patched Kickstart tree (that is, one with the updates already applied) and add some change control to your yum cronjobs.

Custom Containers & Iterators for STL-Friendly Code

(2005/02/15 - C/C++ Users Journal)

Many C++ STL container objects look and act alike, but they don’t share a parent class. Learn how to extend existing contianers or create new ones using STL’s “concepts,” a kind of loosely-enforced polymorphism.

The Watchful Eye of FAM

(2004/12/16 - Linux DevCenter)


Watching for changes in a file or directory? Calling poll() can be expensive. Let the File Alteration Monitor, or FAM, watch for you and report results to your code.

Advanced Linux Installations and Upgrades with Kickstart

(2004/11/04 - Linux DevCenter)


Second in a series, this article shows how to customize your Kickstart process and leverage Kickstart for OS upgrades.

Migrating to Page Controllers

(2004/10/14 - OnLAMP)


Use the Page Controller pattern in your PHP web applications to separate business logic from the HTML.

Hands-Off Fedora Installs with Kickstart

(2004/08/19 - Linux DevCenter)


First in a series, this article is an introduction to the Kickstart automated OS-install tool for Linux. Why click through the installer a few (hundred) times? For Red Hat, Fedora, CentOS, and other RPM-based Linux distros, let Kickstart do the work so you can hang out at the pub.

Building a PHP Front Controller

(2004/07/08 - OnLAMP)


Apply the Front Controller design pattern to your PHP apps, and in return you’ll get a single entry point through which to apply common services (such as security or page templating).

Programming Linux 2.6

(2004/06/15 - Linux Magazine)

A review of the developer-oriented features in Linux kernel 2.6.

Changing a Program’s Identity

(2004/04/15 - Linux Magazine)

Learn how to safely use the setuid() and setgid() system calls to make your app change its identity at runtime.

Writing a Trace System

(2004/03/15 - Linux Magazine)

You can’t always use a debugger in production! Add a configurable trace (logging) system to your app so you can track down problems at runtime.

Software Packaging with RPM

(2004/02/15 - Linux Magazine)

The RPM is the unit of measurement Red Hat Linux and its derivatives (Fedora, CentOS, and so on). Learn how to package your software as an RPM, so you can take advantage of the OS’s package management system.