works in progress
The following titles are still underway, and publication dates are often TBA.
Published works are listed in the next section.
(title and topic soon to be revealed)
(publication topic and date TBA: currently in stealth mode)
Making Analytics Work: Case by Case
(publication date TBA)
This book will be a source of practical advice for CxOs, managers, and team leads who have been tasked with building an internal analytics practice.
Building on interviews with company leaders, we will explain how to build the analytics team, how to align data analysis with the company mission, how to choose the toolset, and so on.
Brett and I are still interviewing people, so please contact us if you’d like to be included in the book.
(untitled book on time series analysis)
(publication date TBA)
Project details to be announced.
(untitled work on systems architecture)
(publication date TBA)
My ramblings on how to layout a shop’s technology infrastructure (work-in-progress)
This list includes select material from formal print and electronic publications, plus certain guest posts I’ve written on others’ sites.
Business Models for the Data Economy
The recent surge in data collection and analysis opens up a number of business models, though only a couple of them get much attention. Business Models for the Data Economy explores eight ways to add value and generate revenue in the world of data.
This blog post on O’Reilly Radar, “Building a Business on Data” describes the paper in greater detail. You can also download it for free through the O’Reilly catalog.
Steering the ship that is data science
(May 2013 - O’Reilly Radar)
This was the second in a set of O’Reilly Radar posts I co-authored with Mike Loukides (@mikeloukides). We explored some parallels between today’s data science boom and the late-1990s tech boom. In particular, we ask: how can data science reap the rewards of being the Hot New Thing while avoiding its pitfalls?
(April 2013 - O’Reilly Radar)
This is the first in a set of O’Reilly Radar posts I co-authored with Mike
@mikeloukides). We pondered how to size up an organization’s data
science efforts from the outside, perhaps as a possible interview candidate.
Bad Data Handbook: Mapping the World of Data Problems
A road map of data problems and solutions. This book describes various real-world data problems, from the hands-on technical grunt work to the high-level strategic issues.
I was the book’s editor, which means I was responsible for developing the concept
and leading the project.
I supported and coordinated the efforts of nineteen contributing authors. I also co-wrote a chapter, “Data Quality Analysis Demystified: Knowing When Your Data Is Good Enough”.
Parallel R: Data Analysis in the Distributed World
Paralell R describes strategies for getting R to work in the Big-Data era. In other words, Stephen Weston and I explain how to work past R’s limitations – being memory-bound and single-threaded – and let R work in a parallel, distributed manner suited to modern datasets.
The book covers well-known R packages for parallelism (Snow, Multicore, Parallel) as well as newer, Hadoop-related tools (RHIPE, Segue, Hadoop streaming). Much of my contribution explores how to mix R and Hadoop.
Managing RPM-Based Systems with Kickstart and Yum
An exploration of automated builds and systems management, using the RedHat Kickstart and yum tools.
APR Networking & the Reactor Pattern
(2006/10/03 - Doctor Dobb’s Journal)
Introduction to Apache Portable Runtime (APR) networking. I use the classic Reactor pattern as an example.
What Is Jetty
(2006/06/14 - OnJava (O’Reilly Network))
A page from the O’Reilly “What Is” series, this article describes the Jetty servlet container and its underlying API. Jetty is designed with embedding in mind; that is, you can add webapp (servlet, JSP, web services) functionality to a Java application without having to repackage it as a formal WAR.
(2005/12/09 - C/C++ User’s Journal)
Use GNU Autoconf to simplify cross-platform builds of your native-code apps. Familiar with the standard
./configure; make; make install routine? Autoconf is what drives the
App-Managed DataSources with commons-dbcp
(2005/11/17 - Java.net)
I’m all for standards, such as J2EE’s container-managed database connection pooling. Sometimes, though, you have to take a different path. This article explains how to create a database connection pool inside your application using two Jakarta libraries, commons-pool and commons-dbcp.
Processing XML with Xerces and SAX
(2005/11/10 - OnLAMP (O’Reilly Network))
Second in a two-part series, this article explains how to use the SAX side of the (Apache) Xerces C++ library to process XML documents.
The Perl-Compatible Regular Expressions Library
(2005/09/28 - C/C++ Users Journal)
Want the power of Perl’s regular expressions (regexps) in your C and C++ apps? Use the Perl-Compatible Regular Expressions Library, or PCRE.
Processing XML with Xerces and the DOM
(2005/09/08 - OnLAMP (O’Reilly Network))
First in a two-part series, this article explains how to use the DOM side of the (Apache) Xerces C++ library to process XML documents.
Simplify Network Programming with libCURL
(2005/05/05 - Linux DevCenter)
curl commandline tool is a Swiss-Army knife of URL handling and downloading. Use its backend
libCURL library to add file-transfer power to your native-code applications.
Pre-Patched Kickstart Installs
(2005/02/17 - Linux DevCenter)
Third in a series, this article explains how to create a pre-patched Kickstart tree (that is, one with the updates already applied) and add some change control to your
Custom Containers & Iterators for STL-Friendly Code
(2005/02/15 - C/C++ Users Journal)
Many C++ STL container objects look and act alike, but they don’t share a parent class. Learn how to extend existing contianers or create new ones using STL’s “concepts,” a kind of loosely-enforced polymorphism.
The Watchful Eye of FAM
(2004/12/16 - Linux DevCenter)
Watching for changes in a file or directory? Calling
poll() can be expensive. Let the File Alteration Monitor, or FAM, watch for you and report results to your code.
Advanced Linux Installations and Upgrades with Kickstart
(2004/11/04 - Linux DevCenter)
Second in a series, this article shows how to customize your Kickstart process and leverage Kickstart for OS upgrades.
Migrating to Page Controllers
(2004/10/14 - OnLAMP)
Use the Page Controller pattern in your PHP web applications to separate business logic from the HTML.
Hands-Off Fedora Installs with Kickstart
(2004/08/19 - Linux DevCenter)
First in a series, this article is an introduction to the Kickstart automated OS-install tool for Linux. Why click through the installer a few (hundred) times? For Red Hat, Fedora, CentOS, and other RPM-based Linux distros, let Kickstart do the work so you can hang out at the pub.
Building a PHP Front Controller
(2004/07/08 - OnLAMP)
Apply the Front Controller design pattern to your PHP apps, and in return you’ll get a single entry point through which to apply common services (such as security or page templating).
Programming Linux 2.6
(2004/06/15 - Linux Magazine)
A review of the developer-oriented features in Linux kernel 2.6.
Changing a Program’s Identity
(2004/04/15 - Linux Magazine)
Learn how to safely use the
setgid() system calls to make your app change its identity at runtime.
Writing a Trace System
(2004/03/15 - Linux Magazine)
You can’t always use a debugger in production! Add a configurable trace (logging) system to your app so you can track down problems at runtime.
Software Packaging with RPM
(2004/02/15 - Linux Magazine)
The RPM is the unit of measurement Red Hat Linux and its derivatives (Fedora, CentOS, and so on). Learn how to package your software as an RPM, so you can take advantage of the OS’s package management system.