I’ve had some great talks with Mike Loukides (
@mikeloukides). After our last chat, we decided to publish some of what we had discussed:
Last week, Mike took the first part and wrote up Leading Indicators. This is a look at how to size up an organization’s data science efforts from the outside, perhaps as a possible interview candidate.
Today, in Steering the ship that is data science, I ask what today’s data science boom can learn from the late-1990s tech boom. In particular, how can data science reap the rewards of being the Hot New Thing while avoiding its pitfalls?
Earlier this year, I previewed some upcoming projects. Today, I can add one to the list (and scratch out one of the “TBD” placeholders on the publications page while I’m at it):
I’m very pleased to have paired up with Brett Goldstein to write a new book. The working title is Making Analytics Work: Case by Case. The book will be a source of practical advice for CxOs, managers, and team leads who have been tasked with building an internal analytics practice.
To do this, we’re going to interview people in key roles, to learn from their experiences. Maybe you’d like to contribute, as well?
Our post on today’s Strata Blog has more details on the book, and how to contact us.
I’ve just released a collection of UDFs (user-defined functions) for Apache Pig.
Please check out charcuterie for details.
Bad Data Handbook co-author Adam Laiacano recently posted a recap of his 2012 and preview of 2013. I thought this was a great way to round up several projects and details, and it made me realize that I’d done a poor job of announcing things here. So, Adam, allow me to borrow your idea and present my own recap-and-preview.
Truth be told, I rarely place any significance on the new year — if something needs to change, better to do it now than wait for some arbitrary date — but several projects wrapped late last year, leaving me some time at year-end to reflect and plot my next course.
2012 was quite a busy year! Here’s a quick roundup of things that were announced elsewhere, but not here:
- new book: shortly after Parallel R landed in late 2011, I started working on a new title. Bad Data Handbook landed late 2012.
- text-mining fun with
@ChicagoCDO and a team of civic-minded data folk. We even paired up on a Strata talk (“Text-mining Your City”) to share what we’d learned.
- speaking engagements, at local meetups and larger conferences
2013 looks like it will be even more fun (and busy):
- (another) new book: shortly after Bad Data Handbook landed (are you seeing a theme here?), I laid the foundation for a book on time series analysis. Joining me in this adventure is none other than noted R expert and
xts author Jeff Ryan.
- more writing: I plan to release a set of short papers that have been sitting in the pipeline. Some of them will pair me up with Ken Gleason, with whom I co-wrote a chapter in Bad Data Handbook.
- software: continuing my themes of text mining and writing tools for data anlysis, I plan to release some utilities for Apache Pig in the near future. Stay tuned for this and other project updates.
- speaking engagements: I’m already lining up some travel for the coming months. Perhaps I’ll visit your town? Time will tell …
- research & collaboration: I’m exploring some new subjects and avenues. Details forthcoming.
What’s most exciting about the future are the things I haven’t mentioned here, because I don’t yet know about them! Drop a line if there’s something I should know about. I’d be especially interested to hear about new opportunities to collaborate, new projects, and talks.
This new version of forqlift supports reading and extracting files that contain external
Writable types — that is, implementations from Mahout, Hive, and other products that extend the base Hadoop
Details are available on the forqlift download page.