Communicating with Data: The Art of Writing for Data Science

A new book by Deborah Nolan and Sara Stoudt

David Aldous
2 min readMay 23, 2021
The Voyager Golden Record (Wikipedia)

This is an unsolicited enthusiastic recommendation for the new book by two Berkeley colleagues. Data Science is perhaps unique in the following sense. A physicist writes technical papers to be read by other physicists, as do chemists or mathematicians or ……. In contrast, a data scientist needs to communicate results of analyses in a particular domain to readers familiar with that domain but lacking detailed knowledge of data science techniques and limitations.
So effective “communicating with data” is more important and challenging in data science than in other academic disciplines.

You might fear that a 300 page textbook on “technical writing” would be rather dry and sterile. But no! It offers a wide scope, from conceptual overviews of organization, to presentation of graphics and pseudo-code, to detailed analyses of individual sentences and words. A recurrent theme is that one should learn how to write by critically reading what others have written. The book provides explicit coaching on how to do this, with many examples.

One noteworthy chapter illustrates the device of starting with a storyboard — “a visual outline that informs a formal written outline”. They write

The main goal of a visual outline is to identify the narrative. What problem exists? What did we do to solve that problem? Why does it matter that we have solved this problem? After we have identified the story and experimented with the order of details. by rearranging panels of plots and text summaries, we can build a formal outline to tell the story we have identified.

A few of their “sentences and words” examples are

Use consistent terminology:

we typically use the word “about” to refer to imprecise quantities that are naturally measured in whole units, and we reserve “approximately” for measurements that are given in fractions.

Don’t overuse cliches, like

We fit the distribution through the lens of a quantile-quantile plot.

Don’t mismatch words, like

Self-selection of respondents enhances bias.

Though mostly focused on writing for technical journals, one chapter discusses writing for the “broader public” — blogs or press releases —
and medium readers will find it interesting to compare the book’s advice with that from medium itself.

Anyone in data science who engages with technical writing will surely benefit by having this book within easy reach.

--

--

David Aldous

After a research career at U.C. Berkeley, now focussed on articulating critically what mathematical probability says about the real world.