logo

An Imaginary Sabermetrics for Publishing

 

Empty Set by Verónica Gerber Bicecci, translated from the Spanish by Christina MacSweeney (Coffee House)

Although five books is most definitely a small sample size of throwaway proportions, out of the books that I’ve written about for this weekly “column,” Empty Set by Verónica Gerber Bicecci and translated from the Spanish by Christina MacSweeney is my favorite. I don’t know where it will stack up by the end of the year—there are a number of titles coming out this summer that I’m looking forward to, and as a gesture toward impartiality, I’ll should really leave Fox, The Bottom of the Sky, The Endless Summer, and other Open Letter titles out of these evaluations—but for now I’d put it ahead of The Perfect Nanny, In Black and White, Frankenstein in Baghdad, and Theory of Shadows. (And that is how I would rank them, one to five.)

As you can probably predict, I’m not going to write a full, well thought out review for this book. If that’s what you want, I’d highly recommend checking out Lisa Fetchko’s review over at the _Los Angeles Review of Books. She breaks the book down really well, and even gets into a particular translation issue about the use of _ in place of _Yo(Y), which is also discussed in an afterword that will be of particular interest to translators or those interested in the translation—or editing of translations—process.

I’m going to use this book as an opportunity to write about something entirely different, but before I do that, I have two or three quick points.

1) I like the use of the charts in this book. I’ll come back to this in a few different ways down below, but drawings such as this one—which is preceded by, “Here’s where this story ends,” a statement that means more once you have reached the end—is what makes this book unique.

 

And obviously, all the Venn Diagram charts are why I initially chose to read this book. Who doesn’t like a Venn Diagram?! This is one statement about math and statistics that everyone can agree on.

2) In a way, this is The Perfect Nanny for an entirely different set of readers. Written to be a blockbuster, The Perfect Nanny includes a lot of techniques and tropes and literary moments designed to make a certain set of readers feel comfortably stimulated. The set of readers (R-1) who prefer linear plots, heavy character development, detailed settings, psychological tension.

Empty Set generates an equal amount of reading comfort in a different set of readers (R-2) who feel more at ease in a text of evocative fragments, acrostics, plots like puzzles, and characters whom you don’t feel obligated to relate to.

For both R-1 and R-2 these books are equally successful in their approaches. And R-1 probably doesn’t care for Empty Set (“too confusing!” “I couldn’t relate to anyone!”), and vice-versa (“I’d rather see the movie”).

You could, I don’t know, draw a Venn Diagram of these two subsets of readers . . .

3) Not to take anything away from this novel, but wow have January and February been slow months for international literature. There doesn’t seem to have been anything buzzing on Book Twitter or Book Marks or in the blogosphere (doesn’t anyone say that anymore?) or at Winter Institute. I’ve written about the drop in translations both of the past two months, but that was just focused on pure numbers, not quality or sales or impact or anything else. But looking back at what I have read, and forward to what’s on my docket, it feels like pretty quiet year so far.

Although I’m personally hoping this New York Times review of Madame Nielsen’s The Endless Summer changes that, this still feels a lot like the current situation in Major League Baseball—the slowest in all of history—in which no free agents are being signed and nothing at all is happening. There are so many interesting explanations for this situation in which several of the game’s best players are currently unemployed: it could be collusion, it could be that clubs have more advanced understanding of the value available in the free agent market, it could be due to the fact that 1/3 of the teams are tanking in 2018 and another 1/2 aren’t really in a position to do anything but tread water, it could be because of the new collective bargaining agreement and traditional big spenders (LA Dodgers, NY Yankees) trying to reset their competitive balance assessments by getting under the spending threshold for one year, or it could have something to do with yachts. God bless Scott Boras!1

Anyway, this combination of thinking about baseball (how to best build a team, player valuations, etc.) + reading a novel centered around set theory2 + a stray comment I made in an earlier post —> an idea to try and create some core concepts for a sabermetric approach to the book industry.

*
     Sales(S)

This is an obvious building block. People usually value books based on how many copies they sold. “We sold 10,000 copies!” Or, “It was a best-seller in Mexico!”

(Not to be confused with “Print Run(PR),” which is a number based in hope that signifies nothing more than the publisher’s wish to sneakily manipulate the bookseller market. Print Run(PR) is equivalent to Scott Boras’s bullshit stats packages for players like Eric Hosmer who are hoping to receive contracts that are far larger than the value they’ll generate for their team. Print Runs(PR) are generally lies.)

Are sales really all that useful of a statistic though?

First off, the latter statement up there—repeated way too frequently in meetings with foreign agents—is crap. It’s descriptive, not objective, and lacks any and all context. How many books did this title beat out to become a best-seller? For how long was it a best-seller? How predictive is the Mexican best-seller list for a book entering other markets? Are the coefficients mapping it onto the French and U.S. markets radically different?

Another criticism: Sales in a vacuum takes into account none of the expenses involved with generating those sales. A book with a million dollar marketing budget that sells 100,000 copies is vastly different from a book that sells 100,000 based on a viral video that cost $.49 to make.

It also doesn’t take into account the list price of the book itself. It’s obviously way easier to sell 10,000 ebooks at $.99 than 10,000 hardcovers of a scholarly investigation into the sexual life of mollusks that lists for $149.

Sales is like batting average. A nice metric the average citizen can understand, but really not all that valuable.

Actually, that’s kind of a lie. Batting Average has values that most people can recognize as “good,” (.280) “amazing,” (.320) and “hall of fame.” (.340+). What are the equivalents for books? If I tell the people sitting next to me at the bar that we sold 3,000 copies of a book, will they think that’s great? Or pathetic? Without a commonly accepted baseline—among the larger audience, not just book nerds—this doesn’t mean a whole lot.

And it doesn’t take into account the idea that a book is more than its purchases. Thought experiment: Which is better? A book that sells 10,000 copies, 2,000 of which are read, with 10 readers capable of recalling the book one year later, or a book that sells 1,500 copies, 1,000 of which are read, with 200 readers taking this to the grave? (A: If you’re Big Five it’s the former, if you’re nonprofit the latter. There is no unified theory of sales.)

(Sales(S) x List Price(P)) x Readership® – Fixed Operating Expenses(FOE) – Printing(PR) – Author Payment(AP) – Translator Payment(TP) – Marketing Costs(MC) = True Profit(RP)

OK, so this is two steps in one: I’ve added in all the variables mentioned above (costs, list price), but then thrown in the idea of “Readership®” to try and point at the fact that overall impact of a single printed book isn’t a one-to-one ratio with copies sold. On the most basic level, there are used copies. How many students a year buy used copies of The Great Gatsby for class? Or check it out from a library? A book’s true value, or “Profit” (capitalist term, I know), is always and forever greater than the number of printed copies.

We’re still missing a few things though: What about people who know about a book, yet don’t buy it? And what about the longevity of readership? It’s one thing to read Gone Girl and then keep on living, another to read Ulysses and have your life perspective changed. That Cultural Value(CV) isn’t captured here, and I’m not sure it ever can be quantified in this way. So let’s change tactics a bit.

((Expected Sales(ES) x List Price (P)) – ((Publishing Interest(PI) + Agent Status(AS)) – Total Expenses(TE))) ) = Cash Profit(CP) + Cultural Capital(CC)

If we really want to create a sabermetric approach to books, we have to look for exploitable inefficiencies in the marketplace. And my first inclination is that these inefficiencies come in two flavors: leveraging reputations against author advances and finding a way to decrease artist payments.

That’s not quite right though. Let me back up a bit and math this out.

In the early 2000s, there were no translations3 and there was a major gap between the best /most expensive translators (Margaret Jull Costa, Edith Grossman, Richard Howard, Gregory Rabassa) and everyone else. Without a middle class—and without competition—certain publishers saw an exploitable inefficiency. How much can you make when you pay $1,000 as an author advance, $1,000 to a grad student translator (“Hey, yo, we’re gonna like, launch your career!”), and can get $3,000+ from foreign agencies desperate for American publishers to acknowledge that their literature even existed? In that situation, you can flip 2,500 sales into a decent amount of money. That is the dirty truth of translation publishing in the early part of this century.

Then things changed! International lit got more popular. Translators got organized. Now, the idea of going overseas to find the best books that no one knows or cares about is complicated by the two dozen new presses trying to beat you there, and the combination of ethical obligations in relation to translator payments and agent involvement in raising author advances (good in the short term, maybe, and probably not in the long term, but that’s its own metric), raised Total Expenses(TE) in an astronomical fashion. As well as altering the Agent Status(AS) (“I have the next Ferrante on my list . . . “) and the Publishing Interest(PI) (“We’re starting a new press and want in on the hot trends, so which book is the one that’s going to get us critical attention AND be most readable by the (R1) readers of The Perfect Nanny?”). Increase the second half of the equation above while not changing the overall sales, and you’re going to kill your margins.

That doesn’t mean that publishers will stop pursuing books that are unlikely to earn back expenses. Look at Penguin paying a million dollars for a Knausgaard novel. There’s basically no way that he’ll earn that back in straight sales. Same with Knopf and Javier Marías. PRH can definitely expand the audiences for these authors, but there’s a ceiling. Even knowing that, they’re willing to go ahead because there’s a value just to having these names on your list. Reputation, cultural capital, whatever you want to call it, it’s part of this equation as well.

Expected Sales(ES) = Author Fans(AF) x Purchasing Coefficient(PC)

If someone were able to come up with an algorithm that was even 90% accurate in predicting sales, they would be in a position to basically print money. Long time readers—or anyone involved in the book word—know that publishers don’t really do any market research. Unlike movies, there is no pre-release tracking figures for blockbuster titles. Sure, you can “have a pretty good sense” about how well a book is or isn’t going to sell, but outside of Harry Potter, James Patterson, and a handful of other brands, the error bars on predicted sales are really wide.

Past performance by the author and publisher are major indicators of how a particular title will sell, so maybe this is something that could be calculated . . . Throw in a few sensible metrics about the author—Twitter Followers(TF), Reviewing Connections(RC), etc.—along with some sort of figures about the publisher—Sales Reps(REP), Average Reach(REA), Influencer Access(IA), etc.—and maybe you can come up with some sort of prediction.

(Pace of Reading(PAC) x Length(LEN)) x (Character Connections(CC) x Plot Points(PP)) x Buzz(BUZZ) = Reading Desirability(DES)

Amazon’s metrics about how fast people read various books, where they tend to stop, which titles are most/least likely to be read in their entirety, etc., totally freak literary people out. There are a ton of Silicon Valley people who would love to create a program that would use some complex algorithm to churn out best-selling book after best-selling book without any author’s involvement whatsoever. They would flood the market with exactly what most people want, all more or less for free, and utilizing some sort of textual analysis that combines all the typical plot elements of popular books (hero’s quest, typical plot structure of rising action, climax, denouement) with other quantifiable elements (language level, sentence and chapter length, number of chapters) that have been found to keep readers engaged and flipping pages.

Take all that, mix in some BUZZ (readers want to feel like they have to read a book so as to not be left out) and you can figure out how likely a book is to appeal to a wide audience.

Turnover(TO) x Cash Profit(CP) x Hipster Quotient(HQ) = Indie Stock(IND)

Bookstores actually have the ability to come up with a ton of different measurements, depending on what they want to track or evaluate. Sales per linear foot in given sections. How fast different subjects turn over. Average amount spent by a customer. Frequency of returning customers. There’s tons of data sitting right there that could be analyzed in a totally straightforward fashion.

But indie stores aren’t necessarily about efficiency in the way Barnes & Noble or Amazon would like to be. Part of their reason for being is tied to having the books that you don’t always find at the big box stores, at pushing a sort of aesthetic agenda that sets them apart. If, as a store owner, you could always know which books will both increase your coolness factor with your clientele and sell with the necessary velocity to keep you paying your rent, you’d be in the best spot possible. This might seem intuitive, but I think it can be a bit more complicated depending on how you value your reputation. For example, you may not want to carry Fifty Shades of Gray because you have standards, but that means you’re leaving a lot of money on the table. And carrying too many different titles that sell one time a year, yet make you seem like the smartest bookstore around, is a recipe for closure. Figuring out that balance—and which books maximize Cash Profit(CP) and Reputation(REP)—would be ideal.

*
There are tons and tons of different types of equations one could come up with in hopes of finding exploitable inefficiencies. And that could be kind of fun! But so is ignoring data completely and publishing/reading/stocking a book just because it feels right.

Besides, a lot of this calculus is already done on a daily basis by most everyone. Even though it’s not quantified in a sortable, sharable way, people are constantly making these sorts of decisions. They may not think about them quite as honestly as they should though, and maybe something like a set of publishing sabermetric ideas could help publishers and stores be all that they could be. It’s fun to come up with various calculations, mostly because it makes you think about what you’re actually trying to measure, and why the measurements you might already have fall short. It can help define your mission, and by working in various intangible benefits, you can better justify various investments or decisions.

 

– – – – – – – – – – – – –

1 For anyone not willing to click through (and good on you!), here’s the amazing quote from super-agent Scott Boras:

The off-season is like the America’s Cup. We have 30 boats in the water. They take off and eventually they get to the free-agent docks. Normally, there are trade winds, and there are economic investments in the capacity of the boat, which allow those boats to get to the appropriate free-agent docks.

This year, there was a detour to Japan, where there was a $250 million asset available for $3 million (Ohtani). All boats went to Japan. Then they sailed back a good distance. They came to Florida and found a sinking ship and all of its cargo was in the water (Dee Gordon, Giancarlo Stanton, Marcell Ozuna, Christian Yelich). All teams tried to load it on their boats.

That took additional time. Then, as they moved forward to the free-agent docks, they found other ships dumping cargo—Pittsburgh and Tampa Bay and a few others—which then slowed their arrivals to the free-agent docks. So, trade winds, Japan, shipwreck in Florida, more cargo-spewing, all those things artificially delayed the arrivals to the free-agent docks.

 

Sorry, I have no idea—but I love it! More literary agents need to go off the rails when making random comments about the books they’re trying to auction. That would liven up book journalism!

2 Representative bit from Bicecci and MacSweeney’s Empty Set:

There isn’t much documented evidence of this, but during the military dictatorship in Argentina, teaching basic set theory was prohibited in schools. We know, for example, that a tomato belongs to the tomato(TO) set and not to onion(ON) or chilies(CH) or coriander(CO). Where’s the threat in reasoning like that? In set theory, tomatoes, onions, and chilies might realize they are different foodstuffs, but also that they have things in common, like the fact that they can all belong to the fresh hot salsa(FHS) set and, at the same time, to the Universe(U) of cultivated plants(CP), and might perhaps unite against some other set or Universe(U); for example, that of canned hot salsa(CAHS). In short, a community of vegetables. Venn diagrams are tools of the logic of sets. And from the perspective of sets, dictatorship makes no sense, because its aim is, for the most part, dispersal: separation, scattering, disunity, disappearance.

 

3 My sabermetric principles apply to BOOKS in general, not just translations, but I want to focus on exploiting this market since it might explain what’s going on in 2018 with the weird decrease in translation publications.

Although! Let me promise the four of you reading this that next month I’ll run some three- and five-year rolling average stats to avoid comparing 2018 to the Best Year Ever. I’ve been statistically irresponsible and I know it. Sorry.



One response to “An Imaginary Sabermetrics for Publishing”

  1. […] —Chad W. Post, “An Imaginary Sabermetrics for Publishing“ […]

Leave a Reply

Your email address will not be published. Required fields are marked *


The reCAPTCHA verification period has expired. Please reload the page.

This site uses Akismet to reduce spam. Learn how your comment data is processed.