Month April 2025

Assorted Links for Friday

Book Deals for Friday

NOTE: These are NOT recommendations UNLESS clearly indicated. For more information see About Book Deals.

Book Deals for Shakespeare

NOTE: These are NOT recommendations UNLESS clearly indicated. For more information see About Book Deals.

 

Book Deals for Thursday

NOTE: These are NOT recommendations UNLESS clearly indicated. For more information see About Book Deals.

Assorted Links for Wednesday

Book Deals for Wednesday

NOTE: These are NOT recommendations UNLESS clearly indicated. For more information see About Book Deals.

The 2025 AEJ Best Paper Awards

The 2025 AEJ Best Paper Awards Have Been Announced. These are the papers that won. Not all were available through AEJ if you are not a member. When I could not access the AEJ, I found a working paper from a different source.

Assorted Links for Tuesday

Book Deals for Tuesday

NOTE: These are NOT recommendations UNLESS clearly indicated. For more information see About Book Deals.

Ramblings on the Value of Books to an LLM

Vanity Fair has a fascinating article about Meta AI staffers making the claim that the individual books used to train their LLM have no economic value. I won't rehash the whole article here - please read the whole thing.

I just want to question two of the claims made by Meta. First, one of Meta's expert witnesses claimed that the books had no vaule because a single book in LLM pretraining "adjusted its performance by less that 0.06% on industry standard benchmarks, a meaningless change no difference from noise." Second, Meta claims that "for there to be a market, there must be something of value to exchange, but none of the Plaintiffs works has economic value, individually, as training data."

The Value of a Single Book

Is a single book that adjusted the performance of an LLM by less than 0.06% on industry standard benchmarks really of no value? This is Meta's expert, so we will assume that impact to the standard benchmarks is a legitimate basis for value at some level (just not 0.06%). Since the Meta expert merely said less than 0.06% and not a lower number, let's also assume that 0.05% is probably a lower number that the expert calculated. If a single book contributed 0.05%, wouldn't that mean that a mere 2,000 books would have a significant impact (~100%) on the standard benchmarks?  That is not noise.

I know this is a gross oversimplification, but it is directionally quite correct. and shows that some number of books probably has a sizeable impact on the model. Which means, in the absence of fair use, that the improvements that the books made to the model would represent the authors value to the model - and perhaps their cut of the model's value.

For Meta to say that a book has no value, and still use millions (according to the article) of books is akin to a building contractor going to his supplier and getting 5000 bricks for free because no single brick alone has value to the project. Clearly the books have value or Meta would not have used them. Or could they scap all their models and go forward using no books and rebuild them as good as they are with the books?

The Lack of a Market

Meta and the other AI companies are correct in saying there is no market currently in existence to efficiently license large numbers of individual books. But is there no market because the books have no value or merely because books never needed to be valued and sold this way before? No individual or library needed so many books for anything other than reading or lending them to be read. Of course there would be knock on effects for readers, and new writers would be influenced by these books. But no one could read millions at all, or even thousands of books over a lifetime and monetize the results the way that these models potentially can.

Starting a new type of market for an already established product currently marketed on licensing an individual product is challenging. And there are no easy answers. Its not like the radio and streaming industry where the tragedy of the commons was handled, however inexactly, by deals with associations and other industry organizations. There model, however it works, is based on the songs being replayed and listened too again.

With AI models, this is potentially a one time use to help create something, or somethings, that will continue to be profitable for some time to come. Additionally, the models that are created using more books than any individual could ever read in a lifetime will not only have value for many other things, they could be used to write stories that can compete with human authors. I imagine that an organizational model akin to the music industry may be required but I really don't know how that would develop.

As in many things, I'm probably wrong, but I welcome your comments!