At what point do information artifacts cease to become copyrighted works of art and become patentable inventions ? The distinction between aesthetics and utility would seem to clarify this, but in an information based economy the two become less and less distinguishable. Code is data, data is code, and

certain innocent prime numbers become illegal.

**Copyright** is the set of exclusive rights granted to the author or creator of an original work, including the right to copy, distribute and adapt the work. Copyright lasts for a certain time period after which the work is said to enter the public domain. Copyright applies to a wide range of works that are substantive and fixed in a medium.

In the United States, a **design patent** is a patent granted on the ornamental design of a functional item. Design patents are a type of industrial design right. Ornamental designs of jewelry, furniture, beverage containers (see Fig. 1) and computer icons are examples of objects that are covered by design patents.

Note that copyright verses design patent has nothing to do with the medium. A mass produced copy of a marble sculpture can be copyrighted, and a mass produced copy of a marble coffee cup would be design-patented. The critical feature it would seem, is weather the art is attached to something useful, or is purely valued for aesthetic purposes. However, this distinction is undefined. Modern art created objects of minimalist, utilitarian design that were also clearly works of art (see

here). Furthermore, information can itself be a technology. The distinction between code (patentable) and data (copyrighted) has been dissolved my modern programming paradigms. Music is arguably a technology for manipulating emotional and cognitive state. Whether a coffee cup is a useful object or an art object changes depending on how many you sell and what your clients decide to do with them. Modern music seems to be made for production, to the point where some of it is legally in the same class as disposable coffee cups.

For every partition between the set of things covered by copyright, and the set of things covered by design patents, I can construct an object that lies directly on this partition. This indicates that the two sets are connected and must share at least on object. Existing law acknowledges the intersection, (e.g. The Statue of Liberty ).

The distinction boils down to the number of bits representing the idea and the probability of collision between artifacts. There are still objects that clearly fall under copyright, and objects that clearly fall under design patent, but the digital economy has created ever more exceptions to this classification.

Design patents cover features with low information content and a high probability of collision. Alternatively, (since we may be working with physical objects with an infinite amount of state ) the similarity threshold for design patents is higher than that for copyrighted works. Generally speaking, design patents will cover the "gist" of the decorative appearance of objects, and will therefore contain far fewer bits than might be required to exactly reproduce the design.

Copyrighted tends to cover objects with high information content where collisions are improbable. Almost all written and printed works contain sufficient entropy that the probability of collision between two original works is near 0. The lyrics to the song "Happy Birthday" have exceptionally low information content as far as copyright goes and, arguably somewhere close to 50 bits, using

1.0 bits per letter and assuming that the repetitive structure reduces essentially to

*"eval('happybirthdaytoyou'*4;s/toyou/dearname line3)" *

This gives p(collision) as 2^-50. However, if each of the last 7 billion people to live uttered a mere 100,000 sentences of equivalent entropy over the course of their lifespan, you'd get approximately 50% chance of collision ((1-(1/2^50))^(7000000000*100000)=0.537). Although this estimate is off, due to the number of people on earth who actually speak English, most works are quite a bit longer than Happy Birthday and therefore will not collide in practice.

Incidentally this gives a heuristic for the expiration of copyright : The length of time a copyright is granted can be proportional to the raw number of novel bits contained within the work. Perhaps an article could remain copyrighted until you reach a 50% probability of another human producing the same work by random chance. This creates the unfortunate incentive to simply append a lot of unrelated works together. Perhaps copyrighted material could be sectioned into substrings which provide no mutual information about eachother, although this enters into a weirdness of information theory about which I am not equipped to reason.

The only sense I can make of the distinction is that design patents must be verified before production, and copyright violations are settled in expensive lawsuits after production. It makes economic sense to thoroughly check objects with a high probability of collision before mass production, but to be lazy with objects of a low probability of collision. Of course, for physical artifacts the number of effective bits is subjective and depends on how discerning the consumer is. Have you ever heard two people arguing for hours over the relative merits of two cars/shoes/operating-systems that appear functionally indistinguishable to you? In light of this, it almost makes more sense to let the author ( yes, 3D objects with utility are works of authorship ) decide on what risks to take : expensive, fixed cost patent search, or prohibitively expensive possible lawsuit ? I bet we can write a whole set of optimization equations to determine what is economically most favorable based on probability of collision, and the cost of resolving these collisions, but I need to go to sleep.

How does one tell if two files are "equal" up to the equivalence class of "copyright" ? Computer files can be encoded in a multitude of ways, producing file strings of seemingly different information content. The key is that a file string, in union with a decoding method, makes up the information artifact. The file string itself is useless without a consensus decoding scheme. Large prime numbers are not illegal, but a

specific large prime together with the decoding scheme "this is a zipped c source for a DVD decryption algorithm" becomes a copyright violation on the DVD decoding algorithm. This is much the same way that possessing a sequence of numbers, eg 4-8-15-16-23-42 is not in itself illegall, unless you also have the information "This is the combination to the safe at the bank downtown". For a working definition, let us say that a tuple (file, decoder) are equivalent if they must, at some point, be translated into the same string of bits to become usable, or ultimately compute the same function. Note that the general problem of determining if two algorithms compute the same function is undecidable, but most specific instances can be evaluated in finite time.

This problem is complicated by the advent of lossy compression algorithms. If I compress an audio book into a low bit rate Ogg Vorbis format, I will loose most of the original file information, and can not reconstruct it. We might be able to get away with "equivalent information content as appropriate for the medium", which for books might reasonably be defined as the string of text representing the work. This case would also cover transcribing the audio book to text using speech recognition. However, this definition is still vague. For instance, the quality of the narration in an audio book is arguably part of the product and would be lost with heavy compression schemes.

We might be able to get away with another definition : a lossy compression of a copyrighted work is a violation if the tuple (file, decoder) could not, with high probability, have been arrived at without access to the original file in some form. This can be formalized by saying that the compressed file contains a very specific subset of the information content of the original, and effectively is a copyright violation on this subset of information, but not the whole file.

So, how many bits do you need for a copyright violation ? How many bits do you need for a patent violation on an algorithm ?

Another scenario, dealing with algorithm patents. An algorithm may be patented because it is exceptionally fast, or has some desirable properties. So, say Alice has patented a super good face recognition algorithm with excellent accuracy and time constant, and Bob then develops a wholly different algorithm with the same speed and accuracy. Alice's algorithm was phrased in terms of digital signal processing, and Bob's algorithm used a lot of group theory and graph data-structures. The patent office decides that these are clearly pretty different algorithms, and grants both patents. Later on, a mathematician comes along and says "Aha! by the following isomorphism Alice's and Bob's face algorithms are mathematically equivalent", and furthermore "All possible algorithms with this time constant are isomorphic to Alice's and Bob's algorithms", and even worse "No one can do better, theoretically, than Alice or Bob". So, not only are Alice's and Bob's inventions provably identical, but all future inventions attempting to solve the problem will also be provably identical. How do you resolve this ? Do you revoke the patent on whoever filed second ? This seems pretty clearly unfair. It could get even worse : Alice's and Bob's algorithms could be, theoretically, identical in speed and accuracy, but the equivalence between them could be uncomputable. Is this really, legally, a different situation from the case where the equivalence is computable(provable) ?