iOS readers: For the best reading experience, we recommend using a bright or light theme. Dark themes (such as night mode) are not compatible with the book's graphics.
We live in an age of exponential growth in knowledge, and it is increasingly futile to teach only polished theorems and proofs. We must abandon the guided tour through the art gallery of mathematics, and instead teach how to create the mathematics we need. In my opinion, there is no long-term practical alternative.
—Richard Hamming
Like many electrical engineers, I grew up in a world shaped by Richard Hamming. My graduate thesis was based on coding theory; my early career on digital filters and numerical methods. I loved these subjects, took them for granted, and had no idea how they came to be.
I got my first hint upon encountering “You and Your Research,” Hamming’s electrifying sermon on why some scientists do great work, why most don’t, why he did, and why you should too. Most thrilling was Hamming’s vivid image of greatness, and its unapologetic pursuit. Not only was it allowable to aim for greatness, it was cowardly not to. It was the most inspirational thing I’d ever read.
The Art of Doing Science and Engineering is the full, beautiful expression of what “You and Your Research” sketched in outline. In this delightfully earnest parody of a textbook, chapters on “Digital Filters” and “Error-Correcting Codes” do not, in fact, teach those things at all, but rather exist to teach the style of thinking by which these great ideas were conceived.
This is a book about thinking. One cannot talk about thinking in the abstract, at least not usefully. But one can talk about thinking about digital filters, and by studying how great scientists thought about digital filters, one learns, however gradually, to think like a great scientist.
Among the most remarkable attributes of Hamming’s style of thinking is its sweeping range of scale. At the micro level, we see the close, deliberate examination of everyone and everything, from the choice of complex exponentials as basis functions to the number of jokes in an after-dinner speech. Nothing is taken for granted or left unquestioned, but is picked up and turned over curiously, intently, searching for
And then, even in the same sentence, a zoom out to the wide shot, the subject contextualized against the broad scientific landscape, a link in the long chain of history. Just as nothing is taken for granted, nothing is taken in isolation.
For one accustomed to the myopia of day-to-day work in a field, so jammed against the swaggering parade of passing trends that one can hardly see beyond them or beneath them, such shifts in viewpoint are exhilarating—a reminder that information may be abundant but wisdom is rare.
After all, where today can the serious student of scientific creativity observe the master at work, short of apprenticeship? Histories are written for spectators; textbooks teach tools, not craft. We can gather small hints from the loose reflections of Hadamard, the lofty theories of Koestler, the tactical heuristics of Pólya and Altshuller.
But Hamming stands alone. Certainly for the thoroughness with which he presents his style of thinking, and more so for explicitly identifying style as a thing that exists and can be learned in the first place. But most of all, for his expectation—his insistence—that the reader is destined to join him in extending the arc of history, to become a great person who does great work. In this tour of scientific greatness, the reader is not a passenger, but a driver in training.
This book is filled with great people doing great work, and to successfully appreciate Hamming’s expectations, it is valuable to consider exactly what he meant by “great people,” and by “great work.”
* * *
In Hamming’s world, great people do and the rest do not. Hamming’s heroes assume almost mythical stature, swashbuckling across the scientific frontier, generating history in their wake. And yet, it is well-known that science is a fundamentally social enterprise, advanced by the intermingling ideas and cumulative contributions of an untold mass of names which, if not lost to the ages, are at least absent from Wikipedia.
Hamming’s own career reflects this contradiction. He was employed essentially as a kind of internal mathematical consultant; he spent his days helping other people with their problems, often problems of a practical and mundane nature. Rather than begrudging this work, he saw it as the “interaction with harsh reality” necessary to keep his head out of the clouds, and at best, the continuous production of “micro-Nobel Prizes.” And most critically, all of his “great” work, his many celebrated inventions, grew directly out of these problems he was solving for other people.
Throughout, Hamming insisted on an open door, lunched with anyone he could learn from or argue with, stormed in and out of colleagues’ offices, and otherwise made indisputable the social dimensions of advancing a field.
Hamming’s conviction—indeed, obsession—was the opposite: that this greatness was less a matter of genius (or divinity), and more a kind of virtuosity. He saw these undeniably great figures as human beings that had learned how to do something, and by studying them, he could learn it too.
And he did, or so his narrative goes: against a background of colleagues more talented, more knowledgable, more supported, better equipped along every axis, Hamming was the one to invent the inventions, found the fields, win the awards, and generally transcend his times.
And if he could, then so could anyone. Hamming was always as much a teacher as a scientist, and having spent a lifetime forming and confirming a theory of great people, he felt he could prepare the next generation for even greater greatness. That’s the premise and promise of this book.
Hamming-greatness is thus more a practice than a trait. This book is full of great people performing mighty deeds, but they are not here to be admired. They are to be aspired to, learned from, and surpassed.
* * *
Hamming leaves the definition of “great work” open, encouraging the reader to “pick your goals of excellence” and “strive for greatness as you see it.” Despite this apparent generosity, the sort of work that Hamming considered “great” had a distinct shape, and tracing that shape reveals the deepest message of the book.
There are many commonalities we can admire in these endeavors: the dazzling leap of imagination, the broad scope of applicability, the founding of a new paradigm. But let’s focus here on their form of distribution. These are all things that are taught. To “use” them means to learn them, understand them, internalize them, perform them with one’s own hands. They are free to any open mind.
In Hamming’s world, great achievements are gifts of knowledge to humanity.
Looking at this work in retrospect, it can be hard to imagine it having taken any other form. It’s possible to see these past gifts as inevitable, without realizing how much the present-day winds have shifted. Many of the bright young students who, in Hamming’s day, would have pursued a doctorate and a research career, today find themselves pursued by venture capital. Steve Jobs has replaced Einstein as cultural icon, and the brass ring is no longer the otherworldly Nobel Prize, but the billion-dollar acquisition. Universities are dominated by “technology transfer” activities, and engineering professors—even some in the sciences—are looked at askance if they aren’t running a couple startups on the side.
One can imagine the encouraged path, the inevitable-seeming path, for a present-day inventor of error-correcting codes: error-correcting.com, encoding as a service, all ingenuity hidden behind an api. Bait to be swallowed by some hulking multinational. Hamming wrote books.
It’s crucial to emphasize that the “great work” Hamming extols had no entrepreneurial component whatsoever. Gifts to humanity—to be learned by minds, not downloaded to phones.
As we read Hamming’s reminiscences of lunching with Shockley, Brattain, and Bardeen—lauded today as “inventors of the transistor”—let us keep in mind how they viewed themselves: “discoverers” of “transistor action.” A phenomenon, not a product; to be described, not owned and rented out. Their Nobel Prize reads “discovery of the transistor effect.” Hamming’s account lingers on the prize and is silent on the patents.
This book is adapted from a course that Hamming taught at the U.S. Naval Postgraduate School, to cohorts of “carefully selected navy, marine, army, air force, and coast guard students with very few civilians.” In a recording of the introductory lecture, we hear Hamming tell his students:
The Navy is paying a large sum of money to have you here, and it wants its money back by your later performance.
The United States Navy does not want its money back with an ipo. Regardless of one’s orientation toward military engineering, there is no question that these students are expected and expecting to serve the public interest.
Hamming-greatness is tied, inseparably, with the conception of science and engineering as public service. This school of thought is not extinct today, but it is rare, and doing such work is not impossible, but fights a nearly overwhelming current.
Yet, to Hamming, bad conditions are no excuse for bad work. You will find in these pages ample motivation to use your one life to do great work, work that transcends its times, regardless of the conditions you find yourself in.
As you proceed, I invite you to study not just Hamming’s techniques for achieving greatness, but the specific kind of greatness he achieved. I invite you to be inspired not just by Hamming’s success, but by his gifts to humanity, among the highest of which is surely this book itself.
Bret Victor
Cambridge, Massachusetts
December 2018
The Art of Doing Science and Engineering: Learning to Learn
© 2020 Stripe Press
CRC Press edition published 1996
Stripe Press edition published 2020
All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording or any other information storage and retrieval system, without prior permission in writing from the publisher.
Mathematics proofreading for this edition by Dave Hinterman
Published in the United States of America by Stripe Press / Stripe Matter Inc.
Stripe Press
Ideas for progress
San Francisco, California
Printed by Hemlock in Canada
ebook design by Bright Wing Media
ISBN: 978-1-7322651-7-2
Fourth Edition
After many years of pressure and encouragement from friends, I decided to write up the graduate course in engineering I teach at the U.S. Naval Postgraduate School in Monterey, California. At first I concentrated on all the details I thought should be tightened up, rather than leave the material as a series of somewhat disconnected lectures. In class the lectures often followed the interests of the students, and many of the later lectures were suggested topics in which they expressed an interest. Also, the lectures changed from year to year as various areas developed. Since engineering depends so heavily these days on the corresponding sciences, I often use the terms interchangeably.
After more thought I decided that since I was trying to teach “
Since my classes are almost all carefully selected Navy, Marine, Army, Air Force, and Coast Guard students with very few civilians, and, interestingly enough, about 15 percent very highly selected foreign military, the students face a highly technical future—hence the importance of preparing them for their future and not just our past.
This course is mainly personal experiences I have had and digested, at least to some extent. Naturally one tends to remember one’s successes and forget lesser events, but I recount a number of my spectacular failures as clear examples of what to avoid. I have found that the personal story is far, far more effective than the impersonal one; hence there is necessarily an aura of “bragging” in the book that is unavoidable.
Let me repeat what I earlier indicated. Apparently an “art”—which almost by definition cannot be put into words—is probably best communicated by approaching it from many sides and doing so repeatedly, hoping thereby students will finally master enough of the art, or if you wish, style, to significantly increase their future contributions to society. A totally different description of the course is: it covers all kinds of things that could not find their proper place in the standard curriculum.
The casual reader should not be put off by the mathematics; it is only “window dressing” used to illustrate and connect up with earlier learned material. Usually the underlying ideas can be grasped from the words alone.
It is customary to thank various people and institutions for help in producing a book. Thanks obviously go to AT&T Bell Laboratories, Murray Hill, New Jersey, and to the U.S. Naval Postgraduate School, especially the Department of Electrical and Computer Engineering, for making this book possible.
This book is concerned more with the future and less with the past of science and engineering. Of course future predictions are uncertain and usually based on the past; but the past is also much more uncertain—or even falsely reported—than is usually recognized. Thus we are forced to imagine what the future will probably be. This course has been called “Hamming on Hamming” since it draws heavily on my own past experiences, observations, and wide reading.
There is a great deal of mathematics in the early part because almost surely the future of science and engineering will be more mathematical than the past, and also I need to establish the nature of the foundations of our beliefs and their uncertainties. Only then can I show the weaknesses of our current beliefs and indicate future directions to be considered.
If you find the mathematics difficult, skip those early parts. Later sections will be understandable provided you are willing to forgo the deep insights mathematics gives into the weaknesses of our current beliefs. General results are always stated in words, so the content will still be there but in a slightly diluted form.
The course is concerned with “style,” and almost by definition style cannot be taught in the normal manner by using words. I can only approach the topic through particular examples, which I hope are well within your grasp, though the examples come mainly from my 30 years in the mathematics department of the Research Division of Bell Telephone Laboratories (before it was broken up). It also comes from years of study of the work of others.
I have found to be effective in this course, I must use mainly firsthand knowledge, which implies I break a standard taboo and talk about myself in the first person, instead of the traditional impersonal way of science. You must forgive me in this matter, as there seems to be no other approach which will be as effective. If I do not use direct experience, then the material will probably sound to you like merely pious words and have little impact on your minds, and it is your minds I must change if I am to be effective.
This talking about first-person experiences will give a flavor of “bragging,” though I include a number of my serious errors to partially balance things. Vicarious learning from the experiences of others saves making errors yourself, but I regard the study of successes as being basically more important than the study of failures. As I will several times say, there are so many ways of being wrong and so few of being right, studying successes is more efficient, and furthermore when your turn comes you will know how to succeed rather than how to fail!
I am, as it were, only a coach. I cannot run the mile for you; at best I can discuss styles and criticize yours. You know you must run the mile if the athletics course is to be of benefit to you—hence you must think carefully about what you hear or read in this book if it is to be effective in changing you—which must obviously be the purpose of any course. Again, you will get out of this course only as much as you put in, and if you put in little effort beyond sitting in the class or reading the book, then it is simply a waste of your time. You must also mull things over, compare what I say with your own experiences, talk with others, and make some of the points part of your way of doing things.
Since the subject matter is “style,” I will use the comparison with teaching painting. Having learned the fundamentals of painting, you then study under a master you accept as being a great painter; but you know you must forge your own style out of the elements of various earlier painters plus your native abilities. You must also adapt your style to fit the future, since merely copying the past will not be enough if you aspire to future greatness—a matter I assume, and will talk about often in the book. I will show you my style as best I can, but, again, you must take those elements of it which seem to fit you, and you must finally create your own style. Either you will be a leader or a follower, and my goal is for you to be a leader. You cannot adopt every trait I discuss in what I have observed in myself and others; you must select and adapt, and make them your own if the course is to be effective.
Even more difficult than what to select is that what is a successful style in one age may not be appropriate to the next age! My predecessors at Bell Telephone Laboratories used one style; four of us who came in all at about the same time, and had about the same chronological age, found our own styles, and as a result we rather completely transformed the overall style of the mathematics department, as well as many parts of the whole Laboratories. We privately called ourselves “the four young Turks,” and many years later I found top management had called us the same!
I return to the topic of education. You all recognize there is a significant difference between education and training.
Education is what, when, and why to do things.
Training is how to do it.
and the amount of knowledge produced annually has a constant k of proportionality to the number of scientists alive. Assuming we begin at minus infinity in time (the error is small and you can adjust it to Newton’s time if you wish), we have the formula
hence we know b. Now to the other statement. If we allow the lifetime of a scientist to be 55 years (it seems likely that the statement meant living and not practicing, but excluding childhood) then we have
which is very close to 90%.
Typically the first back-of-the-envelope calculations use, as we did, definite numbers where one has a feel for things, and then we repeat the calculations with parameters so you can adjust things to fit the data better and understand the general case. Let the doubling period be D and the lifetime of a scientist be L. The first equation now becomes
and the second becomes
With D = 17 years we have 17 × 3.3219 = 56.47… years for the lifetime of a scientist, which is close to the 55 we assumed. We can play with the ratio of L/D until we find a slightly closer fit to the data (which was approximate, though I believe more in the 17 years for doubling than I do in the 90%). Back-of-the-envelope computing indicates the two remarks are reasonably compatible. Notice the relationship applies for all time so long as the assumed simple relationships hold.
Added to the problem of the growth of new knowledge is the obsolescence of old knowledge. It is claimed by many the half-life of the technical knowledge you just learned in school is about 15 years—in 15 years half of it will be obsolete (either we will have gone in other directions or will have replaced it with new material). For example, having taught myself a bit about vacuum tubes (because at Bell Telephone Laboratories they were at that time obviously important) I soon found myself helping, in the form of computing, the development of transistors—which obsoleted my just-learned knowledge!
To bring the meaning of this doubling down to your own life, suppose you have a child when you are x years old. That child will face, when it is in college, about y times the amount you faced.
| y | x |
|---|---|
| Factor of increase | Years |
| 2 | 17 |
| 3 | 27 |
| 4 | 34 |
| 5 | 39 |
| 6 | 44 |
| 7 | 48 |
| 8 | 51 |
This doubling is not just in theorems of mathematics and technical results, but in musical recordings of Beethoven’s Ninth, of where to go skiing, of tv programs to watch or not to watch. If you were at times awed by the mass of knowledge you faced when you went to college, or even now, think of your children’s troubles when they are there! The technical knowledge involved in your life will quadruple in 34 years, and many of you will then be near the high point of your career. Pick your estimated years to retirement and then look in the left-hand column for the probable factor of increase over the present current knowledge when you finally quit!
I need to discuss science vs. engineering. Put glibly:
In science, if you know what you are doing, you should not be doing it.
In engineering, if you do not know what you are doing, you should not be doing it.
Of course, you seldom, if ever, see either pure state. All of engineering involves some creativity to cover the parts not known, and almost all of science includes some practical engineering to translate the abstractions into practice. Much of present science rests on engineering tools, and as time goes on, engineering seems to involve more and more of the science part. Many of the large scientific projects involve very serious engineering problems—the two fields are growing together! Among other reasons for this situation is almost surely that we are going forward at an accelerated pace, and now there is not time to allow us the leisure which comes from separating the two fields. Furthermore, both the science and the engineering you will need for your future will more and more often be created after you left school. Sorry! But you will simply have to actively master on your own the many new emerging fields as they arise, without having the luxury of being passively taught.
It should be noted that engineering is not just applied science, which is a distinct third field (though it is not often recognized as such) which lies between science and engineering.
I read somewhere there are 76 different methods of predicting the future—but the very number suggests there is no reliable method which is widely accepted. The most trivial method is to predict tomorrow will be exactly the same as today—which at times is a good bet. The next level of sophistication is to use the current rates of change and to suppose they will stay the same—linear prediction in the variable used. Which variable you use can, of course, strongly affect the prediction made! Both methods are not much good for long-term predictions, however.
The past was once the future and the future will become the past.
In any case, I will often use history as a background for the extrapolations I make. I believe the best predictions are based on understanding the fundamental forces involved, and this is what I depend on mainly. Often it is not physical limitations which control but rather it is human-made laws, habits, and organizational rules, regulations, personal egos, and inertia which dominate the evolution to the future. You have not been trained along these lines as much as I believe you should have been, and hence I must be careful to include them whenever the topics arise.
There is a saying, “Short-term predictions are always optimistic and long-term predictions are always pessimistic.” The reason, so it is claimed, the second part is true is that for most people the geometric growth due to the compounding of knowledge is hard to grasp. For example, for money, a mere 6% annual growth doubles the money in about 12 years! In 48 years the growth is a factor of 16. An example of the truth of this claim that most long-term predictions are low is the growth of the computer field in speed, in density of components, in drop in price, etc., as well as the spread of computers into the many corners of life. But the field of artificial intelligence (ai) provides a very good counterexample. Almost all the leaders in the field made long-term predictions which have almost never come true, and are not likely to do so within your lifetime, though many will in the fullness of time.
Reading some historians you get the impression the past was determined by big trends, but you also have the feeling the future has great possibilities. You can handle this apparent contradiction in at least four ways:
It is probable the future will be more limited by the slow evolution of the human animal and the corresponding human laws, social institutions, and organizations than it will be by the rapid evolution of technology.
In spite of the difficulty of predicting the future and that
unforeseen technological inventions can completely upset the most careful predictions,
you must try to foresee the future you will face. To illustrate the importance of this point of trying to foresee the future I often use a standard story.
steps from the origin. But if there is a pretty girl in one direction, then his steps will tend to go in that direction and he will go a distance proportional to n. In a lifetime of many, many independent choices, small and large, a career with a vision will get you a distance proportional to n, while no vision will get you only the distance
. In a sense, the main difference between those who go far and those who do not is some people have a vision and the others do not and therefore can only react to the current events as they happen.No vision, not much of a future.
To what extent history does or does not repeat itself is a moot question. But it is one of the few guides you have, hence history will often play a large role in my discussions—I am trying to provide you with some perspective as a possible guide to create your vision of your future. The other main tool I have used is an active imagination in trying to see what will happen. For many years I devoted about 10% of my time (Friday afternoons) to trying to understand what would happen in the future of computing, both as a scientific tool and as a shaper of the social world of work and play. In forming your plan for your future you need to distinguish three different questions:
What is possible?
What is likely to happen?
What is desirable to have happen?
In a sense the first is science—what is possible. The second is engineering—what are the human factors which choose the one future that does happen from the ensemble of all possible futures. The third is ethics, morals, or whatever other word you wish to apply to value judgments. It is important to examine all three questions, and insofar as the second differs from the third, you will probably have an idea of how to alter things to make the more desirable future occur, rather than let the inevitable happen and suffer the consequences. Again, you can see why having a vision is what tends to separate the leaders from the followers.
The standard process of organizing knowledge by departments, and sub-departments, and further breaking it up into separate courses, tends to conceal the homogeneity of knowledge, and at the same time to omit much which falls between the courses. The optimization of the individual courses in turn means a lot of important things in engineering practice are skipped since they do not appear to be essential to any one course. One of the functions of this book is to mention and illustrate many of these missed topics which are important in the practice of science and engineering. Another goal of the course is to show the essential unity of all knowledge rather than the fragments which appear as the individual topics are taught. In your future anything and everything you know might be useful, but if you believe the problem is in one area you are not apt to use information that is relevant but which occurred in another course.
| Economics | Far cheaper, and getting more so |
| Speed | Far, far faster |
| Accuracy | Far more accurate (precise) |
| Reliability | Far ahead (many have error correction built into them) |
| Rapidity of control | Many current airplanes are unstable and require rapid computer control to make them practical |
| Freedom from boredom | An overwhelming advantage |
| Bandwidth in and out | Again overwhelming |
| Ease of retraining | Change programs, not unlearn and then learn the new thing, consuming hours and hours of human time and effort |
| Hostile environments | Outer space, underwater, high-radiation fields, warfare, manufacturing situations that are unhealthful, etc. |
| Personnel problems | They tend to dominate management of humans but not of machines; with machines there are no pensions, personal squabbles, unions, personal leave, egos, deaths of relatives, recreation, etc. |
I need not list the advantages of humans over computers—almost every one of you has already objected to this list and has in your mind started to cite the advantages on the other side.
The unexamined life is not worth living.
We are approaching the end of the revolution of going from signaling with continuous signals to signaling with discrete pulses, and we are now probably moving from using pulses to using
Why has this revolution happened?
Compare this to discrete signaling. At each stage we do not amplify the signal, but rather we use the incoming pulse to gate, or not, a standard source of pulses; we actually use repeaters, not amplifiers. Noise introduced at one spot, if not too much to make the pulse detection wrong at the next repeater, is automatically removed. Thus with remarkable fidelity we can transmit a voice signal if we use digital signaling, and furthermore the equipment need not be built extremely accurately. We can use, if necessary, error-detecting and error-correcting codes to further defeat the noise. We will examine these codes later, Chapters 10–12. Along with this we have developed the area of digital filters, which are often much more versatile, compact, and cheaper than are analog filters, Chapters 14–17. We should note here transmission through space (typically signaling) is the same as transmission through time (storage).
Digital computers can take advantage of these features and carry out very deep and accurate computations which are beyond the reach of analog computation. Analog computers have probably passed their peak of importance, but should not be dismissed lightly. They have some features which, so long as great accuracy or deep computations are not required, make them ideal in some situations.
The invention and development of transistors and integrated circuits, ics, has greatly helped the digital revolution. Before ics the problem of soldered joints dominated the building of a large computer, and ics did away with most of this problem, though soldered joints are still troublesome. Furthermore, the high density of components in an ic means lower cost and higher speeds of computing (the parts must be close to each other since otherwise the time of transmission of signals will significantly slow down the speed of computation). The steady decrease of both the voltage and current levels has contributed to the partial solving of heat dissipation.
| Interconnection on the chip | $10–5 = 0.001 cent |
| Interchip | $10–2 = 1 cent |
| Interboard | $10–1 = 10 cents |
| Interframe | $100 = 100 cents |
Society is steadily moving from a material goods society to an information service society. At the time of the American Revolution, say 1780 or so, over 90% of the people were essentially farmers—now farmers are a very small percentage of workers. Similarly, before wwii most workers were in factories—now less than half are there. In 1993, there were more people in government (excluding the military) than there were in manufacturing! What will the situation be in 2020? As a guess I would say less than 25% of the people in the civilian workforce will be handling things; the rest will be handling information in some form or other. In making a movie or a tv program you are making not so much a thing, though of course it does have a material form, as you are organizing information. Information is, of course, stored in a material form, say a book (the essence of a book is information), but information is not a material good to be consumed like food, a house, clothes, an automobile, or an airplane ride for transportation.
The information revolution arises from the above three items plus their synergistic interaction, though the following items also contribute.
This last point needs careful emphasis.
When we first passed from hand accounting to machine accounting we found it necessary, for economical reasons if no other, to somewhat alter the accounting system. Similarly, when we passed from strict hand fabrication to machine fabrication we passed from mainly screws and bolts to rivets and welding.
It has rarely proved practical to produce exactly the same product by machines as we produced by hand.
Indeed, one of the major items in the conversion from hand to machine production is the imaginative redesign of an equivalent product. Thus in thinking of mechanizing a large organization, it won’t work if you try to keep things in detail exactly the same, rather there must be a larger give and take if there is to be a significant success. You must get the essentials of the job in mind and then design the mechanization to do that job rather than trying to mechanize the current version—if you want a significant success in the long run.
I need to stress this point: mechanization requires you produce an equivalent product, not identically the same one. Furthermore, in any design it is now essential to consider field maintenance since in the long run it often dominates all other costs. The more complex the designed system, the more field maintenance must be central to the final design. Only when field maintenance is part of the original design can it be safely controlled; it is not wise to try to graft it on later. This applies to both mechanical things and to human organizations.
From that one experience, on thinking it over carefully and what it meant, I realized computers would allow the simulation of many different kinds of experiments. I put that vision into practice at Bell Telephone Laboratories for many years. Somewhere in the mid to late 1950s, in an address to the President and vps of Bell Telephone Laboratories, I said, “At present we are doing one out of ten experiments on the computers and nine in the labs, but before I leave it will be nine out of ten on the machines.” They did not believe me then, as they were sure real observations were the key to experiments and I was just a wild theoretician from the mathematics department, but you all realize by now we do somewhere between 90% to 99% of our experiments on the machines and the rest in the labs. And this trend will go on! It is so much cheaper to do simulations than real experiments, so much more flexible in testing, and we can even do things which cannot be done in any lab, so it is inevitable the trend will continue for some time. Again, the product was changed!
Computers have also greatly affected engineering. Not only can we design and build far more complex things than we could by hand, we can explore many more alternate designs. We also now use computers to control situations, such as on the modern high-speed airplane, where we build unstable designs and then use high-speed detection and computers to stabilize them since the unaided pilot simply cannot fly them directly. Similarly, we can now do unstable experiments in the laboratories using a fast computer to control the instability. The result will be that the experiment will measure something very accurately right on the edge of stability.
As noted above, engineering is coming closer to science, and hence the role of simulation in unexplored situations is rapidly increasing in engineering as well as science. It is also true computers are now often an essential component of a good design.
In the past engineering has been dominated to a great extent by “what can we do,” but now “what do we want to do” looms greater since we now have the power to design almost anything we want. More than ever before, engineering is a matter of choice and balance rather than just doing what can be done. And more and more it is the human factors which will determine good design—a topic which needs your serious attention at all times.
The effects on society are also large. The most obvious illustration is that computers have given top management the power to micromanage their organization, and top management has shown little or no ability to resist using this power. You can regularly read in the papers some big corporation is decentralizing, but when you follow it for several years you see they merely intended to do so, but did not.
Furthermore, central planning has been repeatedly shown to give poor results (consider the Russian experiment, for example, or our own bureaucracy). The persons on the spot usually have better knowledge than can those at the top and hence can often (not always) make better decisions if things are not micromanaged. The people at the bottom do not have the larger, global view, but at the top they do not have the local view of all the details, many of which can often be very important, so either extreme gets poor results.
Next, an idea which arises in the field, based on the direct experience of the people doing the job, cannot get going in a centrally controlled system since the managers did not think of it themselves. The not invented here (nih) syndrome is one of the major curses of our society, and computers, with their ability to encourage micromanagement, are a significant factor.
There is slowly, but apparently definitely, coming a counter trend to micromanagement. Loose connections between small, somewhat independent organizations are gradually arising. Thus in the brokerage business one company has set itself up to sell its services to other small subscribers, for example computer and legal services. This leaves the brokerage decisions of their customers to the local management people who are close to the front line of activity. Similarly, in the pharmaceutical area, some loosely related companies carry out their work and trade among themselves as they see fit. I believe you can expect to see much more of this loose association between small organizations as a defense against micromanagement from the top, which occurs so often in big organizations. There has always been some independence of subdivisions in organizations, but the power to micromanage from the top has apparently destroyed the conventional lines and autonomy of decision making—and I doubt the ability of most top managements to resist for long the power to micromanage. I also doubt many large companies will be able to give up micromanagement; most will probably be replaced in the long run by smaller organizations without the cost (overhead) and errors of top management. Thus computers are affecting the very structure of how society does its business, and for the moment apparently for the worse in this area.
Computers have already invaded the entertainment field. An informal survey indicates the average American spends far more time watching tv than eating—again an information field is taking precedence over the vital material field of eating! Many commercials and some programs are now either partially or completely computer produced.
How far machines will go in changing society is a matter of speculation—which opens doors to topics that would cause trouble if discussed openly! Hence I must leave it to your imaginations as to what, using computers on chips, can be done in such areas as sex, marriage, sports, games, “travel in the comforts of home via virtual realities,” and other human activities.
Computers began mainly in the field of number crunching but passed rapidly on to information retrieval (say airline reservation systems); word processing, which is spreading everywhere; symbol manipulation, as is done by many programs, such as those which can do analytic integration in the calculus far better and cheaper than can the students; and in logical and decision areas, where many companies use such programs to control their operations from moment to moment. The future computer invasion of traditional fields remains to be seen and will be discussed later under the heading of Artificial Intelligence (ai), Chapters 6–8.
The simplest model of growth assumes the rate of growth is proportional to the current size—something like compound interest, unrestrained bacterial and human population growth, as well as many other examples. The corresponding differential equation is
whose solution is, of course,
But this growth is unlimited and all things must have limits, even knowledge itself, since it must be recorded in some form and we are (currently) told the universe is finite! Hence we must include a limiting factor in the differential equation. Let L be the upper limit. Then the next simplest growth equation seems to be
At this point we of course reduce it to a standard form that eliminates the constants. Set y = Lz and x = kLt, then we have
as the reduced form for the growth problem, where the saturation level is now 1. Separation of variables plus partial fractions yields
A is, of course, determined by the initial conditions, where you put t (or x) = 0. You see immediately the S shape of the curve: at t = −∞, z = 0; at t = 0, z = A/(A + 1); and at t = + ∞, z = 1.
A more flexible model for the growth is (in the reduced variables)
This is again a separable equation, and also yields to numerical integration if you wish. We can analytically find the steepest slope by differentiating the right-hand side and equating to 0, getting
Hence at the place
we have the maximum slope
In the special case of a = b we have maximum slope = 2–2a. The curve will in this case be odd symmetric about the point where z = 1/2. In the further special case of a = b = 1/2 we get the solution
Here we see the solution curve has a finite range. For larger exponents a and b we have clearly an infinite range.
Again we see how a simple model, while not very exact in detail, suggests the nature of the situation. Whether parallel processing fits into this picture or is an independent curve is not clear at this moment. Often a new innovation will set the growth of a field onto a new S curve which takes off from around the saturation level of the old one, Figure 2.2. You may want to explore models which do not have a hard upper saturation limit but rather finally grow logarithmically; they are sometimes more appropriate.
It is evident electrical engineering in the future is going to be, to a large extent, a matter of (1) selecting chips off the shelf or from a catalog, (2) putting the chips together in a suitable manner to get what you want, and (3) writing the corresponding programs. Awareness of the chips and circuit boards which are currently available will be an essential part of engineering, much as the Vacuum Tube Catalog was in the old days.
Hence beware of special-purpose chips! Though many times they are essential.
The history of computing probably began with primitive man using pebbles to compute the sum of two amounts. Marshack (of Harvard) found that what had been believed to be mere scratches on old bones from caveman days were in fact carefully scribed lines apparently connected with the moon’s phases. The famous
The sand pan and the abacus are instruments more closely connected with computing, and the arrival of the Arabic numerals from India meant a great step forward in the area of pure computing. Great resistance to the adoption of the Arabic numerals (not in their original Arabic form) was encountered from officialdom, even to the extent of making them illegal, but in time (the 1400s) the practicalities and economic advantages triumphed over the more clumsy Roman (and earlier Greek) use of letters of the alphabet as symbols for the numbers.
The invention of logarithms by Napier (1550–1617) was the next great step. From it came the slide rule, which has the numbers on the parts as lengths proportional to the logs of the numbers, hence adding two lengths means multiplying the two numbers. This analog device, the slide rule, was another significant step forward, but in the area of analog, not digital, computers. I once used a very elaborate slide rule in the form of a 6–8-inch diameter cylinder about two feet long, with many, many suitable scales on both the outer and inner cylinders, and equipped with a magnifying glass to make the reading of the scales more accurate.
Slide rules in the 1930s and 1940s were standard equipment of the engineer, usually carried in a leather case fastened to the belt as a badge of one’s group on the campus. The standard engineer’s slide rule was a “ten-inch log log decitrig slide rule,” meaning the scales were ten inches long, included log log scales, square and cubing scales, as well as numerous trigonometric scales in decimal parts of the degree. They are no longer manufactured!
During wwii the electronic analog computers came into the military field use. They used condensers as integrators in place of the earlier mechanical wheels and balls (hence they could only integrate with respect to time). They meant a large, practical step forward, and I used one such machine at Bell Telephone Laboratories for many years. It was constructed from parts of some old m9 gun directors. Indeed, we used parts of some later condemned m9s to build a second computer to be used either independently or with the first one to expand its capacity to do larger problems.
Returning to digital computing, Napier also designed “Napier’s bones,” which were typically ivory rods with numbers which enabled one to multiply numbers easily; these are digital and not to be confused with the analog slide rule.
The next major practical stage was the comptometer, which was merely an adding device, but by repeated additions, along with shifting, this is equivalent to multiplication, and was very widely used for many, many years.
From this came a sequence of more modern desk calculators, the Millionaire, then the Marchant, the Friden, and the Monroe. At first they were hand controlled and hand powered, but gradually some of the control was built in, mainly by mechanical levers. Beginning around 1937 they gradually acquired electric motors to do much of the power part of the computing. Before 1944 at least one had the operation of square root incorporated into the machine (still mechanical levers intricately organized). Such hand machines were the basis of computing groups of people running them to provide computing power. For example, when I came to the Bell Telephone Laboratories in 1946 there were four such groups in the Labs, typically about six to ten girls in a group: a small group in the mathematics department, a larger one in network department, one in switching, and one in quality control.
Let me turn to some comparisons:
| Hand calculators | 1/20 ops. per sec. |
| Relay machines | 1 op. per sec. typically |
| Magnetic drum machines | 15–1,000, depending somewhat on fixed or floating point |
| 701 type | 1,000 ops. per sec. |
| Current (1990) | 109 (around the fastest of the von Neumann type) |
The changes in speed, and corresponding storage capacities, that I have had to live through should give you some idea as to what you will have to endure in your careers. Even for von Neumann-type machines there is probably another factor of speed of around 100 before reaching the saturation speed.
Since such numbers are actually beyond most human experience, I need to introduce a human dimension to the speeds you will hear about. First, notation (the parentheses contain the standard symbol):
| milli (m) | 10–3 | kilo (k) | 103 |
| micro (µ) | 10–6 | mega (M) | 106 |
| nano (n) | 10–9 | giga (G) | 109 |
| pico (p) | 10–12 | terra (T) | 1012 |
| femto (f) | 10–15 | ||
| atto (a) | 10–18 |
Now to the human dimensions. In one day there are 60 × 60 × 24 = 86,400 seconds. In one year there are close to 3.15 × 107 seconds, and in 100 years, probably greater than your lifetime, there are about 3.15 × 109 seconds. Thus in three seconds a machine doing 109 floating point operations per second (flops) will do more operations than there are seconds in your whole lifetime, and almost certainly get them all correct!
For another approach to human dimensions, the velocity of light in a vacuum is about 3 × 1010 cm/sec (along a wire it is about 7/10 as fast). Thus in a nanosecond light goes 30 cm, about one foot. At a picosecond the distance is, of course, about 1/100 of an inch. These represent the distances a signal can go (at best) in an ic. Thus at some of the pulse rates we now use the parts must be very close to each other—close in human dimensions—or else much of the potential speed will be lost in going between parts. Also we can no longer used lumped circuit analysis.
How about natural dimensions of length instead of human dimensions? Well, atoms come in various sizes, running generally around 1 to 3 angstroms (an angstrom is 10-8 cm) and in a crystal are spaced around 10 angstroms apart, typically, though there are exceptions. In 1 femtosecond light can go across about 300 atoms. Therefore the parts in a very fast computer must be small and very close together!
If you think of a transistor using impurities, and the impurities run around one in a million typically, then you would probably not believe a transistor with one impure atom, but maybe, if you lower the temperature to reduce background noise, 1,000 impurities is within your imagination—thus making the solid-state device of at least around 1,000 atoms on a side. With interconnections at times running at least ten device distances, you see why you feel getting below 100,000 atoms’ distance between some interconnected devices is really pushing things (3 picoseconds).
Then there is heat dissipation. While there has been talk of thermodynamically reversible computers, so far it has only been talk and published papers, and heat still matters. The more parts per unit area, and the faster the rate of state change, the more the heat generated in a small area, which must be gotten rid of before things melt. To partially compensate we have been going to lower and lower voltages, and are now going to 2 1/2 or 3 volts operating the ic. The possibility that the base of the chip have a diamond layer is currently being examined since diamond is a very good heat conductor, much better than copper. There is now a reasonable possibility for a similar, possibly less expensive than diamond crystal structure with very good heat conduction properties.
To speed up computers we have gone to two, four, and even more arithmetic units in the same computer, and have also devised pipelines and cache memories. These are all small steps towards highly parallel computers.
Thus you see the handwriting on the wall for the single-processor machine—we are approaching saturation. Hence the fascination with highly parallel machines. Unfortunately there is as yet no single general structure for them, but rather many, many competing designs, all generally requiring different strategies to exploit their potential speeds and having different advantages and disadvantages. It is not likely a single design will emerge for a standard parallel computer architecture, hence there will be trouble and dissipation in efforts to pursue the various promising directions.
Here in the history of the growth of computers you see a realization of the S-type growth curve: the very slow start, the rapid rise, the long stretch of almost linear growth in the rate, and then the facing of the inevitable saturation.
Again, to reduce things to human size. When I first got digital computing really going inside Bell Telephone Laboratories I began by renting computers outside for so many hours the head of the mathematics department figured out for himself it would be cheaper to get me one inside—a deliberate plot on my part to avoid arguing with him, as I thought it useless and would only produce more resistance on his part to digital computers. Once a boss says “no!” it is very hard to get a different decision, so don’t let them say “no!” to a proposal. I found in my early years I was doubling the number of computations per year about every 15 months. Some years later I was reduced to doubling the amount about every 18 months. The department head kept telling me I could not go on at that rate forever, and my polite reply was always, “You are right, of course, but you just watch me double the amount of computing every 18–20 months!” Because the machines available kept up the corresponding rate, this enabled me, and my successors, for many years to double the amount of computing done. We lived on the almost straight line part of the S curve all those years.
However, let me observe in all honesty to the department head, it was remarks by him which made me realize it was not the number of operations done that mattered, it was, as it were, the number of micro-Nobel prizes I computed that mattered. Thus the motto of a book I published in 1961:
The purpose of computing is insight, not numbers.
A good friend of mine revised it to:
The purpose of computing numbers is not yet in sight.
It is necessary now to turn to some of the details of how for many years computers were constructed. The smallest parts we will examine are two-state devices for storing bits of information, and for gates which either let a signal go through or block it. Both are binary devices, and in the current state of knowledge they provide the easiest, fastest methods of computing we know.
From such parts we construct combinations which enable us to store longer arrays of bits; these arrays are often called number registers. The logical control is just a combination of storage units including gates. We build an adder out of such devices, as well as every larger unit of a computer.
Going to the still larger units we have the machine consisting of: (1) a storage device, (2) a central control, (3) an alu unit, meaning arithmetic and logic unit. There is in the central control a single register, which we will call the Current Address Register (car). It holds the address of where the next instruction is to be found, Figure 3.1.
The cycle of the computer is:
We see the machine does not know where it has been, nor where it is going to go; it has at best only a myopic view of simply repeating the same cycle endlessly. Below this level the individual gates and two-way storage devices do not know any meaning—they simply react to what they are supposed to do. They too have no global knowledge of what is going on, nor any meaning to attach to any bit, whether storage or gating.
There are some instructions which, depending on some state of the machine, put the address of their instruction into the car (and 1 is not added in such cases), and then the machine, in starting its cycle, simply finds an address which is not the immediate successor in storage of the previous instruction, but the location inserted into the car.
I am reviewing this so you will be clear the machine processes bits of information according other bits, and as far as the machine is concerned there is no meaning to anything which happens—it is we who attach meaning to the bits. The machine is a “machine” in the classical sense; it does what it does and nothing else (unless it malfunctions). There are, of course, real-time interrupts, and other ways new bits get into the machine, but to the machine they are only bits.
How different are we in practice from the machines? We would all like to think we are different from machines, but are we essentially? It is a touchy point for most people, and the emotional and religious aspects tend to dominate most arguments. We will return to this point in Chapters 6–8 on ai when we have more background to discuss it reasonably.
As I indicated in the last chapter, in the early days of computing the control part was all done by hand. The slow desk computers were at first controlled by hand, for example multiplication was done by repeated additions, with column shifting after each digit of the multiplier. Division was similarly done by repeated subtractions. In time electric motors were applied, both for power and later for more automatic control over multiplication and division. The punch card machines were controlled by plug board wiring to tell the machine where to find the information, what to do with it, and where to put the answers on the cards (or on the printed sheet of a tabulator), but some of the control might also come from the cards themselves, typically X and Y punches (other digits could, at times, control what happened). A plug board was specially wired for each job to be done, and in an accounting office the wired boards were usually saved and used again each week, or month, as they were needed in the cycle of accounting.
When we came to the relay machines, after
An interesting story about soap is a copy of the program, call it program A, was both loaded into the machine as a program and processed as data. The output of this was program B. Then B was loaded into the 650 and A was run as data to produce a new B program. The difference between the two running times to produce program B indicated how much the optimization of the soap program (by soap itself) produced. An early example of self-compiling, as it were.
In the beginning we programmed in absolute binary, meaning we wrote the actual address where things were in binary, and wrote the instruction part also in binary! There were two trends to escape this: octal, where you simply group the binary digits in sets of three, and hexadecimal, where you take four digits at a time, and had to use A, B, C, D, E, and F for the representation of other numbers beyond 9 (and you had, of course, to learn the multiplication and addition tables to 15).
If, in fixing up an error, you wanted to insert some omitted instructions, then you took the immediately preceding instruction and replaced it by a transfer to some empty space. There you put in the instruction you just wrote over, added the instructions you wanted to insert, followed by a transfer back to the main program. Thus the program soon became a sequence of jumps of the control to strange places. When, as almost always happens, there were errors in the corrections, you then used the same trick again, using some other available space. As a result the control path of the program through storage soon took on the appearance of a can of spaghetti. Why not simply insert them in the run of instructions? Because then you would have to go over the entire program and change all the addresses which referred to any of the moved instructions! Anything but that!
We very soon got the idea of reusable software, as it is now called. Indeed, Babbage had the idea. We wrote mathematical libraries to reuse blocks of code. But an absolute address library meant each time the library routine was used it had to occupy the same locations in storage. When the complete library became too large we had to go to relocatable programs. The necessary programming tricks were in the von Neumann reports, which were never formally published.
Someone got the idea a short piece of program could be written which would read in the symbolic names of the operations (like add) and translate them at input time to the binary representations used inside the machine (say 01100101). This was soon followed by the idea of using symbolic addresses—a real heresy for the old time programmers. You do not now see much of the old heroic absolute programming (unless you fool with a handheld programmable computer and try to get it to do more than the designer and builder ever intended).
I once spent a full year, with the help of a lady programmer from Bell Telephone Laboratories, on one big problem coding in absolute binary for the ibm 701, which used all the 32K registers then available. After that experience I vowed never again would I ask anyone to do such labor. Having heard about a symbolic system from Poughkeepsie, ibm, I asked her to send for it and to use it on the next problem, which she did. As I expected, she reported it was much easier. So we told everyone about the new method, meaning about 100 people, who were also eating at the ibm cafeteria near where the machine was. About half were ibm people and half were, like us, outsiders renting time. To my knowledge only one person—yes, only one—of all the 100 showed any interest!
Finally, a more complete, and more useful, Symbolic Assembly Program (sap) was devised—after more years than you are apt to believe, during which time most programmers continued their heroic absolute binary programming. At the time sap first appeared I would guess about 1% of the older programmers were interested in it—using sap was “sissy stuff,” and a real programmer would not stoop to wasting machine capacity to do the assembly. Yes! Programmers wanted no part of it, though when pressed they had to admit their old methods used more machine time in locating and fixing up errors than the sap program ever used. One of the main complaints was when using a symbolic system you didn’t know where anything was in storage—though in the early days we supplied a mapping of symbolic to actual storage, and believe it or not they later lovingly pored over such sheets rather than realize they did not need to know that information if they stuck to operating within the system—no! When correcting errors they preferred to do it in absolute binary addresses.
Physically, the management of the ibm 701, at ibm Headquarters in New York City where we rented time, was terrible. It was a sheer waste of machine time (at that time $300 per hour was a lot) as well as human time. As a result I refused later to order a big machine until I had figured out how to have a monitor system—which someone else finally built for our first ibm 709, and later modified it for the ibm 7096.
Again, monitors, often called “the system” these days, like all the earlier steps I have mentioned, should be obvious to anyone who is involved in using the machines from day to day; but most users seem too busy to think or observe how bad things are and how much the computer could do to make things significantly easier and cheaper. To see the obvious it often takes an outsider, or else someone like me who is thoughtful and wonders what he is doing and why it is all necessary. Even when told, the old timers will persist in the ways they learned, probably out of pride for their past and an unwillingness to admit there are better ways than those they were using for so long.
One way of describing what happened in the history of software is that we were slowly going from absolute to virtual machines. First, we got rid of the actual code instructions, then the actual addresses, then in fortran the necessity of learning a lot of the insides of these complicated machines and how they worked. We were buffering the user from the machine itself. Fairly early at Bell Telephone Laboratories we built some devices to make the tape units virtual, machine independent. When, and only when, you have a totally virtual machine will you have the ability to transfer software from one machine to another without almost endless trouble and errors.
fortran was successful far beyond anyone’s expectations because of the psychological fact it was just what its name implied—formula translation of the things one had always done in school; it did not require learning a new set of ways of thinking.
Algol, around 1958–1960, was backed by many worldwide computer organizations, including the acm. It was an attempt by the theoreticians to greatly improve fortran. But being logicians, they produced a logical, not a humane, psychological language, and of course, as you know, it failed in the long run. It was, among other things, stated in a Boolean logical form which is not comprehensible to mere mortals (and often not even to the logicians themselves!). Many other logically designed languages which were supposed to replace the pedestrian fortran have come and gone, while fortran (somewhat modified to be sure) remains a widely used language, indicating clearly the power of psychologically designed languages over logically designed languages.
This was the beginning of a great hope for special languages, pols they were called, meaning problem-oriented languages. There is some merit in this idea, but the great enthusiasm faded because too many problems involved more than one special field, and the languages were usually incompatible. Furthermore, in the long run, they were too costly in the learning phase for humans to master all of the various ones they might need.
The last is something which need not bother you, as in those days we made a distinction between “open” and “closed” subroutines, which is hard to explain now!
I made the two-address fixed point decimal machine look like a three-address floating point machine—that was my goal—A op. B = C. I used the ten decimal digits of the machine (it was a decimal machine so far as the user was concerned) in this form:
| A address | Op. | B address | C address |
| xxx | x | xxx | xxx |
The software system I built was placed in the storage registers 1,000 to 1,999. Thus any program in the synthetic language, having only three decimal digits, could only refer to addresses 000 to 999, and could not refer to, and alter, any register in the software and thus ruin it: designed-in security protection of the software system from the user.
The human animal is not reliable, as I keep insisting, so low redundancy means lots of undetected errors, while high redundancy tends to catch the errors. The spoken language goes over an acoustic channel with all its noise and must be caught on the fly as it is spoken; the written language is printed, and you can pause, backscan, and do other things to uncover the author’s meaning. Notice in English more often different words have the same sounds (“there” and “their,” for example) than words have the same spelling but different sounds (“record” as a noun or a verb, and “tear” as in tear in the eye vs. tear in a dress). Thus you should judge a language by how well it fits the human animal as it is—and remember I include how they are trained in school, or else you must be prepared to do a lot of training to handle the new type of language you are going to use. That a language is easy for the computer expert does not mean it is necessarily easy for the non-expert, and it is likely non-experts will do the bulk of the programming (coding, if you wish) in the near future.
What is wanted in the long run, of course, is that the man with the problem does the actual writing of the code with no human interface, as we all too often have these days, between the person who knows the problem and the person who knows the programming language. This date is unfortunately too far off to do much good immediately, but I would think by the year 2020 it would be fairly universal practice for the expert in the field of application to do the actual program preparation rather than have experts in computers (and ignorant of the field of application) do the program preparation.
Until we better understand languages of communication involving humans as they are (or can be easily trained), it is unlikely many of our software problems will vanish.
You read constantly about “engineering the production of software,” both for the efficiency of production and for the reliability of the product. But you do not expect novelists to “engineer the production of novels.” The question arises: “Is programming closer to novel writing than it is to classical engineering?” I suggest yes! Given the problem of getting a man into outer space, both the Russians and the Americans did it pretty much the same way, all things considered, and allowing for some espionage. They were both limited by the same firm laws of physics. But give two novelists the problem of writing on “the greatness and misery of man,” and you will probably get two very different novels (without saying just how to measure this). Give the same complex problem to two modern programmers and you will, I claim, get two rather different programs. Hence my belief that current programming practice is closer to novel writing than it is to engineering. The novelists are bound only by their imaginations, which is somewhat as the programmers are when they are writing software. Both activities have a large creative component, and while you would like to make programming resemble engineering, it will take a lot of time to get there—and maybe you really, in the long run, do not want to do it! Maybe it just sounds good. You will have to think about it many times in the coming years; you might as well start now and discount propaganda you hear, as well as all the wishful thinking which goes on in the area! The software of the utility programs of computers has been done often enough, and is so limited in scope, so it might reasonably be expected to become “engineered,” but the general software preparation is not likely to be under “engineering control” for many, many years.
There are many proposals on how to improve the productivity of the individual programmer, as well as groups of programmers. I have already mentioned top-down and bottom-up; there are others, such as head programmer, lead programmer, proving the program is correct in a mathematical sense, and the waterfall model of programming, to name but a few. While each has some merit I have faith in only one, which is almost never mentioned—think before you write the program, it might be called. Before you start, think carefully about the whole thing, including what will be your acceptance test that it is right, as well as how later field maintenance will be done. Getting it right the first time is much better than fixing it up later!
One trouble with much of programming is simply that often there is not a well-defined job to be done; rather, the programming process itself will gradually discover what the problem is! The desire that you be given a well-defined problem before you start programming often does not match reality, and hence a lot of the current proposals to “solve the programming problem” will fall to the ground if adopted rigorously.
The use of higher-level languages has meant a lot of progress. The following table shows one estimate of the improvement in 30 years.
| Type of improvement | Cumulative improvement factor | ||
|---|---|---|---|
| Assembler:machine code | = | 2:1 | ×2 |
| C language:assembler | = | 3:1 | ×6 |
| Timeshare:batch | = | 1.5:1 | ×9 |
| UNIX:monitor | = | 4:3 | ×12 |
| System QA:debugging | = | 2:1 | ×24 |
| Prototyping:top-down | = | 1.3:1 | ×30 |
| C++:C | = | 2:1 | ×60 |
| Reuse:redo | = | 1.5:1 | ×90 |
We apparently have made a factor of about 90 in the total productivity of programmers in 30 years (a mere 16% rate of improvement!). This is one person’s guess, and it is at least plausible. But compared with the speed-up of machines it is like nothing at all! People wish humans could be similarly sped up, but the fundamental bottleneck is the human animal as it is, and not as we wish it were.
Many studies have shown programmers differ in productivity, from worst to best, by much more than a factor of ten. From this I long ago concluded the best policy is to pay your good programmers very well but regularly fire the poorer ones—if you can get away with it! One way is, of course, to hire them on contract rather than as regularly employed people, but that is increasingly against the law, which seems to want to guarantee even the worst have some employment. In practice you may actually be better off to pay the worst to stay home and not get in the way of the more capable (and I am serious)!
Another view of neural nets is they represent a fairly general class of stable feedback systems. You pick the kind and amount of feedback you think is appropriate, and then the neural net’s feedback system converges to the desired solution. Again, it avoids a lot of detailed programming since, at least in a simulated neural net on a computer, by once writing out a very general piece of program you then have available a broad class of problems already programmed, and the programmer hardly does more than give a calling sequence.
What other very general pieces of programming can be similarly done is not now known—you can think about it as one possible solution to the “programming problem.”
In the chapter on hardware I carefully discussed some of the limits—the size of molecules, the velocity of light, and the removal of heat. I should summarize correspondingly the less firm limits of software.
I made the comparison of writing software with the act of literary writing; both seem to depend fundamentally on clear thinking. Can good programming be taught? If we look at the corresponding teaching of “creative writing” courses we find most students of such courses do not become great writers, and most great writers in the past did not take creative writing courses! Hence it is dubious that great programmers can be trained easily.
Does experience help? Do bureaucrats after years of writing reports and instructions get better? I have no real data, but I suspect with time they get worse! The habitual use of “governmentese” over the years probably seeps into their writing style and makes them worse. I suspect the same for programmers! Neither years of experience nor the number of languages used is any reason for thinking the programmer is getting better from these experiences. An examination of books on programming suggests most of the authors are not good programmers!
The results I picture are not nice, but all you have to oppose it is wishful thinking—I have evidence of years and years of programming on my side!
As you have probably noticed, I am using the technical material to hang together a number of anecdotes, hence I shall begin this time with a story of how this, and the two preceding chapters, came about. By the 1950s I had found I was frightened when giving public talks to large audiences, this in spite of having taught classes in college for many years. On thinking this over very seriously, I came to the conclusion I could not afford to be crippled that way and still become a great scientist; the duty of a scientist is not only to find new things, but to communicate them successfully in at least three forms:
Lacking any one of these would be a serious drag on my career. How to learn to give public talks without being so afraid was my problem. The answer was obviously by practice, and while other things might help, practice was a necessary thing to do.
Shortly after I had realized this it happened I was asked to give an evening talk to a group of computer people who were ibm customers learning some aspect of the use of ibm machines. As a user I had been through such a course myself and knew that typically the training period was for a week during working hours. To supply entertainment in the evenings ibm usually arranged a social get-together the first evening, a theater party on some other evening, and a general talk about computers on still another evening—and it was obvious to me I was being asked to do the last of these.
I immediately accepted the offer because here was a chance to practice giving talks, as I had just told myself I must do. I soon decided I should give a talk which was so good I would be asked to give other talks and hence get more practice. At first I thought I would give a talk on a topic dear to my heart, but I soon realized if I wanted to be invited back I had best give a talk the audience wanted to hear, which is often a very, very different thing. What would they want to hear, especially as I did not know exactly the course they were taking and hence the abilities of the people? I hit on the general interest topic The History of Computing to the Year 2000—this at around 1960. Even I was interested in the topic, and wondered what I would say! Furthermore, and this is important, in preparing the talk I would be preparing myself for the future. In saying, “What do they want to hear?” I am not speaking as a politician but as a scientist who should tell the truth as they see it. A scientist should not give talks merely to entertain, since the object of the talk is usually scientific information transmission from the speaker to the audience. That does not imply the talk must be dull. There is a fine, but definite, line between scientific communication and entertainment, and the scientist should always stay on the right side of that line.
My first talk concentrated on the hardware, and I dealt with the limitations of it, including, as I mentioned in Chapter 3, the three relevant laws of nature: the size of molecules, the speed of light, and the problem of heat dissipation. I included lovely colored Vugraphs with overlays of the quantum mechanical limitations, including the
The talk also kept me up to date, made me keep an eye out for trends in computing, and generally paid off to me in intellectual ways, as well as getting me to be a more polished speaker. It was not all just luck—I made a lot of it by trying to understand, below the surface level, what was going on. I began, at any lecture I attended anywhere, to pay attention not only to what was said, but to the style in which it was said, and whether it was an effective or a non-effective talk. Those talks which were merely funny I tended to ignore, though I studied the style of joke telling closely. An after-dinner speech requires, generally, three good jokes: one at the beginning, one in the middle, and a closing one so that they will at least remember one joke; all jokes of course told well. I had to find my own style of joke telling, and I practiced it by telling jokes to secretaries.
After giving the talk a few times I realized, of course, it was not just the hardware but also the software which would limit the evolution of computing as we approached the year 2000—Chapter 4 I just gave you. Finally, after a long time, I began to realize it was the economics, the applications, which probably would dominate the evolution of computers. Much, but by no means all, of what would happen had to be economically sound. Hence this chapter.
In the early years of modern computing, say around the 1940s and 1950s, “number crunching” dominated the scene since people who wanted hard, firm numbers were the only ones with enough money to afford the price (in those days) of computing. As computing costs came down the kinds of things we could do economically on computers broadened to include many other things than number crunching. We had realized all along these other activities were possible, it was just they were uneconomical at that time.
This is typical of many situations. It is first necessary to prove beyond any doubt the new thing, device, method, or whatever it is, can cope with heroic tasks before it can get into the system to do the more routine, and, in the long run, more useful tasks. Any innovation is always against such a barrier, so do not get discouraged when you find your new idea is stoutly, and perhaps foolishly, resisted. By realizing the magnitude of the actual task you can then decide if it is worth your efforts to continue, or if you should go do something else you can accomplish and not fritter away your efforts needlessly against the forces of inertia and stupidity.
In the early evolution of computers I soon turned to the problem of doing many small problems on a big machine. I realized, in a very real sense, I was in the mass production of a variable product—I should organize things so I could cope with most of the problems which would arise in the next year, while at the same time not knowing what, in detail, they would be. It was then I realized that computers have opened the door much more generally to the mass production of a variable product, regardless of what it is: numbers, words, word processing, making furniture, weaving, or what have you. They enable us to deal with variety without excessive standardization, and hence we can evolve more rapidly to a desired future! You see it at the moment applied to computers themselves! Computers, with some guidance from humans, design their own chips, and computers are assembled, more or less, automatically from standard parts; you say what things you want in your computer and the particular computer is then made. Some computer manufacturers are now using almost total machine assembly of the parts with almost no human intervention.
Let me discuss the applications of computers in a more quantitative way. Naturally, since I was in the Research Division of Bell Telephone Laboratories, initially the problems were mainly scientific, but being in Bell Telephone Laboratories we soon got to engineering problems. First, Figure 5.1, following only the growth of the purely scientific problems, you get a curve which rises exponentially (note the vertical log scale), but you soon see the upper part of the S-curve, the flattening off to more moderate growth rates. After all, given the kind of problem I was solving for them at that time, and the total number of scientists employed in Bell Telephone Laboratories, there had to be a limit to what they could propose and consume. As you know, they began much more slowly to propose far larger problems, so scientific computing is still a large component of the use of computers, but not the major one in most installations.
The engineering computing soon came along, and it rose along much the same shape, but was larger and was added on top of the earlier scientific curve. Then, at least at Bell Telephone Laboratories, I found an even larger military workload, and finally, as we shifted to symbol manipulations in the form of word processing, compiling time for the higher-level languages, and other things, there was a similar increase. Thus while each kind of workload seemed to slowly approach saturation in its turn, the net effect of all of them was to maintain a rather constant growth rate.
What will come along to sustain this straight line logarithmic growth curve and prevent the inevitable flattening out of the S-curve of applications? The next big area is, I believe, pattern recognition. I doubt our ability to cope with the most general problem of pattern recognition, because for one thing it implies too much, but in areas like speech recognition, radar pattern recognition, picture analysis and redrawing, workload scheduling in factories and offices, analysis of data for statisticians, creation of virtual images, and such, we can consume a very large amount of computer power. Virtual reality computing will become a large consumer of computing power, and its obvious economic value assures us this will happen, both in the practical needs and in amusement areas. Beyond these is, I believe, artificial intelligence, which will finally get to the point where the delivery of what they have to offer will justify the price in computing effort, and will hence be another source of problem solving.
This one experience led us at Bell Telephone Laboratories to start putting small computers into laboratories, at first merely to gather, reduce, and display the data, but soon to drive the experiment. It is often easier to let the machine program the shape of the electrical driving voltages to the experiment, via a standard digital-to-analog converter, than it is to build special circuits to do it. This enormously increased the range of possible experiments, and introduced the practicality of having interactive experiments. Again, we got the machine in under one pretext, but its presence in the long run changed both the problem and what the computer was actually used for. When you successfully use a computer you usually do an equivalent job, not the same old one. Again you see the presence of the computer, in the long run, changed the nature of many of the experiments we did.
Boeing (in Seattle) later had a somewhat similar idea, namely they would keep the current status of a proposed plane design on a tape and everyone would use that tape, hence in the design of any particular plane all the parts of the vast company would be attuned to each other’s work. It did not work out as the bosses thought it would, and as they probably thought it did! I know, because I was doing a high-level, two-week snooping job for the Boeing top brass under the guise of doing a routine inspection of the computer center for a lower-level group!
The reason it did not work as planned is simple. If the current status of the design is on the tape (currently discs), and if you use the data during a study of, say, wing area, shape, and profile, then when you make a change in your parameters and you find an improvement, it might have been due to a change someone else inserted into the common design and not to the change you made—which might have actually made things worse! Hence what happened in practice was each group, when making an optimization study, made a copy of the current tape, and used it without any updates from any other area. Only when they finally decided on their new design did they insert the changes—and of course they had to verify their new design meshed with the new designs of the others. You simply cannot use a constantly changing database for an optimization study.
Company managers always seem to have the idea that if only they knew the current state of the company in every detail, then they could manage things better. So nothing will do but they must have a database of all the company’s activities, always up to the moment. This has its difficulties, as indicated above. But another thing: suppose you and I are both vps of a company, and for a Monday morning meeting we want exactly the same figures. You get yours from a program run on Friday afternoon, while I, being wiser, and knowing over the weekend much information comes in from the outlying branches, wait until Sunday night and prepare mine. Clearly there could be significant differences in our two reports, even though we both used the same program to prepare them! That is simply intolerable in practice. Furthermore, most important reports and decisions should not be time-sensitive to up-to-the-minute data!
How about a scientific database? For example, whose measurement gets in? There is prestige in getting yours in, of course, so there will be hot, expensive, irritating conflicts of interest in that area. How will such conflicts be resolved? Only at high costs! Again, when you are making optimization studies you have the above problem; was it a change made in some physical constant you did not know happened which made the new model better than the old model? How will you keep the state of changes available to all the users? It is not sufficient to do it so the users must read all your publications every time they use the machine, and since they will not keep up to date, errors will be made. Blaming the users will not undo the errors!
I began mainly talking about general-purpose computers, but I gradually took up discussing the use of a general-purpose computer as a special-purpose device to control things, such as the cyclotron and laboratory equipment. One of the main steps happened when someone in the business of making integrated circuits for people noted that instead of making a special chip for each of several customers, he could make a four-bit general-purpose computer and then program it for each special job (intel 4004). He replaced a complex manufacturing job with a programming job, though of course the chip still had to be made, but now it would be a large run of the same four-bit chips. Again this is the trend I noted earlier, going from hardware to software to gain the mass production of a variable product—always using the same general purpose computer. The four-bit chip was soon expanded to eight-bit chips, then 16, etc., so now some chips have 64-bit computers on them!
You tend not to realize the number of computers you interact with in the course of a day. Stop and go lights, elevators, washing machines, telephones—which now have a lot of computers in them, as opposed to my youth, when there was always a cheerful operator at the end of every line waiting to be helpful and get the phone number you wanted—answering machines, and automobiles controlled by computers under the hood are all examples of their expanding range of application; you have only to watch and note the universality of computers in your life. Of course, they will further increase as time goes on—the same simple general-purpose computer can do so many special-purpose jobs, it is seldom that a special-purpose chip is wanted.
If you have a general-purpose chip, then all the users will tend to contribute, not only in finding flaws but in having the manufacturer very willing to correct them; otherwise you will have to produce your own manuals, diagnostics, etc., and at the same time what others learn about their chips will seldom help you with your special one. Furthermore, with a general-purpose chip, upgrades of the chip, which you can expect will sort of be taken care of mainly by others, will be available to you free of effort on your part. There will inevitably be a need for you to upgrade yours because you will soon want to do more than the original plan called for. In meeting this new need a general-purpose chip with some excess capacity for the inevitable future expansion is much easier to handle.
I need not give you a list of the applications of computers in your business. You should know better than I do your rapidly increasing use of computers, not only in the field but throughout your whole organization, from top to bottom, from far behind the actual manufacturing up to the actual production front. You should also be well aware of the steadily increasing rate of changes, upgrades, and the flexibility a general-purpose symbol manipulating device gives to the whole organization to meet the constantly changing demands of the operating environment. The range of possible applications has only begun, and many new applications need to be done—perhaps by you. I have no objections to 10% improvements of established things, but from you I also look for the great new things which make so much difference to your organization that history remembers them for at least a few years.
As you go on in your careers you should examine the applications which succeed and those which fail; try to learn how to distinguish between them; try to understand the situations which produce successes and those which almost guarantee failure. Realize, as a general rule, it is not the same job you should do with a machine, but rather an equivalent one, and do it so that future, flexible expansion can be easily added (if you do succeed). And always also remember to give serious thought to the field maintenance as it will actually be done in the field—which is generally not as you wish it would be done!
The use of computers in society has not reached its end, and there is room for many new, important applications. They are easier to find than most people think!
In the two previous chapters I ended with some remarks on the possible limitations of their topics, hardware and software. Hence I need to discuss some possible limitations of applications. This I will do in the next few chapters under the general title of Artificial Intelligence, ai.
Having examined the history of computer applications we are naturally attracted to an examination of their future limits, not in computing capacity but rather in what kinds of things computers can and perhaps cannot do. Before we get too far I need to remind you computers manipulate symbols, not information; we are simply unable to say, let alone write a program for, what we mean by the word “
In some areas rule-based logic has had spectacular successes, and in some apparently similar areas there were plain failures, which indicates success depends on a large element of luck; they still do not have a firm basic understanding of when the method of rule-based logic will or will not work, nor how well it will work.
In Chapter 1, I already brought up the topic that perhaps everything we “know” cannot be put into words (instructions)—“cannot” in the sense of impossible, and not in the sense we are stupid or ignorant. Some of the features of expert systems we have found certainly strengthen this opinion.
After quite a few years the field of the limits of intellectual performance by machines acquired the dubious title of artificial intelligence (ai), which does not have a single meaning. First, it is a variant on the question:
Can machines think?
While the problem of ai can be viewed as, “Which of all the things humans do can machines also do?,” I would prefer to ask the question in another form: “Of all of life’s burdens, which are those machines can relieve, or significantly ease, for us?” Note that while you tend to automatically think of the material side of life, pacemakers are machines connected directly to the human nervous system and help keep many people alive. People who say they do not want their life to depend on a machine seem quite conveniently to forget this. It seems to me in the long run it is on the intellectual side of life that machines can most contribute to the quality of life.
Why is the topic of artificial intelligence important? Let me take a specific example of the need for ai. Without defining things more sharply (and without defining either thinking or what a machine is there can be no real proof one way or the other), I believe very likely in the future we will have vehicles exploring the surface of Mars. The distance between Earth and Mars at times may be so large the signaling time round-trip could be 20 or more minutes. In the exploration process the vehicle must, therefore, have a fair degree of local control. When, having passed between two rocks, turned a bit, and then found the ground under the front wheels was falling away, you will want prompt, “sensible” action on the part of the vehicle. Simple, obvious things like backing up will be inadequate to save it from destruction, and there is not time to get advice from Earth; hence some degree of “intelligence” should be programmed into the machine.
This is not an isolated situation; it is increasingly typical as we use computer-driven machines to do more and more things at higher and higher speeds. You cannot have a human backup—often because of the boredom factor which humans suffer from. They say piloting a plane is hours of boredom and seconds of sheer panic—not something humans were designed to cope with, though they manage to a reasonable degree. Speed of response is often essential. To repeat an example, our current fastest planes are basically unstable and have computers to stabilize them, millisecond by millisecond, which no human pilot could handle; the human can only supply the strategy in the large and leave the details in the small to the machine.
I earlier remarked on the need to get at least some understanding of what we mean by “a machine” and by “thinking.” We were discussing these things at Bell Telephone Laboratories in the late 1940s and someone said a machine could not have organic parts, upon which I said the definition excluded any wooden parts! The first definition was retracted, but to be nasty I suggested in time we might learn how to remove a large part of a frog’s nervous system and keep it alive. If we found how to use it for a storage mechanism, would it be a machine or not? If we used it as content-addressable storage, how would you feel about it being a “machine”?
In the same discussion, on the thinking side, a Jesuit-trained engineer gave the definition, “Thinking is what humans can do and machines cannot do.” Well, that solves the problem once and for all, apparently. But do you like the definition? Is it really fair? As we pointed out to him then, if we start with some obvious difference at present, then with improved machines and better programming we may be able to reduce the difference, and it is not clear in the long run there would be any difference left.
Religion unfortunately enters into discussions of the problem of machine thinking, and hence we have both vitalistic and nonvitalistic theories of “machines vs. humans.” For the Christian religions, their Bible says, “God made Man in His image.” If we can in turn create machines in our image, then we are in some sense the equal of God, and this is a bit embarrassing! Most religions, one way or the other, make man into more than a collection of molecules; indeed, man is often distinguished from the rest of the animal world by such things as a soul or some other property. As to the soul, in the late Middle Ages some people, wanting to know when the soul departed from the dead body, put a dying man on a scale and watched for the sudden change in weight, but all they saw was a slow loss as the body decayed—apparently the soul, which they were sure the man had, did not have material weight.
Even if you believe in evolution, still there can be a moment when God, or the gods, stepped in and gave man special properties which distinguish him from the rest of living things. This belief in an essential difference between man and the rest of the world is what makes many people believe machines can never, unless we ourselves become like the gods, be the same as a human in such details as thinking, for example. Such people are forced, like the above mentioned Jesuit-trained engineer, to make the definition of thinking to be what machines cannot do. Usually it is not so honestly stated as he did, rather it is disguised somehow behind a facade of words, but the intention is the same!
We are at a stalemate at this point in the discussion of ai; we can each assert as much as we please, but it proves nothing at all to most people. So let us turn to the record of ai successes and failures.
I must again digress, this time to point out why game playing has such a prominent role in ai research. The rules of a game are clear beyond argument, and success or failure are also—in short, the problem is well defined in any reasonable sense. It is not that we particularly want machines to play games, but they provide a very good testing ground for our ideas on how to get started in ai.
As you examine the 4 × 4 × 4 cube there are 64 cells, and 76 straight lines through them. Any one line is a win if you can get all four of the positions filled with your pieces. You next note the eight corner locations, and the eight center locations, all have more lines through them than the others; indeed there is an inversion of the cube such that the center points go to the corners and the corners go to the center while preserving all straight lines—hence a duality which can be exploited if you wish.
For a program to play 4 × 4 × 4 tic-tac-toe it is first necessary to pick legal moves. Then in the opening moves you tend to place your pieces on these “hot” spots, and you use a random strategy since otherwise, if you play a standard game, the opponent can slowly explore it until a weakness is uncovered which can be systematically exploited. This use of randomness, when there are essentially indifferent moves, is a central part of all game-playing programs.
We next formulate some rules to be applied sequentially.
After this there are apparently no definite rules to follow in making your next move. Hence you begin to look for “forcing moves,” ones which will get you to some place where you have a winning combination. Thus two pieces on an “open” line means you can place a third and the opponent will be forced to block the line (but you must be careful that the blocking move does not produce three in a line for the opponent and force you to go on the defensive). In the process of making several forcing moves you may be able to create a fork, and then you have a win! But these rules are vague. Forcing moves which are on “hot” places and where the opponent’s defense must be on “cool” places seems to favor you, but it does not guarantee a win. In starting a sequence of forcing moves, if you lose the initiative, then almost certainly the opponent can start a sequence of forcing moves on you and gain a win. Thus when to go on the attack is a touchy matter; too soon and you lose the initiative, too late and the opponent starts and wins. It is not possible, so far as I know, to give an exact rule of when to do so.
This is the standard structure of a program to play a game on a computer. Programs must first require you check the move is legal before any other step, but this is a minor detail. Then there is usually a set of more or less formal rules to be obeyed, followed by some much vaguer rules. Thus a game program has a lot of heuristics in it (heuristic—to invent or discover), moves which are plausible and likely to lead you to a win but are not guaranteed to do so.
Is it not fair to say, “The program learned from experience”? Your immediate objection is that there was a program telling the machine how to learn. But when you take a course in Euclidean geometry, is not the teacher putting a similar learning program into you? Poorly, to be sure, but is that not, in a real sense, what a course in geometry is all about? You enter the course and cannot do problems; the teacher puts into you a program and at the end of the course you can solve such problems. Think it over carefully. If you deny the machine learns from experience because you claim the program was told (by the human programmer) how to improve its performance, then is not the situation much the same with you, except you are born with a somewhat larger initial program compared to the machine when it leaves the manufacturer’s hands? Are you sure you are not merely “programmed” in life by what chance events happen to you?
We are beginning to find that not only is intelligence not adequately defined so arguments can be settled scientifically, but a lot of other associated words like computer, learning, information, ideas, decisions (hardly a mere branching of a program, though branch points are often called decision points to make the programmers feel more important), expert behavior—all are a bit fuzzy in our minds when we get down to the level of testing them via a program in a computer. Science has traditionally appealed to experimental evidence and not idle words, and so far science seems to have been more effective than philosophy in improving our way of life. The future can, of course, be different.
In this book we are more concerned with the aid computers can give us in the intellectual areas than in the more mechanical areas, for example manufacturing. In the mechanical area, computers have enabled us to make better, preferable, and cheaper products, and in some areas they have been essential, such as space flights to the moon, which could hardly be done without the aid of computers. ai can be viewed as complementary to robotics—it is mainly concerned with the intellectual side of the human rather than the physical side, though obviously both are closely connected in most projects.
Let us start again and return to the elements of machines and humans. Both are built out of atoms and molecules. Both have organized basic parts; the machine has, among other things, two-state devices both for storage and for gates, while humans are built of cells. Both have larger structures, arithmetic units, storage, control, and I/O for machines, and humans have bones, muscles, organs, blood vessels, a nervous system, etc.
But let us note some things carefully. From large organizations new effects can arise. For example, we believe there is no friction between molecules, but most large structures show this effect—it is an effect which arises from the organization of smaller parts which do not show the effect.
We should also note that often when we engineer some device to do the same as nature does, we do it differently. For example, we have airplanes which generally use fixed wings (or rotors), while birds mainly flap their wings. But we also do a different thing—we fly much higher and certainly much faster than birds can. Nature never invented the wheel, though we use wheels in many, many ways. Our nervous system is comparatively slow and signals with a velocity of around a few hundred meters per second, while computers signal at around 186,000 miles per second.
A third thing to note, before continuing with what ai has accomplished, is that the human brain has many, many components in the form of interconnected nerves. We want to have the definition of “thinking” be something the human brain can do. With past failures to program a machine to think, the excuse is often given that the machine was not big enough, fast enough, etc. Some people conclude from this that if we build a big enough machine, then automatically it will be able to think! Remember, it seems to be more a problem of writing the program than it is building a machine, unless you believe, as with friction, that enough small parts will produce a new effect—thinking from non-thinking parts. Perhaps that is all thinking really is! Perhaps it is not a separate thing, it is just an artifact of largeness. One cannot flatly deny this, as we have to admit we do not know what thinking really is.
Returning again to the past accomplishments of ai. There was a routine which proved theorems in classical school geometry, much as you did when you took such a course. The famous theorem “if two sides of a triangle are equal, then the base angles are also equal” was given to the program, Figure 7.1. You would probably bisect the top angle, and go on to prove the two parts are congruent triangles, hence the corresponding angles are equal. A few of you might bisect the third side, and draw the line to the opposite angle, again getting two congruent triangles. The proof the machine produced used no constructions but compared triangle abc with triangle cba, and then proved the self-congruence, hence equal angles.
Anyone looking at that proof will admit it is elegant, correct, and surprising. Indeed, the people who wrote the program did not know it, nor was it widely known, though it is a footnote in my copy of Euclid. One is inclined to say the program showed “
A bit of thinking will show the programmers gave the instructions in the program to first try to prove the given theorem, and then, when stuck, try drawing auxiliary lines. If that had been the way you were taught to do geometry, then more of you would have found the above elegant proof. So, in a sense, it was programmed in. But, as I said before, what was the course in geometry you were taught except trying to load a program into you? Inefficiently, to be sure. That is the way with humans, but with machines it is clean: you just put the program in once and for all, and you do not need to endlessly repeat and repeat, and still have things forgotten!
The hard ai people claim man is only a machine and nothing else, and hence anything humans can do in the intellectual area can be copied by a machine. As noted above, most readers, when shown some result from a machine, automatically believe it cannot be the human trait that was claimed. Two questions immediately arise. One, is this fair? Two, how sure are you that you are not just a collection of molecules in a radiant energy field, and hence the whole world is merely molecule bouncing against molecule? If you believe in other (unnamed, mysterious) forces, how do they affect the motion of the molecules, and if they cannot affect the motion, then how can they affect the real world? Is physics complete in its description of the universe, or are there unknown (to them) forces? It is a hard choice to have to make. (Aside: at the moment [1994] it is believed that 90% to 99% of the universe is so-called dark matter, of which physics knows nothing except its gravitational attraction.)
But why supply the notes? Why not have the computer also “compose”? There are, after all, many “rules of composition.” And so they did, using the rules, and when there were choices they used random numbers to decide the next notes. At present we have both computer-composed and computer-played music; you hear a lot of it in commercials over radio and tv. It is cheaper, more controlled, and can make sounds which no musical instrument at present can make. Indeed, any sound which can appear on a sound track can be produced by a computer.
Thus in a sense computers are the ultimate in music. Except for the trivial details (of sampling rate and number of levels of quantization, which could be increased if you wanted to pay the price), the composers now have available any sound which can exist, at any rates, in any combinations, tempos, and intensities they please. Indeed, at present the “highest quality recording of music” is digital. There can be no future significant technical improvements. It is now clearly a matter of what sounds are worth producing, not what can be done. Many people now have digitally recorded music players, and they are regarded as being far better than the older analog machines.
The machine also provides the composer with more immediate feedback to hear what was composed. Before this, the composer had often to wait years and years until fame reached out and the music composed earlier was first heard in real life rather than only in the imagination. Hence the composer can now develop a style at a much more rapid pace. From reading an issue of a journal devoted to computer music I get the impression a fairly elaborate computer setup is common equipment for today’s composers of music, there are many languages for them to use, and they are using a wide variety of approaches to creating music in a combined human-machine effort.
The conductor of music now also has much more control. In the past the conductor when making a recording tried to get the best from the musicians, and often several takings were spliced to get the best recording they could, including “mixing” of the various microphone recordings. Now the conductor can get exactly what is wanted, down to the millisecond timing, fraction of a tone, and other qualities of the individual instruments being simulated. All the all-too-human musicians do not have to be perfect at the same time during a passage.
Here you see again the effects of computers and how they are pushing us from the world of things into the world of ideas, and how they are supplementing and extending what humans can do.
Computers have both displaced so many people from jobs, and also made so many new jobs, it is hopeless to try to answer which is the larger number. But it is clear that on the average it is the lower-level jobs which are disappearing and the higher-level jobs which are appearing. Again, one would like to believe most people can be trained in the future to do the higher-level jobs—but that is a hope without any real evidence.
Besides games, geometry, and music, we have programs for algebraic manipulation; they tend to be more “directed” programs than “self-standing” programs, that is they depend on humans for guidance at various stages of the manipulation. At first it is curious we could build a self-standing geometry program but apparently cannot do the same easily for algebra. Simplification is one of the troubles. You may not have noticed when you took an algebra course and you were to told to “simplify an expression” that you were probably not given an explicit rule for “simplification”—and if you were, then the rule was obviously ridiculous. For example, at least one version of the “new math” said
it is not simplified but
is simplified!
We constantly use the word “simplify,” but its meaning depends on what you are going to do next, and there is no uniform definition. Thus, if in the calculus you are going to integrate next, you break things up into small pieces, but at other times you try to combine the parts into nice product or quotient expressions.
A similar “guidance by human” interacting program has been developed for the synthesis of chemical compounds. It has been quite useful as it gives: (1) the possible routes to the synthesis, (2) the costs, (3) the times of the reactions along the way, and (4) the effective yields. Thus the programmer using it can explore many various ways of synthesizing a new compound, or re-explore old ones to find new methods now that the costs of the materials and processes have changed from what they were some years ago.
We know doctors are human and hence unreliable, and often in the case of rare diseases the doctor may never have seen a case before, but a machine does not forget and can be loaded with all the relevant diseases. Hence from the symptoms the program can either diagnose or call for further tests to establish the probable disease. With probabilities programmed in (which can adjust rapidly for current epidemics), machines can probably do better in the long run than can the average or even the better-than-the-average doctor—and it is the average doctors who must be the ones to treat most people! The very best doctors can personally treat (unaided by machines) only very few of the whole population.
One major trouble is, among others, the legal problem. With human doctors, so long as they show “due prudence” (in the legal language), then if they make a mistake the law forgives them—they are after all only human (to err is human). But with a machine error, whom do you sue? The machine? The programmer? The experts who were used to get the rules? Those who formulated the rules in more detail? Those who organized them into some order? Or those who programmed these rules? With a machine you can prove by detailed analysis of the program, as you cannot prove with the human doctor, that there was a mistake, a wrong diagnosis. Hence my prediction is you will find a lot of computer-assisted diagnosis made by doctors, but for a long time there will be a human doctor at the end between you and the machine. We will slowly get personal programs which will let you know a lot more about how to diagnose yourself, but there will be legal troubles with such programs. For example, I doubt you will have the authority to prescribe the needed drugs without a human doctor to sign the order. You, perhaps, have already noted that all the computer programs you buy explicitly absolve the sellers from any, and I mean any, responsibility for the product they sell! Often the legal problems of new applications are the main difficulty, not the engineering!
In many hospitals computers monitor patients in the emergency ward, and sometimes in other places when necessary. The machines are free from boredom, rapid in response, and will alert a local nurse to do something promptly. Unaided by computers it is doubtful full-time nurses could equal the combination of computer and nurse.
In mathematics, one of the earliest programs (1953) which did symbol manipulation was a formal differentiation program to find higher derivatives. It was written so they could find the first 20 terms of a power series of a complicated function. As you ought to know from the calculus, differentiation is a simple formal process with comparatively few rules. At the time you took the course it must have seemed to be much more than that, but you were probably confusing the differentiation with the later necessary simplification and other manipulations of the derivatives. Another very early abstract symbol manipulation program was coordinate changing—needed for guided missiles, radars, etc. There is an extra degree of freedom in all radars so the target cannot fly over the end of an axis of rotation and force the radar to slew 180° to track it. Hence coordinate transformations can be a bit messier than you might think.
An obvious observation for the Navy, for example: if on a ship you are going to have mobile robots (and you need not have all of your robots mobile), then running on rails from the ceiling will mean things which fall to the deck will not necessarily give trouble when both the robot and the ship are in violent motion. That is another example of what I have been repeatedly saying: when you go to machines you do an equivalent job, not the same one. Things are bound to be on the deck where they are not supposed to be, having fallen there by accident, by carelessness, by battle damage, etc., and having to step over or around them is not as easy for a robot as for a human.
Another obvious area for mobile robots is in damage control. Robots can stand a much more hostile environment, such as a fire, than can humans, even when humans are clothed in asbestos suits. If in doing the job rapidly some of the robots are destroyed it is not the same as dead humans. The Navy now has remote-controlled mine sweepers because when you lose a ship you do not lose a human crew. We regularly use robot control when doing deep-sea diving, and we have unmanned bombers these days.
In other games machines have been more successful. For example, I am told a backgammon-playing program beat all the winners of a contest held recently in Italy. But some simple games, like the game of Go, simple in the rules only, remain hard to program a machine to play a first-class game.
To summarize, in many games and related activities machines have been programmed to play very well, in some few games only poorly. But often the way the machine plays may be said to “solve the problem by volume of computations” rather than by insight—whatever “insight” means! We started to play games on computers to study the human thought processes and not to win the game; the goal has been perverted to win, and never mind the insight into the human mind and how it works.
Let me repeat myself: artificial intelligence is not a subject you can afford to ignore; your attitude will put you in the front or the rear of the applications of machines in your field, but also may lead you into a really great fiasco!
Do not be fooled into thinking that psychological novelty is trivial. Once the postulates, definitions, and the logic are given, then all the rest of mathematics is merely psychologically novel—at that level there is in all of mathematics technically no logical novelty!
There is a common belief that if we appeal to a random source of making decisions then we escape the vicious circle of molecule banging against molecule, but from whence comes this external random source except the material world of molecules?
There is also the standard claim a truly random source contains all knowledge. This is based on a variant of the monkeys and the typewriters story. Ideally you have a group of monkeys sitting at typewriters and at random times they hit random keys. It is claimed in time one of them will type all the books in the British Museum in the order in which they are on the shelves! This is based on the argument that sooner or later a monkey will hit the right first key; indeed in infinite time this will happen infinitely often. Among these infinite number of times there will be some (an infinite number) in which the next key is hit correctly. And so it goes; in the fullness of infinite time the exact sequence of keystrokes will occur.
There is an old claim that “free will” is a myth; in a given circumstance, you being you as you are at the moment, you can only do as you do. The argument sounds cogent, though it flies in the face of your belief you have free will. To settle the question, what experiment would you do? There seems to be no satisfactory experiment which can be done. The truth is we constantly alternate between the two positions in our behavior. A teacher has to believe that if only the right words were said then the student would have to understand. And you behave similarly when raising a child. Yet the feeling of having free will is deep in us and we are reluctant to give it up for ourselves—but we are often willing to deny it to others!
As another example of the tacit belief in the lack of free will in others, consider that when there is a high rate of crime in some neighborhood of a city, many people believe the way to cure it is to change the environment—hence the people will have to change and the crime rate will go down!
These are merely more examples to get you involved with the question, “Can machines think?”
Finally, perhaps thinking should be measured not by what you do but how you do it. When I watch a child learning how to multiply two, say, three-digit numbers, then I have the feeling the child is thinking; when I do the same multiplication I feel I am more doing “conditioned responses”; when a computer does the same multiplication I do not feel the machine is thinking at all. In the words of the old song, “It ain’t what you do, it’s the way that you do it.” In the area of thinking, maybe we have confused what is done with the way it is done, and this may be the source of much of our confusion in ai.
The hard ai people will accept only what is done as a measure of success, and this has carried over into many other people’s minds without carefully examining the facts. This belief, “the results are the measure of thinking,” allows many people to believe they can “think” and machines cannot, since machines have not as yet produced the required results.
The situation with respect to computers and thought is awkward. We would like to believe, and at the same time not believe, machines can “think.” We want to believe because machines could then help us so much in our mental world; we want to not believe to preserve our feeling of self-importance. The machines can defeat us in so many ways—speed, accuracy, reliability, cost, rapidity of control, freedom from boredom, bandwidth in and out, ease of forgetting old and learning new things, hostile environments, and personnel problems—that we would like to feel superior in some way to them; they are, after all, our own creations! For example, if machine programs could do a significantly better job than the current crop of doctors, where would that leave them? And by extension where would we be left?
In the two previous chapters I closed with estimates of the limits of both hardware and software, but in these two chapters on ai I can do very little. We simply do not know what we are talking about; the very words are not defined, nor do they seem definable in the near future. We have also had to use language to talk about language processing by computers, and the recursiveness of this makes things more difficult and less sure. Thus the limits of applications, which I have taken to be the general topic of ai, remain an open question, but one which is important for your future career. Thus ai requires your careful thought and should not be dismissed lightly just because many experts make obviously false claims.
I suggest you pause and have two discussions with yourself on the topic “
You could begin your discussion with my observation that whichever position you adopt there is the other side, and I do not care what you believe so long as you have good reasons and can explain them clearly. That is my task, to make you think on this awkward topic, and not to give any answers.
Year after year such discussions are generally quite hostile to machines, though it is getting less so every year. They often start with remarks such as, “I would not want to have my life depend on a machine,” to which the reply is, “You are opposed to using pacemakers to keep people alive?” Modern pilots cannot control their airplanes but must depend on machines to stabilize them. In the emergency ward of modern hospitals you are automatically connected to a computer which monitors your vital signs and under many circumstances will call a nurse long before any human could note and do anything. The plain fact is your life is often controlled by machines, and sometimes they are essential to your life—you just do not like to be reminded of it.
“I do not want machines to control my life.” You do not want stop and go lights at intersections! See above for some other answers. Often humans can cooperate with a machine far better than with other humans!
“Machines can never do things humans can do.” I observe in return machines can do things no human can do. And in any case, how sure are you for any clearly prespecified thing machines (programs) apparently cannot now do that in time they still could not do it better than humans can? (Perhaps “clearly specified” means you can write a program!) And in any case, how relevant are these supposed differences to your career?
The people are generally sure they are more than a machine, but usually can give no real argument as to why there is a difference, unless they appeal to their religion, and with foreign students of very different faiths around they are reluctant to do so—though obviously most (though not all) religions share the belief that man is different, in one way or another, from the rest of life on Earth.
A second useful discussion is on the topic of
future applications of computers to their area of expertise.
All too often people report on past and present applications, which is good, but not on the topic whose purpose is to sensitize you to future possibilities you might exploit. It is hard to get people to aggressively think about how things in their own area might be done differently. I have sometimes wondered whether it might be better if I asked people to apply computers to other areas of application than their own narrow speciality; perhaps they would be less inhibited there!
Since the purpose, as stated above, is to get the reader to think more carefully on the awkward topics of machines “thinking” and their vision of their personal future, you the reader should take your own opinions and try first to express them clearly, and then examine them with counterarguments, back and forth, until you are fairly clear as to what you believe and why you believe it. It is none of the author’s business in this matter what you believe, but it is the author’s business to get you to think and articulate your position clearly. For readers of the book I suggest instead of reading the next pages you stop and discuss with yourself, or possibly friends, these nasty problems; the surer you are of one side, the more you should probably argue the other side!
When I became a professor, after 30 years of active research at Bell Telephone Laboratories, mainly in the Mathematics Research Department, I recalled professors are supposed to think and digest past experiences. So I put my feet up on the desk and began to consider my past. In the early years I had been mainly in computing, so naturally I was involved in many large projects which required computing. Thinking about how things worked out on several of the large engineering systems I was partially involved in, I began, now that I had some distance from them, to see they had some common elements. Slowly I began to realize the design problems all took place in a space of n dimensions, where n is the number of independent parameters. Yes, we build three-dimensional objects, but their design is in a high-dimensional space, one dimension for each design parameter.
I also need high-dimensional spaces so later proofs will become intuitively obvious to you without filling in the details rigorously. Hence we will discuss n-dimensional space now.
You think you live in three dimensions, but in many respects you live in a two-dimensional space. For example, in the random walk of life, if you meet a person you then have a reasonable chance of meeting that person again. But in a world of three dimensions you do not! Consider the fish in the sea who potentially live in three dimensions. They go along the surface, or on the bottom, reducing things to two dimensions, or they go in schools, or they assemble at one place at the same time, such as a river mouth, a beach, the Sargasso Sea, etc. They cannot expect to find a mate if they wander the open ocean in three dimensions. Again, if you want airplanes to hit each other, you assemble them near an airport, put them in two-dimensional levels of flight, or send them in a group; truly random flight would have fewer accidents than we now have!
n-dimensional space is a mathematical construct which we must investigate if we are to understand what happens to us when we wander there during a design problem. In two dimensions we have Pythagoras’s theorem that for a right triangle, the square of the hypotenuse equals the sum of the squares of the other two sides. In three dimensions we ask for the length of the diagonal of a rectangular block, Figure 9.1. To find it we first draw a diagonal on one face, apply Pythagoras’s theorem, and then take it as one side with the other side the third dimension, which is at right angles, and again from the Pythagorean theorem we get the square of the diagonal is the sum of the squares of the three perpendicular sides. It is obvious from this proof, and the necessary symmetry of the formula, that as you go to higher and higher dimensions you will still have the square of the diagonal as the sum of the squares of the individual mutually perpendicular sides
where the xi are the lengths of the sides of the rectangular block in n dimensions.
Continuing with the geometric approach, planes in the space will be simply linear combinations of the xi, and a sphere about a point will be all points which are at the fixed distance (the radius) from the given point.
We need the volume of the n-dimensional sphere to get an idea of the size of a piece of restricted space. But first we need the
A product like n! is hard to handle, so we take the log of n! to get
where, of course, the ln is the logarithm to the base e. Sums remind us that they are related to integrals, so we start with the integral
We apply integration by parts (since we recognize the ln x arose from integrating an algebraic function, and hence it will be removed in the next step). Letting u = ln x and dv = dx yields
On the other hand, if we apply the trapezoid rule, Figure 9.2, to the integral of ln x, we get
Since ln 1 = 0, adding (1/2) ln n to both terms we get, finally:
Undoing the logs by taking the exponential of each side gives
where C is some constant (not far from e) independent of n, since we are approximating an integral by the trapezoid rule and the error in the trapezoid approximation increases more and more slowly as n grows larger and larger, and C is the limiting value. This is the first form of Stirling’s formula. We will not waste time deriving the limiting, at infinity, value of the constant C, which turns out to be
= 2.5066… (e = 2.71828…). Thus we finally have the usual Stirling’s formula for the factorial:
The following table shows the quality of the Stirling approximation to n!.
| n | Stirling | True | Stirling/True |
|---|---|---|---|
| 1 | 0.92214 | 1 | 0.92214 |
| 2 | 1.91900 | 2 | 0.95950 |
| 3 | 5.83621 | 6 | 0.97270 |
| 4 | 23.50618 | 24 | 0.97942 |
| 5 | 118.01917 | 120 | 0.98349 |
| 6 | 710.07818 | 720 | 0.98622 |
| 7 | 4,980.3958 | 5,040 | 0.98817 |
| 8 | 39,902.3955 | 40,320 | 0.98964 |
| 9 | 359,536.87 | 362,880 | 0.99079 |
| 10 | 3,598,695.6 | 3,628,800 | 0.99170 |
Note as the numbers get larger and larger the ratio approaches 1 but the differences get greater and greater!
If you consider the two functions
then the limit of the ratio f(n)/g(n), as n approaches infinity, is 1, but as in the table the difference
grows larger and larger as n increases.
which converges for all n > 0. For n > 1 we again integrate by parts, this time using dv = e–x dx and u = xn–1. At the two limits the integrated part is zero, and we have the reduction formula
with Г (1) = 1.
Thus the gamma function takes on the values (n – 1)! at the positive integers n, and it provides a natural way of extending the factorial to all positive numbers, since the integral exists whenever n > 0.
We will need the value of
Set x = t2, hence dx = 2t dt, and we have (using symmetry in the last step):
We now use a standard trick to evaluate this integral. Multiply the integral by itself, once with respect to x and once with respect to y.
The x2 += y2 suggests polar coordinates, so we convert to get
The angle integration is easy, the exponential is now also easy, and we get, finally, that
We now turn to the volume of an n-dimensional sphere (or hypersphere, if you wish). Clearly the volume of a cube in n dimensions and of side x is xn. A little reflection and you will believe the formula for the volume of an n-dimensional sphere must have the form
where Cn is a suitable constant. In the case n = 2 the constant is π; in the case n = 1, it is 2 (when you think about it). In three dimensions we have C3 = 4π/3.
and hence the elements of volume are
We have, therefore, on setting r2 = t,
from which we get
It is easy to see
and we can compute the following table.
| Dimension n | Coefficient Cn | |
|---|---|---|
| 1 | 2 | = 2.00000… |
| 2 | π | = 3.14159… |
| 3 | 4π/3 | = 4.18879… |
| 4 | π2/2 | = 4.93480… |
| 5 | 8π2/15 | = 5.26379… |
| 6 | π3/6 | = 5.16771… |
| 7 | 16π3/105 | = 4.72477… |
| 8 | π4/24 | = 4.05871… |
| 9 | 32π4/945 | = 3.29851… |
| 10 | π5/120 | = 2.55016… |
| 2k | πk/k! | → 0 |
Thus we see the coefficient Cn increases up to n = 5 and then decreases towards 0. For spheres of unit radius this means the volume of the sphere approaches 0 as n increases. If the radius is r, then we have for the volume, and using n = 2k for convenience (since the actual numbers vary smoothly as n increases and the odd dimensional spaces are messier to compute),
No matter how large the radius, r, increasing the number of dimensions, n, will ultimately produce a sphere of arbitrarily small volume.
Next we look at the relative amount of the volume close to the surface of an n-dimensional sphere. Let the radius of the sphere be r and the inner radius of the shell be r(1 – ²). Then the relative volume of the shell is
For large n, no matter how thin the shell is (relative to the radius), almost all the volume is in the shell and there is almost nothing inside. As we say, the volume is almost all on the surface. Even in three dimensions the unit sphere has 7/8 of its volume within 1/2 of the surface. In n dimensions there is 1 – 1/2n within 1/2 of the radius from the surface.
This has importance in design; it means almost surely the optimal design will be on the surface and will not be inside, as you might think from taking the calculus and doing optimizations in that course. The calculus methods are usually inappropriate for finding the optimum in high-dimensional spaces. This is not strange at all; generally speaking, the best design is pushing one or more of the parameters to their extreme—obviously you are on the surface of the feasible region of design!
Next we turn to looking at the diagonal of an n-dimensional cube, say the vector from the origin to the point (1, 1, …, 1). The cosine of the angle between this line and any axis is given by definition as the ratio of the component along the axis, which is clearly 1, to the length of the line, which is
. Hence:
Therefore, for large n the diagonal is almost perpendicular to every coordinate axis!
If we use the points with coordinates (±1, ±1, …, ±1), then there are 2n such diagonal lines which are all almost perpendicular to the coordinate axes. For n = 10, for example, this amounts to 1,024 such almost perpendicular lines.
I need the angle between two lines, and while you may remember it is the vector dot product, I propose to derive it again to bring more understanding about what is going on. (Aside: I have found it very valuable in important situations to review all the basic derivations involved so I have a firm feeling for what is going on.) Take two points x and y with their corresponding coordinates xi and yi, Figure 9.3. Then, applying the law of cosines in the plane of the three points, x, y, and the origin, we have
where X and Y are the lengths of the lines to the points x and y. But the C comes from using the differences of the coordinates in each direction:
Comparing the two expressions we see that
We now apply this formula to two lines drawn from the origin to random points of the form:
The dot product of these factors, taken at random, is again random ones and negative ones, and these are to be added n times, while the length of each is again
, hence (note the n in the denominator)
and by the weak law of large numbers this approaches 0 for increasing n, almost surely. But there are 2n different such random vectors, and given any one fixed vector, any other of these 2n random vectors is almost surely almost perpendicular to it! n dimensions is indeed vast!
In linear algebra and other courses you learned to find the set of perpendicular axes and then represent everything in terms of these coordinates, but you see in n dimensions there are, after you find the n mutually perpendicular coordinate directions, 2n other directions which are almost perpendicular to those you have found! The theory and practice of linear algebra are quite different!
Now in three dimensions you will have a 4×4×4 cube, and eight spheres of unit radius. The inner sphere will touch each outer sphere along the line to their center and will have a radius of
Think of why this must be larger than for two dimensions.
Going to n dimensions, you have a 4 × 4 × … × 4 cube, and 2n spheres, one in each of the corners, and with each touching its n adjacent neighbors. The inner sphere, touching on the inside all of the spheres, will have a radius of
Examine this carefully! Are you sure of it? If not, why not? Where will you object to the reasoning?
Once satisfied it is correct we apply it to the case of n = 10 dimensions. You have for the radius of the inner sphere
and in ten dimensions the inner sphere reaches outside the surrounding cube! Yes, the sphere is convex, yes it touches each of the 1,024 packed spheres on the inside, yet it reaches outside the cube!
So much for your raw intuition about n-dimensional space, but remember the n-dimensional space is where the design of complex objects generally takes place. You had better get an improved feeling for n-dimensional space by thinking about the things just presented, until you begin to see how they can be true, indeed why they must be true. Else you will be in trouble the next time you get into a complex design problem. Perhaps you should calculate the radii of the various dimensions, as well as go back to the angles between the diagonals and the axes, and see how it can happen.
It is now necessary to note carefully, I have done all this in the classical Euclidean space using the Pythagorean distance, where the sum of squares of the differences of the coordinates is the distance between the points squared. Mathematicians call this distance L2.
The space L1 uses not the sum of the squares, but rather the sum of the distances, much as you must do in traveling in a city with a rectangular grid of streets. It is the sum of the differences between the two locations that tells you how far you must go. In the computing field this is often called the “Hamming distance” for reasons which will appear in a later chapter. In this space a circle in two dimensions looks like a square standing on a point, Figure 9.5. In three dimensions it is like a cube standing on a point, etc. Now you can better see how it is in the circle paradox above the inner sphere can get outside the cube.
These are all examples of a metric, a measure of distance. The conventional conditions on a metric D(x, y) between two points x and y are:
It is left to you to verify that the three metrics, L∞, L2, and L1 (Chebyshev, Pythagoras, and Hamming) all satisfy these conditions.
The truth is, in complex design, for various coordinates we may use any of the three metrics, all mixed up together, so the design space is not as portrayed above, but is a mess of bits and pieces. The L2 metric is connected with least squares, obviously, and the other two, L∞ and L1, are more like comparisons. In making comparisons in real life, you generally use either the maximum difference, L∞, in any one trait as sufficient to distinguish two things, or sometimes, as in strings of bits, it is the number of differences which matters, and the sum of the squares does not enter, hence the L1 distance is used. This is increasingly true, for example, in pattern identification in ai.
Unfortunately, the above is all too true, and it is seldom pointed out to you. They never told me a thing about it! I will need many of the results in later chapters, but in general, after this exposure, you should be better prepared than you were for complex design and for carefully examining the space in which the design occurs, as I have tried to do here. Messy as it is, fundamentally it is where the design occurs and where you must search for an acceptable design.
Since L1 and L∞ are not familiar, let me expand the remarks on the three metrics. L2 is the natural distance function to use in physical and geometric situations, including the data reduction from physical measurements. Thus you find least squares, L2, throughout physics. But when the subject matter is intellectual judgments, then the other two distance functions are generally preferable. This is slowly coming into use, though we still find the chi-squared test, which is obviously a measure for L2, used widely when some other suitable test should be used.
Having looked at computers and how they operate, we now turn to the problem of the representation of
To simplify the problem of the representation of information we will, at present, examine only the problem of the transmission of information from here to there. This is exactly the same as transmission from now to then, storage. Transmission through time or through space are the same problem. The standard model of the system is given in Figure 10.1.
Starting on the left-hand side of Figure 10.1, we have a source of information. We do not discuss what the source is. It may be a string of alphabetical symbols, numbers, mathematical formulas, musical notes of a score, the symbols now used to represent dance movements—whatever the source is and whatever “meaning” is associated with the symbols is not part of the theory. We postulate only a source of information, and by doing only that, and no more, we have a powerful, general theory which can be widely applicable. It is the abstraction from details that gives the breadth of application.
Next, going to the right in Figure 10.1, the channel is supposed to have “random noise added.” All the noise in the system is incorporated here. It is assumed the encoder can uniquely recognize the incoming symbols without any error, and it will be assumed the decoder similarly functions without error. These are idealizations, but for many practical purposes they are close to reality.
Next, the decoding is done in two stages, channel to standard, and then standard to the source code. Finally it is sent on to the sink, to its destination. Again, we do not ask what the sink does with it.
As stated before, the system resembles transmission, for example a telephone message from me to you, radio, or tv programs, and other things such as a number in a register of a computer being sent to another place. Recall, again, sending through space is the same as sending through time, namely storage. If you have information and want it later, you encode it for storage and store it. Later, when you want it, it is decoded. Among encoding systems is the identity, no change in the representation.
We will, for convenience only, assume we are using the binary form for the representation in the system. Other forms can be similarly handled, but the generality is not worth the extra notation.
The first obvious property we want is the ability to uniquely decode a message if there is no noise added—at least it seems to be a desirable property, though in some situations it could be ignored to a small extent. What is sent is a stream of symbols which looks to the receiver like a string of 0s and 1s. We call two adjacent symbols a second extension, three a third extension, and in general if we send n symbols the receiver sees the nth extension of the basic code symbols. Not knowing n, you the receiver must break the stream up into units which can be translated, and you want, as we said above, to be able at the receiving end, meaning you again, to make this decomposition of the stream uniquely in order to recover the original message I, at the sending end, sent to you.
Let us examine one special code of four symbols, s1, s2, s3, s4:
If you receive
On the other hand the code
can be broken up into the symbols
by merely following the decoding tree using the rule:
Each time you come to a branch point (node) you read the next symbol, and when you come to a leaf of the tree you emit the corresponding symbol and return to the start.
The reason why this tree can exist is that no symbol is the prefix of any other, so you always know when you have come to the end of the current symbol.
The next topic is instantaneously decodable codes. To see what this is, consider the above code with the digits reversed end for end.
We now turn to two examples of encoding the same symbols, si. The first encoding is
which will have the decoding tree shown in Figure 10.3.
The second encoding is the same source, but we have
with the tree shown in Figure 10.4.
The most obvious measure of “goodness” of a code is its average length for some ensemble of messages. For this we need to compute the code length li of each symbol multiplied by its corresponding probability pi of occurring, and then add these products over the whole code. Thus the formula for the average code length L is, for an alphabet of q symbols,
where the pi are the probabilities of the symbols si and the li are the corresponding lengths of the encoded symbols. For an efficient code this number L should be as small as possible. If p1 = 1/2, p2 = 1/4, p3 = 1/8, p4 = 1/16, and p5 = 1/16, then for code #1 we get
and for code #2
and hence the given probabilities will favor the first code.
If most of the code words are of the same probability of occurring, then the second encoding will have a smaller average code length than the first encoding. Let pi = 1/5 for all i. Then code #1 has
while code #2 has
thus favoring the second code. Clearly the designing of a “good” code must depend on the frequencies of the symbols occurring.
When examined closely, this inequality says there cannot be too many short symbols or else the sum will be too large.
To prove the Kraft inequality for any instantaneously uniquely decodable code we simply draw the decoding tree, which of course exists, and apply mathematical induction. If the tree has one or two leaves, as shown in Figure 10.5, then there is no doubt the inequality is true. Next, if there are more than two leaves we decompose the trees of length m (for the induction step) into two trees, and by the induction suppose the inequality applies to each branch of length m – 1 or less. By induction the inequality applies to each branch, giving K' and K" for their sums. Now when we join the two trees each length increases by 1, hence each term in the sum gets another factor of 2 in the denominator, and we have
and the theorem is proved.
The Kraft inequality applies to non-instantaneous codes, provided they are uniquely decodable.
where Nk is the number of symbols of length k, and the sum starts from the minimum length of the nth extension of the symbols, which is n, and ends with the maximum length nl, where l is the maximum length of any single code symbol. But from the unique decodability it must be that Nk ≤ 2k. The sum becomes
If K were greater than 1 then we could find an n so large the inequality would be false, hence we see K ≤ 1, and McMillan’s theorem is proved.
Since we now see, as we said we would show, that instantaneous decodability costs us nothing, we will stick to them and ignore merely uniquely decodable codes—their generality buys us nothing.
Let us take a few examples to illustrate the Kraft inequality. Can there exist a uniquely decodable code with lengths 1, 3, 3, 3? Yes, since
How about lengths 1, 2, 2, 3? We have
hence no! There are too many short lengths.
Comma codes are codes where each symbol is a string of 1s followed by a 0, except the last symbol which is all 1s. As a special case we have
We have the Kraft sum
and we have exactly met the condition. It is easy to see the general comma code meets the Kraft inequality with exact equality.
If the Kraft sum is less than 1, then there is excess signaling capacity, since another symbol could be included, or some existing one shortened, and thus the average code length would be less.
Note if the Kraft inequality is met, that does not mean the code is uniquely decodable, only there exists a code with those symbol lengths which is uniquely decodable. If you assign binary numbers in numerical order, each having the right length li in bits, then you will find a uniquely decodable code. For example, given the lengths 2, 2, 3, 3, 4, 4, 4, 4, we have for Kraft’s inequality
hence an instantaneously decodable code can exist. We pick the symbols in increasing order of numerical size, with the binary point imagined on the left, as follows, and watch carefully the corresponding lengths li:
We have learned to “tune” the words we use to fit the person on the receiving end; we to some extent select according to what we think is the channel noise, though clearly this does not match the model I am using above, since there is significant noise in the decoding process, shall we say. This inability of the receiver to “hear what is said” by a person in a higher management position but to hear only what they expect to hear is, of course, a serious problem in every large organization, and is something you should be keenly aware of as you rise towards the top of the organization. Thus the representation of information in the formal theory we have given is mirrored only partly in life as we live it, but it does show a fair degree of relevance outside the formal bounds of computer usage where it is highly applicable.
Two things should be clear from the previous chapter. First, we want the average length L of the message sent to be as small as we can make it (to save the use of facilities). Second, it must be a statistical theory, since we cannot know the messages which are to be sent, but we can know some of the statistics by using past messages plus the assumption the future will probably be like the past. For the simplest theory, which is all we can discuss here, we will need the probabilities of the individual symbols occurring in a message. How to get these is not part of the theory, but can be obtained by inspection of past experience, or imaginative guessing about the future use of the proposed system you are designing.
Thus we want an instantaneous,
Huffman first showed the following running inequalities must be true for a minimum length code. If the pi are in descending order, then the li must be in ascending order:
For suppose the pi are in this order but at least one pair of the li are not. Consider the effect of interchanging the symbols attached to the two which are not in order. Before the interchange the two terms contributed to the average code length L an amount
and after the interchange the terms would contribute
All the other terms in the sum L will be the same. The difference can be written as
One of these two factors was assumed to be negative, hence upon interchanging the two symbols we would observe a decrease in the average code length L. Thus for a minimum length code we must have the two running inequalities.
Next Huffman observed an instantaneously decodable code has a decision tree, and every decision node should have two exits, or else it is wasted effort, hence there are two longest symbols which have the same length.
Now, in going backwards to undo the merging steps, we would need at each stage to split the symbol which arose from the combining of two symbols, keeping the same leading bits but adding to one symbol a 0 and to the other a 1. In this way he would arrive at a minimum L code, see again Figure 11.1. For if there were another code with smaller length L', then doing the forward steps, which changes the average code length by a fixed amount, he would arrive finally at two symbols with an average code length of less than 1—which is impossible. Hence the Huffman encoding gives a code with minimum length. See Figure 11.2 for the corresponding decoding tree.
The code is not unique. In the first place, at each step of the backing-up process it is arbitrary which symbol is assigned the 0 and which the 1. Second, if at any stage there are two symbols of the same probability, then it is indifferent which is put above the other. This can result, sometimes, in codes that appear very different but have the same average code length.
If we put the combined terms as high as possible we get Figure 11.3 with the corresponding decoding tree Figure 11.4. The average length of the two codes is the same, but the codes and the decoding trees are different; the first is “long” and the second is “bushy,” and the second will have less variability than the first one.
We now do a second example so you will be sure how Huffman encoding works, since it is natural to want to use the shortest average code length you can when designing an encoding system. For example, you may have a lot of data to put into a backup store, and encoding it into the appropriate Huffman code has been known at times to save more than half the expected storage space! Let p(s1) = 1/3, p(s2) = 1/5, p(s3) = 1/6, p(s4) = 1/10, p(s5) = 1/12, p(s6) = 1/20, p(s7) = 1/30, and p(s8) = 1/30. First we check that the total probability is 1. The common denominator of the fractions is 60. Hence we have the total probability
This second example is illustrated in Figure 11.5, where we have dropped the 60 in the denominators of the probabilities since only the relative sizes matter. What is the average code length per symbol? We compute:
Note how mechanical the process is for a machine to do. Each forward stage for a Huffman code is a repetition of the same process: combine the two lowest probabilities, place the new sum in its proper place in the array, and mark it. In the backward process, take the marked symbol and split it. These are simple programs to write for a computer, hence a computer program can find the Huffman code once it is given the si and their probabilities pi. Recall in practice you want to assign an escape symbol of very small probability so you can get out of the decoding process at the end of the message. Indeed, you can write a program which will sample the data to be stored and find estimates of the probabilities (small errors make only small changes in L), find the Huffman code, do the encoding, and send first the decoding algorithm (tree) and then the encoded data, all without human interference or thought! At the decoding end you already have received the decoding tree. Thus, once written as a library program, you can use it whenever you think it will be useful.
Huffman codes have even been used in some computers on the instruction part of instructions, since instructions have very different probabilities of being used. We need, therefore, to look at the gain in average code length L we can expect from Huffman encoding over simple block encoding, which uses symbols all of the same length.
If all the probabilities are the same and there are exactly 2k symbols, then an examination of the Huffman process will show you will get a standard block code with each symbol of the same length. If you do not have exactly 2k symbols, then some symbols will be shortened, but it is difficult to say whether many will be shortened by one bit; some may be shortened by two or more bits. In any case, the value of L will be the same, and not much less than that for the corresponding block code.
Rule: Huffman coding pays off when the probabilities of the symbols are very different, and does not pay off much when they are all rather equal.
When two equal probabilities arise in the Huffman process they can be put in any order, and hence the codes may be very different, though the average code length in both cases will be the same L. It is natural to ask which order you should choose when two probabilities are equal. A sensible criterion is to minimize the variance of the code so that messages of the same length in the original symbols will have pretty much the same lengths in the encoded message; you do not want a short original message to be encoded into a very long encoded message by chance. The simple rule is to put any new probability, when inserting it into the table, as high as it can go. Indeed, if you put it above a symbol with a slightly higher probability, you usually greatly reduce the variance, and at the same time only slightly increase L; thus it is a good strategy to use.
Detection of a single error is easy. To a block of n – 1 bits we attach an nth bit, which is set so that the total n bits has an even number of 1s (an odd number if you prefer, but we will stick to an even number in the theory). It is called an even (odd) parity check, or more simply a parity check.
Thus if all the messages I send to you have this property, then at the receiving end you can check to see if the condition is met. If the parity check is not met then you know at least one error has happened; indeed, you know an odd number of errors has occurred. If the parity does check, then either the message is correct, or else there are an even number of errors. Since it is prudent to use systems where the probability of an error in any position is low, then the probability of multiple errors must be much lower.
For mathematical tractability we make the assumption the channel has white noise, meaning: (1) each position in the block of n bits has the same probability of an error as any other position, and (2) the errors in various positions are uncorrelated, meaning independent. Under these hypotheses the probabilities of errors are:
When you find a single error you can ask for a retransmission and expect to get it right the second time, and if not then on the third time, etc. However, if the message in storage is wrong, then you will call for retransmissions until another error occurs, and you will probably have two errors which will pass undetected in this scheme of single error detection. Hence the use of repeated retransmission should depend on the expected nature of the error.
Such codes have been widely used, even in the relay days. The telephone company, in its central offices and in many of the early relay computers, used a 2-out-of-5 code, meaning two and only two out of the five relays were to be “up.” This code was used to represent a decimal digit, since C(5, 2) = 10. If not exactly two relays were up then it was an error, and a repeat was used. There was also a 3-out-of-7 code in use, obviously an odd parity check code.
I first met these 2-out-of-5 codes while using the Model 5 relay computer at Bell Labs, and I was impressed: not only did they help to get the right answer, but more important, in my opinion, they enabled the maintenance people to maintain the machine. Any error was caught by the machine almost in the act of its being committed, and hence pointed the maintenance people correctly rather than having them fool around with this and that part, mis-adjusting the good parts in their effort to find the failing part.
To get such a weighted sum of the symbols (actually their values), you can avoid multiplication and use only addition and subtraction if you wish. Put the numbers in order in a column and compute the running sum, then compute the running sum of the running sum modulo 37, and then complement this with respect to 37, and you have the check symbol. The following table shows an illustration using w, x, y, and z.
| Symbols | Sum | Sum of sums |
|---|---|---|
| w | w | w |
| x | w + x | 2w + x |
| y | w + x + y | 3w + 2x + y |
| z | w + x + y + z | 4w + 3x + 2y + z = weighted check sum |
At the receiving end you subtract the modulus repeatedly until you get either a 0 (correct symbol) or a negative number (wrong symbol).
If you were to use this encoding, for example, for inventory parts names, then the first time a wrong part name came to a computer, say at transmission time, if not before (perhaps at order preparation time), the error will be caught; you will not have to wait until the order gets to supply headquarters to be later told that there is no such part, or else they have sent the wrong part! Before it leaves your location it will be caught and hence is quite easily corrected at that time. Trivial? Yes! Effective against human errors (as contrasted with the earlier white noise)? Yes!
I have repeatedly indicated I believe the future will be increasingly concerned with information in the form of symbols and less concerned with material things, hence the theory of encoding (representing) information in convenient codes is a nontrivial topic. The above material gave a simple error-detecting code for machine-like situations, as well as a weighted code for human use. They are but two examples of what coding theory can contribute to an organization in places where machine and human errors can occur.
There are two subject matters in this chapter: the first is the ostensible topic, error-correcting codes, and the other is how the process of discovery sometimes goes—you all know I am the official discoverer of the Hamming error-correcting codes. Thus I am presumably in a position to describe how they were found. But you should beware of any reports of this kind. It is true at that time I was already very interested in the process of discovery, believing in many cases the method of discovery is more important than what is discovered. I knew enough not to think about the process when doing research, just as athletes do not think about style when they engage in sports, but they practice the style until it is more or less automatic. I had thus established the habit, after something of great or small importance was discovered, of going back and trying to trace the steps by which it apparently happened. But do not be deceived; at best I can give the conscious part, and a bit of the upper subconscious part, but we simply do not know how the unconscious works its magic.
I was using the Model 5 relay computer in New York City in preparation for delivering it to Aberdeen Proving Grounds, along with some required software programs (mainly mathematical routines). When an error was detected by the 2-out-of-5 block codes the machine would, when unattended, repeat the step up to three times before dropping it and picking up the next problem in the hope the defective equipment would not be involved in the new problem. Being at that time low man on the totem pole, as they say, I got free machine time only over the weekends—meaning from Friday at around 5:00 p.m. to Monday morning around 8:00 a.m., which is a lot of time! Thus I would load up the input tape with a large number of problems and promise my friends back at Murray Hill, New Jersey, where the research department was located, that I would deliver them the answers on Tuesday. Well, one weekend, just after we left on a Friday night, the machine failed completely and I got essentially nothing on Monday. I had to apologize to my friends and promised them the answers on the next Tuesday. Alas! The same thing happened again! I was angry, to say the least, and said, “If the machine can locate there is an error, why can it not locate where it is, and then fix it by simply changing the bit to the opposite state?” (The actual language used was perhaps a bit stronger!)
Notice first this essential step happened only because there was a great deal of emotional stress on me at the moment, and this is characteristic of most great discoveries. Working calmly will let you elaborate and extend things, but the breakthroughs generally come only after great frustration and emotional involvement. The calm, cool, uninvolved researcher seldom makes really great new steps.
Back to the story. I knew from previous discussions that of course you could build three copies of a machine, include comparing circuits, and use the majority vote—hence error-correcting machines could exist. But at what cost! Surely there were better methods. I also knew, as discussed in the last chapter, a great deal about parity checks; I had examined their fundamentals very carefully.
Another aside.
It is obvious to anyone who ever took the calculus that the closer the rectangle is to a square, the lower the redundancy for the same amount of message. And of course big m’s and n’s would be better than small ones, but then the risk of a double error might be too great—again an engineering judgment. Note that if two errors occurred, then (1) if they were not in the same column and not in the same row, then just two failing rows and two failing columns would occur, and you could not know which diagonal pair caused them; and (2) if two were in the same row (or column), then you would have only the columns (or rows) but not the rows (columns).
We now move to some weeks later. To get to New York City I would go a bit early to the Murray Hill, New Jersey, location where I worked and get a ride on the company mail delivery car. Well, riding through north Jersey in the early morning is not a great sight, so I was, as I had the habit of doing, reviewing successes so I would have the style in hand automatically; in particular I was reviewing in my mind the rectangular codes. Suddenly, and I can give no reason for it, I realized if I took a triangle and put the parity checks along the diagonal, with each parity check checking both the row and column it was in, then I would have a more favorable redundancy, Figure 12.2.
My smugness vanished immediately! Did I have the best code this time? After a few miles of thought on the matter (remember, there were no distractions in the north Jersey scenery), I realized a cube of information bits, with parity checks across the entire planes and the parity check bit on the axes, for all three axes, would give me the three coordinates of the error at the cost of 3n – 2 parity checks for the whole n3 encoded message. Better! But was it best? No! Being a mathematician I promptly realized a four-dimensional cube (I did not have to arrange them that way, only interwire them that way) would be better. So an even higher-dimensional cube would be still better. It was soon obvious (say five miles) a 2 × 2 × 2 × … × 2 cube with n + 1 parity checks would be the best—apparently!
But having burnt my fingers once, I was not about to settle for what looked good—I had made that mistake before! Could I prove it was best? How to make a proof? One obvious approach was to try a counting argument. I had n + 1 parity checks, whose result was a string of n + 1 bits, a binary number of length n + 1 bits, and this could represent any of 2n + 1 things. But I needed only 2n + 1 things, the 2n points in the cube plus the one result the message was correct. I was off by almost a factor of two. Alas! I arrived at the door of the company and had to sign in and go to a conference, so I had to let the idea rest.
When I got back to the idea after some days of distractions (after all, I was supposed to be contributing to the team effort of the company), I finally decided a good approach would be to use the syndrome of the error as a binary number which named the place of the error, with, of course, all 0s being the correct answer (an easier test than for all 1s on most computers). Notice familiarity with the binary system, which was not common then (1947–1948), repeatedly played a prominent role in my thinking. It pays to know more than just what is needed at the moment!
How do you design this particular case of an error-correcting code? Easy! Write out the positions in the binary code:
| 1 | 1 |
| 2 | 10 |
| 3 | 11 |
| 4 | 100 |
| 5 | 101 |
| 6 | 110 |
| 7 | 111 |
| 8 | 1000 |
| 9 | 1001 |
| … | … |
It is now obvious the parity check on the right-hand side of the syndrome must involve all positions which have a 1 in the right-hand column; the second digit from the right must involve the numbers which have a 1 in the second column, etc. Therefore you have:
| Parity check #1 | 1, | 3, | 5, | 7, | 9, | 11, | 13, | 15, | … |
| Parity check #2 | 2, | 3, | 6, | 7, | 10, | 11, | 14, | 15, | … |
| Parity check #3 | 4, | 5, | 6, | 7, | 12, | 13, | 14, | 15, | … |
| Parity check #4 | 8, | 9, | 10, | 11, | 12, | 13, | 14, | 15, | … |
Thus if any error occurs in some position, those parity checks, and only those, will fail and give 1s in the syndrome, and this will produce exactly the binary representation of the position of the error. It is that simple!
To see the code in operation, suppose we confine ourselves to four message and three check positions. These numbers satisfy the condition
which is clearly a necessary condition, and the equality is sufficient. We pick as the positions for the checking bits (so the setting of the parity check will be easy) the check positions 1, 2, and 4. The message positions are therefore 3, 5, 6, and 7. Let the message be
We (1) write the message on the top line, (2) encode on the next line, (3) insert an error at position 6 on the next line, and (4) on the next three lines compute the three parity checks.
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | position |
| 1 | 0 | 0 | 1 | message | |||
| 0 | 0 | 1 | 1 | 0 | 0 | 1 | encoded message |
| 0 | 0 | 1 | 1 | 0 | 1 | 1 | message with error |
You apply the parity checks to the received message.
The binary number 110 is equivalent to the decimal number 6, so change the digit in position 6. Then drop the check positions 1, 2, and 4. You have the original message, 1001.
If it seems magical, then think of the all-0 message, for which all the checks will be 0. Then think of a single digit changing and you will see that as the position of the error is moved around, the syndrome binary number will change correspondingly and will always exactly match the position of the error. Next, note the sum of any two correct messages is still a correct message (the parity checks are additive modulo 2, hence the proper messages form an additive group modulo 2). A correct message will give all 0s, and hence the sum of a correct message plus an error in one position will give the position of the error regardless of the message being sent. The parity checks concentrate on the error and ignore the message.
How about a double error? If we want to catch (but not be able to correct) a double error, we simply add a single new parity check over the whole message we are sending. Let us see what will then happen at your end.
| old syndrome | new parity check | meaning |
|---|---|---|
| 000 | 0 | right answer |
| 000 | 1 | new parity check wrong |
| xxx | 1 | old parity check works |
| xxx | 0 | must be double error |
A single error correcting plus double error detecting code is often a good balance. Of course, the redundancy in the short message of four bits, with now four bits of check, is bad, but the number of parity bits rises roughly like the log of the message length. Too long a message and you risk a double uncorrectable error (which in a single error correcting code you will “correct” into a third error), too short a message and the cost in redundancy is too high. Again, an engineering judgment depending on the particular situation.
From analytic geometry you learned the value of using the alternate algebraic and geometric views. A natural representation of a string of bits is to use an n-dimensional cube, each string being a vertex of the cube. Given this picture and finally noting any error in the message moves the message along one edge, two errors along two edges, etc., I slowly realized I was to operate in the space of L1. The distance between symbols is the number of positions in which they differ. Thus we have a metric in the space, and it satisfies the four standard conditions for a distance (see Chapter 10, where it is identified as the standard L1 distance):
With a distance, we can define a sphere as all points (vertices, as that is all there is in the space of vertices) at a fixed distance from the center. For example, in the three-dimensional cube which can be easily sketched, Figure 12.3, the points (0,0,1), (0,1,0), and (1,0,0) are all unit distance from (0,0,0), while the points (1,1,0), (1,0,1), and (0,1,1) are all two units away, and finally the point (1,1,1) is three units away from the origin.
We now go to n dimensions, and draw a sphere of unit radius about each point and suppose that the spheres do not overlap. It is obvious that if the centers of these spheres are code points, and only these points, then at the receiving end any single error in a message will result in a non-code point and you can recognize where the error came from; it will be in the sphere about the point I sent to you, or equivalently in a sphere of radius 1 about the point you received. Hence we have an error-correcting code. The minimum distance between code points is 3. If we use non-overlapping spheres of radius 2, then a double error can be corrected, because the received point will be nearer to the original code point than any other point; double error correction, minimum distance of 5. The following table gives the equivalence of the minimum distance between code points and the correctability of errors.
| Min. distance | Meaning |
|---|---|
| 1 | unique decoding |
| 2 | single error detecting |
| 3 | single error correcting |
| 4 | 1 error correct and 2 error detect |
| 5 | double error correcting |
| 2k+1 | k error correction |
| 2k+2 | k error correction and k + 1 error detection |
Thus finding an error-correcting code is the same as finding a set of code points in the n-dimensional space which has the required minimum distance between legal messages, since the above conditions are both necessary and sufficient. It should also be clear that some error correction can be exchanged for more detection; give up one error correction and you get two more in error detection.
I earlier showed how to design codes to meet the conditions in the cases where the minimum distance is 1, 2, 3, or 4. Codes for higher minimum distances are not so easily found, and we will not go farther in that direction. It is easy to give an upper bound on how large the higher-distance codes can be. It is obvious the number of points in a sphere of radius k is
where C(n,k) is a binomial coefficient.
Hence if we divide the size of the volume of the whole space, 2n, by the volume of a sphere, then the quotient is an upper bound on the number of non-overlapping spheres, code points, in the corresponding space. To get an extra error detection we simply, as before, add an overall parity check, thus increasing the minimum distance from 2k + 1 to 2k + 2 (since any two points at the minimum distance will have the overall parity check set differently, thus increasing the minimum distance by 1).
Let us summarize where we are. We see by proper code design we can build a system from unreliable parts and get a much more reliable machine, and we see just how much we must pay in equipment, though we have not examined the cost in speed of computing if we make a computer with that level of error correction built into it. But I have previously stressed the other gain, namely field maintenance, and I want to mention it again and again. The more elaborate the equipment is, and we are obviously going in that direction, the more field maintenance is vital, and error-correcting codes not only mean in the field the equipment will give (probably) the right answers, but it can be maintained successfully by low-level experts.
The use of error-detecting and error-correcting codes is rising steadily in our society. In sending messages from the space vehicles we sent to the outer planets, we often have a mere 20 watts or less of power (possibly as low as 5 watts), and have to use codes which corrected hundreds of errors in a single block of message—the correction being done here on Earth, of course. When you are not prepared to overcome the noise, as in the above situation, or in cases of “deliberate jamming,” then such codes are the only known answer to the situation.
In the late summer of 1961 I was driving across the country from my sabbatical at Stanford, California, to Bell Telephone Laboratories in New Jersey, and I made an appointment to stop at Morris, Illinois, where the telephone company was installing the first electronic central office which was not an experimental one. I knew it used Hamming codes extensively, and I was, of course, welcomed. They told me they had never had a field installation go in so easily as this one did. I said to myself, “Of course, that is what I have been preaching for the past ten years.” When, during initial installation, any unit is set up and running properly (and you sort of know it is because of the self-checking and correcting properties), and you then turned your back on it to get the next part going, if the one you were neglecting developed a flaw, it told you so! The ease of initial installation, as well as later maintenance, was being verified right before their eyes! I cannot say too loudly, error correction not only gets the right answer when running, it can by proper design also contribute significantly to field installation and field maintenance; and the more elaborate the equipment, the more essential these two things are.
I now challenge you. What I wrote in a few pages was done in the course of a total of about three to six months, mainly working at odd moments while carrying on my main duties to the company. (Patent rights delayed the publication for more than a year.) Does anyone dare to say they, in my position, could not have done it? Yes, you are just as capable as I was to have done it—if you had been there and you had prepared yourself as well!
Of course, as you go through life you do not know what you are preparing yourself for—only you want to do significant things and not spend the whole of your life being a “janitor of science,” or whatever your profession is. Of course luck plays a prominent role. But so far as I can see, life presents you with many, many opportunities for doing great things (define them as you will) and the prepared person usually hits one or more successes, and the unprepared person will miss almost every time.
The above opinion is not based on this one experience, or merely on my own experiences. It is the result of studying the lives of many great scientists. I wanted to be a scientist, hence I studied them, and I looked into discoveries which happened where I was and asked questions of those who did them. This opinion is also based on common sense. You establish in yourself the style of doing great things, and then when opportunity comes you almost automatically respond with greatness in your actions. You have trained yourself to think and act in the proper ways.
There is one nasty thing to be mentioned, however: what it takes to be great in one age is not what is required in the next one. Thus you, in preparing yourself for future greatness (and the possibility of greatness is more common and easy to achieve than you think, since it is not common to recognize greatness when it happens under one’s nose), you have to think of the nature of the future you will live in. The past is a partial guide, and about the only one you have besides history is the constant use of your own imagination. Again, a random walk of random decisions will not get you anywhere near as far as those taken with your own vision of what your future should be.
I have both told and shown you how to be great; now you have no excuse for not doing so!
First, what is “information”? Shannon identified information with surprise. He chose the negative of the log of the probability of an event as the amount of information you get when the event of probability p happens. For example, if I tell you it is smoggy in Los Angeles, then p is near 1 and that is not much information, but if I tell you it is raining in Monterey in June, then that is surprising and represents more information. Because log 1 = 0, the certain event contains no information.
In more detail, Shannon believed the measure of the amount of information should be a continuous function of the probability p of the event, and for independent events it should be additive—what you learn from each independent event when added together should be the amount you learn from the combined event. As an example, the outcome of the roll of a die and the toss of a coin are generally regarded as independent events. In mathematical symbols, if I(p) is the amount of information you have for an event of probability p, then for event x of probability p1 and for the independent event y of probability p2, you will get for the event of both x and y that
This is a functional equation, true for all p1 and p2.
To solve this functional equation, suppose
This then gives
If p1 = p2 and p2 = p, then
etc. Extending this process you can show, via the standard method used for exponents, for all rational numbers m/n,
From the assumed continuity of the information measure, it follows the log is the only continuous solution to the functional equation from earlier.
In information theory it is customary to take the base of the log system as 2, so a binary choice is exactly one bit of information. Hence information is measured by the formula
This is a point which needs to be examined whenever any definition is offered. How far does the proposed definition, for example Shannon’s definition of information, agree with the original concepts you had, and how far does it differ? Almost no definition is exactly congruent with your earlier intuitive concept, but in the long run it is the definition which determines the meaning of the concept—hence the formalization of something via sharp definitions always produces some distortion.
Given an alphabet of q symbols with probabilities pi, then the average amount of information (the expected value) in the system is
The proof rests on the obvious picture, Figure 13.1, that
and equality occurs only at x = 1. Applying this inequality to the sum from earlier gives the following.
If there are q symbols in the signaling system, then picking the qi = 1/q we get from Gibbs’ inequality, by transposing the q terms
This says that in a probability distribution, if all the q symbols are of equal probability, 1/q, then the maximum entropy is exactly ln q, and otherwise the inequality holds.
Now if we now define the pseudo-probabilities
where of course Σ[Qi] = 1, it follows from Gibbs’ inequality that
Then after some algebra (remember that K ≤ 1, so we can drop the log term and perhaps strengthen the inequality further):
We now turn to the main theorem on the bounds on signaling systems, which use encoding of a bit stream of independent bits and go symbol to symbol in the presence of noise, meaning there is a probability a bit of information is correct, P > 1/2, and the corresponding probability Q = 1 – P that it is altered when it is transmitted. For convenience, assume the errors are independent and are the same for each bit sent, which is called “white noise.”
We will encode a long stream of n bits into one encoded message, the nth extension of a one-bit code, where the n is to be determined as the theory progresses. We regard the message of n bits as a point in an n-dimensional space. Since we have an nth extension, for simplicity we will assume each message has the same probability of occurring, and we will assume there are M messages (M also to be determined later), hence the probability of each initial message is 1/M.
We next examine the idea of the channel capacity. Without going into details, the channel capacity is defined as the maximum amount of information which can be sent through the channel reliably, maximized over all possible encodings, hence there is no argument that more information can be sent reliably than the channel capacity permits. It can be proved for the binary symmetric channel (which we are using) that the capacity C, per bit sent, is given by
where, as before, P is the probability of no error in any bit sent. For the n independent bits sent we will have the channel capacity
If we are to be near channel capacity, then we must send almost that amount of information for each of the symbols ai, i = 1, …, M, and all of probability 1/M, and we must have
when we send any one of the M equally likely messages ai. We have, therefore:
With n bits we expect to have nQ errors. In practice we will have, for a given message of n bits sent, approximately nQ errors in the received message. For large n the relative spread (spread = width, which is the square root of the variance) of the distribution of the number of errors will be increasingly narrow as n increases.
From the sender’s point of view I take the message ai to be sent and draw a sphere about it of radius
which is slightly larger by e2 than the expected number of errors, Q, Figure 13.2. If n is large enough, then there is an arbitrarily small probability of there occurring a received message point bj which falls outside this sphere. Sketching the situation as seen by me, the sender, we have along any radii from the chosen signal, ai, to the received message, bj, with the probability of an error is (almost) a normal distribution, peaking up at nQ, and with any given e2 there is an n so large the probability of the received point, bj, falling outside my sphere is as small as you please.
Now, looking at it from your end, Figure 13.3, as the receiver, there is a sphere S(r) of the same radius r about the received point, bi, in the space, such that if the received message, bi, is inside my sphere, then the original message ai sent by me is inside your sphere.
How can an error arise? An error can occur according to the following table.
| Case | ai in S(r) | Another in S(r) | Meaning |
|---|---|---|---|
| 1 | yes | yes | error |
| 2 | yes | no | no error |
| 3 | no | yes | |
| 4 | no | no | error |
Here we see that if there is at least one other original message point in the sphere about your received point then it is an error, since you cannot decide which one it is. The sent message is correct only if the sent point is in the sphere and there is no other code point in it.
We have, therefore, the mathematical equation for a probability PE of an error, if the message sent is ai:
We can drop the first factor in the second term by setting it equal to 1, thus making an inequality:
But using the obvious fact
hence
applied repeatedly to the last term on the right gives
By making n large enough the first term can be made as small as we please, say less than some number d. We have, therefore:
The decisive step is that Shannon averaged over all possible code books to find the average error! We will use the symbol Av[·] to mean average over the set of all possible random code books. Averaging over the constant d of course gives the constant, and we have, since for the average each term is the same as any other term in the sum
which can be increased (M – 1 goes to M):
For any particular message, when we average over all code books, the encoding runs through all possible values, hence the average probability that a point is in the sphere is the ratio of the volume of the sphere to the total volume of the space. The volume of the sphere is
where s = Q + e2 < 1/2, and ns is supposed to be an integer.
Note how the entropy H(s) has appeared in a binomial identity.
We have now to assemble the parts; note the Taylor series expansion of H(s) = H(Q + e2) gives a bound when you take only the first derivative term and neglect all others, to get the final expression
where
All we have to do now is pick an e2 so e3 < e1, and the last term will get as small as you please with sufficiently large n. Hence the average value of PE can be made as small as you please while still being as close to channel capacity C as you please.
Let us critique the result. Again and again we said, “For sufficiently large n.” How large is this n? Very, very large indeed if you want to be both close to channel capacity and reasonably sure you are right! So large, in fact, you would probably have to wait a very long time to accumulate a message of that many bits before encoding it, let alone the size of the random code books (which, being random, cannot be represented in a significantly shorter form than the complete listing of all Mn bits, both M and n being very large).
Error-correcting codes escape this waiting for a very long message and then encoding it via a very large encoding book, along with the corresponding large decoding book, because they avoid code books and adopt regular (computable) methods. In the simple theory they tend to lose the ability to come very near to the channel capacity and still keep an arbitrarily low error rate, but when a large number of errors are corrected by the code they can do well. Put into other words, if you provide a capacity for some level of error correction, then for efficiency you must use this ability most of the time or else you are wasting capacity, and this implies a high number of errors corrected in each message sent.
But the theorem is not useless! It does show, insofar as it is relevant, efficient encoding schemes must have very elaborate encodings of very long strings of bits of information. We see this accomplished in the satellites which passed the outer planets; they corrected more and more errors per block as they got farther and farther from both the Earth and the Sun (which for some satellites supplied the solar power of about 5 watts at most; others used atomic power sources of about the same power). They had to use high error-correcting codes to be effective, given the low power of the source, their small dish size, the limited size of the receiving dishes on Earth as seen from their position in space, and the enormous distances the signal had to travel.
Information theory does not tell you much about how to design, but it does point the way towards efficient designs. It is a valuable tool for engineering communication systems between machinelike things, but as noted before it is not really relevant to human communication of information. The extent to which biological inheritance, for example, is machine-like, and hence you can apply information theory to the genes, and to what extent it is not, and hence the application is irrelevant, is simply not known at present. So we have to try, and the success will show the machine-like character, while the failure will point towards other aspects of information which are important.
We now abstract what we have learned. We have seen all initial definitions, to a larger or smaller extent, should get the essence of our prior beliefs, but they always have some degree of distortion and hence non-applicability to things as we thought they were. It is traditional to accept, in the long run, that the definition we use actually defines the thing defined; but of course it only tells us how to handle things, and in no way actually tells us any meaning. The postulational approach, so strongly favored in mathematical circles, leaves much to be desired in practice.
Thus one purpose of this presentation of information theory, besides its usefulness, is to sensitize you to this danger, or, if you prefer, to how to use it to get what you want! It has long been recognized the initial definitions determine what you find, much more than most people care to believe. The initial definitions need your careful attention in any new situation, and they are worth reviewing in fields in which you have long worked so you can understand the extent to which the results are a tautology and not real results at all.
Now that we have examined computers and how they represent information, let us turn to how computers process information. We can, of course, only examine a very few of the things they do, and will concentrate on basics per usual.
Much of what computers process are signals from various sources, and we have already discussed why they are often in the form of a stream of numbers from an equally spaced sampling system. Linear processing, which is the only one I have time for in this book, implies digital filters. To illustrate
What to do? In the first place I thought I knew very little about digital filters, and, furthermore, I was not really interested in them. But does one wisely ignore one’s vp, plus the cogency of one’s own observations? No! The implied social waste was too high for me to contemplate comfortably.
As time went on I was getting a good education from him, and I got my first part of the book going, but he was still writing nothing. So one day I said, “If you don’t write more we will end up calling it Hamming and Kaiser”—and he agreed. Still later when I had about completed all the writing and he had still written nothing, I said I could thank him in the preface, but it should be called Hamming, and he agreed—and we are still good friends! That is how the book on digital filters I wrote came to be, and I saw it ultimately through three editions, always with good advice from Kaiser.
The book also took me many places which were interesting, since I gave short, one-week courses on it for many years. The short courses began while I was still writing it because I needed feedback and had suggested to ucla Extension Division that I give it as a short course, to which they agreed. That led to years of giving it at ucla, once each in Paris, London, and Cambridge, England, as well as many other places in the U.S. and at least twice in Canada. Doing what needed to be done, though I did not want to do it, paid off handsomely in the long run.
Being a mathematician I knew, as all of you do, that any complete set of functions will do about as good as any other set at representing arbitrary functions. Why, then, the exclusive use of the Fourier series? I asked various electrical engineers and got no satisfactory answers. One engineer said alternating currents were sinusoidal, hence we used sinusoids, to which I replied it made no sense to me. So much for the usual residual education of the typical electrical engineer after they have left school!
So I had to think of basics, just as I told you I had done when using an error-detecting computer. What is really going on? I suppose many of you know what we want is a time-invariant representation of signals, since there is usually no natural origin of time. Hence we are led to the trigonometric functions (the eigenfunctions of translation), in the form of both Fourier series and Fourier integrals, as the tool for representing things.
Second, linear systems, which is what we want at this stage, also have the same eigenfunctions—the complex exponentials which are equivalent to the real trigonometric functions. Hence a simple rule: if you have either a time-invariant system or a linear system, then you should use the complex exponentials.
Thus there are three good reasons for the Fourier functions: (1) time invariance, (2) linearity, and (3) the reconstruction of the original function from the equally spaced samples is simple and easy to understand.
Therefore we are going to analyze the signals in terms of the Fourier functions, and I need not discuss with electrical engineers why we usually use the complex exponents as the frequencies instead of the real trigonometric functions. We have a linear operation, and when we put a signal (a stream of numbers) into the filter, then out comes another stream of numbers. It is natural, if not from your linear algebra course then from other things such as a course in differential equations, to ask what functions go in and come out exactly the same except for scale. Well, as noted above, they are the complex exponentials; they are the eigenfunctions of linear, time-invariant, equally spaced sampled systems.
We begin our discussion with, “What is a signal?” Nature supplies many signals which are continuous, and which we therefore sample at equal spacing and further digitize (quantize). Usually the signals are a function of time, but any experiment in a lab which uses equally spaced voltages, for example, and records the corresponding responses is also a digital signal. A digital signal is, therefore, an equally spaced sequence measurement in the form of numbers, and we get out of the digital filter another equally spaced set of numbers. One can, and at times must, process non-equally spaced data, but I shall ignore them here.
The quantization of the signal into one of several levels of output often has surprisingly small effect. You have all seen pictures quantized to two, four, eight, and more levels, and even the two-level picture is usually recognizable. I will ignore quantization here as it is usually a small effect, though at times it is very important.
where a is the positive remainder after removing the integer number of rotations, k (we always use rotations in discussing results, and use radians while applying the calculus, just as we use base-10 logs and base-e logs), and n is the step number. If a > 1/2, then we can write the above as
The aliased band, therefore, is less than half a rotation, plus or minus. If we use the two real trigonometric functions, sine and cosine, we have a pair of eigenfunctions for each frequency, and the band is from 0 to 1/2 rotation, but when we use the complex exponential notation then we have one eigenfunction for each frequency, but now the band reaches from –1/2 to 1/2 rotations. This avoidance of the multiple eigenvalues is part of the reason the complex frequencies are so much easier to handle than are the real sine and cosine functions. The maximum sampling rate for which aliasing does not occur is two samples in the cycle, and is called the Nyquist rate. From the samples the original signal cannot be determined to within the aliased frequencies; only the basic frequencies that fall in the fundamental interval of unaliased frequencies (–1/2 to 1/2) can be determined uniquely. The signals from the various aliased frequencies go to a single frequency in the band and are algebraically added; that is what we see once the sampling has been done. Hence addition or cancellation may occur during the aliasing, and we cannot know from the aliased signal what we originally had. At the maximum sampling rate one cannot tell the result from 1, hence the unaliased frequencies must be within the band.
We shall stretch (compress) time so we can take the sampling rate to be one per unit time, because this makes things much easier and brings experiences from the milli- and microsecond range to those which may take days or even years between samples. It is always wise to adopt a standard notation and framework of thinking of diverse things—one field of application may suggest things to do in the other. I have found it of great value to do so whenever possible—remove the extraneous scale factors and get to the basic expressions. (But then I was originally trained as a mathematician.)
Aliasing is the fundamental effect of sampling and has nothing to do with how the signals are processed. I have found it convenient to think that once the samples have been taken, then all the frequencies are in the Nyquist band, and hence we do not need to draw periodic extensions of anything, since the other frequencies no longer exist in the signal—once the sampling has occurred, the higher frequencies have been aliased into the lower band, and do not exist up there anymore. A significant savings in thinking! The act of sampling produces the aliased signal we must use.
I now turn to three stories which use only the ideas of sampling and aliasing. In the first story I was trying to compute the numerical solution to a system of 28 ordinary differential equations and I had to know the sampling rate to use (the step size of the solution is the sampling rate you are using), since if it were half as large as expected then the computing bill would be about twice as much. For the most popular and practical methods of numerical solution the mathematical theory bases the step size on the fifth derivative. Who could know the bound? No one! But viewed as sampling, then the aliasing begins at two samples for the highest frequency present, provided you have data from minus to plus infinity. Having only a short range of at most five points of data, I intuitively figured I would need about twice the rate, or four samples per cycle. And finally, having only data on one side, perhaps another factor of two; in all eight samples per cycle.
I next did two things: (1) developed the theory, and (2) ran numerical tests on the simple differential equation
They both showed at around seven samples per cycle you are on the edge of accuracy (per step) and at ten you are very safe. So I explained the situation to them and asked them for the highest frequencies in the expected solution. They saw the justice of my request, and after some days they said I had to worry about the frequencies up to 10 cycles per second, and they would worry about those above. They were right, and the answers were satisfactory. The sampling theorem in action!
The sampling is fundamental to the way we currently process data when we use the digital computers. And now that we understand what a signal is, and what sampling does to a signal, we can safely turn to more of the details of processing signals.
We will first discuss nonrecursive filters, whose purpose is to pass some frequencies and stop others. The problem first arose in the telephone company when they had the idea that if one voice message had all its frequencies moved up (modulated) to beyond the range of another, then the two signals could be added and sent over the same wires, and at the other end filtered out and separated, and the higher one reduced (demodulated) back to its original frequencies. This shifting is simply multiplying by a sinusoidal function, and selecting one band (single-sideband modulation) of the two frequencies which emerge according to the following trigonometric identity (this time we use real functions):
There is nothing mysterious about the frequency shifting (modulation) of a signal; it is at most a variant of a trigonometric identity.
The nonrecursive filters we will consider first are mainly of the smoothing type, where the input is the values u(t) = u(n) = un and the output is
with cj = c–j (the coefficients are symmetric about the middle value c0).
I need to remind you about least squares as it plays a fundamental role in what we are going to do, hence I will design a smoothing filter to show you how filters can arise. Suppose we have a signal with “noise” added and want to smooth it, remove the noise. We will assume it seems reasonable to fit a straight line to five consecutive points of the data in a least-squares sense, and then take the middle value on the line as the “smoothed value of the function” at that point.
For mathematical convenience we pick the five points at t = –2,–1, 0, 1, 2 and fit the straight line, Figure 14.1,
Least squares says we should minimize the sum of the squares of the differences between the data and the points on the line, that is, minimize
What are the parameters to use in the differentiation to find the minimum? They are the a and the b, not the t (now the discrete variable k), and u. The line depends on the parameters a and b, and this is often a stumbling block for the student; the parameters of the equation are the variables for minimization! Hence on differentiating with respect to a and b, and equating the derivatives to zero to get the minimum, we have:
In this case we need only a, the value of the line at the midpoint, hence using (some of the sums are for later use)
from the top equation we have
which is simply the average of the five adjacent values. When you think about how to carry out the computation for a, the smoothed value, think of the data in a vertical column, Figure 14.2, with the coefficients each 1/5, as a running weighting of the data; then you can think of it as a window through which you look at the data, with the “shape” of the window being the coefficients of the filter, this case of smoothing being uniform in size.
Had we used 2k + 1 symmetrically placed points, we would still have obtained a running average of the data points as the smoothed value which is supposed to eliminate the noise.
Suppose instead of a straight line we had smoothed by fitting a quadratic, Figure 14.3:
Setting up the difference of the squares and differentiating this time with respect to a, b, and c we get
Again we need only a. Rewriting the first and third equations (the middle one does not involve a) and inserting the known sums from above, we have
To eliminate c, which we do not need, we multiply the top equation by 17 and the lower equation by –5, and add to get
and this time our “smoothing window” does not have uniform coefficients, but has some with negative values. Do not let that worry you, as we were speaking of a window in a metaphorical way and hence negative transmission is possible.
If we now shift these two least-squares-derived smoothing formulas to their proper places about the point n, we would have
Hence the eigenvalue at the frequency ω (the transfer function) is, by elementary trigonometry,
In the parabolic smoothing case we will get
These are easily sketched along with the 2k + 1 smoothing by straight line curves, Figures 14.4 and 14.5.
Smoothing formulas have central symmetry in their coefficients, while differentiating formulas have odd symmetry. From the obvious formula
we see any formula is the sum of an odd and an even function, hence any nonrecursive digital filter is the sum of a smoothing filter and a differentiating filter. When we have mastered these two special cases we have the general case in hand.
For smoothing formulas we see the eigenvalue curve (the transfer function) is a Fourier expansion in cosines, while for the differentiation formula it will be an expansion in sines. Thus we are led, given a transfer function you want to achieve, to the problem of Fourier expansions of a given function.
Now to a brief recapitulation of Fourier series. If we assume that the arbitrary function f(t) is represented
we use the orthogonality conditions (they can be found by elementary trigonometry and simple integrations)
to get
and because we used an a0/2 for the first coefficient, the same formula for ak holds for the case k = 0. In the complex notation it is, of course, much simpler.
Next we need to prove the fit of any orthogonal set of functions gives the least-squares fit. Let the set of orthogonal functions be {fk(t)} with weight function w(t) ≥ 0. Orthogonality means
As above, the formal expansion will give the coefficients
where
when the functions are real, and in the case of complex functions we multiply through by the complex conjugate function.
Now consider the least-squares fit of a complete set of orthogonal functions using the (capitalized) coefficients Ck. We have
to minimize. Differentiate with respect to Cm. You get
and we see from a rearrangement that Ck = ck. Hence all orthogonal-function fits are least-squares fits, regardless of the set of orthogonal functions used.
If we keep track of the inequality we find we will have, in the general case by Bessel’s inequality,
for the number of coefficients taken in the sum, and this provides a running test for when you have taken enough terms in a finite approximation. In practice this has proven to be a very useful guide to how many terms to take in a Fourier expansion.
When digital filters first arose they were viewed merely as a variant of the classical analog filters; people did not see them as essentially new and different. This is exactly the same mistake which was made endlessly by people in the early days of computers. I was told repeatedly, until I was sick of hearing it, that computers were nothing more than large, fast desk calculators. “Anything you can do by a machine you can do by hand,” so they said. This simply ignores the speed, accuracy, reliability, and lower costs of the machines vs. humans. Typically a single order of magnitude change (a factor of ten) produces fundamentally new effects, and computers are many, many times faster than hand computations. Those who claimed there was no essential difference never made any significant contributions to the development of computers. Those who did make significant contributions viewed computers as something new, to be studied on their own merits, and not as merely more of the same old desk calculators, perhaps souped up a bit.
This is a common, endlessly made mistake; people always want to think that something new is just like the past—they like to be comfortable in their minds as well as their bodies—and hence they prevent themselves from making any significant contribution to the new field being created under their noses. Not everything which is said to be new really is new, and it is hard to decide in some cases when something is new, yet the all too common reaction of “it’s nothing new” is stupid. When something is claimed to be new, do not be too hasty to think it is just the past slightly improved—it may be a great opportunity for you to do significant things. But then again, it may be nothing new.
The earliest digital filter I used, in the early days of primitive computers, was one which smoothed first by threes and then by fives. Looking at the formula for smoothing, the smoothing by threes has the
which is easy to draw. The smoothing by fives is the same, except that the 3/2 becomes a 5/2, and is again easy to draw, Figure 15.1. One filter followed by the other is obviously their product (each multiplies the input eigenfunction by the transfer function at that frequency), and you see there will be three zeros in the interval, and the terminal value will be 1/15. An examination will show the upper half of the frequencies were fairly well removed by this very simple program for computing a running sum of three numbers, followed by a running sum of five—as is common in computing practice, the divisors were left to the very end, where they were allowed for by one multiplication by 1/15.
Now you may wonder how, in all its detail, a digital filter removes frequencies from a stream of numbers—and even students who have had courses in digital filters may not be at all clear how the miracle happens. Hence I propose, before going further, to design a very simple digital filter and show you the inner working on actual numbers.
I propose to design a simple filter with just two coefficients, and hence I can meet exactly two conditions on the transfer function. When doing theory we use the angular frequency ω, but in practice we use rotations f, and the relationship is
Let the first condition on the digital filter be that at f = 1/6 the transfer function is exactly 1 (this frequency is to get through the filter unaltered), and the second condition is that at f = 1/3 it is to be zero (this frequency is to be stopped completely). My simple filter has the form, with the two coefficients a and b,
Substituting in the eigenfunction e2πifn we will get the transfer function, and using n = 0 for convenience,
The solution is
and the smoothing filter is simply
In words, the output of the filter is the sum of three consecutive inputs divided by two, and the output is opposite the middle input value. (It is the earlier smoothing by threes, except for the coefficient 1/2.)
Now to produce some sample data for the input to the filter. At the frequency f = 1/6 we use a cosine at that frequency, taking the values of the cosine at equal spaced values n = 0, 1, …, while on the second column of data we use the second frequency f = 1/3, and finally the third column is the sum of the two other columns and is a signal composed of the two frequencies in equal amounts.
| n | 1/6 | 1/3 | Sum |
|---|---|---|---|
| 0 | 1 | 1 | 2 |
| 1 | 1/2 | –1/2 | 0 |
| 2 | –1/2 | –1/2 | –1 |
| 3 | –1 | 1 | 0 |
| 4 | –1/2 | –1/2 | –1 |
| 5 | 1/2 | –1/2 | 0 |
| 6 | 1 | 1 | 2 |
| 7 | 1/2 | –1/2 | 0 |
| 8 | –1/2 | –1/2 | –1 |
| … | … | … | … |
Let us run the data through the filter. We compute, according to the filter formula, the sum of three consecutive numbers in a column and then divide their sum by two. Doing this on the first column you will see that each time the filter is shifted down one line it reproduces the input function (with a multiplier of 1). Try the filter on the second column and you will find every output is exactly 0, the input function multiplied by its eigenvalue 0. The third column, which is the sum of the first two columns, should pass the first and stop the second frequency, and you get out exactly the first column. You can try the 0 frequency input and you should get exactly 3/2 for every value; if you try f = 1/4 you should get the input multiplied by 1/2 (the value of the transfer function at f = 1/2).
If we simply expand the desired transfer function into a sum of cosines and then truncate it, we will get a least-squares approximation to the transfer function. But at a discontinuity the least-squares fit is not what you probably think it is.
Another story. Michelson, of Michelson-Morley fame, built an analog machine to find the coefficients of a Fourier series out to 75 terms. The machine could also, because of the duality of the function and the coefficients, go from the coefficients back to the function. When Michelson did this he observed an overshoot and asked the local mathematicians why it happened. They all said it was his equipment—and yet he was well known as a very careful experimenter. Only Gibbs, of Yale, listened and looked into the matter.
The simplest direct approach is to expand a standard discontinuity, say the function
I remarked it was rediscovered. Yes. In the 1850s Cauchy’s textbooks (1) stated that a convergent series of continuous functions converged to a continuous function and (2) demonstrated the Fourier expansion of a discontinuous function. These flatly contradicted each other. Some people looked into the matter and found they needed the concept of uniform convergence. Yes, the overshoot of the Gibbs phenomenon occurs for any series of continuous functions, not just to the Fourier series, and was known to some people, but it had not diffused into common usage. For the general set of orthogonal functions, the amount of overshoot depends upon where in the interval the discontinuity occurs. This differs from the Fourier functions, where the amount of the overshoot is independent of where the discontinuity occurs.
We need to remind you of another feature of the Fourier series. If the function exists (for practical purposes), then the coefficients fall off like 1/n. If the function is continuous, Figure 15.3 (the two extreme end values must be the same), and the derivative exists, then the coefficients fall off like 1/n2; if the first derivative is continuous and the second derivative exists, then they fall off like 1/n3; if the second derivative is continuous and the third derivative exists, then 1/n4, etc. Thus the rate of convergence is directly observable from the function along the real line—which is not true for the Taylor series, whose convergence is controlled by singularities which may lie in the complex plane.
Now we return to our design of a smoothing digital filter using the Fourier series to get the leading terms. We see the least-squares fit has trouble at any discontinuity—there is a nasty overshoot in the transfer function for any finite number of terms, no matter how far out we go.
To remove this overshoot we first examine Lanczos’s window, also called a “box car” or a “rectangular” window. Lanczos reasoned that if he averaged the output function over an interval of the length of a period of the highest frequency present, then this averaging would greatly reduce the ripples. To see this in detail we take the Fourier series expansion truncated at the Nth harmonic, and integrate about a point t in a symmetric interval of length 1/N of the whole interval. Set up the integral for the averaging,
We now integrate to get
Substituting the limits and then using some trigonometry and algebra then yields
Thus you come out with the original coefficients multiplied by the so-called sigma factors,
But back to my adventures in the matter. I knew, as you do, that at the discontinuity the truncated Fourier expansion takes on the mid-value of the two limits, one from each side. Thinking about the finite, discrete case, I reasoned that instead of taking all 1 values in the pass band and 0 values in the stop band, I should take 1/2 at the transition value. Lo and behold, the transfer function becomes
and now has an extra factor (back in the rotational notation)
and another function be
The sum and difference of g(x) and h(x) are clearly the corresponding series with the sum or difference of the coefficients.
The product is another matter. Evidently we will have again a sum of exponentials, and setting n = k + m we will have the coefficients as indicated:
The coefficient of einx, which is a sum of terms, is called the convolution of the original arrays of coefficients.
In the case where there are only a few nonzero coefficients in the ck coefficient array, for example, say symmetrically placed about 0, we will have for the coefficient
and this we recognize as the original definition of a digital filter! Thus a filter is the convolution of one array by another, which in turn is merely the multiplication of the corresponding functions! Multiplication on one side is convolution on the other side of the equation.
As an example of the use of this observation, suppose, as often occurs, there is potentially an infinite array of data, but we can record only a finite number of them (for example, turning on or off a telescope while looking at the stars). This function un is being looked at through the rectangular window of all 0s outside a range of (2N + 1) 1s—the value 1 where we observe, and the value 0 where we do not observe. When we try to compute the Fourier expansion of the original array from the observed data, we must convolve the original coefficients by the coefficients of the window array:
Generally we want a window of unit area, so we need, finally, to divide by (2N + 1). The array is a geometric progression with the starting value of e−iNx and constant ratio of eix:
At x = 0 this takes on the value 1, and otherwise oscillates rapidly due to the sine function in the numerator, and decays slowly due to the increase of the sine in the denominator (the range in x is from –π to π). Thus we have the typical diffraction pattern of optics.
In the continuous case, before sampling, the situation is much the same, but the rectangular window we look through has the transform of the general form (ignoring all details)
Some rather difficult trigonometric manipulation will directly convince you that whether we sample the function and then limit the range of observations, or limit the range and then sample, we will end up with the same result; theory will tell you the same thing.
The simple modification of the discrete Lanczos window by changing only the outer two coefficients from 1 to 1/2 produced a much better window. The Lanczos window with its sigma factors modified all the coefficients, but its shape had a corner at the ends, and this means, due to periodicity, there are two discontinuities in the first derivative of the window shape—hence slow convergence. If we reason using weights on the coefficients of the raw Fourier series of the form of a raised cosine
then we will have something like the Lanczos window, but now there will be greater smoothness, hence more rapid convergence.
Writing this out in the exponential form we find the weights on the exponentials are
then the Hamming window is a “raised cosine on a platform” with weights
(Figure 15.4). Actually the weights depend on N, the length of data, but so slightly these constants are regularly used for all cases. The Hamming window has a mysterious, hence popular, aura about it with its peculiar coefficients, but it was designed to do a particular job and is not a universal solution to all problems. Most of the time the von Hann window is preferable. There are in the literature possibly 100 various windows, each having some special merit, and none having all the advantages you may want.
It must be your friends, in some sense, who make you famous by quoting and citing you, and it pays, so I claim, to be helpful to others as they try to do their work. They may in time give you credit for the work, which is better than trying to claim it yourself. Cooperation is essential in these days of complex projects; the day of the individual worker is dying fast. Teamwork is more and more essential, and hence learning to work in a team, indeed possibly seeking out places where you can help others, is a good idea. In any case the fun of working with good people on important problems is more pleasure than the resulting fame. And the choice of important problems means generally management will be willing to supply all the assistance you need.
In my many years of doing computing at Bell Telephone Laboratories I was very careful never to write up a result which involved any of the physics of the situation, lest I get a reputation for “stealing others’ ideas.” Instead I let them write up the results, and if they wanted me to be a coauthor, fine! Teamwork implies a very careful consideration for others and their contributions, and they may see their contributions in a different light than you do!
We are now ready to consider the systematic design of nonrecursive filters. The design method is based on Figure 16.1, which has six parts. On the upper left is a sketch of the ideal filter you wish to have. It can be a low-pass, a high-pass, a band-pass, a band-stop, a notch filter, or even a differentiator. For anything other than differentiator filters you usually want either 0 or 1 as the height in the various intervals, while for the differentiator you want iω, since the derivative of the eigenfunction is
and hence the desired
In the method sketched above, you must choose both the N, the number of terms to be kept, and the particular window shape, and if what you get does not suit you, then you must make new choices. It is a “trial and error” design method.
For a band-pass filter, with fp as the band-pass and fs as the band-stop frequencies, the sequence of design formulas is
If N is too big, you stop and reconsider your design. Otherwise you go ahead and compute in turn
(this is plotted in Figure 16.4). The original Fourier coefficients for a band-pass filter are given by
These coefficients are to be multiplied by the corresponding weights wk of the window
where
I0(x) is the pure imaginary Bessel function of order 0. For computing it you will need comparatively few terms, as there is an (n)!2 in the denominator and hence the series converges rapidly.
I0(x) is best computed recursively; for a given x, the successive terms of the series are given by
For a low-pass or a high-pass, one of the two frequencies fp or fs has the limit possible for it. For a band-stop filter there are slight changes in the formulas for the coefficients ck.
Let us examine Kaiser’s window coefficients, the wk:
As we examine these numbers we see they have, for α > 0, something like the shape of a raised cosine
How did Kaiser find the formulas? To some extent by trial and error. He first assumed he had a single discontinuity, and he ran a large number of cases on a computer to see both the rise time ΔF and the ripple height δ. With a fair amount of thinking, plus a touch of genius, and noting as a function of A that as A increases we pass from a Lanczos window (A < 21) to a platform of increasing height, 1/I0 (α). Ideally he wanted a prolate spheroidal function, but he noted they are accurately approximated, for his values, by the I0(x). He plotted the results and approximated the functions. I asked him how he got the exponent 0.4. He replied he tried 0.5 and it was too large, and 0.4, being the next natural choice, seemed to fit very well. It is a good example of using what one knows plus the computer as an experimental tool, even in theoretical research, to get very useful results.
Kaiser’s method will fail you once in a while because there will be more than one edge (indeed, there is the symmetric image of an edge on the negative part of the frequency line) and the ripples from different edges may by chance combine and make the filter ripples go beyond the designated amount. In this case, which seldom arises, you simply repeat the design with a smaller tolerance. The whole program is easily accommodated on a primitive handheld programmable computer like the ti–59, let alone on a modern pc.
We next turn to the finite Fourier series. It is a remarkable fact the Fourier functions are orthogonal, not only over a line segment, but for any discrete set of equally spaced points. Hence the theory will go much the same, except there can be only as many coefficients determined in the Fourier series as there are points. In the case of 2N points, the common case, there is one term of the highest frequency only, the cosine term (the sine term would be identically zero at the sample points). The coefficients are determined as sums of the data points multiplied by the appropriate Fourier functions. The resulting representation will, within roundoff, reproduce the original data.
To compute an expansion it would look like 2N terms each with 2N multiplications and additions, hence something like (2N)2 operations of multiplication and addition. But by both (1) doing the addition and subtraction of terms with the same multiplier before doing the multiplications and (2) producing higher frequencies by multiplying lower ones, the fast Fourier transform (fft) has emerged requiring about N log N operations. This reduction in computing effort has greatly transformed whole areas of science and engineering—what was once impossible in both time and cost is now routinely done.
Moral: when you know something cannot be done, also remember the essential reason why, so later, when the circumstances have changed, you will not say, “It can’t be done.” Think of my error! How much more stupid can anyone be? Fortunately for my ego, it is a common mistake (and I have done it more than once), but due to my goof on the fft I am very sensitive to it now. I also note when others do it—which is all too often! Please remember the story of how stupid I was and what I missed, and do not make that mistake yourself. When you decide something is not possible, don’t say at a later date it is still impossible without first reviewing all the details of why you originally were right in saying it couldn’t be done.
I must now turn to the delicate topic of power spectra, which is the sum of the squares of the two coefficients of a given frequency in the real domain, or the square of the absolute value in the complex notation. An examination of it will convince you this quantity does not depend on the origin of the time, but only on the signal itself, contrary to the dependence of the coefficients on the location of the origin. The spectrum has played a very important role in the history of science and engineering. It was the spectral lines which opened the black box of the atom and allowed Bohr to see inside. The newer quantum mechanics, starting around 1925, modified things slightly, to be sure, but the spectrum was still the key. We also regularly analyze black boxes by examining the spectrum of the input and the spectrum of the output, along with correlations, to get an understanding of the insides—not that there are always unique insides, but generally we get enough clues to formulate a new theory.
Let us analyze carefully what we do and its implications, because what we do to a great extent controls what we can see. There is usually, in our imaginations at least, a continuous signal. This is usually endless, and we take a sample in time of length 2L. This is the same as multiplying the signal by a Lanczos window, a box car if you prefer. This means the original signal is convolved with the corresponding function of the form (sin x)/x, Figure 16.5—the longer the signal, the narrower the (sin x)/x loops are. Each pure spectral line is smeared out into its (sin x)/x shape.
Next we sample at equal spaces in time, and all the higher frequencies are aliased into lower frequencies. It is obvious that interchanging these two operations, and sampling and then limiting the range, will give the same results—and, as I earlier said, I once carefully worked out all the algebraic details to convince myself that what I thought had to be true from theory was indeed true in practice.
Then we use the fft, which is only a cute, accurate way of getting the coefficients of a finite Fourier series. But when we assume the finite Fourier series representation we are making the function periodic—and the period is exactly the sampling interval size times the number of samples we take! This period has generally nothing to do with the periods in the original signal. We force all non-harmonic frequencies into harmonic ones—we force a continuous spectrum to be a line spectrum! This forcing is not a local effect, but as you can easily compute, a non-harmonic frequency goes into all the other frequencies, most strongly into the adjacent ones of course, but nontrivially into more remote frequencies.
Let us turn to theory. Every spectrum of real noise falls off reasonably rapidly as you go to infinite frequencies, or else it would have infinite energy, Figure 16.6. But the sampling process aliases the higher frequencies in lower ones, and the folding as indicated tends to produce a flat spectrum—remember, the frequencies when aliased are algebraically added. Hence we tend to see a flat spectrum for noise, and if it is flat then we call it white noise. The signal, usually, is mainly in the lower frequencies. This is true for several reasons, including “over-sampling” (sampling more often than is required from the Nyquist theorem), and means we can get some averaging to reduce the instrumental errors. Thus the typical spectrum will look as shown in Figure 16.6. Hence the prevalence of low-pass filters to remove the noise. No linear method can separate the signal from the noise at the same frequencies, but those beyond the signal can be so removed by a low-pass filter. Therefore, when we “over-sample” we have a chance to remove more of the noise by a low-pass filter.
I carefully said in the opening talk on digital filters I thought at that time I knew nothing about them. What I did not know was that because I was then ignorant of recursive digital filter design, I had effectively created it when I examined closely the theory of predictor-corrector methods of numerically solving ordinary differential equations. The corrector is practically a recursive digital filter!
While doing the study on how to integrate a system of ordinary differential equations numerically I was unhampered by any preconceived ideas about digital filters, and I soon realized a bounded input, in the words of the filter experts, could produce, if you were integrating, an unbounded output—which they said was unstable, but clearly it is just what you must have if you are to integrate; even a constant will produce a linear growth in the output. Indeed, when later I faced integrating trajectories down to the surface of the Moon, where there is no air, hence no drag, hence no first derivatives explicitly in the equations, and wanted to take advantage of this by using a suitable formula for numerical integration, I found I had to have a quadratic error growth; a small roundoff error in the computation of the acceleration would not be corrected and would lead to a quadratic error in position—an error in the acceleration produces a quadratic growth in position. That is the nature of the problem, unlike on Earth, where the air drag provides some feedback correction to the wrong value of the acceleration, and hence some correction to the error in the position.
Thus I have to this day the attitude that stability in digital filters means “not exponential growth” from bounded inputs, but allows polynomial growth, and this is not the standard stability criterion derived from classical analog filters, where if it were not bounded you would melt things down—and anyway, they had never really thought hard about integration as a filter process.
We will take up this important topic of recursive filters, which are necessary for integration, in the next chapter.
We now turn to recursive filters, which have the form
From this formula it will be seen we have values on only one side of the current value n, and we use both old and current signal values, un, and old values of the outputs, yn. This is classical, and arises because we are often processing a signal in real time and do not have access to future values of the signal.
But, considering basics, we see that if we did have “future values,” then a two-sided prediction would probably be much more accurate. We would then, in computing the yn values, face a system of simultaneous linear equations—nothing to be feared in these days of cheap computing. We will set aside this observation, noting only that often these days we record the signal on a tape or other media, and later process it in the lab—and therefore we have the future available now. Again, in picture processing, a recursive digital filter which used only data from one side of the point being processed would be foolish, since it would not use some of the available, relevant information.
The next thing we see is that the use of old output as new input means that we have
Being a linear system, we see that whatever pure frequency we put into the filter when in the steady state, only that frequency can emerge, though it may be phase shifted. The transients, however, can have other frequencies, which arise from the solution of the homogeneous difference equation. The fact is we are solving a difference equation with constant coefficients with the un terms forming the “forcing function”—that is exactly what a recursive filter is, and nothing else.
We therefore assume for the steady state (which ignores the transients)
I will not go on to the design of recursive digital filters here, only note that I had effectively developed the theory myself in coping with corrector formulas for numerically solving ordinary differential equations. The form of the corrector in a predictor-corrector method is
We see the uj of the recursive filter are now the derivatives y'n of the output and come from the differential equation. In the standard nonrecursive filter there are no feedback paths—the yn that are computed do not appear later in the right-hand side. In the differential equation formula they appear in this feedback path, and also through the derivative terms they form another, usually nonlinear, feedback path. Hence stability is a more difficult topic for differential equations than it is for recursive filters.
These recursive filters are often called “infinite impulse response filters” (iir) because a single disturbance will echo around the feedback loop, which even if the filter is stable will die out only like a geometric progression. Being me, of course I asked myself if all recursive filters had to have this property, and soon found a counterexample. True, it is not the kind of filter you would normally design, but it showed their claim was superficial. If you will only ask yourself, “Is what I am being told really true?,” it is amazing how much you can find is, or borders on, being false, even in a well-developed field!
In Chapter 26 I will take up the problem of dealing with the expert. Here you see a simple example of what happens all too often. The experts were told something in class when they were students first learning things, and at the time they did not question it. It becomes an accepted fact, which they repeat and never really examine to see if what they are saying is true or not, especially in their current situation.
Let me now turn to another story. A lady in the mathematics department at Bell Telephone Laboratories was square dancing with a physicist one weekend at a party, and on Monday morning in the hallway she casually mentioned to me a problem he had. He was measuring the number of counts in a radioactive experiment at each of, as I remember, 256 energy levels. It is called “the spectrum of the process.” His problem was he needed the derivative of the data.
Well, you know: (a) the number of nuclear counts at a given energy is bound to produce a ragged curve, and (b) differentiating this to get the local slope is going to be a very difficult thing to do. The more I thought about her casual remark, the more I felt he needed real guidance—meaning me! I looked him up in the Bell Telephone Laboratories phone book and explained my interest and how I got it. He immediately wanted to come up to my office, but I was obdurate and insisted on meeting in his laboratory. He tried using his office, and I stuck to the lab. Why? Because I wanted to size up his abilities and decide if I thought his problem was worth my time and effort, since it promised to be a tough nut to crack. He passed the lab test with flying colors—he was clearly a very competent experimenter. He was at about the limit of what he could do—a week’s run to get the data and a lot of shielding was around the radioactive source, hence not much we could do to get better data. Furthermore, I was soon convinced, although I knew little about the details, that his experiment was important to physics as well as to Bell Telephone Laboratories. So I took on the problem. Moral: to the extent you can choose, work on problems you think will be important.
Obviously it was a smoothing problem, and Kaiser was just teaching me the facts, so what better to do than take the experimentalist to Kaiser and get Kaiser to design the appropriate differentiating filter? Trouble immediately! Kaiser had always thought of a signal as a function of time, and the square of the area under the curve as the energy, but here the energy was the independent variable! I had repeated trouble with Kaiser over this point until I bluntly said, “All right, his energy is time and the measurements, the counts, are the voltage.” Only then could Kaiser do it. The curse of the expert, with their limited view of what they can do. I remind you Kaiser is a very able man, yet his expertise, as so often happens to the expert, limited his view. Will you in your turn do better? I am hoping such stories as this one will help you avoid that pitfall.
As I earlier observed, it is usually the signal which is in the lower part of the Nyquist interval of the spectrum, and the noise is pretty well across the whole of the Nyquist interval, so we needed to find the cutoff edge between the meaningful physicist’s signal and the flat white noise. How to find it? First, I extracted from the physicist the theoretical model he had in his mind, which was a lot of narrow spectral lines of Gaussian shape on top of a broad Gaussian shape (I suspected Cauchy shapes, but did not argue with him as the difference would be too small, given the kind of data we had). So we modeled it, and he created some synthetic data from the model. A quick spectral analysis, via an fft, gave the signal confined to the lowest 1/20 of the Nyquist interval. Second, we processed a run of his experimental data and found the same location for the edge! What luck! (Perhaps the luck should be attributed to the excellence of the experimenter.) For once theory and practice agreed! We would be able to remove about 95% of the noise.
Kaiser finally wrote for him a program which would, given the cutoff edge position wherever the experimenter chose to put it, design the corresponding filter. The program (1) designed the corresponding differentiating filter, (2) wrote the program to compute the smoothed output, and then (3) processed the data through this filter without any interference from the physicist.
I later caught the physicist adjusting the cutoff edge for different parts of the energy data on the same run, and had to remind him there was such a thing as “degrees of freedom,” and what he was doing was not honest data processing. I had much more trouble, once things were going well, to persuade him that to get the most out of his expensive data, he should actually work in the square roots of the counts, as they had equal variances. But he finally saw the light and did so. He and Kaiser wrote a classic paper in the area, as it opened the door on a new range of things which could be done.
My contribution? Mainly, first identifying the problem, next getting the right people together, then monitoring Kaiser to keep him straight on the fact that filtering need not have exclusively to do with time signals, and finally, reminding them of what they knew from statistics (or should have known and probably did not).
Most signal processing is indeed done on time signals. But most digital filters will probably be designed for small, special-purpose studies, not necessarily signals in time. This is where I ask for your future attention. Suppose that when you are in charge of things at the top, you are interested in some data which shows past records of relative expenses of manpower to equipment. It is bound to be noisy data, but you would like to understand, in a basic sense, what is going on in the organization—what long-term trends are happening so slowly that people hardly sense them as they happen, but which nevertheless are fundamental to understand if you are to manage well. You will need a digital filter to smooth the data to get a glimpse of the trend, if it exists. You do not want to find a trend when it does not exist, but if it does you want to know pretty much what it has been, so you can project what it is likely to be in the near future. Indeed, you might want to observe, if the data will support it, any change in the slope of the trend. Some signals, such as the ratio of firepower to tonnage of the corresponding ship, need not involve time at all, but will tell you something about the current state of the Navy. You can, of course, also study the relationship as a function of time.
I suggest strongly that at the top of your career you will be able to use a lot of low-level digital filtering of signals, whether in time or not, so you will be better able to manage things. Hence, I claim, you will probably design many more filters for such odd jobs than you will for radar data reduction and such standard things. It is usually in the new applications of knowledge where you can expect to find the greatest gains.
Let me supply some warnings against the misuse of intellectual tools, and I will talk in Chapter 27 on topics closer to statistics than I have time for now. Fourier analysis implies linearity of the underlying model. You can use it in slightly nonlinear situations, but often elaborate Fourier analyses have failed because the underlying phenomenon was too nonlinear. I have seen millions of dollars go down that drain when it was fairly obvious to the outsider the nonlinearities would vitiate all the linear analysis they could do using the Fourier function approach. When this was pointed out to them, their reply seemed to be they did not know what else to do, so they persisted in doing the wrong thing! I am not exaggerating here.
I repeat, Fourier analysis is linear, and there exist many nonlinear filters, but the theory is not well developed beyond the running median. Kalman filters are another example of the use of partially nonlinear filters, the nonlinear part being in the “adapting” itself to the signal.
A major use of computers these days, after writing and text editing, graphics, program compilation, etc., is simulation.
A simulation is the answer to the question, “What if…?”
What if we do this? What if this is what happened?
More than nine out of ten experiments are done on computers these days. I have already mentioned my serious worries that we are depending on simulation more and more, and are looking at reality less and less, and hence seem to be approaching the old scholastic attitude that what is in the textbooks is reality and does not need constant experimental checks. I will not dwell on this point further now.
We use computers to do simulations because they
On points 1 and 2, as expensive and slow as programming is, with all its errors and other faults, it is generally much cheaper and faster than getting laboratory equipment to work. Furthermore, in recent years expensive, top-quality laboratory equipment has been purchased, only for it to be found in less than ten years it must be scrapped as being obsolete. All of the above remarks do not apply when a situation is constantly recurring and the lab testing equipment is in constant use. But let lab equipment lie idle for some time, and suddenly it will not work properly! This is called “shelf life,” but it is sometimes the shelf life of the skills in using it rather than the shelf life of the equipment itself! I have seen it all too often in my direct experience. Intellectual shelf life is often more insidious than is physical shelf life.
On point 3, very often we can get more accurate readings from a simulation than we can get from a direct measurement in the real world. Field measurements, or even laboratory measurements, are often hard to get accurately in dynamic situations. Furthermore, in a simulation we can often run over much wider ranges of the independent variables than we can do with any one lab setup.
On point 4, perhaps most important of all, a simulation can do what no experiment can do.
I will illustrate these points with specific stories using simulations I have been involved in so you can understand what simulations can do for you. I will also indicate some of the details so those who have had only a little experience with simulations will have a better feeling for how you go about doing one—it is not feasible to actually carry out a big simulation in class as they often take years to complete.
The first large computation I was involved with was at
Without going into classified details, you will recall one of the two designs was spherically symmetric and was based on implosion, Figure 18.1. They divided the material and space into many concentric shells. They then wrote the equations for the forces on each shell (both sides of it), as well as the equation of state which gives, among other things, the density of the material from the pressures on it. Next they broke time up into intervals of 10–8 seconds (shake, from a shake of a lamb’s tail, I suppose). Then for each time interval we calculated, using the computers, where each shell would go and what it would do during that time, subject to the forces on it. There was, of course, a special treatment for the shock wave front from the outer explosive material as it went through the region. But the rules were all, in principle, well known to experts in the corresponding fields. The pressures were such that there had to be a lot of guessing that things would be much the same outside the realms of past testing, but a little physics theory gave some assurances.
This already illustrates a main point I want to make. It is necessary to have a great deal of special knowledge in the field of application. Indeed, I tend to regard many of the courses you have taken, and will take, as merely supplying the corresponding expert knowledge. I want to emphasize this obvious necessity for expert knowledge—all too often I have seen experts in simulation ignore this elementary fact and think they could safely do simulations on their own. Only an expert in the field of application can know if what you have failed to include is vital to the accuracy of the simulation, or if it can safely be ignored.
Another main point is that in most simulations there has to be a highly repetitive part, done again and again from the same piece of programming, or else you cannot afford to do the initial programming! The same computations were done for each shell and then for each time interval—a great deal of repetition! In many situations, the power of the machine itself so far exceeds our powers to program that it is wise to look early and constantly for the repetitive parts of a proposed simulation, and when possible cast the simulation in the corresponding form.
However, there is a significant difference between the two problems, the bomb and the weather prediction. For the bomb small differences in what happened along the way did not greatly affect the overall performance, but as you know the weather is quite sensitive to small changes. Indeed, it is claimed that whether or not a butterfly flaps its wings in Japan can determine whether or not a storm will hit this country and how severe it will be.
This is a fundamental theme I must dwell on. When the simulation has a great deal of stability, meaning resistance to small changes in its overall behavior, then a simulation is quite feasible; but when small changes in some details can produce greatly different outcomes, then a simulation is a difficult thing to carry out accurately. Of course, there is long-term stability in the weather; the seasons follow their appointed rounds regardless of small details. Thus there are both short-term (day-to-day) instabilities in the weather, and longer-term (year-to-year) stabilities as well. But the ice ages show there are also very long-term instabilities in the weather, with apparently even longer stabilities!
I have met a large number of this last kind of problem. It is often very hard to determine in advance whether one or the other, stability or instability, will dominate a problem, and hence the possibility or impossibility of getting the desired answers. When you undertake a simulation, look closely at this aspect of the problem before you get too involved and then find, after a lot of work, money, and time, you cannot get suitable answers to the problem. Thus there are situations which are easy to simulate, ones which you cannot in a practical sense handle at all, and most of the others which fall between the two extremes. Be prudent in what you promise you can do via simulations!
They had a slant launch in the original design, along with variational equations which would give me information to enable me to make sensible adjustments to the various components, such as wing size. I should point out, I suppose, the solution time for one trajectory was about half an hour, and about halfway through one trajectory I had to commit myself to the next trial shot. Thus I had lots of time to observe and to think hard as to why things went as they did. After a few days I gradually got a “feeling” for the missile behavior, why it did as it did under the different guidance rules I had to supply. As time went on I gradually realized a vertical launch was best in all cases; getting out of the dense lower air and into the thin air above was better than any other strategy—I could well afford the later induced drag when I had to give guidance orders to bend the trajectory over. In doing so, I found I was greatly reducing the size of the wings, and realized, at least fairly well, the equations and constants I had been given for estimating the changes in the effects due to changes in the structure of the missile could hardly be accurate over so large a range of perturbations (though they had never told me the source of the equations, I inferred it). So I phoned down for advice and found I was right—I had better come home and get new equations.
With some delay due to other users wanting their time on the rda #2, I was soon back and running again, but with a lot more wisdom and experience. Again, I developed a feeling for the behavior of the missile—I got to “feel” the forces on it as various programs of trajectory shaping were tried. Hanging over the output plotters as the solution slowly appeared gave me the time to absorb what was happening. I have often wondered what would have happened if I had had a modern, high-speed computer. Would I ever have acquired the feeling for the missile, upon which so much depended in the final design? I often doubt hundreds more trajectories would have taught me as much—I simply do not know. But that is why I am suspicious, to this day, of getting too many solutions and not doing enough very careful thinking about what you have seen. Volume output seems to me to be a poor substitute for acquiring an intimate feeling for the situation being simulated.
The results of these first simulations were that we went to a vertical launch (which saved a lot of ground equipment in the form of a circular rail and other complications), made many other parts simpler, and seemed to have shrunk the wings to about one-third of the size I was initially given. I had found bigger wings, while giving greater maneuverability in principle, produced in practice so much drag in the early stages of the trajectory that the later slower velocity in fact gave less maneuverability in the “end game” of closing in on the target.
Of course, these early simulations used a simple atmosphere of exponential decrease in density as you go up, and other simplifications, which in simulations done years later were all modified. This brings up another belief of mine—doing simple simulations at the early stages lets you get insights into the whole system which would be disguised in any full-scale simulation. I strongly advise, when possible, to start with the simple simulation and evolve it to a more complete, more accurate, simulation later so the insights can arise early. Of course, at the end, as you freeze the final design, you must put in all the small effects which could matter in the final performance. But (1) start as simply as you can, provided you include all the main effects, (2) get the insights, and then (3) evolve the simulation to the fully detailed one.
Guided missiles were some of the earliest explorations of supersonic flight, and there was another great unknown in the problem. The data from the only two supersonic wind tunnels we had access to flatly contradicted each other!
Guided missiles led naturally to spaceflight, where I played a less basic part in the simulations, more as an outside source of advice and initial planning of the mission profile, as it is called.
I also had the nasty idea, since I had found the equations were really local linearizations of more complex nonlinear equations, that I could, at about every twentieth to fiftieth step, estimate the nonlinear component. I found to their amazement on some designs the estimated nonlinear component was larger than the computed linear component—thus vitiating the approximation and stopping the useless computations.
Why tell the story? Because it illustrates another point I want to make—an active mind can contribute to a simulation even when you are dealing with experts in a field where you are a strict amateur. You, with your hands on all the small details, have a chance to see what they have not seen, and to make significant contributions, as well as save machine time! Again, all too often I have seen things missed during the simulation by those running it, and hence were not likely to get to the users of the results.
During the long years of caveman evolution, apparently people lived in groups of around 25 to 100 in size. People from outside the group were generally not welcome, though we think there was a lot of wife stealing going on. When the long years of caveman living are compared with the few of civilization (less than 10,000 years), we see we have been mainly selected by evolution to resent outsiders, and one of the ways of doing this is the use of special jargon languages. The thieves’ argot, group slang, the husband and wife’s private language of words, gestures, and even a lift of an eyebrow are all examples of this common use of a private language to exclude the outsider. Hence this instinctive use of jargon when an outsider comes around should be consciously resisted at all times—we now work in much larger units than those of cavemen and we must try continually to overwrite this earlier design feature in us.
Mathematics is not always the unique language you wish it were. To illustrate this point, recall I earlier mentioned some Navy intercept simulations involving the equivalent of 28 simultaneous first-order differential equations. I need to develop a story. Ignoring all but the essential part of the story, consider the problem of solving one differential equation,
Figure 18.3. Keep this equation in mind as I talk about the real problem. I programmed the real problem of 28 simultaneous differential equations to get the solution and then limited certain values to 1, as if it were voltage limiting. Over the objections of the proposer, a friend of mine, I insisted he go through the raw, absolute binary coding of the problem with me, as I explained to him what was going on at each stage. I refused to compute until he did this—so he had no real choice! We got to the limiting stage in the program and he said, “Dick, that is fin limiting, not voltage limiting,” meaning the limited value should be put in at each step and not at the end. It is as good an example as I know of to illustrate the fact that both of us understood exactly what the mathematical symbols meant—we both had no doubts—but there was no agreement in our interpretations of them! Had we not caught the error I doubt any real, live experiments involving airplanes would have revealed the decrease in maneuverability which resulted from my interpretation. That is why, to this day, I insist a person with the intimate understanding of what is to be simulated must be involved in the detailed programming. If this is not done, then you may face similar situations where both the proposer and the programmer know exactly what is meant, but their interpretations can be significantly different, giving rise to quite different results!
You should not get the idea simulations are always of time-dependent functions. One problem I was given to run on the differential analyzer we had built out of old m9 gun director parts was to compute the probability distributions of blocking in the central office. Never mind that they gave me an infinite system of interconnected linear differential equations, each one giving the probability distribution of that many calls on the central office as a function of the total load. Of course on a finite machine something must be done, and I had only 12 integrators, as I remember. I viewed it as an impedance line, and using the difference of the last two computed probabilities I assumed they were proportional to the difference of the next two (I used a reasonable constant of proportionality derived from the difference from the two earlier functions), thus the term from the next equation beyond what I was computing was reasonably supplied. The answers were quite popular with the switching department, and made an impression, I believe, on my boss, who still had a low opinion of computing machines.
There were underwater simulations, especially of an acoustic array put down in the Bahamas by a friend of mine, where, of course, in winter he often had to go to inspect things and take further measurements. There were numerous simulations of transistor design and behavior. There were simulations of the microwave “jump-jump” relay stations with their receiver horns, and the overall stability arising from a single blip at one end going through all the separate relay stations. It is perfectly possible that while each station recovers promptly from the blip, nevertheless the size of the blip could grow as it crossed the continent. At each relay station there was stability in the sense that the pulse died out in time, but there was also the question of the stability in space—did a random pulse grow indefinitely as it crossed the continent? For colorful reasons I named the problem “space stabilization.” We had to know the circumstances in which this could and could not happen—hence a simulation was necessary because, among other things, the shape of the blip changed as it went across the continent.
I hope you see that almost any situation you can describe by some sort of mathematical description can be simulated in principle. In practice you have to be very careful when simulating an unstable situation—though I will tell you in Chapter 20 about an extreme case I had to solve because it was important to the Laboratories, and that meant, at least to me, I had to get the solution, no matter what excuses I gave myself it could not be done. There are always answers of some sort for important problems if you are determined to get them. They may not be perfect, but in desperation something is better than nothing—provided it is reliable!
We now take up the question of the reliability of a simulation. I can do no better than quote from the Summer Computer Simulation Conference of 1975:
Computer-based simulation is now in widespread use to analyze system models and evaluate theoretical solutions to observed problems. Since important decisions must rely on simulation, it is essential that its validity be tested, and that its advocates be able to describe the level of authentic representation which they achieved.
It is an unfortunate fact that when you raise the question of the reliability of many simulations you are often told about how much manpower went into it, how large and fast the computer is, how important the problem is, and such things, which are completely irrelevant to the question that was asked.
I would put the problem slightly differently:
Why should anyone believe the simulation is relevant?
Do not begin any simulation until you have given this question a great deal of thought and found appropriate answers. Often there are all kinds of reasons given as to why you should postpone trying to answer the question, but unless it is answered satisfactorily, then all that you do will be a waste of effort, or, even worse, either misleading or even plain erroneous. The question covers both the accuracy of the modeling and the accuracy of the computations.
Let me inject another true story. It happened one evening after a technical meeting in Pasadena, California: we all went to dinner together and I happened to sit next to a man who had talked about, and was responsible for, the early spaceflight simulation reliability. This was at the time when there had been about eight space shots. He said they never launched a flight until they had a more than 99 point something percent reliability, say 99.44% reliability. Being me, I observed that there had been something like eight space shots; one live simulation had killed the astronauts on the ground, and we had had one clear failure, so how could the reliability be that high? He claimed all sorts of things, but fortunately for me the man on his other side joined in the chase and we forced a reluctant admission from him that what he calculated was not the reliability of the flight, but only the reliability of the simulation. He further claimed everyone understood that. Me: “Including the director who finally approves of the flight?” His refusal to reply, under repeated requests, was a clear admission that my point went home; he himself knew the director did not understand this difference but thought the report was the reliability of the actual shot.
He later tried to excuse what he had done with things like “what else could he do,” but I promptly pointed out a lot of things he could do to connect his simulation with reality much closer than he had. That was a Saturday night, and I am sure by Monday morning he was back to his old habits of identifying the simulation with reality and making little or no independent checks, which were well within his grasp. That is what you can expect from simulation experts—they are concerned with the simulation and have little or no regard for reality, or even “observed reality.”
Consider the extensive business simulations and war gaming which goes on these days. Are all the essentials incorporated correctly into the model, or are we training the people to do the wrong things? How relevant to reality are these gaming models? And many other models?
We have long had airplane pilot trainers, which in many senses give much more useful training than can be given in real life. In the trainer we can subject the pilot to emergency situations we would not dare to do in reality, nor could we ever hope to produce the rich variety the trainer can. Clearly these trainers are very valuable assets. They are comparatively cheap, efficient in the use of the pilot’s time, and are very flexible. In the current
But as time goes on, and planes of other types are developed, will the people then be as careful as they should be to get all the new interactions into the model, or will some small but vital interactions of the new plane be omitted by oversight, thus preparing the pilot to fail in these situations?
Here you can see the problem clearly. It is not that simulations are not essential these days, and will be in the near future, but rather it is necessary for the current crop of people, who have had very little experience with reality, to realize they need to know enough so the simulations include all essential details. How will you convince yourself you have not made a mistake somewhere in the vast amount of detail? Remember how many computer programs, even after some years of field use, still have serious errors in them! In many situations such errors can mean the difference between life or death for one or more people, let alone the loss of valuable equipment, money, and time.
The relevant accuracy and reliability of simulations are a serious problem. There is, unfortunately, no silver bullet, no magic incantation you can perform, no panacea for this problem. All you have is yourself.
Using classical mechanics I set up the equations, incorporated the elastic bounce, set up the machine to play one base line with the human player on the other, along with both the angle of the racket and the hardness with which you hit the ball, which were set by two dials conveniently placed. Remember, in those days (1955) there were no game-playing machines in many public places, hence the exhibit was a bit novel to the visitors. I then invited a smart physicist friend, who was also an avid tennis player, to inspect and tune up the constants for bounce (asphalt court) and air drag. When he was satisfied, behind his back I asked another physicist to give me a similar opinion without letting him alter the constants. Thus I got a reasonable simulation of tennis without “spin” on the ball.
Had it been other than a public amusement I would have done a lot more. I could have hung a tennis ball on a string in front of a variable strength fan and noted carefully the angle at which it hung for different wind velocities, thus getting at the drag, and included those for variously worn tennis balls. I could have dropped the balls and noted the rebound for different heights to test the linearity of the elastic constants. If it had been an important problem I could have filmed some games and tested that I could reproduce the shots which had no spin on them. I did not do any of these things! It was not worth the cost. Hence it was my sloppiest simulation.
The major part of the story, however, is what happened! As the groups came by they were told what was going on by some assistants, and shown the display of the game as it developed on the plotting board outputs. Then we let them play the game against the machine, and I had programmed the simulation so the machine could lose. Watching the entire process from the background, human and machine, I noticed, after a while, not one adult ever got the idea of what was going on enough to play successfully, and almost every child did! Think that over! It speaks volumes about the elasticity of young minds and the rigidity of older minds! It is currently believed most old people cannot run vcrs, but children can!
Remember this fact—older minds have more trouble adjusting to new ideas than do younger minds—since you will be showing new ideas, and even making formal presentations, to older people throughout much of your career. That your children could understand what you are showing is of little relevance to whether or not the audience to whom you are running the exhibition can. It was a terrible lesson I had to learn, and I have tried not to make that mistake again. Old people are not very quick to grasp new ideas—it is not that they are dumb, stupid, or anything else like that, it is simply that older minds are usually slow to adjust to radically new ideas.
I have emphasized the necessity of having the underlying laws of whatever field you are simulating well under control. But there are no such laws of economics! The only law of economics that I believe in is Hamming’s law: “You cannot consume what is not produced.” There is not another single reliable law in all of economics I know of which is not either a tautology in mathematics or else sometimes false. Hence when you do simulations in economics you have not the reliability you have in the hard sciences.
Let me inject another story. Some years ago the following happened at uc Berkeley. About equal numbers of males and females applied to graduate school, but many more men were accepted than women. There was no reason to assume the men were better prepared on the average than were the women. Hence there was obvious discrimination in terms of the ideal model of fairness. The president of the university demanded to know which departments were guilty. A close examination showed no department was guilty! How could that be? Easy! Various departments have varying numbers of openings for those entering graduate school, and various ratios of men to women applying for them. Those with both many openings and many men applying are the hard sciences, including mathematics, and those with the low ratios of acceptance and many women applying are the soft ones like literature, history, drama, social sciences, etc. Thus the discrimination, if you can say it occurred, was because the men, at a younger age, were made to take mathematics, which is the preparation for the hard sciences, and the women could or could not take mathematics as they chose. Those who avoided mathematics, physics, chemistry, engineering, and such were simply not eligible to apply where the openings were readily available, but had to apply where there was a high probability of rejection. People have trouble adapting to such situations these days!
Here you see a not widely recognized phenomena, but one which has been extensively examined in many of its appearances by statisticians: the combining of data can create effects not there in detail. You are used to the idea that combining data can obscure things, but that it can also create effects is less well known. You need to be careful in your future that this does not happen to you—you are accused, from amalgamated data, of what you are not guilty. Simpson’s paradox is a famous example, where both subsamples can favor A over B and C, but the combined data favors B over A.
Now, you may say in the spaceflight simulations we combined data and at times made the whole vehicle into a point. Yes, we did, but we knew the laws of mechanics and knew when we could and could not do it. Thus in midcourse corrections you get the vehicle pointed in exactly the right direction and then fire the retro or other rockets to get the corrections, and during such times you do not allow the people to move around in the vehicle, as that can produce rotations and hence spoil the careful directing of the rockets. We thought we knew enough of the background theory, and we had had years of experience in the matter, so the combining of all the details into one point mass still gave reliable simulation results.
In many proposed areas of simulation there are neither such known experiences nor theory. Thus when I was occasionally asked to do some ecological simulation I quietly asked for the mathematically expressed rules for every possible interaction, for example given the amount of rain what growth of the trees would occur, what exactly were the constants, and also where I could get some real live data to compare some test runs. They soon got the idea and went elsewhere to get someone more willing to run very questionable simulations which would give the results they wanted and could use for their propaganda. I suggest you keep your integrity and do not allow yourself to be used for other people’s propaganda; you need to be wary when agreeing to do a simulation!
If these soft science situations are hard to simulate with much reliability, think of those in which humans by their knowledge of the simulation can alter their behavior and thus vitiate the simulation. In the insurance business the company is betting you will live a long time and you are betting you will die young. For an annuity the sides are reversed, in case you had not thought about that point. While in principle you can fool the insurance companies and commit suicide, it is not common, and the insurance companies are indeed careful about this point.
Thus beware of any simulation of a situation which allows the human to use the output to alter their behavior patterns for their own benefit, since they will do so whenever they can.
But all is not lost. We have devised the method of scenarios to cope with many difficult situations. In this method we do not attempt to predict what will actually happen, we merely give a number of possible projections. This is exactly what Spock did in his baby-raising book. From the observations of many children in the past he assumed the future (early) behavior of children would not differ radically from these observations, and he predicted not what your specific child would do but only gave typical patterns with ranges of behavior, on such things as when babies begin to crawl, talk, say “no” to everything, etc. Spock predicted mainly the biological behavior and avoided as much as he could the cultural behavior of the child. In some simulations the method of scenarios is the best we can do. Indeed, that is what I am doing in this set of chapters; the future I predict cannot be known in detail, but only in some kinds of scenarios of what is likely to happen, in my opinion. More on this topic in the next chapter.
I have not so far mentioned what at first will appear to be a trivial point: do the marks on the paper which describe the problem get into the machine accurately? Programming errors are known to be all too common.
I did not think they could get the various sets of these equations into the machine correctly every time, so I said we would first write a program which would go from the punched cards, one card describing each particular reaction, with all its relevant constants of interactions, to the equations themselves, thus ensuring all the terms were there; no errors in the coefficients not being the same for the same reaction as it appears in different equations, etc. In hindsight it is an obvious thing to do; at the time it was a surprise to them, but it paid off in effort on their part. They had only to select those cards from the file they wanted to include in the particular simulation they were going to run, and the machine automated all the rest, including the spacing of the steps in the integration. My main idea, besides the ease and accuracy, was to keep their minds focused on what they were best able to do—chemistry—and not have them fussing with the machine with which they were not experts. They were, moreover, in charge of the actual computing. I made it easy to do the bookkeeping and the mechanics of the computer, but I refused to relieve them of the thinking part.
In summary, the reliability of a simulation, of which you will see many in your career, since it is becoming increasingly common, is of vital importance. It is not something you can take for granted just because a big machine gives out nicely printed sheets or displays nice, colorful pictures. You are responsible for your decisions, and you cannot blame them on those who do the simulations, much as you wish you could. Reliability is a central question with no easy answers.
The basic fact is the Nyquist sampling theorem says it takes two samples for the highest frequency present in the signal (for the equally spaced points on the entire real line) to reproduce (within roundoff) the original signal. In practice most signals have a fairly sharp cutoff in the frequency band; with no cutoff there would be infinite energy in the signal!
In practice we use only a comparatively few samples in the digital solution, and hence something like twice the number Nyquist requires is needed. Furthermore, usually we have samples on only one side, and this produces another factor of two. Hence, something from seven to ten samples for the highest frequency are needed. And there is still a little aliasing of the higher frequencies into the band which is being treated (but this is seldom where the information in the signal lies). This can be checked both theoretically and experimentally.
Sometimes the mathematician can accurately estimate the frequency content of the signal (possibly from the answer being computed), but usually you have to go to the designers and get their best estimates. A competent designer should be able to deliver such estimates, and if they cannot then you need to do a lot of exploring of the solutions to estimate this critical number, the sampling rate of the digital solution. The step-by-step solution of a problem is actually sampling the function, and you can use adaptive methods of step-by-step solutions if you wish. You have much theory and some practice on your side.
For accuracy the digital machine can carry many digits, while analog machines are rarely better than one part in 10,000 per component, if that much. Thus analog machines cannot give very accurate answers, nor carry out “deep computations.” But often the situation you are simulating has uncertainties of a similar size, and with care you can handle the accuracy problem.
In spite of their relatively low accuracy analog computers are still valuable at times, especially when you can incorporate a part of the proposed device into the circuits so you do not have to find the proper mathematical description of it. Some of the faster analog computers can react to the change of a parameter, either in the initial conditions or in the equations themselves, and you can see on the screen the effect immediately. Thus you can get a “feel” for the problem easier than for the digital machines, which generally take more time per solution and must have a full mathematical description. Analog machines are generally ignored these days, so I feel I need to remind you they have a place in the arsenal of tools in the kit of the scientist and engineer.
I will continue the general trend of the last chapter, but center on the old expression “
Because many simulations still involve differential equations, we begin by considering the simplest first-order differential equations of the form
has the indicated direction field, Figure 20.2. On each of the concentric circles x2 + y2 = k the slope is always the same, the slope depending on the value of k. These are called isoclines.
Looking at the following picture, Figure 20.3, the direction field of another differential equation. On the left you see a diverging direction field, and this means small changes in the initial starting values, or small errors in the computing, will soon produce large differences in the values in the middle of the trajectory. But on the right-hand side the direction field is converging, meaning large differences in the middle will lead to small differences on the right end. In this single example you see both small errors can become large ones and large ones can become small ones, and furthermore, small errors can become large and then again become small. Hence the accuracy of the solution depends on where you are talking about it, not any absolute accuracy over all. The function behind all this is
whose differential equation is, upon differentiating,
How do we numerically solve a differential equation? Starting with only one first-order linear ordinary differential equation, we imagine the direction field. Our problem is from the initial value, which we are given, we want to get to the next nearby point. If we take the local slope from the differential equation and move a small step forward along the tangent line, then we will make a only small error, Figure 20.4. Using that point we go to the next point, but as you see from the figure we gradually depart from the true curve because we are always using “the slope that was,” and not a typical slope in the interval. To avoid this we “predict” a value, use that value to evaluate the slope there (using the differential equation), and then use the average slope of both ends to estimate the average slope to use for the interval, Figure 20.5. Then, using this average slope, we move the step forward again, this time using a “corrector” formula. If the predicted and corrected values are “close” then we assume we are accurate enough, but if they are far apart then we must shorten the step size. If the difference is too small, then we should increase the step size. Thus the traditional predictor-corrector methods have built into them an automatic mechanism for checking the step-by-step error—but this step-by-step error is, of course, not the whole accumulated error by any means! The accumulated error clearly depends on the convergence or divergence of the direction field.
We used simple straight lines for both predicting and correcting. It is much more economical, and accurate, to use higher-degree polynomials, and typically this means about fourth-degree polynomials (Milne, Adams-Bashforth, Hamming, etc.). Thus we must use several old values of the function and derivative to predict the next value, and then using this in the differential equation we get an estimated new slope, and with this slope and old values of the function and slope, we correct the value. A moment’s thought and you see the corrector is just a recursive digital filter, where the input data are the derivatives and the output values are the positions. Stability and all we discussed there are relevant. As mentioned before, there is the extra feedback through the differential equation’s predicted value, which goes into the corrected slope. But both are simply solving a difference equation—recursive digital filters are simply this formula and nothing more. They are not just transfer functions, as your course in digital filters might have made you think; plainly and simply, you are computing numbers coming from a difference equation. There is a difference, however. In the filter you are strictly processing by a linear formula, but because in the differential equation there is the nonlinearity which arises from the evaluation of the derivative terms, it is not exactly the same as a digital filter.
Now let me note a significant difference between the two approaches, numerical analysis and filter theory. The classical methods of numerical analysis, and still about the only one you will find in the accepted texts, use polynomials to approximate functions, but the recursive filter used frequencies as the basis for evaluating the formula! This is a different thing entirely!
To see this difference, suppose we are to build a simulator for humans landing on Mars. The classical formulas will concentrate on the trajectory shape in terms of local polynomials, and the path will have small discontinuities in the acceleration as we move from interval to interval. In the frequency approach we will concentrate on getting the frequencies right and let the actual positions be what happen. Ideally the trajectories are the same; practically they can be quite different.
Which solution do you want? The more you think about it, the more you realize the pilot in the trainer will want to get the “feel” of the landing vehicle, and this seems to mean the frequency response of the simulator should feel right to the pilot. If the position is a bit off, then the feedback control during landing can compensate for this, but if it feels wrong in the actual flight, then the pilot is going to be bothered by the new experience which was not in the simulator. It has always seemed to me the simulator should prepare the pilots for the actual experience as best we can (we cannot fake out for long the lower gravity of Mars) so they will feel comfortable when the real event occurs, having experienced it many times in the trainer. Alas, we know far too little of what the pilot “feels” (senses). Does the pilot feel only the Fourier real frequencies, or maybe they also feel the decaying Laplace complex frequencies (or should we use wavelets?). Do different pilots feel the same kinds of things? We need to know more than we apparently now do about this important design criterion.
The above is the standard conflict between the mathematician’s and the engineer’s approaches. Each has a different aim in solving the differential equations (and in many other problems), and hence they get different results out of their calculations. If you are involved in a simulation, then you see there can be highly concealed matters which are important in practice but which the mathematicians are unaware of, and they will deny the effects matter. But looking at the two trajectories I have crudely drawn, Figure 20.6, the top curve is accurate in position but the corners will give a very different “feel” than reality will, and the second curve will be more wrong in position but more right in “feel.” Again, you see why I believe the person with the insight into the problem must get deep inside the solution methods and not accept traditional methods of solution.
I found my friend back at the Labs wandering around the halls looking quite unhappy. Why? Because the first two of some six test shots had broken up in mid-flight and no one knew why. The delay meant the data to be gathered to enable us to go to the next stage of design were not available, and hence the whole project was in serious trouble. I observed to him that if he would give me the differential equations describing the flight I would put a girl on the job of hand calculating the solution (big computers were not readily available in the late 1940s). In about a week they delivered seven first-order equations, and the girl was ready to start. But what are the starting conditions just before the trouble arose? (I did not in those days have the computing capacity to do the whole trajectory rapidly.) They did not know! The telemetered data was not clear just before the failure. I was not surprised, and it did not bother me much. So we used the guessed altitude, slope, velocity, angle of attack, etc., one for each of the seven variables of the trajectory; one condition for each equation. Thus I had garbage in. But I had earlier realized the nature of the field trials being simulated was such that small deviations from the proposed trajectory would be corrected automatically by the guidance system! I was dealing with a strongly convergent direction field.
We found both pitch and yaw were stable, but as each one settled down it threw more energy into the other; thus there were not only the traditional stability oscillations in pitch and yaw, but due to the rotation of the missile about its long axis there was also a periodic transfer of increasing energy between them. Once the computer curves for even a short length of the trajectory were shown, everyone realized immediately that they had forgotten the cross-connection stability, and they knew how to correct it. Now that we had the solution they could then also read the hashed-up telemetered data from the trials and check that the period of the transfer of energy was just about correct—meaning they had supplied the correct differential equations to be computed. I had little to do except to keep the girl on the desk calculator honest and on the job. My real contribution was: (1) the realization that we could simulate what had happened, which is now routine in all accidents but was novel then, and (2) the recognition that there was a convergent direction field so the initial conditions need not be known accurately.
At the end of the war I stayed on at Los Alamos an extra six months, and one of the reasons was I wanted to know how it was that such inaccurate data could have led to such accurate predictions for the final design. With, at last, time to think for long periods, I found the answer. In the middle of the computations we were using effectively second differences; the first differences gave the forces on each shell on one side, and the differences from the adjacent shells on the two sides gave the resultant force moving the shell. We had to take thin shells, hence we were differencing numbers which were very close to each other, and hence the need for many digits in the numbers. But further examination showed that as the “gadget” goes off, any one shell went up the curve and possibly at least partly down again, so any local error in the equation of state was approximately averaged out over its history. What was important to get from the equation of state was the curvature, and as already noted even it had only to be on the average correct. Hence garbage in, but accurate results out nevertheless!
These examples show what was loosely stated before: if there is feedback in the problem for the numbers used, then they need not necessarily be accurately known. Just as in H.S. Black’s great insight of how to build feedback amplifiers, Figure 20.8, so long as the gain is very high, only the one resistor in the feedback loop need be accurately chosen; all the other parts could be of low accuracy. From Figure 20.8 you have the equation
We see almost all the uncertainty is in the one resistor of size 1/10, and the gain of the amplifier (–10–9) need not be accurate. Thus the feedback of H.S. Black allows us to accurately build things out of mostly inaccurate parts.
You see now why I cannot give you a nice, neat formula for all situations: it must depend on how the particular quantities go through the whole of the computation; the whole computation must be understood as a whole. Do the inaccurate numbers go through a feedback situation where their errors will be compensated for, or are they vitally out in the open with no feedback protection? The word “vitally” because it is vital to the computation, if they are not in some feedback position, to get them accurate.
Now this fact, once understood, impacts design! Good design protects you from the need for too many highly accurate components in the system. But such design principles are still, to this date, ill understood and need to be researched extensively. Not that good designers do not understand this intuitively, merely it is not easily incorporated into the design methods you were taught in school. Good minds are still needed in spite of all the computing tools we have developed. But the best mind will be the one who gets the principle into the design methods taught so it will be automatically available for lesser minds!
I now look at another example, and the principle which enabled me to get a solution to an important problem. I was given the differential equation
You see immediately the condition at infinity is really the right-hand side of the differential equation equated to 0, Figure 20.9.
But consider the stability. If the y at any fairly far-out point x gets a bit too large, then sinh y is much too large, the second derivative is then very positive, and the curve shoots off to plus infinity. Similarly, if the y is too small the curve shoots off to minus infinity. And it does not matter which way you go, left to right or right to left. In the past I had used the obvious trick when facing a divergent direction field of simply integrating in the opposite direction and getting an accurate solution. But in the above problem you are, as it were, walking the crest of a sand dune, and once both feet are on one side of the crest you are bound to slip down.
You can probably believe that while I could find a decent power series expansion, and an even better non-power series approximate expansion around the origin, still I would be in trouble as I got fairly well along the solution curve, especially for large k. All the analysis I, or my friends, could produce was inadequate. So I went to the proposers and first objected to the condition at infinity, but it turned out the distance was being measured in molecular layers, and (in those days) any realistic transistor would have effectively an infinity number of layers. I objected then to the equation itself; how could it represent reality? They won again, so I had to retreat to my office and think.
It was an important problem in the design and understanding of the transistors then being developed. I had always claimed if the problem was important and properly posed, then I could get some kind of a solution. Therefore, I must find the solution; I had no escape if I were to hold onto my pride.
It took some days of mulling it over before I realized the very instability was the clue to the method to use. I would track a piece of the solution, using the differential analyzer I had at the time, and if the solution shot up then I was a bit too high in my guess at the corresponding slope, and if it shot down I was a bit too low. Thus piece by piece I walked the crest of the dune, and each time the solution slipped on one side or the other I knew what to do to get back on the track. Yes, having some pride in your ability to deliver what is needed is a great help in getting important results under difficult conditions. It would have been so easy to dismiss the problem as insoluble, wrongly posed, or any other excuse you wanted to tell yourself, but I still believe important problems properly posed can be used to extract some useful knowledge which is needed. A number of space charge problems I have computed showed the same difficult instability in either direction.
Now to the next story. A psychologist friend at Bell Telephone Laboratories once built a machine with about 12 switches and a red and a green light. You set the switches, pushed a button, and either you got a red or a green light. After the first person tried it 20 times they wrote a theory of how to make the green light come on. The theory was given to the next victim and they had their 20 tries and wrote their theory, and so on endlessly. The stated purpose of the test was to study how theories evolved.
But my friend, being the kind of person he was, had connected the lights to a random source! One day he observed to me that no person in all the tests (and they were all high-class Bell Telephone Laboratories scientists) ever said there was no message. I promptly observed to him that not one of them was either a statistician or an information theorist, the two classes of people who are intimately familiar with randomness. A check revealed I was right!
This is a sad commentary on your education. You are lovingly taught how one theory was displaced by another, but you are seldom taught to replace a nice theory with nothing but randomness! And this is what was needed: the ability to say the theory you just read is no good and there was no definite pattern in the data, only randomness.
I must dwell on this point. Statisticians regularly ask themselves, “Is what I am seeing really there, or is it merely random noise?” They have tests to try to answer these questions. Their answer is not a yes or no, but only with some confidence a yes or no. A 90% confidence limit means typically in ten tries you will make the wrong decision about once, if all the other hypotheses are correct! Either you will reject something that is true (type 1 error) or you fail to reject something that is false (type 2 error). Much more data is needed to get to the 95% confidence limit, and these days data can often be very expensive to gather. Getting more data is also time consuming, so the decision is further delayed—a favorite trick of people in charge who do not want to bear the responsibility of their position. “Get more data,” they say.
How is the outsider to distinguish this from a Rorschach test? Did he merely find what he wanted to find, or did he get at “reality”? Regrettably, many, many simulations have a large element of this, adjusting things to get what is wanted. It is so easy a path to follow. It is for this reason traditional science has a large number of safeguards, which these days are often simply ignored.
Do you think you can do things safely, that you know better? Consider the famous double-blind experiments which are usual in medical practice. The doctors first found that if the patients thought they were getting the new treatment they responded with better health, and those who thought they were part of the control group felt they were not getting it and did not improve. The doctors then randomized the treatment and gave some patients a placebo so the patients could not respond and fool the doctors this way. But to their horror, the doctors also found that the doctors, knowing who got the treatment and who did not, also found improvement where they expected to and not where they did not. As a last resort, the doctors have widely accepted the double-blind experiment—until all the data are in, neither the patients nor the doctors know who gets the treatment and who does not. Then the statistician opens the sealed envelope and the analysis is carried out. The doctors, wanting to be honest, found they could not be! Are you so much better in doing a simulation that you can be trusted not to find what you want to find? Self-delusion is a very common trait of humans.
I started Chapter 19 with the problem of why anyone should believe in a simulation which has been done. You now see the problem more clearly. It is not easy to answer unless you have taken a lot more precautions than are usually done. Remember also that you are probably going to be on the receiving end of many simulations to decide many questions which will arise in your highly technical future; there is no other way than simulations to answer the question, “What if…?” In Chapter 18 I observed decisions must be made and not postponed forever if the organization is not to flounder and drift endlessly—and I am supposing you are going to be among those who must make the choices. Simulation is essential to answer the “what if…?,” but it is full of dangers and is not to be trusted just because a large machine and much time has been used to get the nicely printed pages or colorful pictures on the oscilloscope. If you are the one to make the final decision, then in a real sense you are responsible. Committee decisions, which tend to diffuse responsibility, are seldom the best in practice—most of the time they represent a compromise which has none of the virtues of any path and tends to end in mediocrity. Experience has taught me that generally a decisive boss is better than a waffling one—you know where you stand and can get on with the work which needs to be done!
The “what if…?” will arise often in your futures, hence the need for you to master the concepts and possibilities of simulations, and be ready to question the results and to dig into the details when necessary.
One of the reasons for taking up the topic of fiber optics is that its significant history occurred within my scientific lifetime, and I can therefore give you a report of how the topic looked to me at the time it was occurring. Thus it provides an illustration of the style I adopted when facing a newly developing field of great potential importance. The field of fiber optics is also, of course, important in its own right. Finally, it is a topic you will have to deal with as it further evolves during your lifetime.
When I first heard of a seminar on the topic of fiber optics at Bell Telephone Laboratories, I considered whether I should attend or not—after all, one must try to do one’s own work and not spend all one’s time in lectures. First, I reflected that optical frequencies were very much higher than the electrical ones in use at time, and hence the fiber optics would have much greater bandwidth—and bandwidth is the effective rate (bits per second) of transmission, and is the name of the game for the telephone company, my employers at the time. Second, I recalled that Alexander Graham
During the early part of the talk the speaker remarked, “God loved sand, He made so much of it.” I heard, inside myself, that we were already having to exploit lower-grade copper mines, and could only expect to have an increasing cost for good copper as the years went by, but the material for glass is widely available and is not likely to ever be in short supply.
Either at the lecture or soon afterwards I heard the observation, “The telephone wire ducts in Manhattan are running out of space, and if the city continues to grow, as it has of late, then we will have to lay a lot more ducts, and this means digging up streets and sidewalks, but if we use glass fibers with their smaller diameters, then we can pull out the copper wires and put the glass fibers in their place.” This told me for that reason alone the Labs would have to do everything they could to develop glass fibers rapidly, that it was going to be an ongoing source of computation problems, and hence I had better keep myself abreast of developments.
Long before this, once I had decided to stay at the Labs and realized my poverty in the knowledge of practical electronics, I bought a couple of Heathkits and assembled them just for the experience, though the resulting objects were also useful. I knew, therefore, the amount of soldering of wires that went on, and immediately identified a difficult point to watch for—how did they propose to splice these fine, hair-sized, glass fibers and still have good transmission? You could not simply fuse them together and expect to get decent transmission.
Why such small diameters as they were proposing? It is obvious once you look at a picture of how a glass fiber works, Figure 21.2. The thinner the diameter, the more the fiber can bend without letting the light get out. That is one good reason for the smaller and smaller proposed diameters, and it is not the cost of the material nor the extra weight of larger-diameter fibers. Also, for many forms of transmission, a smaller-diameter fiber will clearly have less distortion in the signal when going a given distance.
A trouble which soon arose, and I had anticipated it, was that the outer sheathing put on the fine hair-sized fibers might alter the local index of refraction ratios and let some of the light escape. Of course, putting a mirrored surface on the fiber would solve it. They soon had the idea of putting a lower-index glass sleeve around the higher-index core, at human sizes, where it is easily done, and then drawing out the resulting shape into the very thin fibers they needed.
I did not try to follow all the arguments for the multi-mode vs. the single-mode methods of signaling—and while I did a number of simulations via computers for the two sides of the debate, I sort of backed the single mode on the same grounds that we had backed the binary against any higher-base number systems in computers. It is a technical detail anyway, including the details of detectors and emitters, and not a fundamental feature of the optical signaling.
Along the way I was constantly watching to see how they were going to splice the fibers. With the passage of time there were a number of quite clever ways proposed and tested, and the very number of alternates made me decide that probably that feature which first attracted my attention would be handled fairly easily—at least the problem would not prove to be fatal in the field, where it has to be done by technicians, and not in the labs, where things can be done by experts under controlled conditions. I well knew the difference by watching various projects (mostly in other companies) come to grief on the miserable fact that what can be done reliably in the lab by experts is not always the same as what can be done in the field by technicians who are in a hurry and are often operating under adverse conditions, to say the least.
As I recall they first field tested fiber optics by connecting a pair of central offices in Atlanta, Georgia. It was a success (the trial required some years to complete). Furthermore, outsiders from the glass business began to make glasses which were remarkably clear at the frequencies we wanted to use—meaning the frequencies at which we had reliable lasers. They said if the ocean waters were as clear as some of the glasses, then you could see to the bottom of the Pacific Ocean!
All the practical parts seemed to be coming together remarkably well, and as you know we now use fiber optics widely. I have told you as best I can how I approached a new technology, what I looked for, what I watched for, what I ignored, what I kept abreast of, and what I pondered. I had no desire to become an expert in the field; I had my hands full with computers and their rapid development, both hardware and software, as well as the expanding range of applications. Every new field which arises in your future will present you with similar questions, and you will effectively answer by your later actions.
The present applications of fiber optics are very widespread. I had long realized as time went on the satellite business was in for trouble. Stationary satellites for communication must be parked above the equator; there is no other place for them. A number of the countries along the equator have, from the earliest days, claimed we were invading their airspace and should be paying for the use of it. So far they have not been able to enforce their claims, as the advanced countries have simply continued to use the space without paying for it. I leave to you the justice of the situation: (1) the blatant ignoring of their claims, (2) whether or not they have a legitimate point, and (3) if, because they are unable to use it, now everyone else must wait until they can—if ever! It is not a trivial question of international relations, and there is some merit on all sides.
The satellites are now parked at about every 4° or so, and while we could park them closer, say 2°, we will have to use much more accurate (larger-diameter?) dishes on Earth to beam signals up to them without one signal slopping into the adjacent satellites. To a fair extent we can widen the bandwidth of the signaling and thus for a time extend the amount of traffic they can carry, but there are limits due to the atmosphere the signals must traverse. On the other hand, fiber optics can be laid down on Earth with any density you wish; cables of fibers can be easily made, and the total possible bandwidth boggles the mind. The use of satellites means broadcasting the signal—cables give a degree of privacy and the ability to make the user pay rather than get a free ride. Both satellites and cables have their advantages and disadvantages. At present satellites are frequently being used for what are essentially private communications and not broadcast situations. Time will probably readjust the matter so each is used in their best way.
Where are we now? We have already seen transoceanic cables with fibers instead of coaxial waveguides at a great deal less cost and a great deal more bandwidth. We are at the moment (1993) haggling over whether to use the most recently developed soliton signaling system or the classical pulse system of communicating across the Pacific Ocean to Japan. It is, I think, a matter of engineering development—in the long run I believe solitons will be the dominant method, and not pulses. I advise you to watch to see if there is a significant change in the technology—certainly if the transmission of information via solitons wins out over the current pulse signaling method, then this should produce basically new methods of signal analysis in the future, and you had best keep abreast of it if it happens, or else you, like so many other people, will be left behind.
I read that in the Navy, as well as in the obvious Air Force and commercial aviation applications, the decreased weight means great savings, which can be used for other things. On a tour of the carrier Enterprise some 14 years ago, being even then well aware of the trend to optical fibers, I looked especially at the duct wiring and decided fibers will replace all those wires insofar as they are information-handling wires. For the distribution of power it is another matter entirely. But then, will centralized power distribution remain the main method, or will, due to battle conditions, a decentralized power system aboard a ship become the preferred method? It would better blend in with the obviously redundant fiber-optic systems which will undoubtedly be installed as a matter of safety practice. And battleships are not very different from World Trade-type skyscraper office buildings!
We now have fiber-optic cables which are sufficiently armored that trucks can run over them safely, fibers so light missiles are fired with an unreeling fiber attached throughout the flight—and this means two-way communication, both to direct the missile to the target and to get back what the missile can see as it flies.
Being in computers, I naturally asked myself how this could and would impact the design of computers. You probably know we now (1993) often interconnect the larger units of a computer with fiber optics. It seems only a matter of time before major parts of internal wiring will go optical. Cannot one make, in time, “motherboards” by which the integrated circuit chips are interconnected using fiber optics? It does not seem to be unreasonable in this day of the material sciences. How soon will fiber-optic techniques get down to the chips? After all, the bandwidth of optics means, inferentially, higher pulse rates! Can we not in time make optical chips, and have a general light source falling on a photocell on the chip (like some handheld calculators) to power the chip, and avoid all the wiring of power distribution to the chips (Figure 21.4)?
Can we replace chip wiring with light beams? Light beams can pass through one another without interference (provided the intensity is not too high), which is more than you can do with wires, Figure 21.5.
This brings up switching. Can crossbar switches be made to be optical and not electronic? Would not the Bell Telephone Laboratories and others have to work on it intensively? If they succeed, then will it not be true that switching, which has traditionally been one of the most expensive parts of a computer, will become perhaps one of the cheapest? At first memory was the expensive part of computers, but with magnetic cores, and now with electronic storage at fantastically cheap prices, the design and use of computers has significantly changed. If a major drop in switching costs came about, how would you design a computer? Would the von Neumann basic design survive at all? What would be the appropriate computer designs with this new cost structure? You can try, as I indicated above, to keep reasonably abreast by actively anticipating the way things and ideas might go, and then seeing what actually happens. Your anticipation means you are far, far better prepared to absorb the new things when they arise than if you sit passively by and merely follow progress. “Luck favors the prepared mind.”
That is the reason for this talk—to show you how someone tried to anticipate and be prepared for rapid changes in technologies which would impact their research and work. You cannot lead everywhere in this highly technological society, but you need not be left behind by every new development—as many people are in practice.
I have said again and again in this book, my duty as a professor is to increase the probability that you will be a significant contributor to our society, and I can think of no better way than establishing in you the habit of anticipating things and leading rather than passively following. It seems to me I must, to accomplish my duty to you and to the institution, move as many of you as I can from a passive to a more active, anticipating role.
In today’s chapter you see I claim to have made no significant contribution, but at least I was prepared to help others who were more deeply involved by supplying the right kinds of computing rather than slightly misconceived computations, which are so often done. I believe I often supplied that kind of service at Bell Telephone Laboratories during the 30 years I spent there before my retirement. In the area of fiber optics I have told you some of the details of what I did and how I did it.
Let me now turn to predictions of the immediate future. It is fairly clear that in time “drop lines” from the street to the house (they may actually be buried, but will probably still be called “drop lines”!) will be fiber optics. Once a fiber-optic wire is installed, then potentially you have available almost all the information you could possibly want, including TV and radio, and possibly newspaper articles selected according to your interest profile (you pay the printing bill which occurs in your own house). There would be no need for separate information channels most of the time. At your end of the fiber there are one or more digital filters. Which channel you want, the phone, radio, or tv, can be selected by you much as you do now, and the channel is determined by the numbers put into the digital filter—thus the same filter can be multipurpose, if you wish. You will need one filter for each channel you wish to use at the same time (though it is possible a single time-sharing filter would be available) and each filter would be of the same standard design. Alternately, the filters may come with the particular equipment you buy.
But will this happen? It is necessary to examine political, economic, and social conditions before saying what is technologically possible will in fact happen. Is it likely the government will want to have so much information distribution in the hands of a single company? Would the present cable companies be willing to share with the telephone company and possibly lose some profit thereby, and certainly come under more government regulation? Indeed, do we as a society want it to happen?
One of the recurring themes in this book is that frequently what is technologically feasible, and is even economically better, is restrained by legal, social, and economic conditions. Just because it can be done economically does not mean it should be done. If you do not get a firm grasp on these aspects, then as a practicing seer of what is going to happen in your area of specialization you will make a lot of false predictions you will have to explain as best you can when they turn out to be wrong.
Because computers were early installed in many universities, it was natural the question of computer-aided instruction (cai) would arise and be explored in some depth. Before we get to the modern claims, it is wise to get some perspective on the matter.
There is a story from ancient Greek times of a mathematician telling a ruler there were royal roads for him to walk on, and royal messengers to carry his mail, but there was no royal road to geometry. Similarly, you will recognize that money and coaching will do only a little for you if you want to run a four-minute mile. There is no easy way for you to do it. The four-minute mile is much the same for everyone.
There is a long history of people wanting an easy path to learning. Aldous
Hence all of past history, with its many, many claims of easy learning, speaks eloquently against the current rash of promises, but it cannot, of course, prove some new gimmick will not succeed. You need to take a large grain of salt with every such proposal—but there could be new things the past did not know, and new tools like the cheap computers now available, which were not available then, which could make the difference. Regularly I read or hear that I am supposed to believe that the new gimmick, typically these days the computer, will make a significant difference, in spite of all past promises which have apparently failed miserably. Beware of the power of wishful thinking on your part—you would like it to be true, so you assume it is true!
Let me turn to some of the past history of the use of computers to greatly assist in learning. I recall in 1960, while I was at Stanford on a sabbatical, there was a “grader program.” Any problem the professor wanted to assign to the class in a programming course required the professor to give a correct running program to solve it, the names of the input variables, the ranges in which the input numbers could occur, and also a limit for the roundoff of the output numbers to be acceptable. When the students felt their program was ready for submission, they called the grader, gave their identification, and the machine generated some random admissible input, ran both their and the professor’s program, and compared the results. Each output number was right or wrong. Such a grader can easily incorporate the time of compiling and the time of running, which are mere numbers, and still be required to make no judgment on style.
The method is flexible, easily adapted to changes in the course and in the specific exercises assigned from year to year. The program keeps a record in a private database of the professor, and on demand from him gives the raw facts, leaving any evaluation to the professor. Of course class averages, variances, distribution of grades, etc. can all be supplied to the professor from his database, if wanted.
When I visited Stanford a couple of years later I asked about the grader program. I found it was not in use. Why? Because, so they said, the first professor who had got it going left, and a change had been made in the monitor system that would require a few changes in the program! Diligent watching and asking shows this is very typical on many campuses. The machine is programmed to greatly assist, apparently, the professor, but the program is soon forgotten.
Let me turn to the project plato done by a friend of mine at the University of Illinois. I regularly met him at various meetings, and once on a long airplane ride, and every time he told me how wonderful plato was. For example, once he said that at the same time plato had a pupil from Scotland, one from Canada, and one from Kentucky. I said I knew the telephone company could do that, and what he was saying was totally irrelevant to whether or not plato was doing a better job than humans did. He never, to my knowledge, produced any serious evidence that plato did improve teaching in a significant fashion—above what you would expect from the Hawthorne effect.
One claim made was that the student was advanced about 10% along the education path over those who did not use the system. When I inquired as to whether this meant it was the same 10% shift all through the educational system, or whether he meant 10% on each course, compound interest as it were, he did not know! What had he done about the Hawthorne effect? Nothing! So I do not know what was or was not accomplished after spending the millions and millions of dollars of federal money.
Once when I was the chief editor of acm Publications, a programmed book on computing was submitted for publication. A programmed book regularly asks questions of the reader and then, depending on the response, the reader is sent to one of several branch points (pages). In principle the errors are caught and explained again, and correct answers send the reader on to new material. Sounds good! Each student goes at their own pace. But consider that there can be no backtracking to find something you read a few pages ago and are now a bit fuzzy about where you came from or how you got here. There can be no organized browsing through the text. It really is not a book, though from the outside it looks like one. Another terrible fact is that carefully watching the students to see what happens in practice has shown a good student often picks what they know is the wrong answer simply out of either boredom or amusement to see what the book will say. Hence it does not always work out as it was thought it would; the better students do not necessarily progress significantly faster than the poorer ones!
I did not want to reject programmed books on my own opinion, so I went to the Bell Telephone Laboratories’ psychology department and found the local expert. Among other things he said was that there was to be a large conference on programmed books the following week, and why not go? So I did. On the opening day we sat next to each other. He nudged me and said, “Notice no one will ever produce any concrete evidence, they will only make claims that programmed texts are better.” He was exactly right—no speaker had anything to offer in the form of hard, experimental evidence, only their opinions. I rejected the book, and in hindsight I think I did the right thing. We now have computer discs which claim to do the same thing, but I have little reason to suspect the disc format makes a significant difference, though they could backtrack through the path you used to get there.
I have just given some of the negative side of cai. Now to the positive side. I have little doubt that in teaching dull arithmetic, say the addition and multiplication tables, a machine can do a better job than a teacher, once you incorporate the simplest program to note the errors and generate more examples covering that point, such as multiplying by seven, until the point is mastered. For such rote learning I doubt any of you would differ from my opinion. Unfortunately, in the future we can expect corporations and other large organizations will have removed much of the need for just such rote learning (computers can often do it better and cheaper) and employment will usually require judgment on your part.
We now turn to airplane pilot training in the current trainers. They again do a better job, by far, than can any real-life experience, and generally the pilots have fairly little other human interactive training during the course. Flying, to a fair extent, I point out, is a conditioned response being trained into the human. It is not much thinking, though at times thinking is necessary; it is more training to react rapidly and correctly, both mentally and physically, to unforeseen emergencies.
It seems to me that for this sort of training, where there is a conditioned response to be learned, machines can do a very good job. It happens as a child I learned fencing. In a duel there is no time for local thinking; you must make a rapid conditioned response. There is indeed a large overall planning of a duel, but moment to moment it must be a response which does not involve the delay of thinking.
When I first came to the Naval Postgraduate School in 1976 there was a nice dean of the extension division concerned with education. In some hot discussions on education we differed. One day I came into his office and said I was teaching a weightlifting class (which he knew I was not). I went on to say that graduation was lifting 250 pounds, and I had found many students got discouraged and dropped out, some repeated the course, and a very few graduated. I went on to say thinking this over last night I decided the problem could be easily cured by simply cutting the weights in half—the student, in order to graduate, would lift 125 pounds, set them down, and then lift the other 125 pounds, thus lifting the 250 pounds.
I waited a moment while he smiled (as you probably have) and I then observed that when I found a simpler proof for a theorem in mathematics and used it in class, was I or was I not cutting the weights in half? What is your answer? Is there not an element of truth in the observation that the easier we make the learning for the student, the more we are cutting the weights in half? Do not jump to the conclusion that I am saying poor chapters should be given because then the students must work harder. But a lot of evidence on what enabled people to make big contributions points to the conclusion that a famous prof was a terrible lecturer and the students had to work hard to learn it for themselves! I again suggest a rule:
What you learn from others you can use to follow;
What you learn for yourself you can use to lead.
To get closer to the problem, to what extent is it proper to compare physical muscles with “mental muscles”? Probably they are not exactly equivalent, but how far is it a reasonable analogy? I leave it to you to think over.
Another argument I had with this same dean was about his belief that the students should be allowed to take the extension courses which were under his wing at their own pace; I argued that the speed in learning was a significant matter to organizations—rapid learners were much more valuable than were slow learners (other things being the same); it was part of our job to increase the speed of learning and mark for society those who were the better ones. Again, this is opinion, but surely you do not want very slow learners to be in charge of you. Speed in learning new things is not everything, to be sure, but it seems to me it is an important element.
The fundamental trouble in assessing the value of cai is we are not prepared to say what the educated person is, nor how we now accomplish it (if we do!). We can say what we do, but that is not the same as what we should be doing. Hence I can only give more anecdotes.
Let me tell you another story about the transfer of training, as it is called—the use of ideas from one place to another. During the very early part of wwii I was teaching a calculus course at an engineering school in Louisville. The students were having trouble in a course in thermodynamics taught by the dean of engineering, who was an ex-submarine commander and who scared the students. With the dean’s permission I visited a class to see what was happening. He put on the board, at one point,
and asked what it was, and no student knew. The very next hour in my class across the hall I wrote
and they all knew immediately it was ln x plus a constant. When I wrote
they again knew. “Why,” said I, “did you not respond with that in the dean’s class last hour?” The fact is, what they knew in one class at one hour with one professor did not transfer to another hour in a room across the hall with another professor. Sounds strange, but that is what is known as the transfer of training—the ability to use the same ideas in a new situation. Transfer of training was a large part of my contribution to Bell Telephone Laboratories—I did it quite often, though of course I do not know how many chances I missed!
Let me turn to the calculus course I have often taught at the Naval Postgraduate School, though I had formed this opinion years before. Students are remarkably able to memorize their way through many math classes, and many do so. But when I get to analytic integration (I give the students a function and ask for its indefinite integral), there is no way they can memorize their way through the course the way I teach it. They must learn to recognize
in an almost infinite number of disguises. For the first time in their career they are forced to learn to recognize forms independent of the particular representation—which is a basic feature of mathematics and general intelligence. To take analytic integration out of the course, or transfer it to routines in computers, is to defeat the purpose of a stage of learning something that is essential, in my opinion, unless something of equivalent difficulty is put in. The students must master abstract pattern recognition if they are to progress and use mathematics later in their careers.
In summary, as best I can, clearly in low-level conditioned response situations, typically associated with training, I believe computers can greatly aid in the learning process, but at the other end, high-level thinking, education, I am very skeptical. Skeptical mainly because we ourselves do not understand either what we want to do, nor what we are presently doing! We simply do not know what we mean by “the educated person,” let alone what it will mean in the year 2020. Without that knowledge, how am I to judge the success of any proposal which is tried? Between low-level training and high-level education there is a large area to be explored and exploited by organizations outside the universities, as well as inside. I will discuss at great length in Chapter 26 the point that rarely do the experts in a field make the significant steps forward; great progress generally comes from the outside. The role of cai in organizations with large training programs will increase in the future, as progress constantly obsoletes old tools and introduces new ones into the organization that are generally more complex technically to use.
Consider the programs on computers which are supposed to teach such things as business management, or, even more seriously, war games. The machines can take care of the sea of minor details in the simulation, indeed should buffer the player from them, and expect good, high-level decisions. There may be some elements of low-level training which must be included, as well as the higher-level thinking. We must ask to what extent it is training and to what extent it is education. Of course, as mentioned in the three chapters on simulation, we also need to ask if the simulation is relevant to the future for which the training is being given. Will the presence of the gaming programs, if at all widespread, perhaps vitiate the training? You can be sure, however, that even if the proposers cannot answer these questions, they will still produce and advertise the corresponding programs. You may be a victim of being trained for the wrong situations!
As you live your life your attention is generally on the foreground things, and the background is usually taken for granted. We take for granted, most of the time, air, water, and many other things, such as language and mathematics. When you have worked in an organization for a long time its structure, its methods, its “ethos,” if you wish, are usually taken for granted.
It is worthwhile, now and then, to examine these background things which have never held your close attention before, since great steps forward often arise from such actions, and seldom otherwise. It is for this reason we will examine mathematics, though a similar examination of language would also prove fruitful. We have been using mathematics without ever discussing what it is—most of you have never really thought about it, you just did the mathematics—but mathematics plays a central role in science and engineering.
Perhaps the favorite definition of mathematics given by mathematicians is:
Mathematics is what is done by mathematicians, and mathematicians are those who do mathematics.
Coming from a mathematician its circularity is a source of humor, but it is also a clear admission that they do not think mathematics can be defined adequately. There is a famous book, What Is Mathematics?, whose authors exhibit mathematics but do not attempt to define it.
Once at a cocktail party, a Bell Telephone Laboratories mathematics department head said three times to a young lady,
Mathematics is nothing but clear thinking.
I doubt she agreed, but she finally changed the subject; it made an impression on me. You might also say:
Mathematics is the language of clear thinking.
This is not to say mathematics is perfect—not at all—but nothing better seems to be available. You have only to look at the legal system and the income tax people and their use of the natural language to express what they mean, to see how inadequate the English language is for clear thinking. The simple statement “I am lying” contradicts itself!
There are many natural languages on the face of the earth, but there is essentially only one language of mathematics. True, the Romans wrote VII, the Arabic notation is 7 (of course, the 7 is in the Latin form and not the Arabic), and the binary notation is 111, but they are all the same idea behind the surface notation. A 7 is a 7 is a 7, and in every notation it is a prime number. The number 7 is not to be confused with its representation.
Most people who have given the matter serious thought have agreed that if we are ever in communication with a civilization around some distant sun, then they will have essentially the same mathematics as we do. Remember, the hypothesis is we are in communication with them, which seems to imply they have developed to the state where they have mastered the equivalent of Maxwell’s equations. I should note that some philosophers have doubted that even their communication system, let alone any details of it, would resemble ours in any way at all. But people who have their heads in the clouds all the time can imagine anything at all and are very seldom close to correct (witness some of the speculation that the surface of the Moon would have meters of dust into which the space vehicle would sink and suffocate the people).
The words “essentially equivalent” are necessary because, for example, their Euclidean geometry may include orientation, and thus for the aliens two triangles may be congruent or
Over the many years there has developed five main schools of what mathematics is, and not one has proved to be satisfactory.
The trouble with Platonism is that it fails to be very believable, and certainly cannot account for how mathematics evolves, as distinct from expanding and elaborating; the basic ideas and definitions of mathematics have gradually changed over the centuries, and this does not fit well with the idea of the immutable Platonic ideas. Euler’s (1707–1783) idea of continuity is quite different from the one you were taught. You can, of course, claim the changes arise from our “seeing the ideas more clearly” with the passage of time. But when one considers non-Euclidean geometry, which arose from tampering with only the parallel postulate, and then think of the many other potential geometries which must exist in this Platonic space, every possible mathematical idea and all the possible logical consequences from them must all exist in Plato’s realm of ideas for all eternity! They were all there when the Big Bang happened!
There was, probably by the late Middle Ages (though I have never found just when it was first discovered), a well-known proof, using classical Euclidean geometry, that every triangle is isosceles. You start with a triangle ABC, Figure 23.3. You then bisect the angle at B and also make the perpendicular bisector of the opposite side at the point D. These two lines meet at the point E. Working around the point E you establish small triangles whose corresponding sides or angles are equal, and finally prove the two sides of the bisected angle are the same size! Obviously the proof of the theorem is wrong, but it follows the style used by classical Euclidean geometers, so there is clearly something basically wrong. (Notice that only by using metamathematical reasoning did we decide mathematical reasoning this time came to a wrong conclusion!)
To show where the false reasoning of this result arose (and also other possible false results), Hilbert examined what Euclid had omitted to talk about, both betweeness and intersections. Thus Hilbert could show the indicated intersection of the two bisectors met outside the triangle, not inside as the drawing indicated. In doing this he added many more postulates than Euclid had originally given!
I was a graduate student in mathematics when this fact came to my attention. I read up on it a bit, and then thought a great deal. There are, I am told, some 467 theorems in Euclid, but not one of these theorems turned out to be false after Hilbert added his postulates! Yet every theorem which needed one of these new postulates could not have been rigorously “proved” by Euclid! Every theorem which followed, and rested on such a theorem, was also not “proved” by Euclid. Yet the results in the improved system were still the same as those Euclid regarded as being true. How could this be? How could it be that Euclid, though he had not actually proved the bulk of his theorems, never made a mistake? Luck? Hardly!
It soon became evident to me that one of the reasons no theorem was false was that Hilbert “knew” the Euclidean theorems were “correct,” and he had picked his added postulates so this would be true. But then I soon realized Euclid had been in the same position; Euclid knew the “truth” of the Pythagorean theorem, and many other theorems, and had to find a system of postulates which would let him get the results he knew in advance. Euclid did not lay down postulates and make deductions as it is commonly taught; he felt his way back from “known” results to the postulates he needed!
Pure mathematics consists entirely of assertions to the effect that, if such and such a proposition is true of anything, then such and such another proposition is true of that thing. It is essential not to discuss whether the first proposition is really true, and not to mention what the thing is, of which it is supposed to be true.
Here you see a blend of the logical and formalist schools, and the sterility of their views. The logicians failed to convince people their approach was other than an idle exercise in logic. Indeed, I will strongly suggest that what is usually called the foundations of mathematics is only the penthouse. A simple illustration of this is that for years I have been saying if you come into my office and show me Cauchy’s theorem is false, meaning it cannot be derived from the usual assumptions, then I will certainly be interested, but in the long run I will tell you to go back and get new assumptions—I know Cauchy’s theorem is “true.” Thus, for me at least, mathematics does not exclusively follow from the assumptions, but rather very often the assumptions follow from the theorems we “believe are true.” I tend, as do many others, to group the formalists and logicians together.
Clearly, mathematics is not the laying down of postulates and then making rigorous deduction from them, as the formalists pretend. Indeed, almost every graduate student in mathematics has the experience that they have to “patch up” the proofs of earlier great mathematicians; and yet somehow the theorems do not change much, though obviously the great mathematician had not really “proved” the theorem which was being patched up. It is true (though seldom mentioned) that definitions in mathematics tend to “slide” and alter a bit with the passage of time, so previous proofs no longer apply to the same statement of a theorem now that we understand the words slightly differently.
The nature of our language tends to force us into “yes/no”; something is or is not, you either have a proof or you do not. But once we admit there is a changing standard of rigor, we have to accept that some proofs are more convincing than other proofs. If you view proofs on a scale much like probability, running from 0 to 1, then all proofs lie in the range and very likely never reach the upper limit of 1, certainty.
Indeed, some numerical analysts tend to believe the “real number system” is the bit patterns in the computer—they are the true reality, so they say, and the mathematician’s imagined number system is exactly that, imagined. Most users of mathematics simply use it as a tool, and give little or no attention to their basic philosophy.
There is a group of people in software who believe we should “prove programs are correct,” much as we prove theorems in mathematics are correct. The two fallacies they commit are:
This does not mean there is nothing of value to their approach of proving programs are correct, only, as so often happens, that their claims are much inflated.
As you know from your courses in mathematics, what you are actually doing, when viewed at the philosophical level, is almost never mentioned. The professors are too busy doing the details of mathematics to ever discuss what they are actually doing—a typical technician’s behavior!
However, as you all know, mathematics is remarkably useful in this world, and we have been using it without much thought. Hence we need more discussion on this background material you have used without the benefit of thought.
The ancient Greeks believed mathematics was “truth.” There was little or no doubt on this matter in their minds. What is more sure than 1 + 1 = 2? But recall when we discussed error correcting codes we said 1 + 1 = 0. This multiple use of the same symbols (you can claim the 1s in the two statements are not the same things if you wish) contradicts logical usage. It was probably when the first non-Euclidean geometries arose that mathematicians came face to face with this matter that there could be different systems of mathematics. They use the same words, it is true, such as points, lines, and planes, but apparently the meanings to be attached to the words differ. This is not new to you; when you came to the topic of forces in mechanics and to the addition of forces, you had to recognize scalar addition was not appropriate for vector addition. And the word “work” in physics is not the same as we generally mean in real life.
It would appear that the mathematics you choose to use must come from the field of application; mathematics is not universal and “true.” How, then, are we to pick the right mathematics for various applications? What meanings do the symbols of mathematics have in themselves? Careful analysis suggests the “meaning” of a symbol only arises from how it is used and not from the definitions as Euclid—and you—thought when he defined points, lines, and planes. We now realize his definitions are both circular and do not uniquely define anything; the meaning must come from the relationships between the symbols. It is just as in the interpretive language I sketched out in Chapter 4: the meaning of the instruction was contained in the subroutine it called—how the symbols were processed—and not in the name itself! In themselves the marks are just strings of bits in the machine and can have no meaning except by how they are used.
By now it should become clear that symbols mean what we choose them to mean. You are all familiar with different natural languages where different words (labels) are apparently assigned to the same idea. Coming back to Plato: What is a chair? Is it always the same idea, or does it depend on context? At a picnic a rock can be a chair, but you do not expect the use of a rock in someone’s living room as a chair. You also realize any dictionary must be circular; the first word you look up must be defined in terms of other words—there can be no first definition which does not use words.
You may, therefore, wonder how a child learns a language. It is one thing to learn a second language once you know a first language, but to learn the first language is another matter—there is no first place to appeal for meaning. You can do a bit with gestures for nouns and verbs, but apparently many words are not so indicatable. When I point to a horse and say the word “horse,” am I indicating the name of the particular horse, the general name of horses, of quadrupeds, of mammals, of living things, or the color of the horse? How is the other person to know which meaning is meant in a particular situation? Indeed, how does a child learn to distinguish between the specific, concrete horse and the more abstract class of horses?
Apparently, as I said above, meaning arises from the use made of the word, and is not otherwise defined. Some years back a famous dictionary came out and admitted they could not prescribe usage, they could only say how words were used; they had to be “descriptive” and not “prescriptive.” That there is apparently no absolute, proper meaning for every word made many people quite angry. For example, both the New Yorker book reviewer and the fictional detective Nero Wolfe were very irate over the dictionary.
We now see that all this “truth” which is supposed to reside in mathematics is a mirage. It is all arbitrary, human conventions.
But we then face the unreasonable effectiveness of mathematics. Having claimed there was neither “truth” nor “meaning” in the mathematical symbols, I am now stuck with explaining the simple fact that mathematics is used and is an increasingly central part of our society, especially in science and engineering. We have passed from absolute certain truth in mathematics to the state where we see there is no meaning at all in the symbols—but we still use them! We put the meaning into the symbols as we convert the assumptions of the problem into mathematical symbols, and again when we interpret the results. Hence we can use the same formula in many different situations—mathematics is sort of a universal mental tool for clear thinking.
Supposing for the moment the above remark of Einstein is true, then the problem of applying mathematics is simply to recognize an analogy between the formal mathematical structure and the corresponding part of “reality.” For example, for the error-correcting codes, I had to see that for symbols of the code, if I were to use 0 and 1 for the basic symbols, and use a 1 for the position of an error (the error was simply a string of 0s with one 1 where the error occurred), then I could “add” the strings if and only if I chose 1 + 1 = 0 as my basic arithmetic. Two successive errors in the same position is the same as no error. I had to see an analogy between parts of the problem and a mathematical structure which at the start I barely understood.
Thus part of the effectiveness of mathematics arises from the recognition of the analogy, and only insofar as the analogy is extensive and accurate can we use mathematics to predict what will happen in the real world from the manipulation of the symbols at our desks.
You have been taught a large number of these identifications between mathematical models and pieces of reality. But I doubt these will cover all future developments. Rather, as we want, more and more, to do new things which are now possible due to technical advancements of one kind or another, including understanding ourselves better, we will need many other mathematical models.
I suggest, with absolutely no proof, that in the past we have found the easy applications of mathematics, the situations where there is a close correspondence between the mathematical structure and the part being modeled, and in the future you will have to be satisfied with poorer analogies between the two parts. We will, in time, I believe, want mathematical models in which the whole is not the sum of the parts, but the whole may be much more, due to the “synergism” between the parts. You are all familiar with the fact that the organization you are in is often more than the total of the individuals—there is morale, means of control, habits, customs, past history, etc., which are indefinably separate from the particular individuals in the organization. But if mathematics is clear thinking, as I said at the start of this chapter, then mathematics will have to come to the rescue for these kinds of problems in the future. Or, to put it differently, whatever clear thinking you do, especially if you use symbols, then that is mathematics!
Similarly, the three things of Classical Greece, truth, beauty, and justice, though you all think you know what they mean, cannot (apparently) be put into words. From the time of Hammurabi, the attempt to put justice into words has produced the law, and often the law is not your conception of justice. There is the famous question in the Bible, “What is truth?” And who but a beauty judge would dare to judge “beauty”?
Thus I have gone beyond the limitations of Gödel’s theorem, which loosely states that if you have a reasonably rich system of discrete symbols (the theorem does not refer to mathematics in spite of the way it is usually presented), then there will be statements whose truth or falsity cannot be proved within the system. It follows that if you add new assumptions to settle these theorems, there will be new theorems which you cannot settle within the new enlarged system. This indicates a clear limitation on what discrete symbol systems can do.
I think in the past we have done the easy problems, and in the future we will more and more face problems which are left over and require new ways of thinking and new approaches. The problems will not go away—hence you will be expected to cope with them—and I am suggesting at times you may have to invent new mathematics to handle them. Your future should be exciting for you if you will respond to the challenges in correspondingly new ways. Obviously there is more for the future to discover than we have discovered in all the past!
Most physicists currently believe they have the basic description of the universe (though they admit 90% to 99% of the universe is in the form of “dark matter,” of which they know nothing except it has gravitational attraction). You should realize that in all of science there are only descriptions of how things happen and nothing about why they happen.
The reasons for discussing quantum mechanics, or qm, are: (1) it is basic physics, (2) it has many intellectual repercussions, and (3) it provides a number of models for how to do things.
At the end of the 1800s and in the early 1900s physics was faced with a number of troubles. Among them were:
Before going on, let me discuss how this piece of history has affected my behavior in science. Clearly Planck was led to create the theory because the approximating curve fit so well, and had the proper form. I reasoned, therefore, if I were to help anyone do a similar thing, I had better represent things in terms of functions they believed would be proper for their field rather than in the standard polynomials. I therefore abandoned the standard polynomial approach to approximation, which numerical analysts and statisticians among others use most of the time, for the harder approach of finding which class of functions I should use. I generally find the class of functions to use by asking the person with the problem, and then use the facts they feel are relevant—all in the hopes I will thereby, someday, produce a significant insight on their part. Well, I never helped find so large a contribution as qm, but often by fitting the problem to their beliefs I did produce, on their part, smaller pieces of insight.
Moral: there need not be a unique form of a theory to account for a body of observations; instead, two rather different-looking theories can agree on all the predicted details. You cannot go from a body of data to a unique theory! I noted this in the last chapter.
Another story will illustrate this point clearly. Some years ago, when I took over a PhD thesis from another professor, I soon found they were using random input signals and measuring the corresponding outputs. I also found it was “well-known”—meaning it was known, but almost never mentioned—that quite different internal structures of the black boxes they were studying could give exactly the same outputs, given the same inputs, of course. There was no way, using the types of measurements they were using, to distinguish between the two quite different structures. Again, you cannot get a unique theory from a set of data.
The new qm dates from about 1925 and has had great success. It supposes that energy, and many other things in physics, come in discrete chunks, but the chunks are so small that we, who are relatively large objects with respect to the chunks, simply cannot perceive them other than with delicate experiments or in peculiar situations.
The situation was, therefore, that classical Newtonian mechanics, which had been very well verified in so many ways and had even successfully predicted the positions of unknown planets, was being replaced by two theories: relativity at high speeds, large masses, and high energies, and qm at small sizes. Both theories were at first found to be non-intuitive, but as time passed they came to be accepted widely, the special theory of relativity being the more so. You may recall in Newton’s time gravity (action at a distance) was not felt to be reasonable.
Again I stop and remark to you the obvious lessons to learn from this wave-particle duality. With almost 70 years and no decent explanation of the duality, one has to ask, “Is it possible this is one of those things we cannot think?” Or possibly it is only that it cannot be put into words. There are smells you cannot smell, wavelengths of light you cannot see, sounds you cannot hear, all based on the limits of your sense organs, so why do you object to the observation that given the wiring of the brain you have, there can be thoughts you cannot think? qm offers a possible example. In almost 70 years and with all the clever people who have taught qm, no one has found a widely accepted explanation of the fundamental fact of qm, the wave-particle duality. You simply have to get used to it, so they claim.
Man is not a rational animal, he is a rationalizing animal.
Hence you will find that often what you believe is what you want to believe, rather than being the result of careful thinking.
Einstein did not like the idea of non-local effects, and he produced the famous Einstein-Podolsky-Rosen paper (epr), which showed there were restraints on what we could observe if there were non-local effects. Bell sharpened this up into the famous “Bell inequalities” on the relationships of apparently independent probability measurements, and this result is now widely accepted. Non-local effects seem to mean something can happen instantaneously without requiring time to get from cause to effect—similar to the states of polarization of the two particles of the Aspect experiments.
So once more qm has flatly contradicted our beliefs and instincts, which are, of course, based on the human scale and not on the microscopic scale of atoms. qm is stranger than we ever believed, and seems to get stranger the longer we study it.
It is important to notice that while I have indicated that maybe we can never understand qm in the classical sense of “understand,” we have nevertheless created a formal mathematical structure which we can use very effectively. Thus as we go into the future and perhaps meet many more things we cannot “understand,” still we may be able to create formal mathematical structures which will enable us to cope with the fields. Unsatisfactory? Yes! But it is amazing how you get used to qm after you work with it long enough. It is much the same story as your handling complex numbers—all the professor’s words about complex arithmetic, being equivalent to ordered pairs of real numbers with a peculiar rule for multiplication, meant little to you; your faith in the “reality” of complex numbers came from using them for a long time and seeing they often gave reasonable, useful predictions. Faith in Newton’s gravitation (action at a distance) came the same way.
I do not pretend to know in any detail what the future will reveal, but I believe that since at every stage of advance we tend to attack the easier problems, the future will include more and more things our brains, being wired as they are, cannot “understand” in the classical sense of “understand.” Still the future is not hopeless. I suspect we will need many different mathematical models to help us, and I do not think this is only a prejudice of a mathematician. Thus the future should be full of interesting opportunities for those who have the intellectual courage to think hard and use mathematical models as a basis for “understanding” nature. Creating and using new and different kinds of mathematics seems to me to be one of the things you can expect to have to do if you are to get the “understanding” you would like to have. The mathematics of the past was designed to fit the obvious situations, and as just mentioned we have tended to examine them first. As we explore new areas we can expect to need new kinds of mathematics—and even to merely follow the frontier you will have to learn them as they arise!
This brings me to another theme of this book: progress is making us face ourselves in many ways, and computers are very central in this process. Not only do they ask us questions never asked before, but they also give us new ways of answering them. Not just in giving numerical answers, but in providing a tool to create models, simulations if you prefer, to help us cope with the future. We are not at the end of the computer revolution; we are at the start, or possibly near the middle of it.
I can only speculate that this deeper experimental probing of our theories will, in the long run, produce fundamentally new things to be adapted for human use, though the experiments themselves involve only the tiniest of particles. Certainly past history suggests this, so you cannot afford to remain totally ignorant of this exciting frontier of human knowledge.
Creativity, originality, novelty, and other such words are regarded as “good things,” and we often fail to distinguish between them—indeed, we find them hard to define. Surely we do not need three words with exactly the same meaning. Hence we should try to differentiate somewhat between them as we try to define them. The importance of definitions has been stressed before, and we will use this occasion to illustrate an approach to defining things, not that we will succeed perfectly or even well.
It should be remarked that in primitive societies creativity, originality, and novelty are not appreciated; doing as one’s ancestors did is the proper thing to do. This is also true in many large organizations today: the elders are sure they know how the future should be handled and the younger members of the tribe, when they do things differently, are not appreciated.
Long ago a friend of mine in computing once remarked that he would like to do something original with a computer, something no one else had ever done. I promptly replied, “Take a random ten-decimal-digit number and multiply it by another random ten-digit number, and it will almost certainly be something no one else has ever done.” There are, using
The art world, especially painting, has had a great deal of trouble with the distinction between creativity and originality for most of this century. Modern artists, and museum directors, offer to the public things which are certainly novel and new, but which many of the potential paying public often do not like. For many people the shock value of various forms of art has finally worn off, and the average person no longer responds to the current “modern art.” After all, I could paint a picture and it would be new and novel, but I would hardly consider it as a “creative work of art”—whatever that means.
Evidently we want the word “creative” to include the concept of value—but value to whom? A new theorem in some branch of mathematics may be a creative act, but the number of people who can appreciate it may be very few indeed, so we must be careful not to insist the created thing be widely appreciated. We also have the fact that many of the current highly valued works of art were not appreciated during the artist’s lifetime—indeed, the phenomenon is so common as to be discouraging. By a kind of inverted logic it does allow many people to believe that because they are unappreciated they must be a great artist!
I hope the above has disentangled some of the confusion between creativity, novelty, and originality, but I am not able to say just what this word “creativity,” which we value so much in our society, actually means. In women’s fashions it seems to mean “different,” but not too different!
It should be evident from the fact that I am using a whole chapter on the topic that I think creativity in an individual can probably be improved. Indeed, it has been a topic in much of the course, though I have often called it “style.” I believe the future will have even greater need for new, creative ideas than the past, hence I must do what I can to increase the probability you will form your own effective style and have “great ideas.” But except for discussing the topic, making you aware of it, and indicating what we think we know about it, I have no real suggestions (I can put into concrete words) on how to make you, magically, more creative in your careers. The topic is too important to ignore, even if I do not understand the creative act very well. Better I should try to do it, a person you know who has experienced it many times, than you get it from some people who themselves have never done a significant creative act. I often suspect creativity is like sex: a young lad can read all the books you have on the topic, but without direct experience he will have little chance of understanding what sex is—but even with experience he may still not understand what is going on! So we must continue, even if we are not at all sure we know what we are talking about.
Introspection, and an examination of history and of reports of those who have done great work, all seem to show that typically the pattern of creativity is as follows. There is first the recognition of the problem in some dim sense. This is followed by a longer or shorter period of refinement of the problem. Do not be too hasty at this stage, as you are likely to put the problem in the conventional form and find only the conventional solution. This stage, moreover, requires your emotional involvement, your commitment to finding a solution, since without a deep emotional involvement you are not likely to find a really fundamental, novel solution.
A long gestation period of intense thinking about the problem may result in a solution, or else the temporary abandonment of the problem. This temporary abandonment is a common feature of many great creative acts. The monomaniacal pursuit often does not work; the temporary dropping of the idea sometimes seems to be essential to let the subconscious find a new approach.
Then comes the moment of “insight,” creativity, or whatever you want to call it—you see the solution. Of course, it often happens that you are wrong; a closer examination of the problem shows the solution is faulty, but might be saved by some suitable revision. But maybe the problem needs to be altered to fit the solution! That has happened! More usually it is back to the drawing board, as they say, more mulling things over.
The false starts and false solutions often sharpen the next approach you try. You now know how not to do it! You have a smaller number of approaches left to explore. You have a better idea of what will not work and possibly why it will not work.
When stuck I often ask myself, “If I had a solution, what would it look like?” This tends to sharpen up the approach, and may reveal new ways of looking at the problem you had subconsciously ignored but you now see should not be excluded. What must the solution involve? Are there conservation laws which must apply? Is there some symmetry? How does each assumption enter into the solution, and is each one really necessary? Have you recognized all the relevant factors?
Out of it all, sometimes, comes the solution. So far as anyone understands the process it arises from the subconscious, it is suddenly there! There is often a lot of further work to be done on the idea, the logical cleaning up, the organizing so others can see it, the public presentation to others, which may require new ways of looking at the problem and your solution, not just your idiosyncratic way which gave you the first solution. This revision of the solution often brings clarity to you in the long run!
We reason mainly by analogy. But it is curious that a valuable analogy need not be close—it need only be suggestive of what to do next. A dream by Kekule about snakes biting their own tails suggested to him, when he awoke, the ring structure of carbon compounds! Many a poor analogy has proved useful in the hands of experts. This implies the analogy you use is only partial, and you need to be able to abandon it when it is pressed too far; analogies are seldom so perfect that every detail in one situation exactly matches those of the other. We find the analogies when something reminds us of something else—is it only a matter of the “hooks” we have in our minds?
Many books are written these days on the topic of creativity; we often talk about it, and we even have whole conferences devoted to it, yet we can say so little! There is much talk about having the right surrounding atmosphere—as if that mattered much! I have seen the creative act done under the most trying circumstances. Indeed, I often suspect, as I will later discuss more fully, that what the individual regards as ideal conditions for creativity is not what is needed, but rather the constant impinging of reality is often a great help.
In the past I have deliberately managed myself in this matter by promising a result by a given date, and then, like a cornered rat, having at the last minute to find something! I have been surprised at how often this simple trick of managing myself has worked for me. Of course it depends on having a great deal of pride and self-confidence. Without self-confidence you are not likely to create great new things. There is a thin line between having enough self-confidence and being overconfident. I suppose the difference is whether you succeed or fail; when you win you are strong-willed, and when you lose you are stubborn!
Back to the topic of whether we can teach creativity or not. From the above you should get the idea that I believe it can be taught. It cannot be done with simple tricks and easy methods; what must be done is you must change yourself to be more creative. As I have thought about it in the past, I realize how often I have tried to change myself so I was more as I wished I were and less as I had been. (Often I did not succeed!) Changing oneself is not easy, as anyone who has gone on a diet to lose weight can testify; but that you can indeed change yourself is also evident from the few who do succeed in dieting, quitting smoking, and other changes in habits. We are, in a very real sense, the sum total of our habits, and nothing more; hence by changing our habits, once we understand which ones we should change and in what directions, and understand our limitations in changing ourselves, then we are on the path along which we want to go.
In planning to change yourself clearly, the old Greek saying applies: “Know thyself.” And do not try heroic reformations which are almost certain to fail. Practice on small ones until you gradually build up your ability to change yourself in the larger things. You must learn to walk before you run in this matter of being creative, but I believe it can be done. Furthermore, if you are to succeed (to the extent you secretly wish to), you must become creative in the face of the rapidly changing technology which will dominate your career. Society will not stand still for you; it will evolve more and more rapidly as technology plays an increasing role at all levels of the organization. My job is to make you one of the leaders in this changing world, not a follower, and I am trying my best to alter you, especially in getting you to take charge of yourself and not to depend on others, such as me, to help. The many small stories I have told you about myself are partly to convince you that you can be creative when your turn comes for guiding our society to its possible future. The stories have also been included to show you some possible models of how to do things.
As remarked in an earlier chapter, as our knowledge grows exponentially we cope with the growth mainly by specialization. It is increasingly true:
An expert is one who knows everything about nothing; a generalist knows nothing about everything.
In an argument between a specialist and a generalist, the expert usually wins by simply (1) using unintelligible
Occasionally, usually because of the contradictions most of the people in the field choose to ignore or simply forget, there will arise a sudden change in the paradigm, and as a result a new pattern of beliefs comes into dominance, along with the ability to ask new kinds of questions and get new kinds of answers to older problems. These changes in the dominant paradigm of a science usually represent the great steps forward. For example, both special relativity and qm represent such changes in the field of physics.
There is another source for continental drift, namely the distribution of forms of life over the aeons of history. The mutually common forms of life found in widely separated places necessitated the creation of “land bridges,” which were supposed to have risen and sunk again—and the number of these, plus their various placements, seemed unbelievable to me as a child, particularly as there were no observations of their traces in the depths of the oceans to justify them. The biologists studying the past, in trying to account for what they saw, had also postulated both a Pangaea and Gondwanaland as successive arrangements of the continents, not apparently caring for the “land bridges” which seemed necessary otherwise, yet the geologists still resisted. The concept of continental drift was accepted by the oceanographers only after wwii, when, by studying the ocean bottom, they found, by magnetic methods, the actual cracks and the spreading of the land on the ocean floor.
Of course geologists now claim they had always sort of believed in it (the textbooks they used to the contrary), and it was only necessary to exhibit the actual mechanism in detail before they would accept the continental drift theory, which is now “the truth.” This is the typical pattern of a change in the paradigm of a field. It is resisted for a shorter or longer time (and I do not know how many theories were permanently lost—how could I?) before being accepted as being right, and those concerned then say they had not actively opposed the change. You have probably heard many past examples, such as the aviation expert saying, just before the Wright brothers flew, that heavier-than-air flying was impossible; the old claim that if you went too fast in an automobile or train you would lose your breath and die; that faster-than-sound flight (supersonic flight) was impossible, etc. The record of the experts saying something is impossible just before it is done is amazing. One of my favorite ones was that you cannot lift water more than 33 feet. But when the patent office rejected a patent which claimed his method could, the man demonstrated it by lifting water to the roof of their building, which was much more than 33 feet. How? He used, Figure 26.1, a method of standing waves which they had not thought about. When a low pressure of the standing wave appeared at the bottom water was admitted into the column, and when a high pressure appeared at the top water exited, due to the valves which were installed. All the patent office experts knew was that the textbooks said it could not be done, and they never looked to see on what basis this was stated.
All impossibility proofs must rest on a number of assumptions which may or may not apply in the particular situation.
Experts, in looking at something new, always bring their expertise with them, as well as their particular way of looking at things. Whatever does not fit into their frame of reference is dismissed, not seen, or forced to fit into their beliefs. Thus really new ideas seldom arise from the experts in the field. You cannot blame them too much, since it is more economical to try the old, successful ways before trying to find new ways of looking and thinking.
All things which are proved to be impossible must obviously rest on some assumptions, and when one or more of these assumptions are not true the impossibility proof fails—but the expert seldom remembers to carefully inspect the assumptions before making their “impossible” statements. There is an old statement which covers this aspect of the expert. It goes as follows:
If an expert says something can be done he is probably correct, but if he says it is impossible then consider getting another opinion.
Kuhn, and the historians of science, have concentrated on the large changes in the paradigms of science; it seems to me that much the same applies to smaller changes. For example, working for Bell Telephone Laboratories it was fairly natural I should meet the frequency approach to numerical analysis, and hence apply it to the numerical methods I used on the various problems I was asked to solve. Using the kinds of functions the clients are familiar with means insight can arise from the solution, details which suggest other things to do than what they had originally thought. I found the frequency approach very useful, but some of my close friends, not at Bell Telephone Laboratories, regularly twitted me about the frequency approach every time they met me for all the years we had been meeting at various places. They simply kept the polynomial approach, though under questioning they could give no real reason for doing so—simply that was the way things had been done, hence that was the right way to do things.
It is not just for the pleasure of poking fun at the experts I bring this up. There are at least four other reasons for doing so.
First, as you go on you will have to deal with experts many times, and you should understand their characteristics.
Second, in time many of you will be experts, and I am hoping to at least modify the behavior of some of you so that you will, in your turn, not be such a block on progress as many experts have been in the past.
Third, it appears to me that the rate of progress, the rate of innovation and change of the dominant paradigm, is increasing, and hence you will have to endure more changes than I did.
Fourth, if only I knew the right things to say to you, then when a paradigm change occurs fewer of you would be left behind in your careers than usually happens to the experts.
Thus the expert faces the following dilemma. Outside the field there are a large number of genuine crackpots with crazy ideas, but among them may also be the crackpot with the new, innovative idea which is going to triumph. What is a rational strategy for the expert to adopt? Most decide they will ignore, as best they can, all crackpots, thus ensuring they will not be part of the new paradigm, if and when it comes.
Those experts who do look for the possible innovative crackpot are likely to spend their lives in the futile pursuit of the elusive, rare crackpot with the right idea, the only idea which really matters in the long run. Obviously the strategy for you to adopt depends on how much you are willing to be merely one of those who served to advance things, vs. the desire to be one of the few who in the long run really matter. I cannot tell you which you should choose; that is your choice. But I do say you should be conscious of making the choice as you pursue your career. Do not just drift along; think of what you want to be and how to get there. Do not automatically reject every crazy idea the moment you hear of it, especially when it comes from outside the official circle of the insiders—it may be the great new approach which will change the paradigm of the field! But also, you cannot afford to pursue every “crackpot” idea you hear about. I have been talking about paradigms of science, but so far as I know the same applies to most fields of human thought, though I have not investigated them closely. And it probably happens for about the same reasons; the insiders are too sure of themselves, have too much invested in the accepted approaches, and are plain mentally lazy. Think of the history of modern technology you know!
I have covered the two main problems of dealing with the experts. They are: (1) the expert is certain they are right, and (2) they do not consider the basis for their beliefs and the extent to which they apply to new situations. I told you about the fft and why it is not the Tukey-Hamming algorithm. That was not the only time I made such a mistake, forgetting there had been a technological change which invalidated my earlier reasoning, as well as the many other cases where I have observed it happen. To my embarrassment I told the story in order to get the point vividly across to you. I made the mistake; how are you going to avoid it when your turn comes? No one ever told me about the problem, while I have told you about it, so maybe you will not be as foolish as I have been at times.
With the rapid increase in the use of technology this type of error is going to occur more often, so far as I can see. The experts live in their closed world of theory, certain they are right, and are intolerant of other opinions. In some respects the expert is the curse of our society, with their assurance they know everything, and without the decent humility to consider they might be wrong. Where the question looms so important, I suggested to you long ago to use in an argument, “What would you accept as evidence you are wrong?” Ask yourself regularly, “Why do I believe whatever I do?” Especially in the areas where you are so sure you know, the area of the paradigms of your field.
The opposition of the expert is often not as direct as indicated above. Consider my experience at Bell Telephone Laboratories during the earliest years of the coming of digital computers. My immediate bosses all had succeeded in the mathematical areas by using analytical methods, and during their heyday computing had been relegated to some high-school graduate girls with desk calculators. The bosses knew the right way to do mathematics. It was useless to argue their basic assumptions with them—they might even have denied they held them—since they, based on their own experiences, knew they were right! They saw, every one of them, the computer as being inferior, beneath the consideration of a real mathematician, and in the final analysis possibly in direct competition with them—this later giving rise to fear and hatred. It was not a discussable topic with them. I had to do computing in spite of all their (usually unstated) opposition, in spite of all the times they said they had done something I could not do with the machines I had available at the time, and in spite of all my polite replies that I was not concerned with direct competition, rather I was solely interested in doing what they could not do, I was concerned with what the team of man and machine could do together. I hesitate to guess the number of times I gave that reply to a not direct but covert attack on computers in the early days. And this in a highly enlightened place like Bell Telephone Laboratories.
The second point I want to make is that many of you, in your turn, will become experts, and I am hoping to modify in you the worst aspects of the know-it-all expert. About all I can do is to beg you to watch and see for yourself how often the above descriptions occur in your career, and hope thereby you will not be the drag on progress the expert so often is. In my own case, I vowed when I rose to near the top that I would be careful, and as a result I have refused to take part in any decision processes involving current choices of computers. I will give my opinion when asked, but I do not want to be the kind of drag on the next generation I had to put up with from the past generation. Modesty? No, pride!
To put the situation in the form of a picture, we draw a line in n-dimensional space to represent, symbolically, the path of progress in time, Figure 26.2, which is drawn, of course, in two dimensions. At the start of the picture, say 1935 and earlier, the direction was as indicated by the tangent arrow, and those who sensed what to do and how to do it (then) were the successful people, and were, therefore, my bosses. Then computers came in, and at the later date the curve is now pointed in another direction, almost perpendicular to the past one. It is asking a lot of them to admit the very methods they earlier used to succeed are not appropriate at present! But it is true, if this picture is at all like reality (remember, it is in n-dimensional space). If my claim that progress has not stopped miraculously at present, but rather there is probably an accelerating rate of progress, is true, then it will be even more true when you are in charge that:
What you did to become successful is likely to be counterproductive when applied at a later date.
A very good friend of mine was a great analog enthusiast and it was from him I learned a lot about analog computers when I acquired the management of the one at Bell Telephone Laboratories. When digital methods came in, he constantly emphasized the advantages, at that time, of the analog computers. Well, he was gradually squeezed out by his own behavior and fell back on other skills he had. But when I retired early to go to teaching, as I had long planned to do (since I felt old research people mainly get in the way of the young), he also retired. But I left with pleasant memories of Bell Telephone Laboratories, and later, in talking with him, I found his memories were not so pleasant!
If you do not keep up in your field, that is almost certainly what will happen to you. While living in California I have met and talked with a number of ex-Navy officers of the rank Captain, and the stories they tell often reveal a degree of distaste in their careers. How could it be otherwise? If you are passed over for an important (to you) promotion in an organization, then it will tend to affect all the relevant memories of a great career and taint them darker. It is this social as well as economic consequence I care about, and why I am preaching this lesson—you must keep up, or else things will overtake you and may spoil the memories of your career.
I have used isolated stories many times in these lectures. They are illustrative of situations, and I know many other stories which would illustrate the same points. I began to formulate many of these “theories” long ago, and as time went on experience illustrated their truth many times over, though some turned out to be false and had to be abandoned. These are not absolute truths. They are summaries of many observations which tend to “prove” the points made. Of course, you can say I looked for confirmations, but being a scientist I tried also to look for falsifications, and in the face of counterevidence had to abandon some theories. When you think over many of the stories, they often have an element of “truth” based more on human traits than anything else. We are all human, but that does not prevent us from trying to modify our instincts, which were evolved over the long span of history. Civilization is merely a thin veneer we have put on top of our anciently derived instincts, but the veneer is what makes it possible for modern society to operate. Being civilized means, among other things, stopping your immediate response to a situation, and thinking whether it is or is not the appropriate thing to do. I am merely trying to make you more self-aware so you will be more “civilized” in your responses, and hence probably, but not certainly, be more successful in attaining the things you want.
In summary, I began by warning you about dealing with experts; but towards the end I am warning you about yourself when in your turn you are the expert. Please do not make the same foolish mistakes I did!
It has been my experience, as well as the experience of many others who have looked, that data is generally much less accurate than it is advertised to be. This is not a trivial point—we depend on initial data for many decisions, as well as for the input data for simulations which result in decisions. Since the errors are of so many kinds, and I have no coherent theory to explain them all, I have therefore to resort to isolated examples and generalities from them.
Let me start with
Life testing is increasingly important and increasingly difficult as we want more and more reliable components for larger and larger entire systems. One basic principle is accelerated life testing, meaning mainly that if I raise the temperature 17°C, then most, but not all, chemical reactions double their rate. There is also the idea that if I increase the working voltage, I will find some of the weaknesses sooner. Finally, for testing some integrated circuits, increasing the frequency of the clock pulses will find some weaknesses sooner. The truth is, all three combined are hardly a firm foundation to work from, but in reply to this criticism the experts say, “What else can we do, given the limitations of time and money?” More and more, the time gap between the scientific creation and the engineering development is so small there is no time to gain real-life testing experience with the new device before it is put into the field for widespread use. If you want to be certain, then you are apt to be obsolete.
Of course, there are other tests for other things besides those mentioned above. So far as I have seen, the basis of life testing is shaky, but there is nothing else available. I had long ago argued at Bell Telephone Laboratories that we should form a life testing department whose job is to prepare for the testing of the next device which is going to be invented, and not just test after the need arises. I got nowhere, though I made a few, fairly weak suggestions about how to start. There was not time in the area of life testing to do basic research—they were under too much pressure to get the needed results tomorrow. As the saying goes,
There is never time to do the job right, but there is always time to fix it later,
especially in computer software!
The question I leave with you is still, “How do you propose to test a device, or a whole piece of equipment, which is to be highly reliable, when all you have is less reliable test equipment, and with very limited time to test, and yet the device is to have a very long lifetime in the field?” That is a problem which will probably haunt you in your future, so you might as well begin to think about it now and watch for clues for rational behavior on your part when your time comes and you are on the receiving end of some life tests. Let me turn now to some simpler aspects of measurements. For example, a friend of mine at Bell Telephone Laboratories, who was a very good statistician, felt some data he was analyzing was not accurate. Arguments with the department head that they should be measured again got exactly nowhere, since the department head was sure his people were reliable, and furthermore the instruments had brass labels on them saying they were that accurate. Well, my friend came in one Monday morning and said he had left his briefcase on the railroad train going home the previous Friday and had lost everything. There was nothing else the department head could do but call for remeasurements, whereupon my friend produced the original records and showed how far off they were! It did not make him popular, but did expose the inaccuracy of the measurements which were going to play a vital role at a later stage.
The same statistician friend was once making a study for an outside company on the patterns of phone calling of their headquarters. The data was being recorded by exactly the same central office equipment which was placing the calls and writing the bills for making the calls. One day he chanced to notice one call was to a nonexistent central office! So he looked more closely, and found a very large percentage of the calls were being connected for some minutes to nonexistent central offices! The data was being recorded by the same machine which was placing the calls, but there was bad data anyway. You cannot even trust a machine to gather data about itself correctly!
My brother, who worked for many years at the Los Angeles air pollution department, once said to me they had found it necessary to take apart, reassemble, and recalibrate every new instrument they bought! Otherwise they would have endless trouble with accuracy, and never mind the claims made by the seller!
I once did a large inventory study for Western Electric. The raw data they supplied was for 18 months of inventory records on something like 100 different items in inventory. I asked the natural question of why I should believe the data was consistent—for example, could not the records show a withdrawal when there was nothing in inventory? They claimed they had thought of that and had in fact gone through the data and added a few pseudo-transactions so such things would not occur. Like a fool I believed them, and only late in the project did I realize there were still residual inconsistencies in the data, and hence I had first to find them, then eliminate them, and then run the data all over again. From that experience I learned never to process any data until I had first examined it carefully for errors. There have been complaints that I would take too long, but almost always I found errors, and when I showed the errors to them they had to admit I was wise in taking the precautions I did. No matter how sacred the data and urgent the answer, I have learned to pretest it for consistency and outliers at a minimum.
This is not unusual. I very recently saw a table of measurements of Hubble’s constant (the slope of the line connecting the red shift with distance), which is fundamental to most of modern cosmology. Most of the values fell outside of the given errors announced for most of the other values.
By direct statistical measurement, therefore, the best physical constants in the tables are not anywhere near as accurate as they claim to be. How can this be? Carelessness and optimism are two major factors. Long meditation also suggests the present experimental techniques you are taught are also at fault and contribute to the errors in the claimed accuracies. Consider how you, in fact as opposed to theory, do an experiment. You assemble the equipment and turn it on, and of course the equipment does not function properly. So you spend some time, often weeks, getting it to run properly. Now you are ready to gather data, but first you fine-tune the equipment. How? By adjusting it so you get consistent runs! In simple words, you adjust for low variance; what else can you do? But it is this low-variance data you turn over to the statistician and is used to estimate the variability. You do not supply the correct data from the correct adjustments—you do not know how to do that—you supply the low-variance data, and you get from the statistician the high reliability you want to claim! That is common laboratory practice! No wonder the data is seldom as accurate as claimed.
I offer you Hamming’s rule:
90% of the time, the next independent measurement will fall outside the previous 90% confidence limits!
This rule is in fact a bit of an exaggeration, but stated that way it is a memorable rule to recall—most published measurement accuracies are not anywhere near as good as claimed. It is based on a lifetime of experience and represents later disappointments with claimed accuracies. I have never applied for a grant to make a properly massive study, but I have little doubt as to the outcome of such a study.
Another curious phenomenon you may meet is that in fitting data to a model, there are errors in both the data and the model. For example, a normal distribution may be assumed, but the tails may in fact be larger or smaller than the model predicts, and possibly no negative values can occur, although the normal distribution allows them. Thus there are two sources of error. As your ability to make more accurate measurements increases, the error due to the model becomes an increasing part of the error.
I recall an experience I had while I was on the board of directors of a computer company. We were going to a new family of computers and had prepared very careful estimates of costs of all aspects of the new models. Then a salesman estimated that if the selling price were so much he could get orders for ten, if another price 15, and another 20 sales. His guesses, and I do not say they were wrong, were combined with the careful engineering data to make the decision on what price to charge for the new model! Much of the reliability of the engineering guesses was transferred to the sum, and the uncertainty of the salesman’s guesses was ignored. That is not uncommon in big organizations. Careful estimates are combined with wild guesses, and the reliability of the whole is taken to be the reliability of the engineering part. You may justly ask why bother with making the accurate engineering estimates when they are to be combined with other inaccurate guesses, but that is widespread practice in many fields!
My favorite example from his book is the official figures on the gold flow from one country to another, as reported by both sides. The figures can differ at times by more than two to one! If they cannot get the gold flow right, what data do you suppose is right? I can see how electrical gear shipped to a third-world country might get labeled as medical gear because of different import duties, but gold is gold, and is not easily called anything else.
Morgenstern points out that at one time DuPont Chemical held about 23% of the General Motors stock. How do you suppose this appeared when the gross national product (gnp) figure was computed? Of course it was counted twice!
As an example I found for myself, there was a time, not too long ago, when the tax rules for reporting inventory holdings were changed, and as a result many companies changed their methods of inventory reporting to take advantage of the new reporting rules, meaning they now could show smaller inventory and hence get less tax. I watched in vain in the Wall Street Journal to see if this point was ever mentioned. No, it never was that I saw! Yet the inventory holdings are one of the main indices which are used to estimate the expectations of the manufacturers, whether we are headed up or down in the economy. The argument goes that when manufacturers think sales will go down they decrease inventory, and when they expect sales to go up they increase inventory so they will not miss some sales. That the legal rules had changed for reporting inventory and was part of what was behind the measurements was never mentioned, so far as I could see.
This is a problem in all time series. The definition of what is being measured is constantly changing. For perhaps the best example, consider poverty. We are constantly upgrading the level of poverty, hence it is a losing game trying to remove it—they will simply change the definition until there are enough people below the poverty level to continue the projects they manage! What is now called “poverty” is in many respects better than what the kings of England had not too long ago!
In a Navy a yeoman is not the same yeoman over the years, and a ship is not a ship, etc., hence any time series you study to find the trends of the Navy will have this extra factor to confound you in your interpretations. Not that you should not try to understand the situation using past data (and while doing it apply some sophisticated signal processing, Chapters 14–17), but there are still troubles awaiting you due to changing definitions which may never have been spelled out in any official records! Definitions have a habit of changing over time without any formal statement of this fact.
The forms of the various economic indices you see published regularly, including unemployment (which does not distinguish between the unemployed and the unemployable, but in my opinion should), were made up, usually, long ago. Our society has in recent years changed rapidly from a manufacturing to a service society, but neither Washington, DC, nor the economic indicators have realized this to any reasonable extent. Their reluctance to change the definitions of the economic indicators is based on the claim that a change, as indicated in the above paragraph, makes the past non-comparable to the present—better to have an irrelevant indicator than an inconsistent one, so they claim. Most of our institutions (and people) are slow to react to changes such as the shift to service from manufacturing, and even slower to ask themselves how what they were doing yesterday should be altered to fit tomorrow. Institutions and people prefer to go along smoothly, and hence lag far behind, than to make the effort to be reasonably abreast of the times. Institutions, like people, tend to move only when forced to.
If you add to the above the simple fact that most economic data is gathered for other purposes and is only incidentally available for the economic study made, and there are often strong reasons for falsifying the initial data which is reported, then you see why economic data is bad.
What can the government economists use for their basic data other than much of this inaccurate, systematically biased data? Yes, they may to a lesser or greater extent be aware of the biases, but they have no way of knowing how much the data is in error. So it should not surprise you that many economic predictions are seriously wrong. There is little else they can do, hence you should not put too much faith in their predictions.
In my experience, most economists are simply unwilling to discuss the basic inaccuracy in the economic data they use, and hence I have little faith in them as scientists. But who said economic science is a science? Only the economists!
If scientific and engineering data are not at all as accurate as they are said to be, by factors of five or more at times, and economic data can be worse, how do you suppose social science data fares? I have no comparable study of the whole field, but my little, limited experience does suggest it is not very good. Again, there may be nothing better available, but that does not mean what data is available is safe to use.
It should be clear I have given a good deal of attention to this matter of the accuracy of data during most of my career. Due to the attitudes of the experts I do not expect anything more than a slow improvement in the long future.
If the data is usually bad, and you find that you have to gather some data, what can you do to do a better job? First, recognize what I have repeatedly said to you: the human animal was not designed to be reliable; it cannot count accurately, it can do little or nothing repetitive with great accuracy. As an example, consider the game of bowling. All the bowler needs to do is throw the ball down the lane reliably every time. How seldom does the greatest expert roll a perfect game! Drill teams, precision flying, and such things are admired as they require the utmost in careful training and execution, and when examined closely leave a lot to be improved.
Second, you cannot gather a really large amount of data accurately. It is a known fact which is constantly ignored. It is always a matter of limited resources and limited time. The management will usually want a 100% survey when a small one consisting of a good deal less, say 1% or even 0.1%, will yield more accurate results! It is known, I say, but ignored. The telephone companies, in order to distribute the income to the various companies involved in a single long-distance phone call, used to take a very small, carefully selected sample, and on the basis of this sample they distributed the money among the partners. The same is now done by the airlines. It took them a long while before they listened, but they finally came to realize the truth that small samples carefully taken are better than large samples poorly done—better both in lower cost and in greater accuracy.
Third, much social data is obtained via questionnaires. But it is a well-documented fact that the way the questions are phrased, the way they are ordered in sequence, the people who ask them or come along and wait for them to be filled out, all have serious effects on the answers. Of course, in a simple black-and-white situation this does not apply, but when you make a survey the situation is generally murky, or else you would not have to make it. I regret that I did not keep a survey by the American Mathematical Society it once made of its members. I was so indignant at the questions, which were framed to get exactly the answers they wanted, that I sent it back with that accusation. How few mathematicians faced with questions, carefully led up to in each case, such as, “Is there enough financial support for mathematics, enough for publications, enough for graduate scholarships, etc.?,” would say there was more than enough money available? The Mathematical Society of course used the results to claim there was a need for more support for mathematics in all directions.
I recently filled out a long, important questionnaire (important in the consequent management actions which might follow). I filled it out as honestly as I could, but realized I was not a typical respondent. Further thought suggested the class of people being surveyed was not homogeneous at all, but rather was a collection of quite different subclasses, and hence any computed averages will apply to no group. It is much like the famous remark that the average American family has two and a fraction children, but of course no family has a fractional child! Averages are meaningful for homogeneous groups (homogeneous with respect to the actions that may later be taken), but for diverse groups averages are often meaningless. As earlier remarked, the average adult has one breast and one testicle, but that does not represent the average person in our society.
If the range of responses is highly skewed, we have recently admitted publicly that the median is often preferable to the average (mean) as an indicator. Thus they often now publish the median income and median price of houses, and not the average amounts.
Fourth, there is another aspect I urge you to pay attention to. I have said repeatedly that the presence of a high-ranking officer of an organization will change what is happening in the organization at that place and at that time, so while you are still low enough to have a chance, please observe for yourself how questionnaires are filled in. I had a clear demonstration of this effect when I was on the board of directors of a computer company. I saw underlings did what they thought would please me, but in fact angered me a good deal, though I could say nothing to them about it. Those under you will often do what they think you want, and often it is not at all what you want! I suggest, among other things, you will find that when headquarters in your organization sends out a questionnaire, those who think they will rate highly will more often than not promptly fill them out, and those who do not feel so will tend to delay, until there is a deadline, and then some low-level person will fill them out from hunches, without taking the measurements which were to be taken—it is too late to do it right, so send in what you can! What these “made-up” reports do to the reliability of the whole is anyone’s guess. It may make the results too high, too low, or even not change the results much. But it is from such surveys the top management must make their decisions—and if the data is bad, it is likely the decisions will be bad.
A favorite pastime of mine, when I read or hear about some data, is to ask myself how people could have gathered it—how their conclusions could be justified. For example, years ago when I was remarking on this point at a dinner party, a lovely widow said she could not see why data could not be gathered on any topic. After some moments of thought I replied, “How would you measure the amount of adultery per year on the Monterey Peninsula?” Well, how would you? Would you trust a questionnaire? Would you try to follow people? It seems difficult, and perhaps impossible, to make any reasonably accurate estimate of the amount of adultery per year. There are many other things like this which seem to be very hard to measure, and this is especially true in social relationships.
There is a clever proposed method whose effectiveness I do not know in practice. Suppose you want to measure the amount of murder which escapes detection. You interview people and tell them to toss a coin without anyone but themselves seeing the outcome, and then if it is heads they should claim they have committed a murder, while if it is tails they should tell the truth. In the arrangement there is no way anyone except themselves can know the outcome of the toss, hence no way they can be accused of murder if they say so. From a large sample the slight excess of murders above one-half gives the measure you want. But that supposes the people asked, and given protection, will in fact respond accurately. Variations on this method have been discussed widely, but a serious study to find the effectiveness is still missing, so far as I know.
In closing, you may have heard of the famous election where the newspapers announced the victory for President to one man when in fact the other won by a landslide. There is also the famous Literary Digest poll which was conducted via the telephone and was amazingly wrong—so far wrong that the Literary Digest folded soon after, some people say because of this faulty poll. It has been claimed that at that time the ownership of a telephone was correlated with wealth, and wealth with a political party, hence the error.
Surveys are not a job for an amateur to design, administer, and evaluate. You need expert advice on questionnaires (not just a run-of-the-mill statistician) when you get involved with questionnaires, but there seems to be little hope questionnaires can be avoided. More and more we want not mere facts about hard material things, but we want social and other attitudes surveyed—and this is indeed very treacherous ground.
In summary, as you rise in your organization you will need more and more of this kind of information than was needed in the past, since we are becoming more socially oriented and subject to lawsuits for trivial things. You will be forced, again and again, to make surveys of personal attitudes of people, and it is for these reasons I have spent so much time on the topic of unreliable data. You need reliable data to make reliable decisions, but you will seldom have it with any reliability!
Parables are often more effective than is a straight statement, so let me begin with a
If, on the average campus, you asked a sample of professors what they were going to do in the next class hour, you would hear they were going to “teach partial fractions,” “show how to find the moments of a normal distribution,” “explain Young’s modulus and how to measure it,” etc. I doubt you would often hear a professor say, “I am going to educate the students and prepare them for their future careers.”
You may claim in both cases the larger aim was so well understood there was no need to mention it, but I doubt you really believe it. Most of the time each person is immersed in the details of one special part of the whole and does not think of how what they are doing relates to the larger picture. It is characteristic of most people to keep a myopic view of their work and seldom, if ever, connect it with the larger aims they will admit, when pressed hard, are the true goals of the system. This myopic view is the chief characteristic of a bureaucrat. To rise to the top you should have the larger view—at least when you get there.
Systems engineering is the attempt to keep at all times the larger goals in mind and to translate local actions into global results. But there is no single larger picture. For example, when I first had a computer under my complete control, I thought the goal was to get the maximum number of arithmetic operations done by the machine each day. It took only a little while before I grasped the idea that it was the amount of important computing, not the raw volume, that mattered. Later I realized it was not the computing for the mathematics department, where I was located, but the computing for the research division which was important. Indeed, I soon realized that to get the most value out of the new machines it would be necessary to get the scientists themselves to use the machine directly so they would come to understand the possibilities computers offered for their work and thus produce less actual number crunching, but presumably more of the computing done would be valuable to Bell Telephone Laboratories. Still later I saw I should pay attention to all the needs of the Laboratories, and not just the research department. Then there was at&t, and outside at&t the country, the scientific and engineering communities, and indeed the whole world to be considered. Thus I had obligations to myself, to the department, to the division, to the company, to the parent company, to the country, to the world of scientists and engineers, and to everyone. There was no sharp boundary I could draw and simply ignore everything outside.
The obligations in each case were of (1) immediate importance, (2) longer-range importance, and (3) very long-term importance. I also realized that under (2) and (3) one of my functions in the research department was not so much to solve the existing problems as to develop the methods for solving problems, to expand the range of what could be done, and to educate others in what I had found so they could continue, extend, and improve my earlier efforts.
In systems engineering it is easy to say the right words, and many people have learned to say them when asked about systems engineering, but as in many sports such as tennis, golf, and swimming, it is hard to do the necessary things as a whole. Hence systems engineers are to be judged not by what they say but by what they produce. There are many people who can talk a good game but are not able to play one.
The first rule of systems engineering is:
If you optimize the components, you will probably ruin the system performance.
This is a very difficult point to get across. It seems so reasonable: if you make an isolated component better, then the whole system will be better—but this is not true; rather, the system performance will probably degrade! As a simple example, I was running a differential analyzer and was so successful in solving important problems that there was need for both a bigger one and a second one. Therefore we ordered a second one, which was to be connected with the first so the two could be either operated separately or together. They built a second model and wanted to make improvements, which I agreed to only if it would not interfere with the operation of the whole machine. Came the day of acceptance on the shop floor before dismantling and moving it to our location. I started to test it with the aid of a reluctant friend who claimed I was wasting time. The first test, and it failed miserably! The test was the classic one of solving the differential equation
whose solution is, of course, y = cos t. You then plot y(t) against y'(t) and you should get a circle. How well it closes on itself, loop after loop, is a measure of the accuracy.
So we tried the test with other components, and got the same result. My friend had to admit there was something seriously wrong, so we called in the people who constructed it and pointed out the flaw—which was so simple to exhibit they had to admit there was something wrong. They tinkered and tinkered while we watched, and finally my friend and I went to lunch together. When we came back they had located the trouble. They had indeed improved the amplifiers a great deal, but now currents through the inadequate grounding was causing back-circuit leakage! They had merely to put in a much heavier copper grounding and all was well. As I said, the improvement of a component in such a machine, even where each component is apparently self-standing, still ruined the system performance! It is a trivial example, but it illustrates the point of the rule. Usually the effect of the component improvement is less dramatic and clear-cut, but equally detrimental to the performance of the whole system.
You probably still do not believe the statement, so let me apply this rule to you. Most of you try to pass your individual courses by cramming at the end of the term, which is to a great extent counter-productive, as you well know, to the total education you need. You look at your problem as passing the courses one at a time, or a term at a time, but you know in your hearts that what matters is what you emerge with at the end, and what happens at each stage is not as important. During my last two undergraduate college years when I was at the University of Chicago, the rule was that at the end you had to pass a single exam based on nine courses in your major field, and another exam based on six in your minor field, and these were mainly what mattered, not what grades you got along the way. I, for the first time, came to understand what the system approach to education means. While taking any one course, it was not a matter of passing it, pleasing the professor, or anything like that; it was learning it so that at a later date, maybe two years later, I would still know the things which should be in the course.
Cramming is clearly a waste of time. You really know it is, but the behavior of most of you is a flat denial of this truth. So, as I said above, words mean little in judging a systems engineering job; it is what is produced that matters. The professors believe, as do those who are paying the bill for your education, and probably some of you also, that what is being taught will probably be very useful in your later careers, but you continue to optimize the components of the system to the detriment of the whole! Systems engineering is a hard trade to follow; it is so easy to get lost in the details! Easy to say; hard to do. This example should show you the reality of my remark that many people know the words but few can actually put them into practice when the time comes for action in the real world. Most of you cannot!
As another example of the effects of optimizing the components of a system, consider the teaching of the lower-level mathematics courses in college. Over the years we have optimized both the calculus course and linear algebra, and we have stripped out anything not immediately relevant to each course. As a result the teaching of mathematics, viewed as a whole, has large gaps.
And so it goes, large parts of any reasonable mathematical education being omitted in the urge to optimize the individual courses. Usually the inner structure of the calculus and the central role of the limit is glossed over as not essential.
All the proposed reformations of the standard calculus course that I have examined, and there are many, never begin by asking, “What is the total mathematical education, and what, therefore, should be in the calculus course?” They merely try to include computers, or some such idea, without examining the system of total mathematical education which the course should be a part of. The systems approach to education is not flourishing, rather the enthusiasts of various aspects try to mold things to fit their local enthusiasms. The question, as in so many situations, “What is the total problem in which this part is to fit?,” is simply regarded as too big, and hence the suboptimization of the courses goes on. Few people who set out to reform any system try first to find out the total system problem, but rather attack the first symptom they see. And, of course, what emerges is whatever it is, and is not what is needed.
I recently tried to think about the history of systems engineering—and just because a system is built, it does not follow that the builder had the system rather than the components in mind. The earliest system I recall reading about in its details is the Venetian arsenal in its heyday around 1200–1400. They had a production line, and as a new ship came down the line, the ropes, masts, sails, and finally the trained crew were right there when needed and the ship sailed away! At regular intervals another ship came out of the arsenal. It was an early “just in time” production line which included the people properly trained, as well the equipment built.
The early railroads were surely systems, but it is not clear to me the first builders did not try to get each part optimized and really did not think, until after the whole was going, that there was a system to consider—how the parts would intermesh to attain a decent operating system.
I suspect it was the telephone company which first had to really face the problems of systems engineering. If decent service was to be supplied, then all the parts had to interconnect and work at a very high reliability per part. From the first the company provided a service, not just equipment. That is a big difference. If you merely construct something and leave it to others to keep it running, it is one thing; if you are also going to operate it as a service, then it is another thing entirely! Others had clearly faced small systems as a whole, but the telephone system was larger and more complex than anything up to that point. They also found, perhaps for the first time, that in expanding there is not an economy of scale but a diseconomy; each new customer must be connected with all the previous customers, and each new one is therefore a larger expense, hence the system must be very shrewdly designed.
That brings up another point, which is now well recognized in software for computers but which applies to hardware too. Things change so fast that part of the system design problem is that the system will be constantly upgraded in ways you do not now know in any detail! Flexibility must be part of the modern design of things and processes. Flexibility built into the design means not only will you be better able to handle the changes which will come after installation, but it also contributes to your own work as the small changes which inevitably arise both in the later stages of design and in the field installation of the system. I had not realized how numerous these field changes were until the early Nike field test at Kwajalein Island. We were installing it, and still there was a constant stream of field changes going out to them!
Thus rule two:
Part of systems engineering design is to prepare for changes so they can be gracefully made and still not degrade the other parts.
Rule three:
The closer you meet specifications, the worse the performance will be when overloaded.
The truth of this is obvious when building a bridge to carry a certain load; the slicker the design to meet the prescribed load, the sooner the collapse of the bridge when the load is exceeded. One sees this also in a telephone central office; when you design the system to carry the maximum load, then with a slight overload of traffic performance degrades immediately. Hence good design generally includes the graceful decay of performance when the specifications are exceeded.
The list shows clearly his breadth of vision, which arose from many years on both military projects and telephone systems problems.
He believes more in the group which attacks systems engineering problems than in the individual problems attacked, whereas I, from my limited experience in computing, where I had no one nearby to talk to about the proper use of computers, had to do it single-handed. Of course his problems were far more difficult than mine.
He believes specialists brought together to make a team are the basis of systems engineering, and between jobs they must go back to their specialties to maintain their expertise. Using the group too often to fight fires is detrimental in the long run, since then the individuals do not keep their skills honed up in their areas.
We both agree a systems engineering job is never done. One reason is that the presence of the solution changes the environment and produces new problems to be met. For example, while running the computing center in the early days I came to the belief that small problems were relatively more important than large ones; regular, dependable service was a desirable thing. So I instituted a one-hour period in each morning and each afternoon during which only three-minute (or less) problems were to be run (mainly program testing), and if you ran over five minutes you got off the machines no matter how much you had claimed you were practically finished. Well, people with ten-minute problems broke them up into three small pieces, with different people for each piece, and ran them under the rules—thus increasing the load in the input/output facilities. My solution’s very presence altered the system’s response. The optimal strategy for the individual was clearly opposed to the optimal strategy for the whole of the laboratories, and it is one of the functions of the systems engineer to block most of the local optimization of the individuals of the system and reach for the global optimization for the system.
A second reason the systems engineer’s design is never completed is the solution offered to the original problem usually produces both deeper insight and dissatisfactions in the engineers themselves. Furthermore, while the design phase continually goes from proposed solution to evaluation and back again and again, there comes a time when this process of redefinement must stop and the real problem be coped with—thus giving what they realize is, in the long run, a suboptimal solution.
Westerman believes, as I do, that while the client has some knowledge of his symptoms, he may not understand the real causes of them, and it is foolish to try to cure the symptoms only. Thus while the systems engineers must listen to the client, they should also try to extract from the client a deeper understanding of the phenomena. Therefore, part of the job of a systems engineer is to define, in a deeper sense, what the problem is and to pass from the symptoms to the causes.
Just as there is no definite system within which the solution is to be found, and the boundaries of the problem are elastic and tend to expand with each round of solution, so too there is often no final solution, yet each cycle of input and solution is worth the effort. A solution which does not prepare for the next round with some increased insight is hardly a solution at all.
I suppose the heart of systems engineering is the acceptance that there is neither a definite fixed problem nor a final solution, rather evolution is the natural state of affairs. This is, of course, not what you learn in school, where you are given definite problems which have definite solutions.
How, then, can the schools adapt to this situation and teach systems engineering, which, because of the elaboration of our society, becomes ever more important? The idea of a laboratory approach to systems engineering is attractive until you examine the consequences. The systems engineering described above depends heavily on the standard school teaching of definite techniques for solving definite problems. The new element is the formulation of a definite problem from the background of indefiniteness which is the basis of our society. We cannot elide the traditional training, and the schools have not the time nor the resources, except in unusual cases, to take on the new topic, systems engineering. I suppose the best that can be done is regular references to how the classroom solutions we teach differ from the reality of systems engineering.
If you will look back on these chapters you will find a great deal of just this—the stories were often about systems engineering situations which were greatly simplified. I suppose I am a dedicated systems engineer and it is inevitable I will always lean in that direction. But let me say again, systems engineering must be built on a solid ground of classical simplification to definite problems with definite solutions. I doubt it can be taught ab initio.
Let me close with the observation that I have seen many, many solutions offered which solved the wrong problem correctly. In a sense systems engineering is trying to solve the right problem, perhaps a little wrongly, but with the realization that the solution is only temporary and later on, during the next round of design, these accepted faults can be caught, provided insight has been obtained. I said it before, but let me say it again: a solution which does not provide greater insight than you had when you began is a poor solution indeed, but it may be all that you can do given the time constraints of the situation. The deeper, long-term understanding of the nature of the problem must be the goal of the systems engineer, whereas the client always wants prompt relief from the symptoms of his current problem. Again, a conflict leading to a meta-systems engineering approach!
Systems engineering is indeed a fascinating profession, but one which is hard to practice. There is a great need for real systems engineers, as well as perhaps a greater need to get rid of those who merely talk a good story but cannot play the game effectively.
You may think the title means that if you measure accurately you will get an accurate measurement, and if not then not, but it refers to a much more subtle thing—the way you choose to measure things controls to a large extent what happens. I repeat the story
The current popular example of this effect is the use of the bottom line of the profit and loss statement every quarter to estimate how well a company is doing, which produces a company interested mainly in short-term profits and which has little regard to long-term profits.
If in a rating system everyone starts out at 95%, then there is clearly little a person can do to raise their rating, but much which will lower the rating; hence the obvious strategy of the personnel is to play things safe, and thus eventually rise to the top. At the higher levels, much as you might want to promote for risk taking, the class of people from whom you may select them is mainly conservative!
The rating system in its earlier stages may tend to remove exactly those you want at a later stage.
Were you to start with a rating system in which the average person rates around 50%, then it would be more balanced; and if you wanted to emphasize risk taking, then you might start at the initial rating of 20% or less, thus encouraging people to try to increase their ratings by taking chances, since there would be so little to lose if they failed and so much to gain if they succeeded. For risk taking in an organization you must encourage a reasonable degree of risk taking at the early stages, together with promotion, so finally some risk-takers can emerge at the top.
Of the things you can choose to measure some are hard, firm measurements, such as height and weight, while some are soft, such as social attitudes. There is always a tendency to grab the hard, firm measurement, though it may be quite irrelevant as compared to the soft one, which in the long run may be much more relevant to your goals. Accuracy of measurement tends to get confused with relevance of measurement, much more than most people believe. That a measurement is accurate, reproducible, and easy to make does not mean it should be done; instead, a much poorer one which is more closely related to your goals may be much preferable. For example, in school it is easy to measure training and hard to measure education, and hence you tend to see on final exams an emphasis on the training part and a great neglect of the education part.
In giving a final exam in a course, say in the calculus, I can get almost any distribution of grades I want. If I could make up an exam which was uniformly hard, then each student would tend to either get all the answers right or all wrong. Hence I will get a distribution of grades which peaks up at both ends, Figure 29.1. If, on the contrary, I asked a few easy questions, many moderately hard, and a few very hard ones, I would get the typical normal distribution; a few at each end and most of the grades in the middle, Figure 29.2. It should be obvious that if I know the class, then I can get almost any distribution I want. Usually at the final exam time I am most worried about the pass-fail dividing point, and design the exam so I will have little doubt as to how to act, as well as have the hard evidence in case of a complaint.
If you regard giving grades in a course as a communication channel, then, as just noted, the equally frequent use of all the grades will communicate the maximum amount of information—whilst the typical use in graduate schools of mainly the two highest grades, A and B, greatly reduces the amount of information sent. I understand the Naval Academy uses rank in class, and in some sense this is the only defense against “grade inflation” and the failure to use the whole dynamic range of the scale uniformly, thus communicating the maximum amount of information, given a fixed alphabet for grades. The main fault with using rank as the grade is that by chance there may be all very good people in a particular class, but someone among them will have to be at the bottom!
There is also the matter of how you initially attract people to the field. It is easy to see in psychology that the people who enter the field are mixed up in their heads more than the average professor and average student in a college—it is not so much the courses that do this, though I suspect they help to mix the student up further, but the initial selection does it. Similarly, the hard and soft sciences have their attractions and repulsions based on initially perceived features of the fields, and not necessarily on the actual features of the field. Thus people tend to go into the fields which will favor their peculiarities, as they sense them, and then once in the field these features are often further strengthened. Result: poorly balanced, but highly specialized, people—which may often be necessary to succeed in the present situation.
In mathematics, and in computer science, a similar effect of initial selection happens. In the earlier stages of mathematics up through the calculus, as well as in computer science, grades are closely related to the ability to carry out a lot of details with high reliability. But later, especially in mathematics, the qualities needed to succeed change and it becomes more proving theorems, patterns of reasoning, and the ability to conjecture new results, new theorems, and new definitions which matter. Still later it is the ability to see the whole of a field as a whole, and not as a lot of fragments. But the grading process has earlier, to a great extent, removed many of those you might want, and indeed are needed at the later stage! It is very similar in computer science, where the ability to cope with the mass of programming details favors one kind of mind, one which is often negatively correlated with seeing the bigger picture.
There is also the vicious feature of promotion in most systems. At the higher levels the current members choose the next generation—and they tend strongly to select people like themselves, people with whom they will feel comfortable. The board of directors of a company has a strong control over the officers and next board members who are put up for election (the results of which are often more or less automatic). You tend to get inbreeding—but you also tend to get an organization personality. Hence the all too common method of promotion by self-selection at the higher levels of an organization has both good and bad features. This is still on the topic you get what you measure, as there is a definite matter of evaluation, and the criteria used, though unconscious, are still there.
In the distant past, to combat this inbreeding most mathematics departments (a topic I am more familiar with than for other departments) had a general rule that they did not employ their own graduates. The rule is not now widely applied so far as I can see—quite the contrary, there seems to be a tendency to hire their own graduates over outsiders. There have been several occasions when economics departments were so inbred the top management of the university had to step in and do the hiring over the professor’s dead bodies, as it were, in order to gain a reasonable balance in the university of differing opinions. The same has happened in psychology departments, law, and no doubt in other departments.
As just mentioned, a rating system which allows those who are “in” to select the next generation has both good and bad features, and needs to be watched closely for too much inbreeding. Some inbreeding means a common point of view and more harmonious operation from day to day, but also it will probably not have great innovations in the future. I suspect in the future, where I believe change will be the normal state of things, this will become a more serious matter than it has been in the past—and it has definitely been a problem in the past!
I trust you realize I am not trying to be too censorious about things, rather I am trying to illustrate the topic of this chapter—you get what you measure. This is seldom thought about by people setting up a rating, measuring, or other schemes of recording things, and yet in the long run it has enormous effects on the entire system—usually in directions which they never thought about at all!
Although measuring is clearly bad when done poorly, there is no escape from making measurements, rating things, people, etc. Only one person can be the head of an organization at one time, and in the selection there has to be a reduction to a simple scale of rating so a comparison can be made. Never mind that humans are at least as complex as vectors, and probably even more complex than matrices or tensors of numbers; the complex human, plus the effect of the environment they operate in, must somehow be reduced to a simple measure which makes an ordered array of choices. This may be done internally in the mind, without the benefit of conscious thinking, but it must be done whether you believe in rating people or not—there is no escape in any society in which there are differences in rank, power to manage, or whatever other feature you wish. Even on a program of entertainment, there has to be a first and a last performer—all cannot be equally placed. You may hate to rate people, as I do, but it must be done regularly in our society, and in any society which is not exactly equal at all points this must happen very often. You may as well realize this and learn to do the job more effectively than most people do—they simply make a choice and go on, rather than giving the whole process a good deal of careful thought, as well as watching others doing it and learning from them.
By now you see, I hope, how the various scales of measurement affect what happens. They are fundamental, yet they normally receive very little attention. To strengthen what I have been saying, I will simply tell you more examples of how the measurement scale affects the system.
Earthquakes are almost always measured in the Richter scale, which effectively uses the log of the estimated amount of energy in the earthquake. I am not saying this is the wrong measuring scale, but its effect is that you have a few really large earthquakes, 7 and 8, and lots of small ones, 1 and 2. Think about it. I do not know the distribution on Mother Nature’s scale, but I doubt she uses the Richter scale. Linear transformations, as from feet to meters, are not serious, but nonlinear scale transformations are another matter. Most of the time we measure stimuli applied to humans on a log scale, but for weight and height we use linear scales. Linear ones allow additivity easily, but for nonlinear scales you do not have this. For example, in measuring the size of a herd you are apt to count the number of animals in the herd. Thus you have additivity—adding two herds together gives the proper amount of the combined herd. If you have a herd of 3 and add 3 that is one thing, but if you have a herd of 1,000 and add 3 it is quite another thing—hence the additivity of the number in the herd is not always the proper measure to use. In this case the percentage change might be more informative.
How, then, do you decide which scale to use in measuring things? I have no easy answer. Indeed, I have the awful observation that while one scale of measurement is suitable for one kind of conclusion in a field, another scale of measurement may be more appropriate for some other kind of decision in exactly the same field! But how seldom is this recognized and used! Of course, you may observe that sometimes we quietly make a transformation when we apply a given formula, but which scale of measurement to use is a difficult thing to decide in any particular case. Much depends on the field and the existing theories, as well as the new theories you hope to find! All of which is not much help to you in any particular situation.
There is another matter I mentioned in an earlier chapter and must now come back to. It is the rapidity with which the people respond to changes in a rating system. I told you how there was a constant battle between me and the users of the computer, me trying to optimize the performance for the system as a whole and they trying to optimize their own use. Any change in the rating system you think will improve the system performance as a whole is apt to not work out well unless you have thought through the responses of the individuals to the change—they will certainly change their behavior. You have only to think of your own optimization of your careers, of how changes in the rating system in the past have altered some of your plans and strategies.
Some systems of measurement clearly have bad features, but tradition, and other niceties, keep them going. An example is the state of readiness of a branch of the military. In the Navy ships are inspected on a regular routine, one feature after another, and the skipper gets the ship and crew ready for each one, pretty much neglecting the others until they come up. The skipper scores high, to be sure. But when we face simulating war games, what is the true readiness of the fleet? Surely not what the reports say—as you can easily imagine. But what do we have to use? Of course we must use the reported figures—we would not be believed if we used other data! So we train people in war games to use an idealized fleet and not the real one! It is the same in business games; we train the executives to win in the simulated game, and not in the real world. I leave it to you to think about what you will do when you are in charge and want to know the true readiness of your organization. Will random inspections solve everything? No! But they would improve things a bit.
All organizations have this problem. You are now at the lower levels in your organization, and you can see for yourselves how things are reported and how the reports differ from reality—it will still be the same unless you, when you are in charge, change things drastically. The Air Force uses what are supposed to be random inspections, but as a retired Navy captain friend of mine once observed to me, every base commander has a radar and knows what is in the air, and if he is surprised by an inspection team then he must be a fool. But he has less time to prepare than for scheduled inspections, so presumably the inspection reports are closer to reality than when inspections only occur at times known far in advance. Yes, inspections are measurements, and you get what you measure. It is often only a little different in other organizations—the news of a coming measurement (inspection) gets out on the grapevine of gossip, and the receiver, while pretending to be surprised, has often prepared that very morning for it.
Another thing which is obvious, but seems necessary to mention: the popularity of a form of measurement has little relationship to its accuracy or relevance to the organization.
Still another thing to mention is that all up and down the organization each person is bending things so they themselves will look good—so they think! About the only thing which saves top management is that the various lower levels can each only bend things a bit, and often the various levels have different goals, and hence the many bendings of the truth tend to partially annul each other due to the weak law of large numbers. If the whole organization is working together to fool the top, there is little the top can do about it. When I was on a board of directors I was so conscious of this I frequently came either a day early or else stayed a day late and simply wandered around asking questions, looking, and asking myself if things were as reported. For example, once when inventory was very high due to the change in the line of computers we were producing, which forced us to have parts of both lines on hand at the same time, I walked along, suddenly turned towards the supply crib, and simply walked in. I then eyed things to decide if, in my own mind, there was any great discrepancy or the reported amounts were reasonably accurate.
Again, were the computing machines we were supposed to be shipping actually on the loading dock, or were they mythical—as has happened in many a company? Nosing around I found at the end of each quarter the machines to be shipped were really shipped, but often by the process of scavenging the later machines on the production line, and hence the next few weeks were spent in getting the scavenged machines back to proper state. I never could stop that bad habit of the employees, though I was on the board of directors! If you will but look around in your organization you will find lots of strange things which really should not happen, but are regarded as customary practice by the personnel.
Another strange thing that happens is that what at one level is regarded as one thing is differently regarded at a higher level. For example, it often happens that the evaluations of capability of the organization at one level are interpreted as probabilities at a higher level! Why does this happen? Simply because the lower level cannot deliver what the higher one wants, and hence delivers what it can do, and the higher level willfully, because it wants its numbers, chooses to alter the meaning of the reports.
I have already discussed the matter of life tests—what can be done and what is needed are not the same at all! At the moment we do not know how to deliver what is needed: reliability for years of operation at a high level of confidence for parts which were first delivered to us yesterday. That problem will not go away, but a lot can be done to design into things the needed reliability. One of my first problems at Bell Telephone Laboratories was the design of a series of concentric rings of copper and ceramic such that for the choice of the radii, as temperatures changed, the ceramic would always be in compression and never in tension, where it has little strength. The design has a degree of reliability built into it! Too little has been done in this direction in my opinion, but as I remarked before, when they said there was no time to do it, “There is never time to do the job right, but there is always time to fix things later.”
There are rating systems that have built into them a degree of human judgment—and that sounds good. But let me tell you a story which made a big impression on me. I had produced a computing machine method of evaluating the phase shifts from the measured gains at various frequencies in a signal which replaced a human, hand method. I am not claiming it was better, only that the hand method could not do the new job when we passed from voice to tv bandwidths. A smart man said to me one day, “Before, when humans did things, we could not make further improvements because of the random human variations; now that you have removed the random element we can hope to learn things which were not apparent before.” Methods of rating that do not have human judgment have some advantages—but do not conclude I am against putting in an element of human judgment. Most formal methods are necessarily finite, and the complexity of reality is almost infinite, hence human judgment, wisely applied, is often a good thing—though, as just noted, in a way it stands in the path of further progress with its subjective aspects.
From all of this please do not conclude that measurement cannot be done—it clearly can—but the question of the relevance and effects of a form of measurement should be thought through as best you can before you go ahead with some new measurement in your organization. The inevitable changes that will come in the future, and the increasing power of computers to automatically monitor things, means many new measuring systems will come into use—ones you yourself may have to design, organize, and install. So let me tell you yet another story of the effect of measurement.
In computing, the programming effort is often measured by the number of lines of code—what easier measure is there? From the coder’s point of view there is absolutely no reason to try to clean up a piece of code; quite the contrary, to get a higher rating on the productivity scale there is every reason to leave the excess instructions in there—indeed, include a few “bells and whistles” if possible. That measure of software productivity, which is widely used, is one of the reasons why we have such bloated software systems these days. It is a counter-incentive to the production of the clean, compact, reliable coding we all want. Again, the measure used influences the result in ways which are detrimental to the whole system! It also establishes habits which at a later time are hard to remove.
When your turn comes to install a measuring system, or even comment on one someone else is using, try to think your way through to all the hidden consequences which will happen to the organization. Of course, in principle measurement is a good thing, but it can often cause more harm than good. I hope the message came through to you loud and clear:
You get what you measure.
I have given a talk with this title many times, and it turns out from discussions after the talk I could have just as well have called it “You and Your Engineering Career,” or even “You and Your Career.” But I left the word “research” in the title because that is what I have most studied.
From the previous chapters you have an adequate background for how I made the study, and I need not mention again the names of the famous people I have studied closely. The earlier chapters are, in a sense, just a great expansion, with much more detail, of the original talk. This chapter is, in a sense, a summary of the previous 29 chapters.
Why do I believe this talk is important? It is important because as far as I know each of you has but one life to lead, and it seems to me it is better to do significant things than to just get along through life to its end. Certainly near the end it is nice to look back at a life of accomplishments rather than a life where you have merely survived and amused yourself. Thus in a real sense I am preaching the messages that (1) it is worth trying to accomplish the goals you set yourself and (2) it is worth setting yourself high goals.
Again, to be convincing to you I will talk mainly about my own experience, but there are equivalent stories I could use involving others. I want to get you to the state where you will say to yourself, “Yes, I would like to do first-class work. If Hamming could, then why not me?” Our society frowns on those who say this too loudly, but I only ask you say it to yourself! What you consider first-class work is up to you; you must pick your goals, but make them high!
I will start psychologically rather than logically. The major objection cited by people against striving to do great things is the belief that it is all a matter of luck. I have repeatedly cited
Brains are nice to have, but many people who seem not to have great
So I helped Bill Pfann, taught him how to use the computer, how to get numerical solutions to his problems, and let him have all the machine time he needed. It turned out zone melting was just what we needed to purify materials for transistors, for example, and it has proved to be essential in many areas of work. He ended up with all the prizes in the field, much more articulate as his confidence grew, and the other day I found his old lab is now a part of a national monument! Ability comes in many forms, and on the surface the variety is great; below the surface there are many common elements.
Having disposed of the psychological objections of luck and the lack of high-iq-type brains, let us go on to how to do great things. Among the important properties to have is the belief you can do important things. If you do not work on important problems, how can you expect to do important work? Yet direct observation and direct questioning of people show most scientists spend most of their time working on things they believe are not important and are not likely to lead to important things.
As an example, after I had been eating for some years with the physics table at the Bell Telephone Laboratories restaurant, fame, promotion, and hiring by other companies ruined the average quality of the people, so I shifted to the chemistry table in another corner of the restaurant. I began by asking what the important problems were in chemistry, then later what important problems they were working on, and finally one day said, “If what you are working on is not important and not likely to lead to important things, then why are you working on it?” After that I was not welcome and had to shift to eating with the engineers! That was in the spring, and in the fall one of the chemists stopped me in the hall and said, “What you said caused me to think for the whole summer about what the important problems are in my field, and while I have not changed my research it was well worth the effort.” I thanked him and went on—and noticed in a few months he was made head of the group. About ten years ago I saw he became a member of the National Academy of Engineering. No other person at the table did I ever hear of, and no other person was capable of responding to the question I had asked: “Why are you not working on and thinking about the important problems in your area?” If you do not work on important problems, then it is obvious you have little chance of doing important things.
The desire for excellence is an essential feature for doing great work. Without such a goal you will tend to wander like a drunken sailor. The sailor takes one step in one direction and the next in some independent direction. As a result the steps tend to cancel each other out, and the expected distance from the starting point is proportional to the square root of the number of steps taken. With a vision of excellence, and with the goal of doing significant work, there is a tendency for the steps to go in the same direction and thus go a distance proportional to the number of steps taken, which in a lifetime is a large number indeed. As noted before, Chapter 1, the difference between having a vision and not having a vision is almost everything, and doing excellent work provides a goal which is steady in this world of constant change.
Thus what you consider to be good working conditions may not be good for you! There are many illustrations of this point. For example, working with one’s door closed lets you get more work done per year than if you had an open door, but I have observed repeatedly that later those with the closed doors, while working just as hard as others, seem to work on slightly the wrong problems, while those who have let their door stay open get less work done but tend to work on the right problems! I cannot prove the cause-and-effect relationship; I can only observed the correlation. I suspect the open mind leads to the open door, and the open door tends to lead to the open mind; they reinforce each other.
This is related to another aspect of changing the problem. I was once solving on a digital computer the first really large simulation of a system of simultaneous differential equations, which at that time were the natural problem for an analog computer—but they had not been able to do it, and I was doing it on an ibm 701. The method of integration was an adaptation of the classical Milne’s method, and it was ugly to say the least. I suddenly realized that of course, being a military problem, I would have to file a report on how it was done, and every analog installation would go over it trying to object to what was actually being proved as against just getting the answers—I was showing convincingly that on some large problems, the digital computer could beat the analog computer on its own home ground. Realizing this, I realized the method of solution should be cleaned up, so I developed a new method of integration which had a nice theory, changed the method on the machine with a change of comparatively few instructions, and then computed the rest of the trajectories using the new formula. I published the new method and for some years it was in wide use and known as “Hamming’s method.” I do not recommend the method now that further progress has been made and the computers are different. To repeat the point I am making, I changed the problem from just getting answers to the realization I was demonstrating clearly for the first time the superiority of digital computers over the current analog computers, thus making a significant contribution to the science behind the activity of computing answers.
All these stories show that the conditions you tend to want are seldom the best ones for you—the interaction with harsh reality tends to push you into significant discoveries which otherwise you would never have thought about while doing pure research in a vacuum of your private interests.
In a sense my boss was saying intellectual investment is like compound interest: the more you do, the more you learn how to do, so the more you can do, etc. I do not know what compound interest rate to assign, but it must be well over 6%—one extra hour per day over a lifetime will much more than double the total output. The steady application of a bit more effort has a great total accumulation.
I strongly recommend taking the time, on a regular basis, to ask the larger questions, and not stay immersed in the sea of detail where almost everyone stays almost all of the time. These chapters have regularly stressed the bigger picture, and if you are to be a leader into the future, rather than a follower of others, I am now saying it seems to me to be necessary for you to look at the bigger picture on a regular, frequent basis for many years.
There is another trait of great people I must talk about—and it took me a long time to realize it. Great people can tolerate ambiguity; they can both believe and disbelieve at the same time. You must be able to believe your organization and field of research is the best there is, but also that there is much room for improvement! You can sort of see why this is a necessary trait. If you believe too much, you will not likely see the chances for significant improvements; if you do not believe enough, you will be filled with doubts and get very little done, chances are only the 2%, 5%, and 10% improvements. I have not the faintest idea of how to teach the tolerance of ambiguity, both belief and disbelief at the same time, but great people do it all the time.
Most great people also have 10 to 20 problems they regard as basic and of great importance, and which they currently do not know how to solve. They keep them in their mind, hoping to get a clue as to how to solve them. When a clue does appear they generally drop other things and get to work immediately on the important problem. Therefore they tend to come in first, and the others who come in later are soon forgotten. I must warn you, however, that the importance of the result is not the measure of the importance of the problem. The three problems in physics—anti-gravity, teleportation, and time travel—are seldom worked on because we have so few clues as to how to start. A problem is important partly because there is a possible attack on it and not just because of its inherent importance.
Again, you should do your job in such a fashion that others can build on top of it. Do not in the process try to make yourself indispensable; if you do, then you cannot be promoted, because you will be the only one who can do what you are now doing! I have seen a number of times where this clinging to the exclusive rights to the idea has in the long run done much harm to the individual and to the organization. If you are to get recognition then others must use your results, adopt, adapt, extend, and elaborate them, and in the process give you credit for it. I have long held the attitude of telling everyone freely of my ideas, and in my long career I have had only one important idea “stolen” by another person. I have found people are remarkably honest if you are in your turn.
It is a poor workman who blames his tools. I have always tried to adopt the philosophy that I will do the best I can in the given circumstances, and after it is all over maybe I will try to see to it that things are better next time. This school is not perfect, but for each class I try to do as well as I can and not spend my effort trying to reform every small blemish in the system. I did change Bell Telephone Laboratories significantly, but did not spend much effort on trivial details. I let others do that if they wanted to—but I got on with the main task as I saw it. Do you want to be a reformer of the trivia of your old organization or a creator of the new organization? Pick your choice, but be clear which path you are going down.
I must come to the topic of “selling” new ideas. You must master three things to do this (Chapter 5):
All three are essential—you must learn to sell your ideas, not by propaganda, but by force of clear presentation. I am sorry to have to point this out; many scientists and others think good ideas will win out automatically and need not be carefully presented. They are wrong; many a good idea has had to be rediscovered because it was not well presented the first time, years before! New ideas are automatically resisted by the establishment, and to some extent justly. The organization cannot be in a continual state of ferment and change, but it should respond to significant changes.
Change does not mean progress, but progress requires change.
To master the presentation of ideas, while books on the topic may be partly useful, I strongly suggest you adopt the habit of privately critiquing all presentations you attend and also asking the opinions of others. Try to find those parts which you think are effective and which also can be adapted to your style. And this includes the gentle art of telling jokes at times. Certainly a good after-dinner speech requires three well-told jokes: one at the beginning, one in the middle to wake them up again, and the best one at the end so they will remember at least one thing you said!
You are likely to be saying to yourself you have not the freedom to work on what you believe you should when you want to. I did not either for many years—I had to establish the reputation on my own time that I could do important work, and only then was I given the time to do it. You do not hire a plumber to learn plumbing while trying to fix your trouble; you expect he is already an expert. Similarly, only when you have developed your abilities will you generally get the freedom to practice your expertise, whatever you choose to make it, including the expertise of “universality,” as I did. I have already discussed the gentle art of educating your bosses, so I will not go into it again. It is part of the job of those who are going to rise to the top. Along the way you will generally have superiors who are less able than you are, so do not complain, since how else could it be if you are going to end up at the top and they are not?
The unexamined life is not worth living.
In a sense, this has been a course a revivalist preacher might have given—repent your idle ways, and in the future strive for greatness as you see it. I claim it is generally easier to succeed than it at first seems! It seems to me at almost all times there is a halo of opportunities about everyone from which to select. It is your life you have to live, and I am only one of many possible guides you have for selecting and creating the style of the one life you have to live. Most of the things I have been saying were not said to me; I had to discover them for myself. I have now told you in some detail how to succeed, hence you have no excuse for not doing better than I did. Good luck!
A
ADA language, 55
a different product, 19
Aitken, H., 34
alphabet training, 295
anti-congruent triangles, 299
APL language, 54
ASCII code, 127
Aspect, A., 318
atomic bomb, 20, 34, 65, 236, 254, 277
B
back-of-the-envelope calculations, 5–7, 16, 25, 323
Backus, J., 47
Baker, W.O., 178
Bell, A.G., 275
brainstorming, 326
BTL analog computers, 178
C
“Can machines think?”, 77, 104
classical education, 296
Clippinger, D., 44
Club of Rome, 245
computer advantages, 14, 24, 106
constructivists, 305
continental drift, 334–335
D
databases, 70–71
decoding tree, 127
Dirac, P.A.M., 316
Dodgson, L.C., 307
E
Eddington, A.S., 177, 233, 373
education vs. training, 1
eigenvalue, 182, 191, 200, 211, 314, 364
Einstein, A., 11, 50, 79, 308, 313, 317, 319, 331, 338, 342, 387, 388
expert systems, 76
F
Fermi, E., 287
fifth-generation computers, 56
Ford, H., 11
formalists, 300
four-circle paradox, 119–120, 259–260
fundamentals, vii, xix, xxi, 1, 8–9, 55, 186, 222, 328, 367
G
Galileo, 21
garbage in, garbage out, 258
Gibbs’ inequality, 166
Gibbs phenomenon, 200, 205, 206, 207, 212–213
Gilbert, E.N., 146
Gödel’s theorem, 311
GPS language, 75
growth of knowledge, 5
Gulliver’s Travels, 65
H
Hamming code, 156
Hamming window, 209
Hawthorne effect, 287
Hermite, C., 304
Hilbert, D., 301–302
history, 10
Hollerith, H., 33
Hopper, G., 392
how a filter works, 200
Huskey, H., 44
Huxley, A., 286
I
information, 75, 101, 124, 125, 135, 163
interconnection costs, 18
ISBN, 147
J
K
Kaiser filter design, 225
Kaiser, J.F., 179, 213, 225, 293
Kane, J., 69
Kraft inequality, 131, 136. 167
Kuhn, T., 333
L
language, 51
learn from experience, 22, 87, 104, 233, 390
Learning to Learn, xxi
life testing, 345
LISP, 49
logical school of mathematics, 303
Los Alamos, 11, 20, 34, 35, 38, 57, 65, 82, 236, 266, 366
Lull, R., 64
M
Mathews, M., 92
median filters, 231
Mendel, G., 325
micromanagement, 22
Morse code, 126
N
NBS publication, 348
Newton, I., ix, 4, 11, 51, 79, 312, 315, 328, 388
Nike missile, 31, 185, 238, 264, 366, 372
O
originality, 86, 90, 104, 324, 378
P
parable of the old lady and the cathedral, 360
Pasteur, L., xx, 150, 161, 201, 328, 332, 387
Pfann, B., 388
Pierce, J.R., 92
Planck, M., 313
Platonic mathematics, 299
proper teaching, 288
psychological novelty, 100, 104
R
RAND, 75
S
St. Augustine, 321
sampling rate stories, 181
Schickard, W., 32
Schrödinger, E., 314
SDS 910 computer, 69
self-consciousness, 42, 81, 103, 318
Shannon, C.E., ix, x, 82, 125, 163, 165, 167, 172, 173, 175, 325, 387, 390, 391
shower story, 226
Slagle, J., 98
SOAP language, 44
Stonehenge, 30
strong focusing, 278
student’s future, xx
T
tennis simulation, 249
three-dimensional tic-tac-toe, 82
top-down programming, 51
transfer function, 182, 197, 213, 225
travelling wave tube, 240, 241
Tukey-Cooley algorithm, 217
Tukey, J., ix, xi, 179, 180, 181, 208, 210, 217, 220, 230, 329, 393
Turing test, 79
U
UFO, 287
uncertainty principle, 63, 126, 232, 316, 322
uniquely decodable, 127, 129, 136, 166
V
variable-length codes, 126
vision of your future, 12
vitalism, 80
volume of a sphere, 114
von Hann window, 216
von Neumann, J., ix, 19, 28, 33, 35, 39, 44, 317, 391
W
weighted sum codes, 146
Z
Zuse, K., 34
Ask anything about this book.