![]() |
Worlds in a grain of siliconNew science from high-performance computingA report for the Royal Society and the Association of British Science Writers by Mike Holderness, Jan 2002SummaryThe availability of high-performance computing (HPC) has radically changed many areas of science. This offers exciting opportunities for scientists - and a serious challenge for all who depend on scientific advice. Given issues like global climate change, that's everyone. HPC promises to enable advances in fields ranging from cosmology through earth sciences to sub-atomic physics. Some are pure inquiries into the nature of the universe. Some are potentially very practical - like studying the agent that causes BSE. Not only is HPC changing how science is done, but it continues to expand what science can be done. It is opening up areas that experimenters cannot reach - like the interior of our planet - and those where experiments could in principle be done, but probably should not be - like macroeconomics. In these cases and many others, the computer power now available to researchers enables modelling of the inaccessible or the inadvisable - "virtual experiments", if you like. HPC also allows modelling of processes that are just too complicated to deal with any other way, like complex molecules "choosing" their shapes - and turbulent flow in fluids, which is enormously important to industry. These models are no longer restricted to simple, generalising equations. They can increasingly deal with the fine grain of the world - including things, like climate, that brook no generalisation. Some go so far as to call this "the new science of complexity". Our climate is the epitome of a complex system. Predictions of global climate change are based on gigantic computer models of the atmosphere and oceans. In deciding what to do about these the stakes are high, socially and economically. Everyone involved and interested in those decisions needs to appreciate what modelling complex systems means - and why the climatologists really do need yet bigger computers, and more programmers. Headings:
"The singing Xmas card has roughly the same computing power as the entire world had in 1946" - Professor Marshall Stoneham A meeting of massively different mindsIt's not often that you find a cosmologist, a chemist and an economist in animated discussion of their very different sciences. The Royal Society's meeting "New Science from high- performance computing" on 24-25 October 2001 was as multi- disciplinary as they come. What the fourteen speakers had in common is that their work has been revolutionised by the availability of High-Performance Computing (HPC). To appreciate the scope of HPC's impact on how science is done, simply read the list of contributors with vastly simplified descriptions of what they spoke about:
None of this scientific work would have been possible with the methods of 50 years ago; most of it wouldn't have been practicable with the computers of five years ago; and all of it is preparing the ground for work that will only be practicable with the computers of five years' time. As the Chair, Professor Richard Catlow of the Royal Institution, said in his introduction three overall themes emerge:
HPC is defined mostly by the ability to do huge numbers of calculations: one way of measuring a computer's performance is in "teraflops". That stands for "million million floating-point operations per second", where dividing 0.00123 by 2.19037 is an example of a "floating-point operation". Several of the speakers suggest, though, that handling enormous quantities of data will be increasingly important. New astronomical satellites and high-energy physics laboratories now under construction are due to produce torrents of data. The files that will need to be stored and shared between laboratories for analysis are so huge that just as home computer users were getting used to "giga" meaning 1 000 000 000 (109 for short), HPC users had to deal with the "exabyte": 1 000 000 000 000 000 000 (1018) bytes. (See Appendix 1 on naming very large numbers for definitions of the prefixes used below). The most exciting part for a student of science, though, is the way that HPC opens up new areas for scientific study. This is all to do with the ability to model complexity. The world is, as a rule, complexIt increasingly seems that non-complex systems - those that can be modelled with a set of equations that you run once - are in a minority in the world. A single planet orbiting a lone star is such a system - but the Universe is made of galaxies of stars and clusters of galaxies and superclusters of clusters. Physicists may be able to treat an isolated atom, to a close approximation, as a non-complex system - but we are, like almost everything we're interested in, made of very many atoms. (See Appendix 2 for some background to modelling and complexity.) Physicists like to claim that biology is a subset of chemistry and chemistry is just a subset of physics. But the reason science has different disciplines, with radically different languages, is complexity. Physicists' equations for atoms work, but don't help much in understanding molecules. Chemists' descriptions of molecules are too "low-level" to be of much use to cell biologists... and so on up to organisms, species, and cultures. Hence the interest in "ab initio" modelling mentioned in several of the talks. "Ab initio" means "from the beginning" or "from founding principles" - trying to leap one or more of these discipline boundaries by dealing with the complexity of atoms organised into molecules, and so on. The growing power of computersComputers are the salvation of scientists who need to model complexity because what computers are good at is doing simple calculations over and over again. And improving computer technology means that they have got better at it very rapidly: almost every speaker mentioned "Moore's Law". This is merely an observation by Gordon Moore, president of computer chip manufacturer Intel, that roughly every 18 months a new generation of computer chips appears with double the processing power of the last generation. Argument rages (elsewhere) over when Moore's Law will run out. For HPC, though doing each calculation faster helps a lot, the main thing is to do many calculations at once - and how to write the software that encodes your scientific model so it will run on such a machine. There are very many ways of structuring a "massively parallel computer". Which one is best for a given model depends on the mathematical structure of the model. To get a flavour, consider two extreme approaches. They differ in the way that the memory where they store input data, intermediate results and output is organised. At one extreme is a computer with many processors doing calculations on bits of the model at the same time - "in parallel" - and in which each of the processors has fast access to all of the memory. You need such a machine if it's impossible to "decompose" your model or calculation into discrete chunks that have their own intermediate results and do not need to refer to each others' intermediate results in the course of a run. Examples of such calculations are the quantum mechanics simulator mentioned below or code-breaking a heavily-encrypted message (not dealt with in the meeting). The other extreme is the "Beowulf cluster" - a sort of supercomputer built from a large number of off-the-shelf PCs. This was developed by a consortium of hackers - in the "good hacker, solves hard computer problems just because they're there" sense. Communication between PCs is very slow compared to the speed of each one's processor. So the cluster approach works best for models that are so totally "decomposable" that they can be farmed out in separate chunks to the PCs, where each chunk runs in isolation until it reports back with its bit of the result. Climate modelling, for example, is somewhere in the middle of this range. It's fairly decomposable, in that a chunk of atmosphere over the Andes doesn't need to know immediately about the state of Lake Baikal - but adjacent chunks of model do need to communicate frequently. The first talk dealt with the issue of how to design machines that are optimised for the scientific tasks in hand. What do today's supercomputers look like?"What do these HPC machines look like?" asked Professor Thom Dunning of the North Carolina Supercomputing Center. "Acres and acres of cabinets." He means this literally: the machine room for the next machine in the US Department of Energy's Accelerated Strategic Computing Initiative, code-named ASCI Q, will cover 40,000 square feet (3700 square metres). Now being built by Compaq at Los Alamos National Laboratory, ASCI Q will achieve 30 teraflops. It will have 11,968 powerful computer processors, 12 terabytes of memory and 600 terabytes of disk storage. As an example of the kind of design decisions that have to be made, Professor Dunning mentioned that organising the machine in smaller "nodes" could improve performance - but would double the floor-space it needed. The Blue Gene machine being built by IBM is designed to achieve 1000 teraflops - 1 petaflop. It will have a million processors - each of its main silicon chips will carry 32 processors with 8 megabytes of memory on the chip, which makes access from the nearby processors very fast. The entire machine will occupy only 2000 square feet. Software is crucial"One of the things that puzzled me for a long time," Professor Dunning said, "was why the scientific and engineering communities were so slow in embracing parallel computing." The difficulty, it turned out, was rewriting their modelling software to run efficiently on hundreds or now thousands of processors. Climate modellers were having particular problems, often only using 5 per cent of the theoretical power of a highly parallel machine. Professor Dunning is working at the Pacific North-West Laboratory to help solve this problem for chemists, with a software system called "North-West Chem". This aims to address the memory access problem using virtual "global array" technology - very fast memory near the processor whose contents are swapped with slower, distant and cheaper memory as rarely as possible. A major lesson the NWChem team learned is that because the technology is changing so fast, inter-disciplinary teams can no longer consist of separate groups. "The applied mathematicians have to understand enough about theoretical chemistry to engage in conversations" about why the program works, and so on for all the many disciplines involved. They have, however, been able to structure the program in layers with pure computational science at the "bottom" closest to the machine and pure theoretical chemistry at the top. Over five years the program involved 100 person-years of effort. What it does is allow theoretical chemists to examine "virtual molecules". The calculations "are sufficiently accurate that they rival what you can get out of good experiments." One result is the discovery that a mathematical technique still taught in university chemistry, "perturbation theory", does not in fact work. Audience questions revealed a certain nostalgia for the famous Cray computer. It has little parallel processing power by modern standards. But the measure of memory access speed is the number of bytes that can be fetched while one floating-point operation is done: the latest Cray scored 12 bytes per flop, and ASCI White, a predecessor of ASCI Q, scores 0·67. Investment in software that allowed scientists to use 10 per cent of a computer's theoretical peak performance instead of 5% would certainly be as worthwhile as buying another machine and would very likely be cheaper. But it is often harder to find funding for software, especially when scientists are competing with the games industry for highly-skilled programmers. How do we know when to believe the results of a computer model?Scientists get excited about modelling the world from first principles. But if peoples' lives depended on their results, would they believe them? No, said Professor Marshall Stoneham FRS, of the Department of Physics at University College London in a wide-ranging presentation, and rightly not. He has done a survey which asked people, among other things, whether they "would fly in a plane whose materials had been designed entirely by computer". Only one said yes - "and I suspect he didn't understand the question". This is linked to the problem of whether scientists are doing the calculations which are necessary, from the point of view of the science - or those which are convenient, because the code exists to do them. In some fields, there is no alternative to modelling: "You can't predict what happens in 100 million years' time by waiting." But when modelling replaces experiments as a way of predicting next year, the modellers face serious problems of credibility. When do simplifications work, and can results from small models be scaled up? Not always. Professor Stoneham pointed to the design of so-called "quantum dot" computing components, where an assembly of 201 atoms turns out to behave entirely differently to one of 200. Scientists need to recognise that engineers often produce better answers, sooner, than do their theoretical studies. "There is nothing to be ashamed of in the word 'empirical'." Better living through theoretical chemistryProfessor Jens Norskov is working at the Department of Physics, Technical University of Denmark on - among other things - developing catalysts for fuel cells. Fuel cells have the potential to supply cleaner energy for homes, cars and even mobile phones. The most promising variety combine hydrogen with oxygen on a catalyst - a substance whose surface can trap reactants, bringing them together to interact much faster than they would in free solution. The "exhaust" is pure water. But there's a problem. The slightest trace of carbon monoxide "poisons" the catalyst - and carbon monoxide contamination is inevitable in all the practical ways of generating hydrogen. So Professor Norskov is using computer simulations of the electronic structure of metals - similar in principle to NWChem - to investigate ways of modifying the platinum catalyst's surface. The model starts from the first principles of quantum physics, calculating the behaviour of the electrons around individual atoms. His goal is to shift the energy of the "d-states" of the electrons around atoms of platinum, by alloying it with small quantities of other metals. The computer model allows his team to investigate combinations of alloy metals much more quickly than a real-world laboratory could - and then check the most promising combinations with actual metal. They find that the accuracy of their calculation of the energy of formation of alloys is as good as that of the experiments. The next step is to use "genetic algorithms" to explore alloys of four metals chosen from 32. There are far too many of these to check them all. So they simulate a sub-set, and weed out those which have promising properties and are likely to be stable, relatively easily made and not too expensive: then check possible new alloys closely related to these; and repeat for further "generations". One result was a gold-nickel alloy which took three person-years to make. The chemists persevered largely because the model said their work was likely to be worthwhile. Biologists need supercomputers tooA scientific audience in the UK should be more familiar with Bovine Spongiform Encephalitis than those elsewhere, suggests Professor Valerie Daggett of the Department of Medicinal Chemistry at the University of Washington in Seattle - "we in the USA are in denial" about the disease, but "I think we are going to be dealing with this sort of thing in the near future". BSE is a very unusual disease, being transmitted by a protein without any involvement of genetic material. The "prion protein" is a damaged form of a protein that is a normal part of the brain, folded into the wrong shape. Almost certainly, molecules of the damaged form bind to the normal protein and bend them, too, into the abnormal shape, starting a chain reaction. But how? Professor Daggett is investigating the structure of the prion protein by calculating what happens as the individual amino-acid building-blocks swivel about the bonds that connect them like beads on a string. Simulation is essential because the conditions the protein is subjected to by the best current means of measuring its structure - Nuclear Magnetic Resonance - themselves change its structure. "We can get very high-resolution information using molecular dynamics," she said: "It doesn't mean it's right, but we can get it. Then we have to check it experimentally." The model works out the plausible shapes for the protein by calculating the energy of each possible position of each amino acid, taking account of stretching and twisting of the bond, electrostatic forces between the individual atoms in the amino acids, and so on - and making sure that the virtual amino acids aren't trying to be in the same place at the same time. The protein will spend most of the time in the lowest-energy configurations. A crucial feature of the model is 25,000 virtual water molecules surrounding the part of the protein whose structure they know: much more computer power would be needed to include a more realistic amount of water. Much, much more is needed to model the fibres formed by multiple prion proteins and the water around them. The "movie" of the protein produced by the model shows a frame for every ten nanoseconds of real time, runs for 50 frames, and takes a month to compute on a Beowulf cluster of 200 computers. Investigating each form of the protein involves running ten or more simulations because each produces a different "movie". The work so far supports the theory that the acidity of the prion's environment is important to its damaging effect. Now the team wants to investigate interactions between the protein and antibodies which may, possibly, prevent it doing its damage. Quests for accurate calculations of large molecular systemsThe "particles" of a protein-folding model are (roughly speaking) whole amino acid molecules. If you want to model one of those starting from the fundamental quantum physics of its individual atoms, you have a problem. Every time you double the number of (non-hydrogen) atoms, the calculations get at least 64 times harder. So Professor Keiji Morokuma of Emory University, Atlanta, described the "ONIOM" method, in which researchers identify the key part of a molecule - usually the part involved in a reaction - and model it in the greatest detail. They model the parts that seem to just hang around the reactive centre using more approximate methods, and the "environment" of solvent models surrounding their molecule in less detail still. Questions made it clear that there's a problem, especially when doing the useful task of modelling possible molecules that have not yet been synthesised to see what they would look like. The "hanger-on" parts of a molecule often gain energy or electric charge in the course of a reaction - so it's not always clear in advance which are the key parts. But when the method works, it seems to work well. Reading your genesThe Human Genome Project is well-known for producing a large data file containing the sequence of the DNA that makes you you. Now comes the interesting bit: we have the text, but what does it mean? The genes code for 30,000 (or more) proteins. It is the shape of each protein, more than its sequence of amino acids, that determines what it does in your body. An important way to decide where to start unravelling this problem is to detect how each of the tens of thousands of unknown proteins relates to other, well-studied proteins from our species or others. So Dr Mark Swindells has worked on putting together a suite of programs that works out roughly how a protein may fold up into a three-dimensional structure, and hunts the entire database of 700,000 proteins for others with similar structures. This task is highly decomposable - it essentially involves running the same set of 15 programs over and over again on discrete chunks of genome data. Some are processor-intensive, and some use large amounts of memory to store 3-D structure maps. But in order to find all the important relationships between proteins you have to compare each to every other in one huge calculation. Dr Swindells co-founded a company, Inpharmatica, to get funding to do this work. They're currently using a cluster of 1300 PCs, each with 1 gigabyte of memory - and it takes three months to run through the database and produce 200 gigabytes of output. They run the Linux operating system because it's free and that means they can afford twice as many processors. Specialised hardware to speed up their work does exist - "but it might take nine months to install, and for what it costs I can go down the Tottenham Court Road and but another 200 PCs. That's the loser's way - but it works." Data grids: infrastructure for data-intensive scienceLast year, a few scientific experiments in high-energy physics generated a few hundred terabytes of data each. By 2005 several are expected to generate 10,000 terabytes - 10 petabytes each. In 2010 the Large Hadron Collider experiment (LHC) will start spitting out 100 petabytes a year. Other disciplines produce very large data-sets, for example the search for gravity waves, astronomers' full-sky surveys - and models of how pollution disperses in estuaries. In medicine, digitised archives of X-rays and CAT scans could be several petabytes; one brain-scan can be 0.25 terabyte. Even epidemiologists are considering running simulations of the entire population of the planet to understand how diseases spread. The LHC experiments will involve at least 2000 physicists in 30 countries. How are they going to get at the output of the experiment at CERN near Geneva, to analyse it? Professor Paul Avery of the Physics Department at the University of Florida described plans to build a global data grid, connecting national centres at speeds of 10 gigabits per second, and other projects. A serious problem yet to be addressed, it became clear in questions, is how such huge quantities of scientific data will be archived after project funding runs out.
Virtual particle physicsParticle physicists have a problem. Quarks, which they believe to be the fundamental constituents of matter, cannot be observed on their own - the theory says that it would take infinite energy to separate the three quarks which make up a proton. Computer simulation provides the only known way of relating what's observed in particle experiments to the fundamental fields which we believe underlie the structure of matter. These are described the theory of Quantum Chromodynamics (QCD), so named because one of the properties of a quark is arbitrarily called "colour". The QCD simulation is done on a four-dimensional space-time grid, or "lattice". As a measure of its difficulty, note that the obvious way to check a simulation is to repeat it with a grid that is twice as finely-spaced - and this turns out to take up to 1000 times as much calculation. So, even if the first simulation can be done on a PC, it takes 1000 PCs to check it. The goal is to repeat simulations with different "guessed" values for the masses of the six quarks and the strength of their binding, until the output matches the thousands of experimentally measured particle interactions and decays. The requirement that all this experimental data is consistent with one particular set of quark mass values provides a way of falsifying QCD. This will take 100 to 1000 teraflops-years - which only a few years ago would have been an inconceivable project. "We'll know we've finished when our results are accurate to the few percent level, and we've not falsified the theory." Professor Richard Kenway, of the Department of Physics and Astronomy at the University of Edinburgh, said. He described a project to build a system with up to 20,000 special-purpose processors - QCD on a chip. These will be connected in a six-dimensional torus - like a ring doughnut shape, but much more so. By the end of 2003, they plan to have in Edinburgh a machine with a sustained processing speed of 5 teraflops, at a cost of $1 per sustained megaflop, a fifth of the likely cost of a commercial machine. Tests suggest that they will be able to use over 50% of the peak performance, thanks largely to an extremely fast interface between the processors and memory. The machine design could be scaled up to 100 teraflops performance or more - the drawback is that you have to do your own maintenance. One tantalising question is whether these simulations will undermine our current picture of particle physics before the LHC experiment is switched on in 2006. How bright is that laser?Professor Ken Taylor of the Department of Applied Mathematics and Theoretical Physics at Queen's University Belfast described work on simulating lasers and exotic states of matter. There were four good reasons for such simulations, he said. The first was an intellectual appeal: the "three-body" problem still isn't solved for a traditional "analytical" approach, so simulation is the only way to examine, for example, what goes on at an atomic level inside a laser that uses helium or any heavier element. The others were useful: simulation can help guide and interpret lab experiments; it can help with benchmarking the results of laboratory experiments; and it can develop the methods and skills to go on to further problems. Experimentalists can measure the relative intensities of the lasers they build, but cannot measure the absolute amounts of energy they emit. Professor Taylor's simulations - which model the behaviour and interactions of individual helium atoms - provide the first absolute calibration of any high-intensity laser's output. The experimenters readily accepted the model's conclusion that their laser was 50% less intense than they had thought. The team is now simulating very short-wave helium lasers that have not yet been built. Results strongly suggest that these will behave very differently from existing lasers. They are also working on exotic states of helium - atomic clusters, and the Bose-Einstein Condensate, a new state of matter that appears only at temperatures very close to absolute zero. “Simulation in desperation” - the hunt for a finer flameFluid dynamics deals with the behaviour of gases and liquids. The practical benefits of understanding in detail what happens when air flows over an aeroplane wing or when fuel burns in an engine would be enormous - but unfortunately the standard theory leaves out too much information. So Dr Stewart Cant of the Department of Engineering, University of Cambridge reports that they use Direct Numerical Simulation (DNS) "out of sheer desperation - it's incredibly expensive but there's no other way to get the level of detail" - either to build models that can be verified against observation or those that provide a check for other, more simplified but more detailed models. DNS tracks the behaviour of small "cells" of gas or liquid individually, making minimal assumptions about them. The Cambridge team has achieved higher resolution than any group anywhere else - a cube of 384 cells on each side (56,623,104 in all). The punchline of the engineers' joke about physicists' approach to the world is "consider a spherical cow" - but this field requires modelling some shapes which are nearly as complicated as a real cow. That record-breaking model takes thirty to forty hours to run on a 64-processor Hitachi computer, using 10 gigabytes of memory. One feature of doing modelling for industry is that the clients are mostly interested in getting results overnight - "which is frustrating, but certainly drives us to improve the performance of our modelling software." An exception is a model of fuel-air mixing in power-generating gas turbines, which is now running continuously on a cluster of PCs. "We make small changes to the geometry," Professor Cant reports, "and check their effects before anyone touches any metal." The group has gained a reputation for "mission impossible simulation" - but there are limits. When it comes to understanding flames - which could lead to cleaner-burning engines and less pollution - it would be nice to include the chemistry of combustion in each modelled cell. But there isn't the computer power in the world to do this in a model with three dimensions, and even the coming teraflop machines won't be powerful enough. "However much computing power you give us," Professor Cant says, "we can make good use of it." The ab initio simulation of the Earth's coreThe Earth's core is inaccessible. Our observation of it is restricted to listening to how it affects the vibrations from earthquakes on the other side of the planet. This shows that the outer core is a liquid down to 2800 km, that from there to the centre 6400 km down is a solid, and that it's largely iron. It would be really interesting to know more about what most of our planet is made of, what crystalline form it is in - and how it melts, and hence how it generates the magnetic field that shields us all from cosmic radiation. But calculations suggest that the pressure in the centre of the core is around 300 gigapascal. (Atmospheric pressure at sea level is, for comparison, about 100 kilopascal.) The highest sustained pressure achieved in a laboratory is 200 gigapascal - beyond that the diamond anvils used shatter, and we don't have anything harder to use. So we can't directly investigate how any material behaves down there. Simulation is the only way to investigate further. Professor David Price of the Department of Geological Sciences at University College London described the successes so far. Their first result was that the "close-packed hexagonal" crystal structure has lower energy at 300 gigapascal than the other possibilities - so that'll be the "phase" the iron is in at the centre of the Earth. Their second followed from simulating this material melting, thus obtaining the temperature at the liquid-solid boundary: between 5200 K and 5700 K - hotter than previously thought. They were rewarded with a half page in the Daily Mirror: "Core, what a scorcher". The next step is to deduce what elements are alloyed with the iron. In the model, they can do virtual alchemy, changing selected iron "atoms" into sulphur, silicon or oxygen. Preliminary results suggest that the liquid contains 9% of either silicon or sulphur and 8% oxygen, and the solid 7.5% and 0.3% respectively. They now aim to investigate the effects of hydrogen, carbon and potassium, and to find a way of distinguishing silicon from sulphur. These results are based on modelling 100 iron atoms at once from quantum-mechanical first principles. Simulating their behaviour for just 500 femtoseconds takes 12 hours on a Cray T3E computer. They'd like to run a model of twice as many atoms for 20 times as long, as a check... A new language for economistsIt would be very nice - for economists - if the Chair of the Federal Reserve were to set interest rates at random between zero and 20% for a month at a time, until they had gathered some experimental data on the effects. But, as Dr Jurgen Doornik of Nuffield College, University of Oxford, ruefully points out, this wouldn't be so good for other people and would probably cause outrage. So simulation is suggested. But the economy is complex. Not only are there over 6 gigapeople interacting, but they often do so organised in groups. And not only are economies non-stationary - equilibrium models are wrong - but they undergo external events like privatisations and deprivatisations, and shocks, like wars... Neither is economic forecasting like weather forecasting: "If you went to the Chancellor of the Exchequer and offered to forecast tomorrow's economy, he'd laugh you out of the door." He's interested in next year. Economists find it as difficult as biologists do to get funds for HPC. They also have a problem with the difficulty of writing software to run simulations on parallel machines - not least because any students with an interest in this will get much more money in the City. So Dr Doornik and colleagues at Nuffield, including Professor David Hendry, have designed a matrix programming language, Ox, for economics modelling. Applications range from estimating the value at risk in options portfolios to developing theories of market volatility by simulating candidate theories. The models partition well, so they're considering running them on Beowulf-style clusters. Nuffield College has several hundred computers that are unused most of the time - and Stanford Business School, where Professor Doornik also teaches, has several thousand. These could be used to run models at night. Climage change computing challenge"One of the greatest scientific challenges of the 21st century," says Professor Alan O'Neill of Department of Meteorology at the University of Reading, is "the construction of reliable prediction models of the complex Earth system". The ability to predict could have "significant societal and economic advantages," as he modestly put it. A useful model must simulate the whole of the atmosphere, oceans, ice, biomass (including us), solar energy input, their complex interactions and many of their internal mechanisms, like atmospheric chemistry. And it must do so on a fine scale - "if you want to know how climate change will affect your potato growing," Professor O'Neill pointed out, "with current models, unless your field is 200 kilometres square, you're in trouble." Models detailed enough to represent the Welsh uplands at all are only just becoming feasible. To model the atmosphere in 20-kilometre "cells" will require teraflops of computer power. The chaotic nature of the system means that this will not produce definitive weather or climate forecasts. Running the model many times, however, will allow reasonable estimates of the probabilities of different outcomes. One fundamental question which cannot yet be answered because of lack of computer power is how stable the Gulf Stream is. This flow of warm water toward North-West Europe has "turned off" in the past, most recently about 10,000 years ago. This was probably connected with melting Arctic ice sheets injecting fresh water south of Greenland, and possibly has an element of chaotic fluctuation. If it happens again, London will have the climate of St Petersburg or Newfoundland. Modelling accurately in space is one problem: a colleague refers to current models including a "Gulf Smudge". Time is another: water in the giant ocean current system of which the Gulf Stream is a part takes about 500 years to complete a circuit of the globe:, so a simulation must model this timescale. A current Cray T3E will run a model-year in a day or so: model-centuries per day are required. To provide an answer to this, and predictions of climate change with less uncertainty, climatologists need computers with fast memory access. They also need to handle large quantities of data - both from the models, and the weather satellite data against which they will be validating the models. The planned Japanese 40-teraflop computer, it was reported from the floor to the horror of many participants, will produce 100 petabytes of data a day when running climate models. Refining predictions of climate change is very necessary, and will require significant funding. Simulating the formation of cosmic structureHaving heard about very practical simulation of the whole planet, the meeting considered the deep theory of the entire Universe. How can scientists test theories of Universe formation? Obviously, any experiments would be infeasibly large and long- running. And the most distant galaxies that astronomers can see are 80% of the age of the universe. But the properties of galaxies and the structure of clusters of galaxies and super- clusters of clusters "encode the physics of the very early universe," said Professor Carlos Frenk of the Department of Physics, University of Durham. Running a simulation of the formation of such structures and comparing them to observed structures provides a check on the theory. The model is ab initio in that it starts from first principles of quantum physics and also in that it starts very soon after the Big Bang. The night before Professor Frenk's talk, the Anglo- Australian galaxy-mapping project completed the measurement of the "redshift" of 200,000 actual galaxies - from which their distances are inferred. These are the observations to which the model output is compared. Simulations also aim to falsify theories of the mysterious "dark matter" whose existence astronomers infer from way galaxies rotate. Einstein proposed a force inherent in space-time, measured by his "cosmological constant". He later described this as "the biggest blunder of his life" - but cosmologists are currently suggesting that he was right the first time. So the simulations start from the very first quantum event - the Big Bang - and model its evolution over time. They produce a very good "prediction" of the fluctuations in the "microwave background" - which are relics of unevenness in the very early Universe, before there were stars or galaxies. They also map well to the observed clusters and super-clusters that developed out of this unevenness, and to the distibution of voids between them. So the theory is standing up to the model on the large scale. But no-one's managed to build a model that produces galaxies that resemble the ones observed, so there's still work to do. APPENDIX 1: Naming very large and very small numbersTo save breath and ink, scientists use names for "multipliers" on quantities. A milligram of aspirin (0.001 g), a kilogram of apples (1000 grams), and a megabyte of hard disk space (1 000 000 bytes) are relatively familiar. The very small and very large multipliers are less so. This is the full set approved by the 19ème Conference Generale des Poids et Mesures in 1990.
In computing, these powers of 1000 are often used loosely for powers of 1024, which is a round number in binary arithmetic (210). The prefixes for these multipliers are allegedly mebi-, gibi-, tibi- etc but not enough people know this for it to be useful if true.
APPENDIX 2: What is scientific modelling and how does HPC help?What scientists do is look at the world, and at the theories they have. The observations of the world that don't fit current theories, in particular, inspire new theories. A theory counts as a scientific theory if there's an observation - either from experiment or from the world doing its own thing - that would, if made, disprove it. So the theory that the Moon is made of green cheese is (in this rather specialised philosophical sense) scientific. It has also been tested, and found to be false - been there, played golf on that. The theory that the Moon smiles kindly on happy lovers is poetic, not scientific - what could falsify it? How does modelling the world work?The theory that any two massive objects are attracted to each other by a force - called gravity - that is proportional to their masses and inversely proportional to the square of the distance between them is a scientific theory. Make some observations of the position of the Moon, plug them into the equations that summarise the theory, and you can predict when the Moon will rise on any date. Immediately, you can make "predictions" of when the Moon rose on any date in the past, and compare them with other observations of the past - different ones from those you used to calculate its orbit. If your version of the theory successfully predicts all past observations, you're ready to take bets on it predicting the future. If it doesn't, you need to go back and look at the numbers that you plugged into the theory - the masses of the Earth and Moon, the sizes of their orbit around each other, possible errors in your observations, and the fundamental constant that is the strength of the force of gravity - and to look for other forces. If no amount of re-calculating with the variables worked, it would be time to try a new theory. The theory of gravity does work (so long as the two masses are moving relatively slowly). Every time anyone observes the time of Moon-rise, they have an opportunity to falsify it - and that's never happened. It got the Apollo missions to the Moon and back. We've tried very, very hard to falsify it, and failed. This commonly-accepted view of how scientific theory-making works explains why you won't get a plain "yes" or "no" out of a thoughtful scientist. But you can take the answer "the probability that the Moon will rise three times tomorrow is vanishingly small" as a "no" for everyday purposes. What the theory of gravity does is to provide a model of the motion of the Moon and the Earth around each other. It's a true classic among models, in that it sums up an enormous number of observations about planets, apples and rockets in some very simple equations - equations that can be worked out with a pencil and paper. Philosophers argued long and hard about whether "gravity" is part of the model or part of something called "reality". Then Einstein came along with a different theory, modelling what happens when the masses get close to the speed of light - using equations that are a bit more complicated but still doable with a pencil and paper - and they had more fun things to argue about. Scientists note that whatever theory you hold about what theories are, if you use the theory of gravity to launch a communications satellite then you will find that everyone with a dish agrees that, yes, CNN is on air as predicted and over-simplifying scientists' work as usual. (And how would you falsify a theory of theories or a model of models?) Scientists are more worried about the cases where models like the theory of gravity work, but don't help. There is good reason to believe that all the really interesting problems of the 21st century will be among these cases. Enter complexity, computers - and chaosA mathematician called Henri Poincaré sat down to work out - with a pencil and paper, since this was in 1890 - a model of three masses orbiting around each other. He couldn't come up with a set of equations that described how they move in the long run. All he could do was solve the equations for how they move in a short time, then plug the resulting positions and velocities back into the equations and do it again. He laboriously did this for the first few dozen time steps. The result suggested that a planet orbiting a double star would wander weirdly and might at any time shoot off into space or graze the surface of a star. Worse, in theory, was the discovery that if you started the sequence of calculations again with very slightly different initial positions, you got wildly different results, and there was absolutely no way of predicting what set of results you'd get for any set of initial conditions. Poincaré had discovered chaos. This was disturbing, because the Earth and the Moon (for example) don't orbit each other in isolation: they're affected by the Sun, Saturn and Jupiter, even asteroids. Their orbit must, in theory, be chaotic too. But the effects are small, as you'll notice when the Moon rises on time tonight. So Poincaré's result was put aside by practical scientists as a curiosity, and by mathematicians as too hard. Fast-forward to the 1960s. Computers have arrived. It'd been clear for half a century that the weather can be described by a very large set of fairly simple equations. Every cubic kilometre, say, of air behaves fairly simply (by physicists' standards) and interacts fairly simply with adjoining "cells" of air and any land or sea beneath. To model climate - which is to weather as weather is to a raindrop - you "just" have to iterate the equations for all those cells, every few minutes for a century or more of "model time". In 1961, Edward Lorenz was running an enormously simplified climate model on an early computer at the Massachusetts Institute of Technology. He found that each time he re-started the sequence of calculations after the computer had broken down, he got wildly different results. He had re-discovered chaos. This time, computers provided scientists with a way of doing something with chaos, so they did. Before long, they'd dug out Poincaré's papers and run his calculations over many thousands of time-steps, producing not just a scatter of pencil points describing a few planetary orbit states but a strange and beautiful skein of lines with infinite detail. Chaos, as properly defined by scientists, is what you get when the repeated iteration of a simple model produces enormously complex results. All chaotic systems are complex, but there are other kinds of complex system too. "Complex" means "cannot be summed up in one go with simple equations" - which is the same as "has to be worked out step by step". Experiments on complex systems produce masses of data but no correct simple theory: turbulent flow in gases and liquids is a classic example. Many, like planetary orbits and climate, are not susceptible to experiment. Computer modelling is the only way to explore complex systems. In principle, things like climate models are no different to well-tried scientific models like gravity. The difference is that the quality of the results - the probability you can give that the answer is right - depends on how much computer power you have, and how ingenious your programmers are at making full use of it. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||
| The document does not necessarily constitute the views of the Royal Society or of the Association of British Science Writers, and views expressed in it should not be attributed either to the Royal Society or to the Association. Copyright 2002 is held jointly by the Royal Society and the author. Moral rights are asserted. A license is hereby granted: (1) to all, to make copies for personal non-profit use only; and (2) to Fellows of the Royal Society, the Association of British Science Writers, the Medical Journalists' Association, and those receiving the document on paper through the Royal Society's press service, to make extensive quotations without payment and without reference to source. For other uses and to request further printed copies, please contact the Royal Society. |