Here’s a simple experiment one can actually try. Take a bag of M&M’s, and without peeking reach in and grab one. Eat it. Then grab another and return it to the bag with another one, from a separate bag, of the same colour. Give it a shake. I guarantee (and if you tell me how big your bag is I’ll have a bet on how long it’ll take) that your bag will end up containing only one colour. Every time. I can’t tell you which colour it will be, but fixation will happen.
This models the simple population process of Neutral Drift. Eating is death, duplication is reproduction, and the result is invariably a change in frequencies, right through to extinction of all but one type. You don’t have to alternate death and birth; choose any scheme you like short of peeking in the bag and being influenced by residual frequencies (ie: frequency-dependent Selection), and you will end up with all one colour.
Is Chance a cause here? Well … yes, in a sense it is, in the form of sample error. Survival and reproduction are basically a matter of sampling the genes of the previous generation. More random samples are a distortion of the larger population than aren’t, so, inexorably, your future populations will move away from any prior makeup, increasing some at the expense of others till only one variant remains.
Selection is a consistent bias upon this basic process. If different colours also differed a little in weight, say, more of some would be at the bottom of the bag than others, so you’d be more likely to pick one type than another. In more trials, the type more likely to be picked would be picked more often, to express it somewhat tautologously. You’d get a sampling bias.
Both of these processes are random – or stochastic, to use the preferred term. In reality, they are variations of the same process, with continuously varying degrees of bias from zero upwards. It makes no sense to call selection nonrandom, unless by ‘random’ you mean unbiased. Where there is no bias, all is Drift. But turning up the selective heat does not eliminate drift – sample error – and so does not eliminate stochasticity.
With a source of new variation, these processes render evolution inevitable. Even with a brand new mutation, with no selective advantage whatsoever, 1/Nth of the time (where N is the population size) it will become the sole survivor. That’s the baseline. If there is a selective advantage, it will be more likely and quicker to fix, on the average. If at a selective disadvantage, it will be less likely and slower.
Conversely, without a source of new variation, all existing variation would be squeezed out of the population, and evolution would stop.
T’ain’t necessarily so!
In that paper I linked to on calculating generation interval in humans, it varied significantly from culture to culture, “primitive” hunter-gather societies and third world populations were sampled. There’s also a difference between paternal and maternal generation interval. We might think that, say, old-stone-age populations may have had shorter generation times than currently.
Don’t understand the significance of that, phoodoo? Let me unpack it for you.
My grandmother had four children. My father was younger than his oldest sister by about 15 years. His sister’s grandson is the same age as me.
So in just two generations we have a difference of one generation.
If you go back N generations, you will find typical differences between generation numbers that scale as the square root of N. So tracing your ancestors 100 generations back will give discrepancies of plus or minus 10.
Coexistence of different generations in the Moran model (M&Ms) is a feature, not a bug.
No. They are all in the same time zone – they all exist at the same time. The tick of the clock is not determined by the generation time; generations occur against the background of a steadily ticking clock. Each individual traces a (slightly) different number of steps back to the original population, and therefore those steps are of different average time length.
“Yadda yadda yadda. Here are some numbers I pulled out my ass”. In This post, I gave an actual example of a typical variance and mean for an actual run. It does not accord with your intuition. And the reason is quite simple; it is vanishingly improbable that an individual will still be in the bag at the same time as its distant descendants are too. For g+1, quite probable, g+2, a bit less, g+3, a bit less still … it soon gets down to the negligible. Which is why the numbers I gave you – actual numbers from an actual run – do not accord with your intuition on variance.
You want more? I have loads.
The ‘tick’ of the clock is fundamentally the time taken to perform the computer operations to remove a member and replicate another. That is constant per member and constant per N members. Each approximate generation occupies a time period of N ticks, so g generations is a multiple – each ‘population generation’ takes about the same time to complete; each individual generation is a little longer or shorter than the mean. Which is entirely what you’d expect, because time-to-reproduction varies. It must do so, even for one individual – offspring are produced serially. This is, in that respect, just like a ‘real’ population. And certainly insufficiently unlike a ‘real’ one to be invalidated. You really think, if I introduced age-related mortality and a juvenile stage that the behaviour of the model would just disappear? You don’t think someone might have noticed this? Bio-Complexity are gagging for content.
In that example,
Here we have 1160 generations on average. We expect fluctuations in generation count of the order of the square root of that, which is 34. So generations should be roughly between 1160−34 = 1126 and 1160+34 = 1194. Lo and behold, that’s what Allan’s simulation yields.
Well my wife's brother is younger than her nephew, and my grandmother married her step-uncle, giving my Dad half-cousins who are also his step-cousins once removed…I think.
Phoodoo comes up with his unique definition of “generation” = one iteration of Allan’s model by imagining that the other N – 1 M&M’s all die, leaving precisely one offspring. It’s a Wright-Fisher model, but the fecundity per generation is 1 + 1 / N, rather than the traditional 2. The other N – 1 M&M’s have all “aged” by one generation, in his mind. He thinks that one iteration corresponds to N births and N deaths, of which only one can have any effect whatsoever.
Pass the brain-bleach.
Most living cousins are not from the same generation as the people to which they are cousins. A cousin is someone with whom one shares an ancestor, often but far from always a child of an aunt or uncle, so if you go back far enough you everyone is your cousin. But it's common to refer to people as your cousin if that person shares a known ancestor with you, usually a great-great-grandparent or someone more recent.
I have posted a new thread Counting generations of M&Ms. Let’s move the discussion of generation counting there.
You’ll need to find a different species to support your assertion. In any given salmon stream/river you will find precocious spawners (jacks/jennies @ 2 yr old) and you will also find adults in the 3, 4 or 5 yr old class with some rivers having salmon returning in up to 6 or 7 years in addition to the earlier year-classes.
My father was the youngest of 12, and he was 35 when I was born. I have had first cousins who were thirty years older than me. Maybe more. Not so many now.
My issue with the term “unguided macroevolution” is that when you step back and look at the general structure of nested hierarchy, it implies guidance, starting with small single cell organisms and branching up in a fairly orderly fashion into more and more complex organisms. Whereas an unguided random process would sometimes go up, sometimes down, sometimes sideways, sometimes backwards, may split here, may reunite here etc. The “tree of life” may no longer be the most appropriate metaphor for the current version of this nested hierarchy, but it still serves an analogous purpose. Would you go so far as to say the growth of a tree is unguided? Surely there are guiding principles at work.
Also, implying that my statement, that a design mechanism may be built into our cells, is analogous to angels pushing the moon and planets across the sky was unhelpful. Here’s a quote from the abstract of the 2008 article “Facilitated Variation: How Evolution Learns from Past Environments to Generalize to New Environments” by Parter, Dashtan, and Alon: “The key observation is that organisms are designed such that random genetic changes are channeled in phenotypic directions that are potentially useful.” (
I hope this doesn’t come across as ungrateful, because I sincerely appreciate the time you and others, especially Allan Miller for his OP and subsequent comments, here at TSZ have taken to help shape my views and insist on critical thinking backed by research.
Thanks. I’ll reserve comment (I’m sure these ideas will come up on a future post) until I’ve actually had a chance to read the book. And, petrushka, I should have posited my own version of an important question instead of simply questioning yours. It’s off-topic, but I might go along the lines of How did organisms develop the ability to generate novelty?
That is a typical conflation of the term guided. The Earth is guided in its orbit by the Sun’s gravity, but nowadays, hardly anyone accounts for the angels who do the actual work. Guided as in a river in its channel, or guided as in the god Hapi causes the Nile to flood.
Integrated complexity at the beginning of life started at the floor, so there was nowhere to go but up. And change in complexity has certainly not been monotonic over evolutionary history. Parasites, for instance, often lose complexity by relying on their hosts for certain functions.
It’s easy to show in simulations that, in an oscillating environment, organisms will develop the ability to adapt to the changing environment. For instance, humans still have hair follicles over their bodies, so it is quite plausible they could evolve a thick fur, if the climate required it (absent clothes, of course).
Sorry I´m late here but only today I had time to play with Mr. Felstein program.
I set the parameters in one case Population 100 and in the other 1000, initial frequency of the alele A I used .99 and .999 as I supposed “a” a neutral ramdom mutation of A and I used a migration rate between the ten independant population of 0.001 and 0.0001.
I run twelve time each conditions and only in one run some populations get close to the fixation of alele “a”.
Can anyone explain this to me?
It sounds like you are using our lab’s teaching program PopG. You have a high initial gene frequency of A, some migration among 10 populations of size 100 or of size 1000, and no mutation or selection.
With no selection, in this situation the probability of fixation of a would be its initial gene frequency. As the 10 populations are connected by migration, they will all fix for A or all fix for a. The probability of all fixing for a would be its initial gene frequency, which is 0.01 or 0.001.
It sounds as if that is what you see. So this is the expected result. Of the initial set of 10 populations, one copy of the gene ends up being the ancestor of every copy. The chance that it happens to be a copy with the a allele is then 0.01 (or 0.001).
Joe Felsenstein,
Thank you.
I proposed a list of at least 7-10 examples of unrealistic findings which could possibly occur in this kind of sampling. Not a single poster here has shown how many times these aberrations specifically occured in say one round of complete fixation starting from zero and ending up with 100, or one round of 1000 (or how many times do they occur in just a pop. of 10) , without any alteration of the data whatsoever, and yet they summarize it by coming back here and saying, we averaged it, see, our program shows it is realistic.
So if you think they are going to acknowledge any deficiencies in any of their claims, I would say don’t hold your breath. I have given up on believing they have any desire to at all debate authentically.
Phoodoo: did you notice that Blas’s question was answered and that Blas acknowledged the answer?
That is simply not true. Here is Allan Miller’s comment addressing these issues. And that’s only one example. People were more than willing to address the issues you kept throwing up.
If you stick around, you can help me build whatever features will clarify this into the v2 of my toy. E.G. I intend to make available the complete history of some/all of the generations.
Tell me what statistics you want to see! A complete history of every change at every time step? Sure!
Just to note that I have been manfully struggling against WordPress’s useless table handling to provide at least some of the stats you requested from the sim. I don’t have to do this, and I am aware that, when and if the stats do appear, they won’t be enough/notwhatyouasked for. The issue is not, as you gracelessly insinuate, secretiveness, bluffing or disinterest, but simply time and my own desire to be thorough. It’s a probabilistic process, and therefore simple statistics from single runs do not convey the distribution adequately. Or, if you prefer, the answer is 42.
Our simulator is working fairly well at this point, and we can create all sorts of tables of results. For what is phoodoo looking specifically?
Hi, phoodoo! I don’t know if you saw it, but in the “Counting generations of M&Ms” thread, I proposed a model which I thought would not display any of the “unrealistic findings” of which you’ve been complaining. Rather than force you to go searching thru that thread to find it, I’ll just cut-and-paste the thing here:
1: In this model, each M&M has its own unique ID label (which, being unique, cannot ever be associated with more than one M&M in the same run); its own color (which it may or may not share with however-many other M&Ms); and a list of its ancestors.
2: Each time the model is run, the guy who’s doing it sets up the initial conditions:
2.i: Initial population. This is an integer. If phoodoo has any insights to give regarding the size of the initial population, we can take those insights into account when setting up the model; otherwise, let’s say the initial population can be anything from (picking arbitrary numbers out of my hat) 1 to 10,000.
2.ii: Number of colors in the initial population. Another integer, whose starting value can be anywhere from 2 to 10, except if phoodoo has anything to say about the number of colors, in which case we again take phoodoo’s insights into account when setting up the model.
2.iii: The distribution of colors. Are all colors evenly distributed, are there 4 times as many red M&Ms as there are green M&Ms, whatever. If you want to have 1 M&M of one color, and every other M&M be a second color shared in common, go for it.
2.iv: Birth rate. This is a percentage, from 0.0% up to 100%. The way it works in this model is, we completely ignore the concept of ‘generations’. On each ‘tick’ of the ‘clock’, the model ‘rolls a die’ separately for each individual M&M. With a birth rate of 10%, M&M #1 has a 10% chance of reproducing; M&M #2 has a 10% chance of reproducing; M&M #3 has a 10% chance of reproducing; and so on, for each member of the M&M ‘breeding population’.
2.v: Death rate. This, too, is a percentage, from 0.0% up to 100%. On each ‘tick’ of the ‘clock’, the model ‘rolls a die’ separately for each individual M&M. With a death rate of 10%, M&M #1 has a 10% chance of dying; M&M #2 has a 10% chance of dying; M&M #3 has a 10% chance of dying; and so on, for each member of the M&M ‘breeding population’. Number of cycles you want the model to, er, cycle through before it stops running. This is an integer. It’s also an upper limit, not an absolute number; if the entire ‘breeding population’ becomes the same color at some point before this number, that’s when the model stops running.
3: After the initial conditions are defined, the model creates the defined number of M&Ms; assigns each M&M a unique ID label; assigns each M&M a color at random, in accordance with the defined number of colors and the defined distribution; and assigns each M&M a blank ‘list’ of ancestors.
4: Once the initial set of M&Ms has been created/defined, the ‘clock’ starts ‘ticking’. The following things occur on each ‘tick’ of the ‘clock’:
4.i: The model stores the current list of M&Ms, including all their various ID labels, all their various colors, and all their various ancestries.
4.ii: The model rolls its ‘birth-rate dice’ for each M&M in the current list. For each M&M whose die-roll said yep, this one reproduced, the model creates a new M&M. This new M&M has a unique ID label. Whatever color the ‘parent’ M&M was, the ‘child’ M&M has that color. Whatever ancestor-list the ‘parent’ M&M had, add the ‘parent’ M&M’s ID label to that list, and the ‘child’ M&M has that extended list.Each newly-created ‘child’ M&M is added to a list of This Year’s Kids.
4.iii: The model rolls its ‘death-rate dice’ for each M&M in the current list. For each M&M whose die-roll said yep, this one’s dead, the model removes that M&M from the current list.
4.iv: The list of This Year’s Kids is appended to the current list of M&Ms.
5: After all the births and deaths of the entire ‘breeding population’ of M&Ms have been accounted for, the model checks to see how many different colors exist in said ‘breeding population’; if there’s only one surviving color, the model stops running.
6: If the model has reached the cycle-number that was defined by the user, the model stops running.
How does this model satisfy your concerns, phoodoo?
Blas’s question was not about the M&M simulation. It was about my population genetics simulation program PopG (which, by the way, simulates a diploid Wright-Fisher discrete-generations model).
Do you have some arguments about the validity of results from PopG?
Just a couple of observations (there is nothing to stop anyone building any model they wish).
The unique labelling is enough to satisfy the basic criterion – ‘can a single ‘black’ M&M fix’? If a unique label fixes, then its ‘colour’ (or whatever else represents phenotype) must do also. Creating sets with the same colour should be less satisfactory for phoodoo – the extreme is a single representative taking over entirely.
In some languages (eg VBA – thumbs nose at Patrick!) – you can make your array open-ended (within language limits), and dimension it at runtime based on parameter.
As I say, I’d make the number of ‘colours’ N – the population size – and actually use unique label as a proxy for colour. The population will pass through a stage (many stages) where it has created ‘pools’ of particular labels all by itself, by increasing the representation of some original members at the expense of others. The process essentially runs ‘in the dark’. We (and any potential selective agent) only see the colours – and labels – when we turn the lights on.
Phoodoo should be satisfied with nothing less than every colour unique, and nominating a single one as the mutant, if the impossibility of fixing a new mutation is a concern. But either way you can separate the colour table from the model – it does not need to keep track. Before you start, you can declare that 1-255 is green, 256-891 is red, etc. When you turn the lights on, and 213 is fixed, it’s green.
I feel that would be process-heavy to no effect. Distributing a birth rate evenly amongst the population resolves to generating the same 1/N chance per member as if you’d just thrown one N-sided die once to get the next breeder, but in a more fiddly manner. Anything else (if I’ve misinterpreted the method) is a kind of bias – individuals examined first have a different chance than those examined later***, because serial positional examination isn’t … ummm … random. It’s either 1/N or it’s biased.
*** meaning that if you always start from the same end, those offspring have entered the population and affect its size as seen by the later ones.
An evolutionist programmer with the courage of their convictions would not have a ‘halting’ parameter other than fixation of an original singleton! The argument is that fixation (of ancestry, if not of the unmutated version of that ancestor’s allele) is inevitable. Phoodoo seemed to be arguing, among other things, that certain runs should loop indefinitely.
In some languages *cough*Common Lisp*cough* you can make your array open-ended (within the resources available on your machine) and increase its dimensions on the fly.
In that, then, they are on a par! You can issue ReDim when you like, as often as you like. It’s about 10% slower than a fixed-dimension array, though, per operation upon it.
(I’m not, of course, holding vba up as the ideal tool – its random number and clock facilities make it particularly crap for this kind of analysis. I found some strange cyclic behaviour in the pseudorandom function on large iterations (even below the hex ‘ffffff’ at which it repeats the previous cycle precisely), which I shook it out of by some additional reseeding shenanigans. But I don’t have another language at home, and it would have the virtue (if phoodoo didn’t keep just staring at the ball when passed it) of being transferable to other home users.)
I think that Phoodoo understands how the simulation works. Xe thinks the simulation would be closer to reality if in each trial, instead of choosing one candy to reproduce in each trial, all the candies left in the bag reproduced in each trial, with one of them reproducing twice. Then each trial would result in a new generation in each lineage (except for the one that was eliminated). And that would make the definition of generation match the common usage of the word.
The original simulation allows for huge variations in the current generation numbers among the lineages. That is somewhat different from reality. Phoodoo thinks that is so different from reality that it makes the simulation inappropriate.
Xis version of the simulation keeps all the generation number the same for each lineage (each trial replaces all 10 candies). That is also different from reality.
I’m no expert, but I think the results of his version would be essentially the same as the results from the version described in the OP, so that it doesn’t really matter which version of the simulation you use (for purposes of demonstrating the inevitability of fixation due to drift).
On a slightly different aspect of the topic, even if a particular neutral mutation does not fix, it may remain in the population for some time, and therefore be available as a potentiating mutation for some later mutation, as in the citrate eating bacteria, right?
Walter Kloover,
I think more accurately it has the intuitive potential for huge variations. I ran 200 replicates of N=10000, counting actual generations, and recording mean, min and max for each run. The maximum variation between lineages was 7% of the mean, the minimum 0.5%. *** Sorting these spreads in ascending order gave a plot with a somewhat sigmoidal shape – the lowest 10 and the highest 30 had a steeper between-run gradient than the bulk (left-hand graph in the link below, plotting spread for each run). Having thus sorted, the narrowest spreads were associated with the longest runs (right-hand graph, mean generation count to fixation for each of those runs). One can debate cause-and-effect – is fixation quicker because the spreads are greater, or are the spreads greater because the runs are shorter? I’d say the latter.
***(Of course the spread of a run is too simple a statistic in itself. There will be a distribution about the mean; the spread is determined entirely by outliers).
I think that’s the case. What is described is the baseline process – you vary N as minimally as possible, 1 in and 1 out, and you apply a completely impartial method to locate both the next death and the next birth. This is the slowest it can run. If there is preference for some alleles over others, fixation occurs more quickly, whichever way you shift that balance. If you allow N to vary more widely, the rate is more conditioned by the minima. If you restrict breeding to a fraction of an individual’s residence time, again you reduce the effective population size (Ne).
That’s the idea, yes. Although drift and selection act against variation, mutation and simultaneous processes of fixation leave a standing pool of variation in any population. While it is around, it can lead to further change, and/or become adaptively favoured in itself.
I agree that there is no real difference between having one M&M replaced at a time, and a version that has that happen but also declares all the other M&Ms to have replaced themselves. It’s still the same Moran model. The transition probabilities are the same either way, so the behavior of the process is the same.
What would happen if you ran this somewhat like WEASEL, but without selection? On each virtual clock tick, every member of the population undergoes some probability of mutation. There would be no ambiguity about generations, because every member would be replaced (most buy unmutated versions).
For this sort of Moran model, that would be one standard way of putting mutation into the model. You could alternatively mutate only the newborn M&N (of course with a correspondingly higher mutation rate). These two methods would not be exactly equivalent to each other, but they wouild be close fpr large N.
I don’t think a ‘neutral WEASEL’ would result in fixation, except trivially every generation. It is effectively a genome with 28 loci. Each ‘generation’ is a set of offspring – with a rather high mutation rate! – from which only one survives. N oscillates from 1 to however-many-offspring to 1 …
It’s an illustration of the power of hill-climbing, rather than population sampling.
I don’t think ‘generation’ is particularly ambiguous anyway, unless one thinks that every zygote-zygote interval should be of the same length. It’s a random variable, and hence a population of lineages will experience different counts in the same unit time.