Como se calcula a distribuição das estatísticas de rolagem de Matt Colville?

28

Especificamente, the Matt Colville way of rolling stats é:

  1. Roll 4d6, drop the lowest value die for 1 stat;
  2. If this roll is lower than 8, reroll it;
  3. Repeat steps 1 and 2 until you have a set of 6 stats greater than 8;
  4. If there are not at least 2 values of 15 or higher in this set, drop it entirely and start over.

I've written some AnyDice code for calculating this process's distribution but I got stuck at this:

function: ROLL:n reroll BAD:s as REROLL:d {
  if ROLL = BAD { result: REROLL }
  result: ROLL
}
function: ROLL:d reroll BAD:s {
  loop I over {1..20} {
    ROLL: [ROLL reroll BAD as ROLL]
  }
  result: ROLL
}
X: [highest 3 of 4d6]
Y: 6 d[dX reroll {3..7}]
loop P over {1..6} {
 output P @ Y named "Ability [P]"
}

This gives me the probabilities for all my abilities individually, but does not take into account the discarding of the set if there are not at least 2 15s. How should I make it take that into account? (Or how do I calculate this distribution in another way?)

por Bogdan Ionică 17.09.2019 / 14:04

4 respostas

Os seguintes anydice program will show you what the statistical distribution of ability score results for the Colville method looks like.

function: roll ROLL:n min MIN:n{
 if ROLL < MIN { result: d{} }
 result: ROLL
}

function: colville ARRAY:s INDEX:n {
  if (ARRAY >= 15) < 2 { result: d{} }
  result: [email protected]
}

ROLL: [highest 3 of 4d6]
SCORE: [roll ROLL min 8]
ARRAY: 6dSCORE

output [colville ARRAY 1] named "Score 1"
output [colville ARRAY 2] named "Score 2"
output [colville ARRAY 3] named "Score 3"
output [colville ARRAY 4] named "Score 4"
output [colville ARRAY 5] named "Score 5"
output [colville ARRAY 6] named "Score 6"

The trick here is that we don't actually want to have to reroll anything, because recursive functions are expensive and take forever (plus there's a limit to how far Anydice will recurse). Fortunately we actually have a really neat shortcut we can use in the specific case of rerolling until we get a result that's in the range we actually want; we can use a function as a filter to check the value is in the desired range, which returns the input value if it is, or the so-called empty die, d {}, if it is not.

The result of the empty die is basically discarded when anydice calculates probabilities, so we are shown results based only on rolls which met our parameters; since we were just going to reroll anyway until we got a result that was in our range, this is statistically identical to actually rerolling (potentially forever).

So we have two functions, one of which discards results for individual ability scores unless they are 8 or higher, one of which discards arrays of ability scores if there aren't two scores of 15 or more.

The other trick is that the latter function also takes an index to return one of those ability scores since unfortunately we can't get anydice to return a sequence from a function, only a flat number, and we so have to use the index to inspect the individual rolls; fortunately the generated sequence is automatically sorted in descending order by default, so we can just iterate through each position to build a complete distribution.

That gives us a result that looks like this when graphed:

Colville stat distribution graph from anydice

This seems to agree perfectly with Ryan Thompson's R-based answer so I feel pretty confident I haven't messed up how this works anywhere.

17.09.2019 / 21:40

I went ahead and implemented this in R. You can see the code here: https://gist.github.com/DarwinAwardWinner/34dd19f302bd1ef24310f6098dc3218d

This code enumerates every possible roll of "4d6 drop lowest, reroll 7 or lower" in order to determine the exact probabilities of rolling each stat from 8 to 18. Then it uses these to compute the exact probability of rolling each possible set of 6 stats, rejecting stat sets without at least 2 stats 15 or higher. Interestingly, about 54% of stat rolls with all 8 or higher will not have 2 stats of 15 or higher, which means that for each stat roll, you have better than even odds of needing to reroll from scratch. Depending on how much your players enjoy rolling stats, this may be an advantage or a disadvantage.

Here's a plot of the distributions of each stat. A is whichever stat rolled highest, B is the 2nd highest, and so on, with F being the lowest stat. The Y axis the the probability of rolling a certain number. For example, your 2nd highest stat has about a 57% chance of being a 15, and a 0% chance of being anything lower than that (by definition).

Stat distribution plot

We can also get some statistics on the distributions. \$Q_{25}\$ e \$Q_{75}\$ are the 25th and 75th percentiles.

\begin{array}{l|r r r r r r} \textbf{Ability} & \textbf{Min} & \boldsymbol Q_{25} & \textbf{Median} & \textbf{Mean} & \boldsymbol Q_{75} & \textbf{Max} \\ \hline \text{A} & 15 & 16 & 16 & 16.5 & 17 & 18 \\ \text{B} & 15 & 15 & 15 & 15.5 & 16 & 18 \\ \text{C} & 8 & 13 & 14 & 14.0 & 15 & 18 \\ \text{D} & 8 & 12 & 13 & 12.7 & 14 & 18 \\ \text{E} & 8 & 10 & 11 & 11.3 & 12 & 18 \\ \text{F} & 8 & 9 & 10 & 9.8 & 11 & 18 \\ \end{array}

Of course, my code computes the full distribution for all possible stat rolls, so if you're curious about other facets of the data, such as point buy value, feel free to run the code and experiment.

17.09.2019 / 19:02

Through my methods, I found 5,236 unique different dice pools that result from the Colville Stat Distribution. I've posted it aqui, as the length of the table violate's Stack Exchange's hard limit on post length. You can review it yourself and run data processing on it if you want to examine some stats on it that haven't been provided here or in other answers.

How was this Generated?

We need to first generate an array for the 4d6 drop 1 roll. I detailed a general purpose method for generating these kinds of arrays aqui, and I'm going to use the same process here, although I'm shortcutting past the details because I don't want to manually step through the whole process. Look at that post to see how this starts.

In short, we need to generate a 4d6 array that also preserves the die that was lowest for each sum. It'll look something like this:

\begin{array}{r|rr} \textbf{4d6 Drop 1 (pre drop)} & \textbf{Odds} \\ \hline \text{[4,1]} & 1 \\ \text{[5,1]} & 4 \\ \text{[6,1]} & 10 \\ \text{[7,1]} & 20 \\ \text{[8,1]} & 34 \\ \text{[9,1]} & 52 \\ \text{[10,1]} & 70 \\ \text{[11,1]} & 84 \\ \text{[12,1]} & 90 \\ \text{[13,1]} & 88 \\ \text{[14,1]} & 78 \\ \text{[15,1]} & 60 \\ \text{[16,1]} & 40 \\ \text{[17,1]} & 24 \\ \text{[18,1]} & 12 \\ \text{[19,1]} & 4 \\ \text{[8,2]} & 1 \\ \text{[9,2]} & 4 \\ \text{[10,2]} & 10 \\ \text{--Snip--} & \text{--Snip--} & \textit{... We need to conserve space...} \\ \text{[23,5]} & 4 \\ \text{[24,6]} & 1 \\ \end{array}

And then for each one we just subtract out that lowest roll, giving us the final roll.

\begin{array}{l|rr} \text{4d6 Drop 1} & \text{Odds} \\ \hline \text{[3]} & 1\\ \text{[4]} & 4\\ \text{[5]} & 10\\ \text{[6]} & 21\\ \text{[7]} & 38\\ \text{[8]} & 62\\ \text{[9]} & 91\\ \text{[10]} & 122\\ \text{[11]} & 148\\ \text{[12]} & 167\\ \text{[13]} & 172\\ \text{[14]} & 160\\ \text{[15]} & 131\\ \text{[16]} & 94\\ \text{[17]} & 54\\ \text{[18]} & 21\\ \end{array}

We simply chop off the results that are lower than 8. We always reroll when we encounter them, so the odds of the other results are unaffected.

\begin{array}{l|rr} \text{4d6 Drop 1 (≥8 only)} & \text{Odds} \\ \hline \text{[8]} & 62\\ \text{[9]} & 91\\ \text{[10]} & 122\\ \text{[11]} & 148\\ \text{[12]} & 167\\ \text{[13]} & 172\\ \text{[14]} & 160\\ \text{[15]} & 131\\ \text{[16]} & 94\\ \text{[17]} & 54\\ \text{[18]} & 21\\ \end{array}

Then we start multiplying this array against itself. My method involved multiplexing these numbers, but regardless of how you do it, you'll end up with something like this (after removing sets that do not contain at least two ≥15 rolls):

\begin{array}{l|r} \textbf{6x(4d6D1≥8)} & \textbf{Odds} \\ \hline \text{[15, 15, 8, 8, 8, 8]} & 3803650531440\\ \text{[16, 15, 8, 8, 8, 8]} & 5458674045120\\ \text{[17, 15, 8, 8, 8, 8]} & 3135834025920\\ \text{[18, 15, 8, 8, 8, 8]} & 1219491010080\\ \text{[16, 16, 8, 8, 8, 8]} & 1958455573440\\ \text{[17, 16, 8, 8, 8, 8]} & 2250140446080\\ \text{[18, 16, 8, 8, 8, 8]} & 875054617920\\ \textit{... You get the Idea} & \textit{Look at the link I posted} \\ \textit{There's 5236 rows of this} & \textit{above for the full set} \\ \end{array}

That, finally, is the entire distribution of rolls that can be gained from this method of rolling stats.

What can we learn from it?

Well, we could compare the total statpoint distribution, against a normal 6x(4d6D1) rolled stats:

Posted as an image because I was overloading Mathjax

So it turns out the Colville Method has a pretty consderable boost to the overall sum, raising the average from 73.468 to 79.867. This isn't surprising though: all the data it acts on (individual rolls below 8, any set that is does not have at least two ≥15 rolls) are specifically bad rolls, so it naturally tends towards higher results.

If you use the dataset I posted above, you're welcome to do your own analysis on it. I recommend pesado use of Microsoft Excel or some other Spreadsheet software.

17.09.2019 / 20:57

Ignore the actual order things are done, use an order that's easier to calculate, and don't be scared to approximate

Instead of first rolling stats and then rerolling if there aren't two 15+s, we can achieve exactly the same result by first rolling two stats that devo be 15+ and then rolling the rest 'normally'.

To do this in anydice, what we want to do is take the collection of possible outcomes that is what 'highest 3 of 4d6' means and just remove all the parts that are under 15.

The easiest way to do this is manually. Looking at the results of the aforementioned distribution, we can see that '15' has a 10.11% chance of occurring, '16' a 7.25% chance, '17' a 4.17% chance, and '18' a 1.62% chance. These odds are truncated to the hundredths place, but we are going to consider that level of error acceptable. A sequence with 1011 '15's, 725 '16's, 417 '17's, and 162 '18's, then, can function as a die that gives us our two best values.

Using repetition, we can populate a sequence by using the following code:

output {15:1011,16:725,17:417,18:162}

Next, we need to fix your code. It doesn't actually get you what you are looking for, I think, since it has an approximately infinitessimal chance of outputting numbers lower than 8. That may be fine with you, but we can also use truncation to get a (in my opinion) much cleaner and about equally accurate system for the remaining 4 ability scores:

output {8:478,9:702,10:941,11:1142,12:1289,13:1327,14:1235,15:1011,16:725,17:417,18:162}

You can do something like output [highest 1 of 6d {8:478,9:702,10:941,11:1142,12:1289,13:1327,14:1235,15:1011,16:725,17:417,18:162}] to confirm that it gives the same results.

To look at each ability score, we can just pull the appropriate number from a set of rolls, remembering that the rolls higher than 8 instead of 15 are also no better than the 3rd highest roll of such a sequence. So we end up with:

output [highest 1 of 2d{15:1011,16:725,17:417,18:162}] named "highest stat"
output 2 @ 2d{15:1011,16:725,17:417,18:162} named "2nd highest stat"

output [email protected]{8:478,9:702,10:941,11:1142,12:1289,13:1327,14:1235,15:1011,16:725,17:417,18:162} named "highest non-forced stat"
output [email protected]{8:478,9:702,10:941,11:1142,12:1289,13:1327,14:1235,15:1011,16:725,17:417,18:162} named "2nd highest non-forced stat"
output [email protected]{8:478,9:702,10:941,11:1142,12:1289,13:1327,14:1235,15:1011,16:725,17:417,18:162} named "2nd lowest stat"
output [email protected]{8:478,9:702,10:941,11:1142,12:1289,13:1327,14:1235,15:1011,16:725,17:417,18:162} named "lowest stat"

Which gives resultados within 1 percentage point of the analytic value 1(approximately 10% error).


  1. graças a @Carcer for the analytic value program.
17.09.2019 / 22:50