Probabilidad de una serie de k éxitos en una secuencia de n ensayos de Bernoulli

13

Estoy tratando de encontrar la probabilidad de obtener 8 intentos seguidos correctos en un bloque de 25 intentos, tiene 8 bloques totales (de 25 intentos) para obtener 8 intentos correctos seguidos. La probabilidad de obtener una prueba correcta basada en adivinar es 1/3, después de obtener 8 en una fila correcta, los bloques terminarán (por lo que técnicamente no es posible obtener más de 8 en una fila correcta). ¿Cómo haría para encontrar la probabilidad de que esto ocurra? He estado pensando en el sentido de usar (1/3) ^ 8 como la probabilidad de obtener 8 en una fila correcta, hay 17 posibilidades posibles de obtener 8 en una fila en un bloque de 25 intentos, si multiplico 17 posibilidades * 8 bloques Obtengo 136, ¿1- (1- (1/3) ^ 8) ^ 136 me da la probabilidad de obtener 8 en una fila correcta en esta situación o me estoy perdiendo algo fundamental aquí?

AcidNynex
fuente
1
Creo que el problema con el argumento dado es que los eventos considerados no son independientes. Por ejemplo, considere un solo bloque. Si te digo que (a) no hay plazo de ocho años que se inicia en la posición 6, (b) no es una carrera comienza en la posición 7 y (c) no hay ninguna carrera comienza en la posición 8, ¿qué te dice eso acerca la probabilidad de una carrera que comienza en las posiciones, digamos, 9 a 15?
cardenal

Respuestas:

14

Al realizar un seguimiento de las cosas, puede obtener una fórmula exacta .

Deje que la probabilidad de éxito y k = 8 el número de éxitos en una fila que desea contar. Estos están arreglados para el problema. Los valores variables son m , el número de intentos restantes en el bloque; y j , el número de éxitos sucesivos ya observados. Deje que la posibilidad de lograr eventualmente k éxitos seguidos antes de que se agoten las pruebas m se escriba f p , k ( j , m ) . Buscamos f 1 / 3 , 8 (p=1/3k=8mjkmfp,k(j,m)f1/3,8(0,25).

Suppose we have just seen our jth success in a row with m>0 trials to go. The next trial is either a success, with probability p--in which case j is increased to j+1--; or else it is a failure, with probability 1p--in which case j is reset to 0. In either case, m decreases by 1. Whence

fp,k(j,m)=pfp,k(j+1,m1)+(1p)fp,k(0,m1).

Como condiciones iniciales tenemos los resultados obvios para m 0 ( es decir , ya hemos visto k en una fila) y f p , k ( j , m ) = 0 para k - j > m ( es decir , no quedan suficientes pruebas para obtenerfp,k(k,m)=1m0kfp,k(j,m)=0kj>mken una fila). Ahora es rápido y sencillo (usando programación dinámica o, debido a que los parámetros de este problema son muy pequeños, recursividad) para calcular

fp,8(0,25)=18p817p945p16+81p1736p18.

Cuando esta rendimientos 80.897 mil / 43.046721 millones 0,0018793 .p=1/380897/430467210.0018793

El Rcódigo relativamente rápido para simular esto es

hits8 <- function() {
    x <- rbinom(26, 1, 1/3)                # 25 Binomial trials
    x[1] <- 0                              # ... and a 0 to get started with `diff`
    if(sum(x) >= 8) {                      # Are there at least 8 successes?
        max(diff(cumsum(x), lag=8)) >= 8   # Are there 8 successes in a row anywhere?
    } else {
        FALSE                              # Not enough successes for 8 in a row
    }
}
set.seed(17)
mean(replicate(10^5, hits8()))

After 3 seconds of calculation, the output is 0.00213. Although this looks high, it's only 1.7 standard errors off. I ran another 106 iterations, yielding 0.001867: only 0.3 standard errors less than expected. (As a double-check, because an earlier version of this code had a subtle bug, I also ran 400,000 iterations in Mathematica, obtaining an estimate of 0.0018475.)

Este resultado es de menos de un décimo de la estimación de en la pregunta. Pero tal vez no he entendido plenamente: otra interpretación de "usted tiene 8 bloques totales ... para corregir 8 ensayos en una fila" es que el ser iguales respuesta buscó 1 - ( 1 - f 1 / 3 , 8 ( 0 , 25 ) ) 8 ) = 0.0149358 ... .1(1(1/3)8)1360.02051(1f1/3,8(0,25))8)=0.0149358...

whuber
fuente
13

While @whuber's excellent dynamic programming solution is well worth a read, its runtime is O(k2m) with respect to total number of trials m and the desired trial length k whereas the matrix exponentiation method is O(k3log(m)). If m is much larger than k, the following method is faster.

Ambas soluciones consideran el problema como una cadena de Markov con estados que representan el número de intentos correctos al final de la cadena hasta el momento, y un estado para lograr los intentos correctos deseados en una fila. La matriz de transición es tal que ver una falla con probabilidad lo devuelve al estado 0 y, de lo contrario, con probabilidad 1 - p lo lleva al siguiente estado (el estado final es un estado absorbente). Al elevar esta matriz a la n ésima potencia, el valor en la primera fila, y la última columna es la probabilidad de ver k = 8 cabezas en una fila. En Python:p1pnk=8

import numpy as np

def heads_in_a_row(flips, p, want):
    a = np.zeros((want + 1, want + 1))
    for i in range(want):
        a[i, 0] = 1 - p
        a[i, i + 1] = p
    a[want, want] = 1.0
    return np.linalg.matrix_power(a, flips)[0, want]

print(heads_in_a_row(flips=25, p=1.0 / 3.0, want=8))

produce 0.00187928367413 según se desee.

Neil G
fuente
10

According to this answer, I will explain the Markov-Chain approach by @Neil G a bit more and provide a general solution to such problems in R. Let's denote the desired number of correct trials in a row by k, the number of trials as n and a correct trial by W (win) and an incorrect trial by F (fail). In the process of keeping track of the trials, you want to know whether you already had a streak of 8 correct trials and the number of correct trials at the end of your current sequence. There are 9 states (k+1):

A: We have not had 8 correct trials in a row yet, and the last trial was F.

B: We have not had 8 correct trials in a row yet, and the last two trials were FW.

C: We have not had 8 correct trials in a row yet, and the last three trials were FWW.

H: We have not had 8 correct trials in a row yet, and the last eight trials were FWWWWWWW.

I: We've had 8 correct trials in a row!

BAp=1/31p=2/3ABC1/32/3AI, we stay there.

9×9 transition matrix M (as each column of M sums to 1 and all entries are positive, M is called a left stochastic matrix):

M=(2/32/32/32/32/32/32/32/301/30000000001/30000000001/30000000001/30000000001/30000000001/30000000001/30000000001/31)

Each column and row corresponds to one state. After n trials, the entries of Mn give the probability of getting from state j (column) to state i (row) in n trials. The rightmost column corresponds to the state I and the only entry is 1 in the right lower corner. This means that once we are in state I, the probability to stay in I is 1. We are interested in the probability of getting to state I from state A in n=25 steps which corresponds to the lower left entry of M25 (i.e. M9125). All we have to do now is calculating M25. We can do that in R with the matrix power function from the expm package:

library(expm)

k <- 8   # desired number of correct trials in a row
p <- 1/3 # probability of getting a correct trial
n <- 25  # Total number of trials 

# Set up the transition matrix M

M <- matrix(0, k+1, k+1)

M[ 1, 1:k ] <- (1-p)

M[ k+1, k+1 ] <- 1

for( i in 2:(k+1) ) {

  M[i, i-1] <- p

}

# Name the columns and rows according to the states (A-I)

colnames(M) <- rownames(M) <- LETTERS[ 1:(k+1) ]

round(M,2)

     A    B    C    D    E    F    G    H I
A 0.67 0.67 0.67 0.67 0.67 0.67 0.67 0.67 0
B 0.33 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0
C 0.00 0.33 0.00 0.00 0.00 0.00 0.00 0.00 0
D 0.00 0.00 0.33 0.00 0.00 0.00 0.00 0.00 0
E 0.00 0.00 0.00 0.33 0.00 0.00 0.00 0.00 0
F 0.00 0.00 0.00 0.00 0.33 0.00 0.00 0.00 0
G 0.00 0.00 0.00 0.00 0.00 0.33 0.00 0.00 0
H 0.00 0.00 0.00 0.00 0.00 0.00 0.33 0.00 0
I 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.33 1

# Calculate M^25

Mn <- M%^%n
Mn[ (k+1), 1 ]
[1] 0.001879284

The probability of getting from state A to state I in 25 steps is 0.001879284, as established by the other answers.

COOLSerdash
fuente
3

Here is some R code that I wrote to simulate this:

tmpfun <- function() {
     x <- rbinom(25, 1, 1/3)  
     rx <- rle(x)
     any( rx$lengths[ rx$values==1 ] >= 8 )
}

tmpfun2 <- function() {
    any( replicate(8, tmpfun()) )
}

mean(replicate(100000, tmpfun2()))

I am getting values a little smaller than your formula, so one of us may have made a mistake somewhere.

Greg Snow
fuente
Does your function include trials where it is impossible to get 8 in a row right, e.g. where the "run" started on trial 20?
Michelle
Most likely me, my R simulation is giving me smaller values as well. I'm just curious if there is an algebraic solution to solve this as a simple probability issue in case someone disputes a simulation.
AcidNynex
1
I think this answer would be improved by providing the output you obtained so that it can be compared. Of course, including something like a histogram in addition would be even better! The code looks right to me at first glance. Cheers. :)
cardinal
3

Here is a Mathematica simulation for the Markov chain approach, note that Mathematica indexes by 1 not 0:

M = Table[e[i, j] /. {
    e[9, 1] :> 0,
    e[9, 9] :> 1,
    e[_, 1] :> (1 - p),
    e[_, _] /; j == i + 1 :> p,
    e[_, _] :> 0
  }, {i, 1, 9}, {j, 1, 9}];

x = MatrixPower[M, 25][[1, 9]] // Expand

This would yield the analytical answer:

18p817p945p16+81p1736p18

Evaluating at p=1.03.0

x /. p -> 1/3 // N

Will return 0.00187928

This can also be evaluated directly using builtin Probability and DiscreteMarkovProcess Mathematica functions:

Probability[k[25] == 9, Distributed[k, DiscreteMarkovProcess[1, M /. p -> 1/3]]] // N

Which will get us the same answer: 0.00187928

Hossam Karim
fuente