Lanzar bolas en contenedores, estimar un límite inferior de su probabilidad

14

Esta no es una tarea, aunque parece. Cualquier referencia es bienvenida. :-)

Escenario: Hay bolas $n$ diferentes y contenedores $n$ diferentes (etiquetados de 1 a $n$ , de izquierda a derecha). Cada bola se lanza de manera independiente y uniforme en los contenedores. Sea $f(i)$ el número de bolas en el $i$ -ésimo contenedor. Deje $E_i$ denotar el siguiente evento.

Para cada $j\le i$ , $\sum_{k\le j}{f(k)} \le j-1$

Es decir, los primeros $j$ contenedores (los $j$ contenedores más a la izquierda ) contienen menos de $j$ bolas, para cada $j\le i$ .

Pregunta: Estima $\sum_{i<n}{Pr(E_i)}$ , en términos de $n$ ? Cuando $n$ va al infinito. Se prefiere un límite inferior. No creo que exista una fórmula fácil de calcular.

Ejemplo: $\lim\limits_{n\to\infty}{Pr(E_1)}=\lim\limits_{n\to\infty}{(\frac{n-1}{n})^n}=\frac{1}{e}$ . Nota $Pr(E_n)=0$ .

Mi suposición: supongo que $\sum_{i<n}{Pr(E_i)}=\ln n$ , cuando $n$ va al infinito. Mientras yo contemplaba los primeros $\ln n$ elementos de la suma.

reference-request co.combinatorics pr.probability Peng Zhang
fuente

1

Parece un subcase del problema de cumpleaños ...

Gopi

@Gopi No puedo convencerme de que mi pregunta es un problema restringido de cumpleaños. ¿Puedes explicarlo explícitamente? Muchas gracias. Nota: La restricción está en la suma de bolas en los primeros

contenedores, no en el número de contenedores en un contenedor específico.

j

$j$

Peng Zhang

De hecho, mi error, después de volver a leer el artículo de Wikipedia sobre el problema del cumpleaños, me di cuenta de que estaba considerando otro problema que se adaptó del problema del cumpleaños.

Gopi

2

Algunas ideas incorrectas ... Por lo tanto, piense en cómo codificar un estado: lea los contenedores de izquierda a derecha. Si el primer contenedor tiene i bolas, genere una secuencia de i ones, seguido de un 0. Haga esto para todos los contenedores de izquierda a derecha. Su codificación parece ser que está interesado en el i mayor, de modo que esta cadena binaria (que tiene n ceros yn unos) por primera vez contiene más unos que ceros. Ahora, vamos a hacer un salto de destino y generar el 0 y 1, con igual probabilidad

. (Esto podría ser una completa tontería). Este problema está relacionado con los números catalanes y las palabras Dyck. Y...???

1 / 2

$1/2$

Sariel Har-Peled

44

No veo en su definición por qué importa que las bolas sean diferentes. Además, la interpretación de la cadena tiene en cuenta el hecho de que los contenedores son diferentes.

Sariel Har-Peled

11

EDITAR: (2014-08-08) Como Douglas Zare señala en los comentarios, el argumento a continuación, específicamente el 'puente' entre las dos probabilidades, es incorrecto. No veo una forma directa de solucionarlo. Dejaré la respuesta aquí porque creo que todavía proporciona algo de intuición, pero sé que no es cierto en general.

Pr (E_{m}) \leq \prod_{l = 1}^{m} Pr (F_{l})

$\Pr(E_m) \le \prod_{l=1}^{m}\Pr(F_l)$

Esta no será una respuesta completa, pero con suerte tendrá suficiente contenido para que usted o alguien más conocedor que yo pueda terminarlo.

Considere la probabilidad de que exactamente bolas caigan en los primeros (de ) contenedores: $k$ $l$ $n$

(\binom{n}{k}) {(\frac{l}{n})}^{k} {(\frac{n - l}{n})}^{n - k}

$\binom{n}{k} \left( \frac{l}{n} \right)^k \left(\frac{n-l}{n} \right)^{n-k}$

Calcule la probabilidad de que menos de bolas caigan en los primeros contenedores : $l$ $l$ $F_l$

Pr (F_{l}) = \sum_{k = 0}^{l - 1} (\binom{n}{k}) {(\frac{l}{n})}^{k} {(\frac{n - l}{n})}^{n - k}

$\Pr(F_l) = \sum_{k=0}^{l-1} \binom{n}{k} \left( \frac{l}{n} \right)^k \left( \frac{n-l}{n} \right)^{n-k}$

La probabilidad de que ocurra el evento, , anterior es menor que si consideramos cada uno de los $E_l$ $F_l$ eventos ocurren independientemente y todos a la vez. Esto nos da un puente entre los dos:

\begin{array}{lll} Pr (E_{m}) & \leq & \prod_{l = 1}^{m} Pr (F_{l}) \\ = & \prod_{l = 1}^{m} (\sum_{k = 1}^{l - 1} (\binom{n}{k}) ({\frac{l}{n}}^{k}) {(\frac{n - l}{n})}^{n - k}) \\ = & \prod_{l = 1}^{m} F (l - 1; n, \frac{l}{n}) \end{array}

$\begin{array}{lll} \Pr(E_m) & \le & \prod_{l=1}^m \Pr(F_l) \\ & = & \prod_{l=1}^m \left( \sum_{k=1}^{l-1} \binom{n}{k} \left( \frac{l}{n}^k \right) \left( \frac{n-l}{n} \right)^{n-k} \right) \\ & = & \prod_{l=1}^m F(l-1; n, \frac{l}{n} ) \end{array}$

Donde es lafunción de distribución acumulativa para la distribución binomialcon $F(l-1; n, \frac{l}{n})$ . Simplemente leyendo unas pocas líneas en la página de Wikipedia y notando que, podemos usarla desigualdad de Chernoffpara obtener: $p = \frac{l}{n}$ $(l-1 \le p n)$

\begin{array}{lll} Pr (E_{m}) & \leq & \prod_{l = 1}^{m} \exp [- \frac{1}{2 l}] \\ = & \exp [- \frac{1}{2} \sum_{l = 1}^{m} \frac{1}{l}] \\ = & \exp [- \frac{1}{2} H_{m}] \\ \leq & \exp [- \frac{1}{2} (\frac{1}{2 m} + \ln (m) + γ)] \end{array}

$\begin{array}{lll} \Pr(E_m) & \le & \prod_{l=1}^m \exp\left[ -\frac{1}{2l} \right] \\ & = & \exp\left[ - \frac{1}{2} \sum_{l=1}^m \frac{1}{l} \right] \\ & = & \exp\left[ - \frac{1}{2} H_m \right] \\ & \le & \exp\left[ -\frac{1}{2} \left( \frac{1}{2 m} + \ln(m) + \gamma \right) \right] \end{array}$

Donde es el número armónico ' , es la constante de Euler-Mascheroni y la desigualdad para el $H_m$ $m$ $\gamma$ se toma de la página enlazada MathWorld de Wolfram. $H_m$

No preocuparse por el $e^{-1/4m}$ de factores, esto finalmente nos da:

Pr (E_{m}) \leq \frac{e^{- γ / 2}}{\sqrt{m}}

$\Pr(E_m) \le \frac{ e^{ -\gamma/2}}{\sqrt{m}}$

A continuación se muestra un gráfico log-log de un promedio de 100,000 instancias para en función de con la función $n=2048$ $m$ también trazado para referencia: $\frac{e^{ -\gamma/2}}{\sqrt{m}}$

enter image description here

Mientras las constantes están desactivadas, la forma de la función parece ser correcta.

A continuación se muestra un gráfico log-log para variar siendo cada punto el promedio de 100,000 instancias en función de : $n$ $m$

enter image description here

Finalmente, llegamos a la pregunta original que quería que contestara, ya que sabemos que tenemos: $\Pr(E_m) \propto \frac{1}{\sqrt{m}}$

\sum_{i < n} Pr (E_{i}) \propto \sqrt{n}

$\sum_{i<n} \Pr(E_i) \propto \sqrt{n}$

Y como verificación numérica, a continuación se muestra un gráfico log-log de la suma, , frente al tamaño de la instancia, . Cada punto representa el promedio de la suma de 100,000 instancias. La función se ha trazado para la referencia: $S$ $n$ $x^{1/2}$

enter image description here

Si bien no veo una conexión directa entre los dos, los trucos y la forma final de este problema tienen muchos puntos en común con el problema de cumpleaños, como se adivinó inicialmente en los comentarios.

usuario834
fuente

44

¿Cómo se obtiene

? Por ejemplo, para

, calculo que

P r (E_{2}) \leq P r (F_{1}) \times P r (F_{2})

$Pr(E_2) \le Pr(F_1)\times Pr(F_2)$

n = 100

$n=100$

P r (E_{2}) = 0.267946 > 0.14761 = P r (F_{1}) P r (F_{2}) .

$Pr(E_2) = 0.267946 \gt 0.14761 = Pr(F_1)Pr(F_2).$ Si le dicen que el primer contenedor está vacío, ¿esto hace que sea más o menos probable que los dos primeros contenedores tienen como máximo

1

$1$ ball? It's more likely, so

P r (F_{1}) P r (F_{2})

$Pr(F_1)Pr(F_2)$ is an underestimate.

Douglas Zare

@DouglasZare, I've verified your calculations, you're correct. Serves me right for not being more rigorous.

user834

15

The answer is $\Theta(\sqrt{n})$ .

First, let's compute $E_{n-1}$ .

Let's suppose we throw $n$ balls into $n$ bins, and look at the probability that a bin has exactly $k$ balls in it. This probability comes from the Poisson distribution, and as $n$ goes to $\infty$ the probability that there are exactly $k$ balls in a given bin is $\frac{1}{e} \frac{1}{ k!}$ .

Now, let's look at a different way of distributing balls into bins. We throw a number of balls into each bin chosen from the Poisson distribution, and condition on the event that there are $n$ balls total. I claim that this gives exactly the same distribution as throwing $n$ balls into $n$ bins. Why? It is easy to see that the probability of having $k_j$ balls in the $j$ ^th bin is proportional to $\prod_{j=1}^n \frac{1}{k_j!}$ in both distributions.

So let's consider a random walk where at each step, you go from $t$ to $t+1-k$ with probability $\frac{1}{e}\frac{1}{k!}$ . I claim that if you condition on the event that this random walk returns to 0 after $n$ steps, the probability that this random always stays above $0$ is the probability that the OP wants to calculate. Why? This height of this random walk after $s$ steps is $s$ minus the number of balls in the first $s$ bins.

If we had chosen a random walk with a probability of $\frac{1}{2}$ of going up or down $1$ on each step, this would be the classical ballot problem, for which the answer is $\frac{1}{2(n-1)}$ . This is a variant of the ballot problem which has been studied (see this paper), and the answer is still $\Theta\left(\frac{1}{n}\right)$ . I don't know whether there is an easy way to compute the constant for the $\Theta\left(\frac{1}{n}\right)$ for this case.

The same paper shows that when the random walk is conditioned to end at height $k$ , the probability of always staying positive is $\Theta(k/n)$ as long as $k = O(\sqrt{n})$ . This fact will let us estimate $E_s$ for any $s$ .

I'm going to be a little handwavy for the rest of my answer, but standard probability techniques can be used to make this rigorous.

We know that as $n$ goes to $\infty$ , this random walk converges to a Brownian bridge, i.e., Brownian motion conditioned to start and end at $0$ . From general probability theorems, for $\epsilon n < s< (1-\epsilon)n$ , the random walk is roughly $\Theta(\sqrt{n})$ away from the $x$ -axis. In the case it has height $t>0$ , the probability that it has stayed above $0$ for the entire time before $s$ is $\Theta(t/s)$ . Since $t$ is likely to be $\Theta(\sqrt{n})$ when $s = \Theta(n)$ , we have $E_s \approx \Theta(1/\sqrt{n})$ .

Peter Shor
fuente

4

[Edit 2014-08-13: Thanks to a comment by Peter Shor, I have changed my estimate of the asymptotic growth rate of this series.]

My belief is that $\lim_{n\to\infty} \sum_{i<n} \Pr(E_i)$ grows as $\sqrt{n}$ . I do not have a proof but I think I have a convincing argument.

Let $B_i = f(i)$ be a random variable that gives the number of balls in bin $i$ . Let $B_{i,j} = \sum_{k=i}^j B_k$ be a random variable that gives the total number of balls in bins $i$ through $j$ inclusive.

You can now write $\Pr(E_i) = \sum_{b<j} \Pr(E_j \wedge B_{1,j} = b) \Pr(E_i \mid E_j \wedge B_{1,j} = b)$ for any $j < i$ . To that end, let's introduce the functions $\pi$ and $g_i$ .

π (j, k, b) = Pr (B_{j} = k ∣ B_{1, j - 1} = b) = (\binom{n - b}{k}) {(\frac{1}{n - j + 1})}^{k} {(\frac{n - j}{n - j + 1})}^{n - b - k}

$\pi(j, k, b) = \Pr(B_j = k \mid B_{1,j-1} = b) = \binom{n-b}{k}\left(\frac{1}{n-j+1}\right)^k\left(\frac{n-j}{n-j+1}\right)^{n-b-k}$

\begin{aligned} g_{i} (j, k, b) & = Pr (E_{i} \land B_{j, i} \leq k ∣ E_{j - 1} \land B_{1, j - 1} = b) \\ = {\begin{cases} 0 & k < 0 \\ 1 & k >= 0 \land j > i \\ \sum_{l = 0}^{j - b - 1} π (j, l, b) g_{i} (j + 1, k - l, b + l) & o t h e r w i s e \end{cases} \end{aligned}

$\begin{aligned} g_i(j, k, b) \; &= \Pr(E_i \wedge B_{j,i} \le k \mid E_{j-1} \wedge B_{1,j-1} = b) \\ &= \begin{cases} 0 & k < 0 \\ 1 & k >= 0 \wedge j > i \\ \sum_{l=0}^{j-b-1} \pi(j, l, b) g_i(j + 1, k - l, b + l) & \mathrm{otherwise} \end{cases}\end{aligned}$

We can write $\Pr(E_i)$ in terms of $g_i$ :

Pr (E_{i}) = g_{i} (1, i - 1, 0)

$\Pr(E_i) = g_i(1, i - 1, 0)$

Now, it's clear from the definition of $g_i$ that

Pr (E_{i}) = \frac{(n - i)^{n - i + 1}}{n^{n}} h_{i} (n)

$\Pr(E_i) = \frac{(n-i)^{n-i+1}}{n^n}h_i(n)$

where $h_i(n)$ is a polynomial in $n$ of degree $i - 1$ . This makes some intuitive sense too; at least $n - i + 1$ balls will have to be put in one of the $(i+1)$ th through $n$ th bins (of which there are $n-i$ ).

Since we're only talking about $Pr(E_i)$ when $n\to\infty$ , only the lead coefficient of $h_i(n)$ is relevant; let's call this coefficient $a_i$ . Then

lim_{n \to \infty} Pr (E_{i}) = \frac{a_{i}}{e^{i}}

$\lim_{n\to\infty} \Pr(E_i) = \frac{a_i}{e^i}$

How do we compute $a_i$ ? Well, this is where I'll do a little handwaving. If you work out the first few $E_i$ , you'll see that a pattern emerges in the computation of this coefficient. You can write it as

a_{i} = μ_{i} (1, i - 1, 0)

$a_i = \mu_i(1, i-1, 0)$ where

μ_{i} (j, k, b) = {\begin{cases} 0 & k < 0 \\ 1 & k >= 0 \land i > j \\ \sum_{l = 0}^{j - b - 1} \frac{1}{l!} μ_{i} (j + 1, k - l, b + l) & o t h e r w i s e \end{cases}

$\mu_i(j, k, b) = \begin{cases} 0 & k < 0 \\ 1 & k >= 0 \wedge i > j \\ \sum_{l = 0}^{j-b-1} \frac{1}{l!} \mu_i(j + 1, k - l, b+ l) & \mathrm{otherwise} \end{cases}$

Now, I wasn't able to derive a closed-form equivalent directly, but I computed the first 20 values of $Pr(E_i)$ :

N       a_i/e^i
1       0.367879
2       0.270671
3       0.224042
4       0.195367
5       0.175467
6       0.160623
7       0.149003
8       0.139587
9       0.131756
10      0.12511
11      0.119378
12      0.114368
13      0.10994
14      0.105989
15      0.102436
16      0.0992175
17      0.0962846
18      0.0935973
19      0.0911231
20      0.0888353

Now, it turns out that

Pr (E_{i}) = \frac{i^{i}}{i! e^{i}} = Pois (i; i)

$\DeclareMathOperator{\Pois}{Pois} \Pr(E_i) = \frac{i^i}{i! e^i} = \Pois(i; i)$

where $\Pois(i; \lambda)$ is the probability that a random variable $X$ has value $i$ when it's drawn from a Poisson distribution with mean $\lambda$ . Thus we can write our sum as

lim_{n \to \infty} \sum_{i = 1}^{n} Pr (E_{i}) = \sum_{x = 1}^{\infty} \frac{x^{x}}{x! e^{x}}

$\lim_{n\to\infty} \sum_{i=1}^n \Pr(E_i) = \sum_{x = 1}^{\infty} \frac{x^x}{x!e^x}$

Wolfram Alpha tells me this series diverges. Peter Shor points out in a comment that Stirling's approximation allows us to estimate $\Pr(E_i)$ :

lim_{n \to \infty} Pr (E_{x}) = \frac{x^{x}}{x! e^{x}} \approx \frac{1}{\sqrt{2 π x}}

$\lim_{n\to\infty} \Pr(E_x) = \frac{x^x}{x!e^x} \approx \frac{1}{\sqrt{2 \pi x}}$

Let

ϕ (x) = \frac{1}{\sqrt{2 π x}}

$\phi(x) = \frac{1}{\sqrt{2 \pi x}}$

Since

$\lim_{x\to\infty}\frac{\phi(x)}{\phi(x+1)} = 1$
$\phi(x)$ is decreasing
$\int_1^n \phi(x)dx \to \infty$ as $n \to \infty$

our series grows as $\int_1^n \phi(x) dx$ (See e.g. Theorem 2). That is,

\sum_{i = 1}^{n} P r (E_{i}) = Θ (\sqrt{n})

$\sum_{i=1}^n Pr(E_i) = \Theta\left(\sqrt{n}\right)$

ruds
fuente

1

Wolfram Alpha is wrong. Use Stirling's formula. It says that,

x^{x} / (x! e^{x}) \approx 1 / \sqrt{2 π x}

$x^x/(x! e^x)\approx 1/\sqrt{2\pi x}$ .

Peter Shor

@PeterShor Thanks! I've updated the conclusion thanks to your insight, and now I am in agreement with the other two answers. It's interesting to me to see 3 quite different approaches to this problem.

ruds

4

Exhaustively checking the first few terms (by examining all n^n cases) and a bit of lookup shows that the answer is https://oeis.org/A036276 / $n^n$ . This implies that the answer is $\sim n^{\frac{1}{2}} \frac{\sqrt{\pi}}{2}$ .

More exactly, the answer is:

\frac{n!}{2 n^{n}} \sum_{k = 0}^{n - 2} \frac{n^{k}}{k!}

$\frac{n!}{2 n^n} \sum_{k=0}^{n-2}\frac{n^k}{k!}$ and there is no closed-form answer.

Haran
fuente

Oeis is pretty awesome

Thomas Ahle

Lanzar bolas en contenedores, estimar un límite inferior de su probabilidad

Respuestas: