Suponga que tiene una matriz de tamaño n ≥ 6 que
algorithms
arrays
searching
darylnak
fuente
fuente
Respuestas:
Puede crear una matriz B adicional de tamaño n . Inicialmente establezca todos los elementos de la matriz en 0 . Luego recorra la matriz de entrada A y aumente B [ A [ i ] ] en 1 para cada i . Después de eso, simplemente verifique la matriz B : repita sobre A y si B [ A [ i ] ] > 1, entonces se repite A [ i ] . Lo resuelves en O ( n )B n 0 A B[A[i]] i B A B[A[i]]>1 A[i] O(n) tiempo a costa de la memoria que es O ( n ) y porque sus enteros están entre 1 y n - 5 .O(n) 1 n−5
fuente
La solución en la respuesta de fade2black es la estándar, pero utiliza el espacio O ( n ) . Puede mejorar esto al espacio O ( 1 ) de la siguiente manera:O(n) O(1)
This algorithm assumes the RAM machine model, in which basic arithmetic operations on O(logn)O(logn) -bit words take O(1)O(1) time.
Another way to formulate this solution is along the following lines:
This solution shows that if we replace 5 by dd , then we get (I believe) a O(d2n)O(d2n) algorithm using O(d2)O(d2) space, which performs O(dn)O(dn) arithmetic operations on integers of bit-length O(dlogn)O(dlogn) , keeping at most O(d)O(d) of these at any given time. (This requires careful analysis of the multiplications we perform, most of which involve one operand of length only O(logn)O(logn) .) It is conceivable that this can be improved to O(dn)O(dn) time and O(d)O(d) space using modular arithmetic.
fuente
There's also a linear time and constant space algorithm based on partitioning, which may be more flexible if you're trying to apply this to variants of the problem that the mathematical approach doesn't work well on. This requires mutating the underlying array and has worse constant factors than the mathematical approach. More specifically, I believe the costs in terms of the total number of values nn and the number of duplicates dd are O(nlogd)O(nlogd) and O(d)O(d) respectively, though proving it rigorously will take more time than I have at the moment.
Algorithm
Start with a list of pairs, where the first pair is the range over the whole array, or [(1,n)][(1,n)] if 1-indexed.
Repeat the following steps until the list is empty:
Cursory analysis of time complexity.
Steps 1 to 6 take O(j−i)O(j−i) time, since finding the minimum and maximum and partitioning can be done in linear time.
Every pair (i,j)(i,j) in the list is either the first pair, (1,n)(1,n) , or a child of some pair for which the corresponding subarray contains a duplicate element. There are at most d⌈log2n+1⌉d⌈log2n+1⌉ such parents, since each traversal halves the range in which a duplicate can be, so there are at most 2d⌈log2n+1⌉2d⌈log2n+1⌉ total when including pairs over subarrays with no duplicates. At any one time, the size of the list is no more than 2d2d .
Consider the work to find any one duplicate. This consists of a sequence of pairs over an exponentially decreasing range, so the total work is the sum of the geometric sequence, or O(n)O(n) . This produces an obvious corollary that the total work for dd duplicates must be O(nd)O(nd) , which is linear in nn .
To find a tighter bound, consider the worst-case scenario of maximally spread out duplicates. Intuitively, the search takes two phases, one where the full array is being traversed each time, in progressively smaller parts, and one where the parts are smaller than ndnd so only parts of the array are traversed. The first phase can only be logdlogd deep, so has cost O(nlogd)O(nlogd) , and the second phase has cost O(n)O(n) because the total area being searched is again exponentially decreasing.
fuente
Leaving this as an answer because it needs more space than a comment gives.
You make a mistake in the OP when you suggest a method. Sorting a list and then transversing it O(nlogn)O(nlogn) time, not O(n2logn)O(n2logn) time. When you do two things (that take O(f)O(f) and O(g)O(g) respectively) sequentially then the resulting time complexity is O(f+g)=O(maxf,g)O(f+g)=O(maxf,g) (under most circumstances).
In order to multiply the time complexities, you need to be using a for loop. If you have a loop of length ff and for each value in the loop you do a function that takes O(g)O(g) , then you'll get O(fg)O(fg) time.
So, in your case you sort in O(nlogn)O(nlogn) and then transverse in O(n)O(n) resulting in O(nlogn+n)=O(nlogn)O(nlogn+n)=O(nlogn) . If for each comparison of the sorting algorithm you had to do a computation that takes O(n)O(n) , then it would take O(n2logn)O(n2logn) but that's not the case here.
In case your curious about my claim that O(f+g)=O(maxf,g)O(f+g)=O(maxf,g) , it's important to note that that's not always true. But if f∈O(g)f∈O(g) or g∈O(f)g∈O(f) (which holds for a whole host of common functions), it will hold. The most common time it doesn't hold is when additional parameters get involved and you get expressions like O(2cn+nlogn)O(2cn+nlogn) .
fuente
There's an obvious in-place variant of the boolean array technique using the order of the elements as the store (where
arr[x] == x
for "found" elements). Unlike the partition variant that can be justified for being more general I'm unsure when you'd actually need something like this, but it is simple.This just repeatedly putsn since each swap makes its exit condition correct.
arr[idx]
at the locationarr[idx]
until you find that location already taken, at which point it must be a duplicate. Note that the total number of swaps is bounded by nfuente
while
loop runs in constant time on average. Otherwise, this isn't a linear-time algorithm.Subtract the values you have from the sum ∑ni=1i=(n−1)⋅n2∑ni=1i=(n−1)⋅n2 .
So, after Θ(n)Θ(n) time (assuming arithmetic is O(1), which it isn't really, but let's pretend) you have a sum σ1σ1 of 5 integers between 1 and n:
x1+x2+x3+x4+x5=σ1x1+x2+x3+x4+x5=σ1
Supposedly, this is no good, right? You can't possibly figure out how to break this up into 5 distinct numbers.
Ah, but this is where it gets to be fun! Now do the same thing as before, but subtract the squares of the values from ∑ni=1i2∑ni=1i2 . Now you have:
x12+x22+x32+x42+x52=σ2x12+x22+x32+x42+x52=σ2
See where I'm going with this? Do the same for powers 3, 4 and 5 and you have yourself 5 independent equations in 5 variables. I'm pretty sure you can solve for →xx⃗ .
Caveats: Arithmetic is not really O(1). Also, you need a bit of space to represent your sums; but not as much as you would imagine - you can do most everything modularly, as long as you have, oh, ⌈log(5n6)⌉⌈log(5n6)⌉ bits; that should do it.
fuente
Easiest way to solve the problem is to create array in which we will count the apperances for each number in the original array, and then traverse all number from 11 to n−5n−5 and check if the number appears more than once, the complexity for this solution in both memory and time is linear, or O(N)O(N)
fuente
Map an array to
1 << A[i]
and then XOR everything together. Your duplicates will be the numbers where corresponding bit is off.fuente
fuente
collated[item].append(item)
runs in constant time. Is that really true?