if ive a 1000 songs,using shuffle whats the probability, on average, how many tracks will be played before a previous song plays again.
tricky probability Q, getting a few different answers, with some variation of accuracy!!
thanks

With the given information one can only assume a truly random shuffle of a single list of tracks with no other variables involved, so the probability is 1/1000 every time a new track is selected.

"Write a program to predict, on average, how many tracks will be played before any of the previously played tracks is repeated" sorry thats the actual wording of the Q.
i understand the 1/1000 chance then theres another 1/1000 chance of getting the song again next time, meaning 1/1000 X 1/1000 chance of getting the same song twice in a row using the shuffle.
not sure about the " how many tracks" part though!!

i understand the 1/1000 chance then theres another 1/1000 chance of getting the song again next time, meaning 1/1000 X 1/1000 chance of getting the same song twice in a row using the shuffle.

That's an analysis of not getting a SPECIFIC song twice in a row in two sequential playings. It's like saying "What are the odds of getting song number 17 twice in row straight away when I press play?". You don't care about a SPECIFIC song, and you don't care about WHEN they are repeated.

Probability is commonly counter-intuitive. Here's a take on the problem (and probability IS tricky - I stand by to be corrected...):


Given a first song X, what is the probability that it is not the same as any previous song? 1

Given that first song, what is the probability that the next song will not be the same? 999/1000.

What is the probability that the next song will not be either of the first two? 998/1000.

What is the probability that the next song will not be any of the first three? 997/1000.

So the probability that none of the first four are the same is:

1 * 999/1000 * 998/1000 * 997/1000 = 0.994

The question becomes how far do you have to go before the product of these probabilities is equal to or less than 0.5?

Just as an interesting value, the chance of playing 1000 songs and not getting ANY repeats is about 4 x 10^(-433), which is pretty small :) And of course, if you go for 1001 songs with no repeats, the final value to be multiplied by is 0/1000, so the probability of playing 1001 songs with no repeats is exactly zero, which is what we intuitively expect.


An alternative way of doing this would be to actually run the situation a million billion times. Get a decent random number generator, and start churning out value between 1 and 1000. Each time you get a number that you already had, make a note of how many numbers you had to generate and start again. Once you've done it a million billion times, you'll have a pretty accurate average answer.


As an aside, I understand that in the early days Apple's random play algorithm on its music players were truly random. People complained about them not being random, and Apple changed the algorithm to a more deterministic one that people felt was more random.

Get a decent random number generator

I would have to be a uniformly distributed random number generator right? Or is that part of the simulation, normal vs. uniform?

ah...probability,my enemy!!!was thinking of using the random number, and giving each song a number, and then recording how many songs it takes until repeat, since that'll take into account the 'average'. is there no simple (probability never is!!) calculation out there?

is there no simple (probability never is!!) calculation out there?

Yes and no. The equation you're looking for is this, solved for y: http://nuclear.warhead.org.uk/rosy/CodeCogsEqn.gif

This is a problem really well suited to computation, which is why it was chosen, I expect.

thinking of using the random number, and giving each song a number, and then recording how many songs it takes until repeat, since that'll take into account the 'average'.

You certainly could do it that way. Be sure to repeat a million billion times to get a good value :) As histrungalot says, make sure it's a uniform distribution (i.e. equal chance of every value coming up each time, not clustered around some central point).

only thing is the random number method will take an age!! and moschops, just how does that linked equation work exactly?

The big symbol on the left indicates a product. You can read it as:

The product, from x=1000 to x=y, of (x/1000), equals 0.5

You can expand it as:

1000/1000 * 999/1000 * 998/1000 * ..... * y/1000 = 0.5

Note that with an integer value of y you're unlikely to get exactly 0.5, so you have to find the value of y for which the value comes out closest and interpret that yourself.

The code to solve this is quite simple. Something along the lines of:

double product = 1;
double x = 1000;
while (product > 0.5)
{
  product = product * ( x / 1000);
  x--;
}

number_of_songs_until_probability_is_50_percent = 1000 - x;

I've not checked this, so be careful that it stops at the right place (isn't off by one or some such) and so forth, but it's this sort of thing.

well i'll try and work around this, and see what i come up with, thanks for the help, likely be back on tonight or tomorrow, if i encounter any problems! thanks

It might be quite neat to use both methods; you can use the multiple runs method to check the calculated answer, and produce neat little graphs and so forth showing how many runs it takes to get better and better answers.

well got sort of a briefing into this, and the answer is less than 50 songs! i need 2 prove this just!
was thinking of picking say song 9, then have a random number generator in loop and when song 9 comes out again, stop the loop and do another loop til the same happens, does that sound about right?

was thinking of picking say song 9, then have a random number generator in loop and when song 9 comes out again, stop the loop and do another loop til the same happens, does that sound about right?

No, that sounds wrong. That will be testing the chance of song 9 coming up twice, given that the first song is song 9. That's not the question. You're supposed to be checking the probability of ANY song coming up twice, and you don't care where it comes out first in the sequence (could be the first song out, could be the 10th song out, could be the 50th song out, you don't care, you'll just stop as soon as the most recent song matches ANY previous song).

If you used your suggestion, and the songs that came up were song 17, 17, 17, 17, 17, 17 (OMG that's the same song six times in a row!), 9, you would say that it took 7 loops to get the same song twice. Clearly not true.

I thought I explained this all pretty clearly. You can calculate the exact value using the equation I gave you, which is perfectly suited to computation. It will require a single loop construct.

To do it experimentally, STOP thinking about code first. STOP. Imagine you had a big bag of tiles, and on each tile is a number, from 1 to 1000. You are to pick one out, and read the number, and then put it back in, and then pick one out, and read the number, and then put it back in, over and over. You stop when you have seen the same number twice. On average, how many times will you pull a tile out of the bag?

THAT is what you are modelling. THIS is the actual difficulty in programming - understanding the problem and being able to think about it in a way that lends itself to computation; THEN comes translating that problem into computation using the syntax available to you.

More edit: I've STILL not actually done a decent check on that equation I gave earlier, so don't trust that too much!

I tried both, your calculation and running a test of 1000 songs randomly played 10 million times. The difference is ~1.3 songs. So, the question I have is "average number of songs played before there is a repeat" is the 50% mark?

Now that I've gone back to look at my equation, I realise that this is just the "Birthday Paradox" with a year of 1000 days, so to speak :)

Wikipedia has a good explanation and a number of equations that come out with the answer I got earlier, which is nice.

So, the question I have is "average number of songs played before there is a repeat" is the 50% mark?

Yes, I think so. There is a number of songs to play at which the probability of hearing a repeat is 50% (well, just over); the question becomes what is that number?

I've done calculation and experiment now, and they come out with the same value most of the time (I did use a better random number generator than the one I found in cstdlib, and even then I had to let it run a lot of times); every so often, the experiment is off by a couple. That's experimentation for you! :)

hopefully on monday il get starting this, and if you're willing moschops and other helpful users, to help me on it, if i encounter any difficulties or any questions that need answered. thanks

I'm late on this but here is my $0.02 anyway. Moschops is right that this is the same as the Birthday Problem.

However, this is NOT a C++ problem, or even a computer programming problem. It is a probability problem. The probability solution involves dividing the 1000 songs into 2 groups. In the first group is the song to be repeated, and the second group has all the other songs. Every time the song player selects a song it can either choose from the first group with probability 1/1000 or the second group with probability 999/1000. The specific problem seems to be to find the number of songs the player selects before it chooses from the first group twice.

This defines the binomial probability. The exact solution to this type of problem is Prob(first group selections equal to 2) = [n!/2!(n-2)!] times (1/1000) raised to power 2 times (999/1000) raised to power (n-2). Find the n that makes the left hand side equal to 1.0.

I think the prof should have assigned how to program the solution rather than find the solution.

In the first group is the song to be repeated, and the second group has all the other songs. Every time the song player selects a song it can either choose from the first group with probability 1/1000 or the second group with probability 999/1000. The specific problem seems to be to find the number of songs the player selects before it chooses from the first group twice.

This is an incorrect assessment of the problem. You have assessed the problem in which a specific song is repeated. This is the same as PJH's suggestion above of picked song 9 to be the repeated song. The question does not concern a specific song to be repeated. The question is about repeating any song.

im glad this Q has sparked a good debate! =]

This is an incorrect assessment of the problem. You have assessed the problem in which a specific song is repeated. This is the same as PJH's suggestion above of picked song 9 to be the repeated song. The question does not concern a specific song to be repeated. The question is about repeating any song.

Are you saying that the 2 groups are 1) played songs and 2) unplayed songs, so that after 2 different songs are selected the probabilities for the third selection is 2/1000 for group 1 and 998/1000 for group 2? And for the fourth song selection, given no repeat for the third selection, the probabilities are 3/1000 for group 1 and 997/1000 for group 2? Et cetera?

If so, then I did misunderstand the problem. It changes the complexity of the probability expression, but doesn't change the fundamental analysis. It is no longer a binomial probability problem. It is a combinations problem. With each selection, for there to be no repeat of a song, there must be 0 selections from the number of songs in group 1 (there is only 1 way to do that) and 1 selection from the remaining unplayed songs in group 2 (there are as many ways to do that as there are remaining unplayed songs). That is a joint probability. That joint probability expression needs to be iterated over the number of selections. The number of iterations it takes for the joint probability of continuing to select only unplayed songs to reach 0.0 is the answer.

Again, this tasks probability knowledge FAR more than programming knowledge.

You now have understood the situation. You still misunderstand the question.

The number of iterations it takes for the joint probability of continuing to select only unplayed songs to reach 0.0 is the answer.

No, the question is not how many songs must be played to guarantee that there will be a repeat. The answer to that is obviously 1000. That is not the question. The question is how many songs need to be played, on average, such that the played songs contain a repeat. The answer is 38.

This question requires a very basic understanding of probability and is an excellent programming question, as a simple loop lends itself very well to answering this question.

The answer is 38

Doing a simulation of 10 million times, the number comes out to be 39.3... Which is right in line with your calculation.

When I went out of my way to get a decent Mersenne twister and ran it a bajillion times, it (often) came out at 38; the probability gradient around that value is very shallow and I wasn't surprised to see it was variations about the value of 38 :)

yeah the answer is less than 50, so the 38-40 answers you are getting are right, il get a crack at monday at it! hoping you's will stick around if any problems come up for me.!

yeah the answer is less than 50, so the 38-40 answers you are getting are right

The answer doesn't have to be reached experimentally - it can be calculated exactly using that equation I put up, which is ideally suited to computation. Definitely do it that way first.

You now have understood the situation. You still misunderstand the question.

No, the question is not how many songs must be played to guarantee that there will be a repeat. The answer to that is obviously 1000. That is not the question. The question is how many songs need to be played, on average, such that the played songs contain a repeat. The answer is 38.

This question requires a very basic understanding of probability and is an excellent programming question, as a simple loop lends itself very well to answering this question.

Not according to the probability textbooks I read that give a more than very basic understanding of probability.

On the first draw of a song, the probability of repeating P(repeat) is 0.
On the second draw of a song, the probability of repeating given no prior repeat, P(repeat|no prior repeat), is 1/1000.
On the third draw of a song, the probability of repeating given no prior repeat, P(repeat|no prior repeat), is 2/1000.
On the fourth draw of a song, the probability of repeating given no prior repeat, P(repeat|no prior repeat), is 3/1000.

And so on.

These are conditional probabilities and are computed using the formula: P(A|B) = P(A and B) / P(B). A precise solution requires re-expressing the above conditional probabilities using that formula in order to get the unconditional probability of P(repeat). I don't want to get into that level of detail.

A less precise solution uses the logic that summing the probabilities for each draw until the sum is 1.0 will yield the number of draws necessary to ensure a repeated song. That means the numerators of the above fractions need to sum to 1,000.

A math trick is that the sum of whole numbers from 0 to n is given by the formula: sum = n(n+1)/2. So setting that formula equal to 1,000 and then solving for n will give a ballpark answer. n(n+1)/2 = 1,000 => n(n+1) = 2,000 => n(squared) + n - 2,000 = 0. This quadratic equation has roots close to -45 and +44 [(n-44) * (n+45) = 0.]

My ballpark answer is 44 draws.

My point is that it is not necessary to run a simulation with 10 million iterations. An analytical solution is attainable but it does require a FAR more than basic understanding of probability. Note that the analytical solution does not require any programming, though it is made EASIER by programming, and certainly complex scenarios could only be solved quickly through writing a program.

In a prior post, I stated setting the probability of no repeats to 0.0. Moschops, you belittled that. What you seemed to not grasp is the probability concept of complementarity. The probability of at least 1 repeated song is equal to 1.0 minus the probability of NO repeated songs. Sometimes it is easier and faster to find the probability of a value being 0.0 than the probability of a value being greater than 0.0.

Moschops is right that this is the same as the Birthday Problem

As you agree that this is the Birthday Problem, why do you dispute Moschops logic? And the simulation was to give experimental proof. This was a programming question originally.
And if you have P(B) which is no repeat at n songs, don't you have the answer?
And isn't the P(repeat on the nth song) independent of from P(no repeats of the nth-1 songs)?

The Wiki page on the Birthday problem presents a full explanation.

Perhaps I should just write out the numbers. This is very basic probability. The point is that the chance of having a repeat somewhere in the songs drawn is equal to 1 minus the chance of having no repeats. That is very, very simple. It's stated above by Nat who then wanders off into far more complicated territory.

P(repeat somewhere in all drawn so far) = 1 - P(no repeats in all drawn so far)

So, starting at the first draw:

P(repeat somewhere in all drawn so far on first draw) = 1 - (1000/1000) = 0

P(repeat somewhere in all drawn so far on second draw) = 1 - (1000/1000)*(999/1000) = 0.001

P(repeat somewhere in all drawn so far on third draw) = 1 - (1000/1000)*(999/1000)*(998/1000) = 0.002998

P(repeat somewhere in all drawn so far on fourth draw) = 1 - (1000/1000)*(999/1000)*(998/1000)*(997/1000) = 0.005989006

...

...

...

P(repeat somewhere in all drawn so far on 38th draw) = 1 - (1000/1000)*(999/1000)*(998/1000)*(997/1000) ... * (963/1000) = 0.509317

On the 38th draw, the probability of having a repeat somewhere in all the songs drawn so far exceeds 0.5

This can be knocked together with a simple while loop in a few minutes. We've pretty much done the OP's homework for him, but what the hell; it's all right there on Wiki's Birthday problem page anyway.

Now that I've gone back to look at my equation, I realise that this is just the "Birthday Paradox" with a year of 1000 days, so to speak :)

Wikipedia has a good explanation and a number of equations that come out with the answer I got earlier, which is nice.

So, the question I have is "average number of songs played before there is a repeat" is the 50% mark?

Yes, I think so. There is a number of songs to play at which the probability of hearing a repeat is 50% (well, just over); the question becomes what is that number?

No you are wrong. These are two completely different questions. The answer to average number of songs is given at the same wiki page, but way down below.

Do you see the difference?

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.