Here's an example: Say you want to know if traffic down your street is different for different days of the week. So, as an experiment, you count the cars that come down a street on Monday, Tuesday, and Wednesday for 6 weeks.
Here are your counts for weeks 1-6:
Monday: 66, 56, 72, 70, 71, 59
Tuesday: 54, 44, 56, 49, 64, 39
Wednesday: 45, 44, 43, 40, 50, 56
And here are your averages:
Monday: 65.7 cars
Tuesday: 51 cars
Wednesday: 46.3 cars
What we are really asking here is whether or not the average number of cars that goes by each day is different from the others. In short, are the averages different? Well, the more weeks we counted, the more sure we could be of our averages. That is, if we counted 2 weeks, our estimate wouldn't be as good as if we counted for 8 weeks.
With these data from 6 weeks of counting, we can say that their is a 99.1% chance that the average number of cars on Tuesday is less than Monday. (I did a stats test) In turn, there is a 0.9% chance that our experiment is wrong. If we are wrong, then if you kept counting for more weeks, you would find that there is either no difference between Monday and Tuesday traffic, or that Tuesday actually has more traffic than Monday on average.
Now, comparing Monday and Wednesday, we can even be more sure (a 99.97% chance) that Wednesday's traffic is less than Monday. But, comparing Tuesday and Wednesday, we can only be 69.4% sure that Wednesday's traffic is less than Tuesday's. -that's not too much better than chance.
So we can be pretty sure that Monday and Tuesday's traffic is different, and that Monday and Wednesday's traffic is different, but we can't be so sure that Tuesday and Wednesday's traffic is different.
When setting up an experiment, we might say if our experiment predicts a difference with a 95% chance of being right, we'll accept that as a real difference. So we will call any chance of 95% or more that we find to be "statistically significant".