til: Benford's law: real life number are not evenly disribued, 1 occur 30% of the time

ooli@lemmy.world · 1 year ago

til: Benford's law: real life number are not evenly disribued, 1 occur 30% of the time

funnystuff97@lemmy.world · 1 year ago

Great video on Benford’s Law here. Matt goes into a good amount of detail outlining why this occurs, why it doesn’t always apply, and what it means if data does/doesn’t follow the Law.

PipedLinkBot@feddit.rocks · 1 year ago

Here is an alternative Piped link(s): https://piped.video/etx0k1nLn78

Piped is a privacy-respecting open-source alternative frontend to YouTube.

I’m open-source, check me out at GitHub.

blanketswithsmallpox@kbin.social · 1 year ago

Neat. Thanks for the share.

blazera@kbin.social · 1 year ago

What the heck is real life number

agamemnonymous@sh.itjust.works · 1 year ago

An actual measured data point, as opposed to a randomly generated number. Also this principle applies specifically to the first digit. Overall the title is a complete mess.

Basically, when you gather a bunch of data points about real world quantitative phenomena (e.g. town population, lake surface area, etc), you find this distribution curve of leading digits where 1 is something like 30% most frequent, gradually decreasing down to 9 being least frequent.

This is called Benford’s Law, it’s basically an emergent property about how orders of magnitude work. It’s useful because you can use it to detect fake data, since if your data faker doesn’t know about it they’ll generate fake data that looks random but doesn’t follow this distribution.

affiliate@lemmy.world · 1 year ago

something that isn’t an imaginary life number

themoonisacheese@sh.itjust.works · 1 year ago

This is used to catch tax fraud. People who forge reciepts tend to use random numbers, so they stand out as outliers, and they get caught that way.

Akasazh@feddit.nl · 1 year ago

Title needs some work. I would suggest:

TIL about Benfords law: In many real life data-sets the leading number is ‘1’ 30% of the time.

Also you could’ve included the wiki link in the post, so people could read up on what you just wrote:

https://en.wikipedia.org/wiki/Benford's_law

JCSpark · 1 year ago

This is a bit weird. I was just listening to Infinity 2 today (great book. Totally recommend), and there’s a section where the characters use Benford’s Law to prove reality. I then had to look it up myself.

Just a super weird coincidence…unless Lemmy is listening to me…

Ddhuud@lemmynsfw.com · edit-2 1 year ago

We are not listening to you Travis.

That had a 1 in a million chance, but I had to try.

Siegfried@lemmy.world · 1 year ago

It was worth the shot if you ask me, Michael

Amaltheamannen@lemmy.ml · 1 year ago

This is called the Baader–Meinhof phenomenon, or frequency illusion.

chickenhuggit@sh.itjust.works · 1 year ago

but… why?

bane_killgrind@lemmy.ml · 1 year ago

https://en.m.wikipedia.org/wiki/Benford's_law

Look at the logarithmic scale. This law has to do with number sets in the wild, so apparently the scaling is flat over the set of data they examined. If you look at the distribution of the number sets over the logarithmic scale, they are evenly distributed. If you looked at the same numbers on a linear scale, they would become more and more sparse as they grow in size.

Pogbom@lemmy.world · 1 year ago

Cool! Now imagine I’ve got severe brain damage… can you explain that again?

Alien Nathan Edward@lemm.ee · 1 year ago

The further left you are in a number, the more likely it is that the digit will be small

sharpiemarker@feddit.de · 1 year ago

What a great explanation! Thanks for dumbing it down.

blanketswithsmallpox@kbin.social · 1 year ago

Ahhhh yes. That’s what my smooth brain needed. Dank ah.

ShakeThatYam@lemmy.world · 1 year ago

In real life distributions you are always going to have situations where you fill up the bigger digits last, so it becomes less likely they show up. The best example of this is the population of cities. For cities between 100k and 999k you’ll have a larger number of cities with 100k-300k because cities of those sizes are smaller and more common.

lunarul@lemmy.world · 1 year ago

Benford’s law is about the leading digit, so it doesn’t matter if the numbers are rounded or not.

bane_killgrind@lemmy.ml · 1 year ago

No problem!

So if you have a small amount of something, you’ll have maybe 2, maybe 3, or 4, or 5, or 6, 7, 8, 9, 10, 11, 12, 13, 14 or so. If you have a medium amount of something, the numbers might be 20, or 30 ish, or 40 ish, 50s, 60s, 70s, 80s, 90s, 100ish, 110ish, 120 or so, around 130. Larger amounts of stuff end up being 200ish, 300, 400, 500, 600, 700, 800, 900, 1000ish, around 1100, 1200 something, 1300

All the numbers I’ve mentioned are about evenly spaced on this logarithmic scale. You can see that a bunch of them start with 1 just because of how big we think they are! It turns out there is a math reason for this, instead of just being about the weird way humans think.

Pogbom@lemmy.world · 1 year ago

deleted by creator

spaduf@lemmy.blahaj.zone · 1 year ago

Does anybody know if this is a feature of a decimal system?

Bolt@lemmy.world · 1 year ago

I think it’s a feature of all positional notation systems.

lunarul@lemmy.world · 1 year ago

The distribution shown in this post is for base 10, but Benford’s Law includes distributions for other bases too. The wiki article linked in another comment goes into detail on that too.

Mouselemming@sh.itjust.works · 1 year ago

If you were in Base 12 or something it would still lean towards 1 but the percentage would be a little different.

davidgro@lemmy.world · 1 year ago

The percentages change. At the lower end, in binary every number that isn’t 0 itself starts with a 1.

This fact is actually used to save one bit in the format that computers usually use to store floating point (fractional instead of integer) numbers.

J12@lemmy.world · 1 year ago

So if I rolled a 10 sided dice 1000 times 30% of those rolls would be a 1?

bane_killgrind@lemmy.ml · 1 year ago

J12@lemmy.world · 1 year ago

Thanks. Now I understand

ssboomman@lemm.ee · 1 year ago

From what I understand it works like this.

Let’s say you have a series of numbers that represent real life data. In general the first number of all of these numbers will be a 1, 30% of the time.

J12@lemmy.world · 1 year ago

Thanks, that makes sense. I must be missing a link or article on my client otherwise I would’ve read it lol

halvo317@sh.itjust.works · 1 year ago

Such as “1000 rolls”

idiomaddict@feddit.de · 1 year ago

It applies to situations with more than one order of magnitude being counted, such as d20 rolls, 55% of which will start with a 1.

halvo317@sh.itjust.works · 1 year ago

The “1” of the “1000” is the real life number. He didn’t pick “785” or “462”.

NotSpez@lemm.ee · 1 year ago

I don’t know why you’re being downvoted.

GenderNeutralBro@lemmy.sdf.org · edit-2 1 year ago

It works on things that operate on a logarithmic scale. It’s odd how many real-world things fit that mold that don’t intuitively seem like they would.

Another factor promoting it in real-world data sets is that they often have restricted ranges that favor lower numbers. Days of the month, for example, only go from 1 to 31. There’s only one way for the leading digit to be 4, but there are eleven ways for the leading digit to be 1.

Another type of data includes values of varying ranges, which also favors lower leading numbers. Street numbers start at 1 and go up, ending at some point within a fairly large range in the real world. All of these ranges will have their fair share of leading 1s. They will NOT all have a fair share of leading 2s (what if it ended before 20?), and as you go up it gets progressively less likely. So if you took all street addresses, you’d expect to see more leading 1s than 9s.

Your theoretical dice roll is not such a case. You would expect a uniform distribution of leading numbers. This would hold true with a 99-sided die as well.

GigglyBobble@kbin.social · 1 year ago

While that’s true with a 10-sided die 20% of your rolls will start with a one and all other digits only have a 10% chance.

GenderNeutralBro@lemmy.sdf.org · 1 year ago

Oh, yes. Thanks for the correction!

ooli@lemmy.world · 1 year ago

No it is a property of real life thing. It come from the fact that most thing in real world, dont go over 30 or 300 so often. Like number of houses in a street.

monk@lemmy.unboiled.info · 1 year ago

Digits, not numbers.