TradingView
Steversteves
16 ספט׳ 2023 22:20

Cumulative Distribution of a Dataset [SS] 

NVIDIA CorporationNASDAQ

תיאור

This is the Cumulative Distribution of a Dataset indicator that also calculates the Kurtosis and Skewness for a selected dataset and determines the normality and distribution type.

What it does, in pragmatic terms?

In the most simplest terms, it calculates the cumulative distribution function (or CDF) of user-defined dataset.

The cumulative distribution function (CDF) is a concept used in statistics and probability to describe how the probability of a random variable taking on a certain value or less is distributed across the entire range of possible values. In simpler terms, you can conceptualize the CDF as this:

Imagine you have a list of data, such as test scores of students in a class. The CDF helps you answer questions like, "What's the probability that a randomly chosen student scored 80 or less on the test?"

Or in our case, say we are in a strong up or downtrend on a stock. The CDF can help us answer questions like "Based on this current xyz trend, what is the probability that a ticker will fall above X price or below Y price".

Within the indicator, you can manually assess a price of interest. Let's say, for NVDA, we want to know the probability NVDA goes above or below $450. We can enter $450 into the indicator and get this result:



Other functions:

  • Kurtosis and Skewness Functions:


In addition to calculating and plotting the CDF, we can also plot the kurtosis & Skewness.



This can help you look for outlier periods where the distribution of your dataset changed. It can potentially alert you to when a stock is behaving abnormally and when it is more stable and evenly distributed.

  • Tests of normality


The indicator will use the kurtosis and skewness to determine the normality of the dataset. The indicator is programmed to recognize up to 7 different distribution types and alert you to them and the implications they have in your overall assessment.

e.g. #1 AMC during short squeeze:



e.g. #2: BA during the COVID crash:



  • Plotting the standardized Z-Score of the Distribution Dataset


You can also standardize the dataset by converting it into Z-Score format:



  • Plot the raw, CDF results




Two values are plotting, the green and the red. The green represents the probability of a ticker going higher than the current value. The red represents the probability of a ticker going lower than the current value.

Limitations

There are some limitations of the indicator which I think are important to point out. They are:

  • The indicator cannot tell you timelines, it can only tell you the general probability that data within the dataset will fall above or below a certain value.


The indicator cannot take into account projected periods of consolidation. It is possible a ticker can remain in a consolidation phase for a very long time. This would have the effect of stabilizing the probability in one direction (if there was a lot of downside room, it can normalize the data out so that the extent of the downside probability is mitigated). Thus, its important to use judgement and other methods to assess the likelihood that a stock will pullback or continue up, based on the overall probability.

  • The indicator is only looking at an individual dataset.


Using this indicator, you have to omit a large amount of data and look at solely a confined dataset. In a way, this actually improves the accuracy, but can also be misleading, depending on the size and strength of the dataset being chosen. It is important to balance your choice of dataset time with such things as:


a) The strength of the uptrend or downtrend.
b) The length of the uptrend or downtrend.
c) The overall performance of the stock leading into the dataset time period


And that is the indicator in a nutshell.

Hopefully you find it helpful and interesting. Feel free to leave questions, comments and suggestions below.

Safe trades everyone and take care!

הערות שחרור

Had to update the table as there was a problem on certain distribution assessments with the two data tables overlapping. Final product:

הערות שחרור

Quick little re-fix
תגובות
Degen-Dynasty
excellent work friend and simple concepts that very few don't understand or even consider when at the end of the day are pretty much all that matter. What is likely or probable.... based on how things actually function in real time.
Adam-Szafranski
Hey my friend, FIRSTLY...you are absolutely AMAZING. Like seriously. Thank you for everything that you do. Just wanted to share this...for loops are the compiler KILLER. Since the runtime environment steps forward you should just use variables with var (that saves their values as the bars progress) and when the time is between the times set by the user start saving all the values that are needed to your variables in a var variable. That way you're not having to re-gather all the previous bars price data on every updating real-time bar. Unless there's something that Im not seeing in the code that doesn't permit you to go that route. It will also get rid of your max_bars_back issue. If you want to keep the loop you should still change the loops value from 5000 to 4997. That'll also fix the issue. and again you are a wonderful Pinescript artist...I mean just WONDERFUL!!! Keep up the amazing work.
ramtk2000
SteversSteve - On quick question: In the option of 'Cumulative Distribution Density', Plotting of 'Red' and 'Green' values is for the price we have entered for 'Probability calculation' as Input or for the current price of the Asset? Thanks.
Steversteves
@ramtk2000, If you are selecting the "Plot the CDD" function then the plots are just the cumulative distribution density based on the price of the asset.
If you select calculate the probability, it will write the probability that the value falls above or below the dataset in the table itself, it doesn't plot anything, it just gives you the probability within the current distribution written on the table.
ramtk2000
@Steversteves, Great - Thanks Steve, What is your recommendation on time-frame to Set for Data-Set in a 5 Min chart? Or what logic we can use to set Data-Set time-frame to calculate the 'Probability'? Thanks again.
Steversteves
@ramtk2000, As a general rule, I always use tradingviews regression tool and identify the strongest, most current trend and will set that as my data area :-).
jppiechocki
Steversteves- Is it possible to create a version of this indicator using the prior day's high (or low) as a starting point and the daily close as the ending point? Thanks for considering!
MWCLLC
Hello, I'm trying to use this indicator as it looks to be very helpful but am unable to get it to work. I get an ! Stating there are too many bars for the time frame. I've adjusted the values several times but can't seem to get it to work. I'm using TV mobile app
Steversteves
@MWCLLC, The error you are encountering is likely because you are trying to look at a period beyond what Pinescript will support. The Max Bars back it can look at is 5,000, but even that tends to throw errors sometimes, depending on the functions you are asking it to do.
Consider shortening your lookback timeframe, or increasing the the timeframe you are looking at (if you are looking at 1 or 5 minute, consider 1 hour to 4 hour tfs).
Let me know if this helps.
MWCLLC
@MWCLLC, @Steversteves, I've now tried every time frame, as well as changed the dates to very short and long. For some reason it keeps saying the study references too many candles in history, even if I use the daily chart and only a few days. I'm expecting this is an issue with using the mobile app
עוד