Hi, I have a question here:

imagine we want to write a function that checks if a given binary tree is a valid binary search tree ( the tree is in the form of objects/embedded references). What is the best time complexity possible for such a function?

I'm new to this Big-Oh stuff; I'm not sure how to apply Big-Oh to algorithms and programs. If there are n nodes, to check if a binary tree is a BST, you will have to do n comparisons. So, if you put 4 more nodes into the tree, you will have to do 4 more comparisons. So would it have to be O(n)?
Thanks

Well this is essentially a transversal problem. You need to verify that
the left child of the parent is less than, and the right child of the parent
has to be greater than the parent. So essentially you will have to
transverse the whole binary tree. Whats the usual transversal, algorithm
that you are use to ? Whats its complexity ?

thanks a lot for replying to my thread, I appreciate it.

I went ahead and coded this:

(In python)

def is_bst(root):
    
    a = b = True
    
    if not(root.left or root.right):
        return True
    
    if root.left:
        a = root.left.data < root.data and is_bst(root.left)
        
    if root.right:
        b = root.right.data > root.data and is_bst(root.right)
        
    return a and b

This might not even be the "best possible time complexity" possible as the question states.. But anyway, I have some issues with these "complexity measures". I'm currently in an introductory computer science/programming course, and we haven't really been through this rigourously at all. What are we even trying to measure? Sometimes I hear the number of "steps" a program has to execute, and sometimes I hear the number of time it takes. But if we're to measure the time it takes, that would depend on a lot of things wouldn't it? I don't know how long it takes for my program to call a function, or to access an object from a pointer.. et c.

since you asked, here is another traversal algorithm for printing in-order:

def print_bst(root):    
    
    if root.left:
        print_bst(root.left)
        
    print root.data

    if root.right:
        print_bst(root.right)

thanks a lot, both my code assumes that root is not None

>> What are we even trying to measure

You are trying to measure BigO(), Big ) measures the growth rate of an algorithm. For example say there is a function f(n) , and that it follows the following inequality :

O( log(n) ) < f(n) < O(n)

the above inequality says that the function f(n) is bounded on top by
O(n), that means its worst performance is O(n). That O(n) is bigO(n).

It approximates how fast or slow the performance of an algorithm is.

To start your algorithm, depth-first-search is a well known one.
Its not easy to evaluate its performance. Thats why you can
quote whats already been proven. Take a look here and
read about its performance.

thanks, I understand big-O (I've seen it used in math), but I don't understand the actual analysis of "growth rate". What is the growth rate with respect to? For example, in my is_bst function, will my growth rate be with respect to the extra number of comparisons it would have to make as I increase the size of my tree? the number of recursive calls? thanks

thanks, I understand big-O (I've seen it used in math), but I don't understand the actual analysis of "growth rate". What is the growth rate with respect to? For example, in my is_bst function, will my growth rate be with respect to the extra number of comparisons it would have to make as I increase the size of my tree? the number of recursive calls? thanks

Its respect to the number n. Which is the size of the function.
As n gets bigger say for a O(n^2), it will take more time to compute
and finish computing the algorithm

You are trying to measure BigO(), Big ) measures the growth rate of an algorithm. For example say there is a function f(n) , and that it follows the following inequality :

O( log(n) ) < f(n) < O(n)

the above inequality says that the function f(n) is bounded on top by
O(n), that means its worst performance is O(n). That O(n) is bigO(n).

This is complete nonsense. You can't say "O(log(n)) < f(n)". That doesn't make any sense. It only makes sense to say that functions are less than or equal to big O bloboids, such as the statement, "f(n) <= O(log(n))".

def is_bst(root):
    
    a = b = True
    
    if not(root.left or root.right):
        return True
    
    if root.left:
        a = root.left.data < root.data and is_bst(root.left)
        
    if root.right:
        b = root.right.data > root.data and is_bst(root.right)
        
    return a and b

This algorithm is incorrect, by the way.

Its respect to the number n. Which is the size of the function.

Wat. No it's not.

thanks, I understand big-O (I've seen it used in math), but I don't understand the actual analysis of "growth rate". What is the growth rate with respect to?

The running time of the function is described with respect to some measurement of the "size" of the input. Often the measurement is the number of elements in the data structure, or the amount of memory used, or something with multiple variables. For example, a function that counts the number of spaces in a string would have a running time proportional to the length of the string. We'd say it's "O(n)" where it's implicit that n is referring to the number of characters in the string. What it really means, though, is that if we consider for each value n, all the strings with length n, and then pick the string of that length for which our spacecount function has the worst performance, we could make a function out of that: T(n) = the maximum running time of spacecount(s) for all strings s for which length(s) = n.

And then T(n) is O(n) in the usual sense.

And we use the length of the string as the measurement because that's one we can use to describe the running time. We couldn't use the number of spaces or the number of z's in the string, because we can't use that to get an upper bound on the running time. (You could use those measurements to get silly lower bounds on the running time, though, since the running time is proportional to the length of the string, and those values are less than the length of the string.) Often other measurements make sense to use. When you have two inputs s and t, you might use min{length(s), length(t)} or length(s)+length(t) or something.

Sometimes we use different measurements. For example, graphs often have two variables, V and E, the number of vertices and edges. Sometimes multivariable big O notation is used in that case, and sometimes the reasoning is done in terms of the size V+E or some other number.

The thing to note is that big O notation is always written with respect to some implicit variable. O(n^2) is can be more explicitly written O_{n -> infinity}(n^2). That's just implicit though. (You might be aware that other fields of math use big O notation in situations where the independent variables tend to zero, and then you'd write O_{h -> 0}(h^2) or something.)

When you see things like O(log(min{size(s), size(t)})), it gets a bit weird: that would be more explicitly written as:

O_{min{size(s),size(t)} -> infinity}(log(min{size(s), size(t)})).

You could also interpret that as using multivariable big O notation, as size(s) -> infinity and size(t) -> infinity. Which would you choose? It doesn't matter, the meaning would be the same.

>> This is complete nonsense. You can't say "O(log(n)) < f(n)".

Why not ? Why can't a function be bounded by another function. For
example :

x < 2x < 3x

Thats certainly true isn't it? After all thats, why Big Omega was created.

>> Wat. No it's not.

Can you explain to me the difference between what I said and what
you said here "measurement of the "size" of the input. Often the measurement is the number of elements in the data structure, or the amount of memory used, or something with multiple variables. ".

I just started learning about big O and its families, so I'm new to this
as well. So forgive me if I said anything incorrect.

>> This is complete nonsense. You can't say "O(log(n)) < f(n)".

Why not ? Why can't a function be bounded by another function.

Functions can be bounded by other functions, but there's no way to make sense of the notation you're using.

For starters, you never really defined what O(...) < f(n) means. You're just throwing notation around. It would be fine if the meaning of the notation could be inferred, but here it can't be. Does, in your example, the statement "O(log(n)) < f(n)" mean that f(n) is "not" O(log(n))? Or does it mean that f(n) is omega(log(n))? That's lowercase omega. Or does it mean that f(n) is not O(log(n)) and is Omega(log(n))? There's no natural way to interpret that notation. There is with "f(n) <= O(n)", though. It clearly means that f(n) is O(n), and can also be written "f(n) = O(n)". There's no other meaning it could have.

If you want to say that f(n) grows no faster than n and no slower than log(n), one way to say that would be to say, "f(n) is O(n) and Omega(log(n))." Another way _might_ be to say "Omega(log(n)) <= f(n) <= O(n)". You could also say "Theta(log(n)) <= f(n) <= Theta(n)" and that would be understandable, because there's no way to misinterpret that statement. To say something is "less than or equal to" a Theta notation expression can only imply the meaning of O notation. People usually don't say things like that, because it's gross, and because the reader would at first think they're hand-waving, and so it's better to say "f(n) is O(n) and Omega(log(n))."

Can you explain to me the difference between what I said and what
you said here "measurement of the "size" of the input.

You said "the size of the function."

Ok, I see. Thanks. Sometimes, I know what I'm trying to say, but my
notation might be confusing to others or misinterpreted by others. Thanks for the correction.

This algorithm is incorrect, by the way.

Just to clarify to the original poster, one must check that for each node, all the nodes in the left subtree are less than it, and so also for the right subtree. Remember that one only needs to check the tightest constrain imposed on every node. So an O(n) algorithm is possible still. Try it yourself first.

Thanks, I had no idea that this thread was still going..

And thanks, you're right, my function was incorrect. I guess I could just do an inorder traversal and see if the list is sorted (would this be a worse algorithm though)? And thanks for the algorithm analysis explanations, it will have to sink in with some practise

No it is a correct algorithm, with the "best" time complexity (if implemented correctly). I was just thinking about altering your original algorithm to include the bounds on the node as a parameter, but the in-order traversal seems easier to implement.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.