Why does this make sense in 3.1?

>>> a = b'\x01\x02'
>>> a[0]
1
>>> a[0:1]
b'\x01'
>>> a[0] == a[0:1]
False
>>> a = '\x01\x02'
>>> a[0]
'\x01'
>>> a[0:1]
'\x01'
>>> a[0] == a[0:1]
True

Shouldn't we get True for both comparisons?

No. b'\x01\x02' Is a byte string in python. The expression bs[x] means that you want the byte at position x in the bytestring x, while bs[a:b] means that you want the part of the byte string from a to (but not including) b. Not the difference: indexing gives a byte (as long), and slicing gives a byte string. The reason bs[0] != bs[0:1] is because they are different types.

Is this in the documentation somewhere?

>>> a = b'\x01\x02'
>>> type(a[0])
<class 'int'>
>>> type(a[0:1])
<class 'bytes'>
>>> a = '\x01\x02'
>>> type(a[0])
<class 'str'>
>>> type(a[0:1])
<class 'str'>

Note that in the byte string version, a[0] and a[0:1] return different types, while in the regular string version, both return the same type. Why does it make sense to treat the two cases differently?

I would have expected python to be more consistent.

Here's a similar example:

>>> s = "hello"
>>> type(s)
<class 'str'>
>>> b = s.encode()
>>> type(b)
<class 'bytes'>
>>> s[0]
'h'
>>> b[0]
104
>>> s[0:1]
'h'
>>> b[0:1]
b'h'

Is this in the documentation somewhere?

>>> a = b'\x01\x02'
>>> type(a[0])
<class 'int'>
>>> type(a[0:1])
<class 'bytes'>
>>> a = '\x01\x02'
>>> type(a[0])
<class 'str'>
>>> type(a[0:1])
<class 'str'>

Note that in the byte string version, a[0] and a[0:1] return different types, while in the regular string version, both return the same type. Why does it make sense to treat the two cases differently?

I would have expected python to be more consistent.

Here's a similar example:

>>> s = "hello"
>>> type(s)
<class 'str'>
>>> b = s.encode()
>>> type(b)
<class 'bytes'>
>>> s[0]
'h'
>>> b[0]
104
>>> s[0:1]
'h'
>>> b[0:1]
b'h'

I happen to think this behavior is consistent (with indexing and slicing rules).

Here's why it makes sense:

bytes and str are not the same. They aren't even conceptually the same.

A bytes (byte string) is a sequence of bytes (I assume you know what a byte is, and that it isn't a "character"). It is data, not text. A string on the other hand is a sequence of characters and is text.

As I mentioned earlier, indexing a byte string gives you a byte (it's a sequence of bytes, so this makes perfect sense). Since a byte is a numeric type (and not a character) what you get is a number (long). Slicing a sequence gives you the portion of the sequence from a to (but not including) b as a new sequence. Note that slicing a sequence always gives a sequence. Why? Because you are asking for a portion of and not just a single element of the sequence. This is why the result is a byte string and not a long. You may notice that while the two results actually have the same data (in essence, at least), their types (and representation) are completely different, and reasonably so. This is why b[0] != b[0:1] where b is a byte string.

str on the other hand is a sequence of characters (text). Indexing an str (which I'll call string from now on) gives you a character (makes perfect sense again). However, python characters are str instances with just one element, so this is what indexing a string gives you, another string. Slicing as string gives you a portion of the string, as a new string. If you slice for just one element, what you get is a new string with just one element. This is why s[0] == s[0:1] where s is a string.

One byte is a number, two bytes is a string of bytes. One str element is a string, two string elements is a string. If you accept that, it all makes sense. However, it seems to have changed from 2.6 to 3.1.

The reason I got into this was trying to port a 2.6 app to 3.1. The code reads some bytes from a file, then examines the bytes.

f = open(filename, 'rb')
data = f.read(12)
if data[0:2] == '\xFF\xD8':
     if data[2] == '\xFF' and data[6:10] == 'Exif':

Note that both single bytes and the sequence of bytes are treated the same, and that they don't require any 'casting' for the comparisons.

This works in 2.6, but fails in 3.1. 3.1 requires

if data[0:2] == b'\xFF\xD8':
     if data[2] == ord('\xFF') and data[6:10] == b'Exif':

Note that the single byte is treated differently than the sequence, and that we need to add 'b' and 'ord'.

As an aside, haven't we eliminated longs in 3.1, so that all numbers are of type int?

One byte is a number, two bytes is a string of bytes. One str element is a string, two string elements is a string. If you accept that, it all makes sense. However, it seems to have changed from 2.6 to 3.1.

The reason I got into this was trying to port a 2.6 app to 3.1. The code reads some bytes from a file, then examines the bytes.

f = open(filename, 'rb')
data = f.read(12)
if data[0:2] == '\xFF\xD8':
     if data[2] == '\xFF' and data[6:10] == 'Exif':

Note that both single bytes and the sequence of bytes are treated the same, and that they don't require any 'casting' for the comparisons.

This works in 2.6, but fails in 3.1. 3.1 requires

if data[0:2] == b'\xFF\xD8':
     if data[2] == ord('\xFF') and data[6:10] == b'Exif':

Note that the single byte is treated differently than the sequence, and that we need to add 'b' and 'ord'.

As an aside, haven't we eliminated longs in 3.1, so that all numbers are of type int?

bytes, as they exist in Python 3.1 are conceptually new to Python (starting with version 3). This isn't to say that 8-bit strings didn't exist before in the form of regular text strings. Note the distinction, each element in a 8-bit python 2.x string (str) is treated as a character, not a byte, yielding the same slicing and indexing behavior as str in Python 3.1. Note that the bytes type in Python 2.6 is a synonym for str, and there is no "true" bytes type in that version. I think this was done to ease the transition into python 3.

Responding to your aside, long wasn't really removed. Int was removed and long was renamed to int, in a way. More accurately, int in Python 3 now behaves like long in Python 2, and uses the underlying PyLongType.

commented: Thanks for the help +14
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.