I wrote a Python script that included the python-swiftclient module to connect to the OpenStack Object Storage and upload some files to the OpenStack Object Storage container

It works great if I upload a file that ends with the extension .gz however, I’m getting an error regarding the ‘TarFile’ object having no attribute ‘read’ after running my script.
when it comes to the compressed file that ends with the extension .tar.bz2.

I’ve included the Python script and the errors I got after running it. Please show me where I’m wrong, and I would like some assistance in solving this issue. I’m a beginner in Python.

from keystoneauth1 import session
from keystoneauth1.identity import v3
from swiftclient.client import Connection
from swiftclient.client import ClientException
import gzip
import tarfile

# Create a password auth plugin
auth = v3.Password(auth_url='https://cloud.company.com:5000/v3/',
                   username='myaccount',
                   password='mypassword',
                   user_domain_name='Default',
                   project_name='myproject',
                   project_domain_name='Default')

# Create session
keystone_session = session.Session(auth=auth)

# Create swiftclient Connection
swift_conn = Connection(session=keystone_session)

# Create a new object with the contents of Netbox database backup
with gzip.open('/var/backup/netbox_backups/netbox_2024-03-16.psql.gz', 'rb') as file:
    swift_conn.put_object(
        container,
        'object_netbox_2024-03-16.psql.gz',
        contents=file,
        content_type='application/gzip'
    )

# Confirm the presence of the object holding the Netbox database backup
obj1 = 'object_netbox_2024-03-16.psql.gz'
container = 'netbox-backups'
try:
    resp_headers = swift_conn.head_object(container, obj1)
    print("The object " + obj1 + " was successfully created")
except ClientException as e:
    if e.http_status == '404':
        print("The object " + obj1 + " was not found!")
    else:
        print("An error occurred checking for the existence of the object " + obj1)

# Create a new object with the contents of the compressed Netbox media backup
with tarfile.open('/var/backup/netbox_backups/netbox_media_2024-03-20.tar.bz2', mode='r:bz2') as file_tar_bz2:

    # Read the contents of the compressed Netbox media backup file
    file_contents = file_tar_bz2.read()

    # Create a file-like object from the contents of the compressed Netbox media backup file
    my_file_like_object = io.BytesIO(file_contents)

    # Upload the returned contents to the OpenStack Object Storage container
    swift_conn.put_object(
        container,
        'object_netbox_media_2024-03-20.tar.bz2',
        contents=file_tar_bz2,
        content_type='application/x-tar'
    )

# Confirm the presence of the object holding the compressed Netbox media backup
obj2 = 'object_netbox_media_2024-03-16.tar.bz2'
container = 'netbox-backups'
try:
    resp_headers = swift_conn.head_object(container, obj2)
    print("The object " + obj2 + " was successfully created")
except ClientException as e:
    if e.http_status == '404':
        print("The object " + obj2 + " was not found!")
    else:
        print("An error occurred checking for the existence of the object " + obj2)

Below is the error I got after running the script.

Python
Traceback (most recent call last):
File "/opt/scripts/netbox_backups_transfer.py", line 57, in <module>
file_contents = file_tar_bz2.read()
AttributeError: 'TarFile' object has no attribute 'read'

First, let's prepare two tar files using different compression schemes for demo purposes.

$ cat foo_1.txt 
This is file 1
$ cat foo_2.txt 
This is file 2
This is file two
This is file too

# Three tar files, two compressed and one uncompressed for reference
$ tar -j -c -f foo.tar.bz2 foo_1.txt foo_2.txt 
$ tar -z -c -f foo.tar.gz foo_1.txt foo_2.txt 
$ tar -c -f foo.tar foo_1.txt foo_2.txt

$ file foo.tar.bz2 foo.tar.gz foo.tar
foo.tar.bz2: bzip2 compressed data, block size = 900k
foo.tar.gz:  gzip compressed data, from Unix, original size modulo 2^32 10240
foo.tar: POSIX tar archive (GNU)

# tar understands the contents of all three formats
$ tar tf foo.tar.bz2
foo_1.txt
foo_2.txt
$ tar tf foo.tar.gz
foo_1.txt
foo_2.txt
$ tar tf foo.tar
foo_1.txt
foo_2.txt

# The file sizes; note how much larger the uncompressed tar file is.
$ ls -l foo.tar.gz foo.tar.bz2 foo.tar
-rw-rw-r-- 1 sc sc 181 Mar 23 08:06 foo.tar.bz2
-rw-rw-r-- 1 sc sc 170 Mar 23 08:07 foo.tar.gz
-rw-rw-r-- 1 sc sc 10240 Mar 23 08:26 foo.tar

The gzip python library only does decompression. It knows nothing of the structure of the file, and just gives you bytes.

>>> import gzip
>>> gz = gzip.open('foo.tar.gz','rb')
>>> bytes = gz.read()
>>> print(len(bytes))
10240
>>> print(str(bytes)[:80])
b'foo_1.txt\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\

Notice that the length of the data is the same as the uncompressed tar file.

with gzip.open('/var/backup/netbox_backups/netbox_2024-03-16.psql.gz', 'rb') as file:
    swift_conn.put_object(
        container,
        'object_netbox_2024-03-16.psql.gz',
        contents=file,
        content_type='application/gzip'
    )

So what you're actually presenting to the put_object will be a decompressed stream.
Is it being re-compressed again when you say "content_type='application/gzip'" ?
It might be worth comparing the local and remote file sizes, to check it is actually storing a compressed version.

The tarfile python library does know about tar files, and gives you a richer set of functions to deal with.

>>> tf1 = tarfile.open('foo.tar.bz2',mode='r')  # let it figure out the compression
>>> print(tf1.getnames())
['foo_1.txt', 'foo_2.txt']
>>> tf2 = tarfile.open('foo.tar.gz',mode='r')   # let it figure out the compression
>>> print(tf2.getnames())
['foo_1.txt', 'foo_2.txt']

In particular, it knows what each member file is called, and can handle the contents of the tarfile on a per member basis.

If you actually don't care about the contents of the file (you're just making a backup), you can use the regular file.
Here you can see that it just reads the compressed size.

>>> rawfile = open('foo.tar.bz2',mode='rb')
>>> rawbytes = rawfile.read()
>>> print(len(rawbytes))
181

In other words, treat every single file the same way with.

with open('/var/backup/netbox_backups/netbox_2024-03-16.psql.gz', 'rb') as file:
    swift_conn.put_object(
        container,
        'object_netbox_2024-03-16.psql.gz',
        contents=file,
        content_type='application/octet-stream'
    )

with open('/var/backup/netbox_backups/netbox_media_2024-03-20.tar.bz2', 'rb') as file:
    swift_conn.put_object(
        container,
        'object_netbox_media_2024-03-20.tar.bz2',
        contents=file,
        content_type='application/octet-stream'
    )

It kinda depends on what swift_conn.put_object does with content_type.

Good day, Salem. My apologies for taking so long to reply to your suggestion.

I refactored my code to read the contents of the tar.bz2 file and then pass them as a file-like object to the 'put_object' and also to change the content type for the file transfer to "application/octet-stream". The first was sent through to object storage but the tar file couldn't be sent and I got the error regarding the 'NoneType' object having no attribute 'read'.

Please see below the attempt I made and the error which occurred afterward.

# Create a new object with the contents of the compressed Netbox media backup
with tarfile.open("/var/backup/netbox_backups/netbox_media_2024-03-24.tar.bz2", "r:bz2") as file_tar_bz2:
# Go over each file in the tar archive...
for file_info in file_tar_bz2:

if file_info.isreg():

# Read the contents...
logger.info(f"Is regular file: {file_info.name}")
file_contents = file_tar_bz2.extractfile(file_info).read()

elif file_info.isdir():

# Read the contents...
logger.info(f"Is directory: {file_info.name}")
file_contents = file_tar_bz2.extractfile(file_info).read()

elif file_info.issym():

# Read the contents...
logger.info(f"Is symbolic link: {file_info.name}")
file_contents = file_tar_bz2.extractfile(file_info).read()

elif file_info.islnk():

# Read the contents...
logger.info(f"Is hard link: {file_info.name}")
file_contents = file_tar_bz2.extractfile(file_info).read()

else:
logger.info(f"Is something else: {tarinfo.name}. Skip it")
continue

# Create a file-like object from the contents...
file_like_object = io.BytesIO(file_contents)

# Upload the returned contents to Swift...
swift_conn.put_object(
container,
file_info.name,
# Use the name of the file selected in the archive as your object name...
contents=file_like_object,
content_type='application/octet-stream' # Set the appropriate content type...
)

Below is the error

File "/opt/scripts/netbox_backups_transfer.py", line 69, in <module>
file_contents = file_tar_bz2.extractfile(file_info).read()
AttributeError: 'NoneType' object has no attribute 'read'

I don't understand why you need to extract all the files from the compressed tar.bz2 just to upload to a backup.

Also, line 69 is now meaningless having just posted only a snippet of the code.

Before the error, what was the last logger.info message?

Good day, Salem. My apologies for taking so long to reply to your suggestion.

I wrote the Python code to upload the .gz file from my local machine to the OpenStack object store using the following documentation: https://docs.openstack.org/python-swiftclient/latest/client-api.html.
Below is the code I wrote

from keystoneauth1 import session
from keystoneauth1.identity import v3
from swiftclient.client import Connection, logger
from swiftclient.client import ClientException
import gzip

# Create a password auth plugin
auth = v3.Password(
    auth_url='https://cloud.company.com:5000/v3/',
    username='myaccount',
    password='mypassword',
    user_domain_name='Default',
    project_name='myproject',
    project_domain_name='Default'
)

# Create swiftclient Connection
swift_conn = Connection(session=keystone_session)

# Create a new container
container = 'object-backups'
swift_conn.put_container(container)
res_headers, containers = swift_conn.get_account()
if container in containers:
    print("The container " + container + " was created!")

# Create a new object with the contents of Netbox database backup
with gzip.open('/var/backup/netbox_backups/netbox_2024-03-16.psql.gz', 'rb') as f:
    # Read the contents...
    file_gz_content = f.read()

    # Upload the returned contents to the Swift Object Storage container
    swift_conn.put_object(
        container,
        "object_netbox_2024-06-16.psql.gz",
        contents=file_gz_content,
        content_type='application/gzip'
    )

# Confirm the presence of the object holding the Netbox database backup
obj1 = 'object_netbox_2024-06-16.psql.gz'
container = 'object-backups'
try:
    resp_headers = swift_conn.head_object(container, obj1)
    print("The object " + obj1 + " was successfully created")
except ClientException as e:
    if e.http_status == '404':
        print("The object " + obj1 + " was not found!")
    else:
        print("An error occurred checking for the existence of the object " + obj1)

The file gets uploaded successfully. However, if I download the file from the object store and try to decompress it, I get the following error:

# gzip -d object_netbox_2024-06-16.psql.gz 

gzip: sanbox_nb01_netbox_2024-06-16.psql.gz: not in gzip format

What should I do to ensure the file gets downloaded in the same format and size to the Object storage as the file in my local machine?

Any assistance will be appreciated.

Yours sincerely

How did you open the file to save the download?
Did you use 'wb' mode in the open?

file_gz_content = f.read()

Verify that the length of this is the whole file.

swift_conn.put_object

https://docs.openstack.org/python-swiftclient/latest/swiftclient.html#swiftclient.client.put_object
One of the parameters is response_dict. It would be an idea to use this to see if anything unusual is happening with the upload.

In general, go through the API in detail and wherever extra status/error information can be returned, you should be at least retrieving this information to help you with debugging.

gzip: sanbox_nb01_netbox_2024-06-16.psql.gz: not in gzip format

Use a hex editor or hex dump to see what the byte stream actually looks like (and compare it with a known good .gz file).
How badly damaged does it look?
For example, if you forgot the 'wb' mode when writing the file, I'd expect it to be "mostly" OK with some bad characters.

Maybe something tried to be 'helpful' by automatically decompressing it for you when you downloaded it, so what you're seeing is an already decompressed file.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.