David Moreau Simard

7 minute read

Swift and Object Storage

Swift is Openstack‘s Object Storage project.

At a very high level, I like to present Object Storage as a filesystem accessible through a set of APIs, often directly by HTTP.

Object Storage backends are usually built from the ground up to be resilient, failure tolerant, highly available and provide mechanisms to ensure data redundancy and security.

Object Storage is the secret sauce that hides behind interfaces such as Dropbox, Google Drive or Microsoft OneDrive.

Ceph is another open source project with Object Storage at it’s core. Ceph natively provides a way to create, mount and format block devices out of the box - Swift, however, does not.

This is great and I’m a fan of Ceph myself but what if there was a way to mount a Swift object store as a filesytem ?

Let’s take a closer look at S3QL which allows you to do just that.

If you don’t have an object storage service, I’ve written a basic guide on finding the right provider.

What’s S3QL ?

Straight from S3QL’s repository:

S3QL is a file system that stores all its data online using storage services like Google Storage, Amazon S3, or OpenStack. S3QL effectively provides a hard disk of dynamic, infinite capacity that can be accessed from any computer with internet access running Linux, FreeBSD or OS-X.

S3QL is a standard conforming, full featured UNIX file system that is conceptually indistinguishable from any local file system. Furthermore, S3QL has additional features like compression, encryption, data de-duplication, immutable trees and snapshotting which make it especially suitable for online backup and archival.

Sounds pretty neat, right ? Let’s see if we can get it installed and test it out.

Installing S3QL

Thankfully, S3QL is packaged for most distributions because it is otherwise a bit more complex to install.

I’ll be testing this under Ubuntu 14.04 (Trusty) so it’s really as simple as adding the PPA and installing S3QL:

add-apt-repository ppa:nikratio/s3ql
apt-get update && apt-get -y install s3ql

Setting it up

Authentication

First, create the file in which you’ll store authentication details:

mkdir ~/.s3ql && install -b -m 600 /dev/null ~/.s3ql/authinfo2

Open up ~/.s3ql/authinfo2 with your favorite editor and fill in your Swift authentication information in the following format:

[swift]
backend-login: tenant:username
backend-password: password
storage-url: swiftks://keystone.example.org/<region>:<container>

It was not necessary for me to specify whether or not the keystone is HTTPs (port 443) and also the (/v2.0|/v2.0/tokens) portion of the authentication URL.

The <region> parameter is something that depends on your provider and where your data is located.

The <container> is the bucket in which the filesystem will reside. It needs to be created first, s3ql won’t do it for you.

Creating the Swift container for s3ql

To create an empty container with swiftclient, you can use swift post <container>:

# Create an empty container called "s3ql" if it doesn't exist
$ swift post s3ql
$ swift stat s3ql
       Account: AUTH_e8217e83ef32427bb4e4d217f1390ab4
     Container: s3ql
       Objects: 0
         Bytes: 0
      Read ACL:
     Write ACL:
       Sync To:
      Sync Key:
 Accept-Ranges: bytes
        Server: nginx
    Connection: keep-alive
   X-Timestamp: 1412040138.20837
    X-Trans-Id: tx3e99ab36fdf44fa3b197f-00542a07bc
  Content-Type: text/html; charset=UTF-8

Initializing the filesystem

To initialize the filesystem, use mkfs.s3ql with your storage url as argument:

$ mkfs.s3ql swiftks://keystone.example.org/region:s3ql
Before using S3QL, make sure to read the user's guide, especially
the 'Important Rules to Avoid Loosing Data' section.
Enter encryption password:
Confirm encryption password:
Generating random encryption key...
Creating metadata tables...
Dumping metadata...
..objects..
..blocks..
..inodes..
..inode_blocks..
..symlink_targets..
..names..
..contents..
..ext_attributes..
Compressing and uploading metadata...
Wrote 155 bytes of compressed metadata.

We can tell that s3ql created some objects:

$ swift list s3ql
s3ql_metadata
s3ql_passphrase
s3ql_passphrase_bak1
s3ql_passphrase_bak2
s3ql_passphrase_bak3
s3ql_seq_no_1
$ swift stat s3ql
       Account: AUTH_e8217e83ef32427bb4e4d217f1390ab4
     Container: s3ql
       Objects: 6
         Bytes: 791
      Read ACL:
     Write ACL:
       Sync To:
      Sync Key:
 Accept-Ranges: bytes
        Server: nginx
    Connection: keep-alive
   X-Timestamp: 1412040138.20837
    X-Trans-Id: txddafca8cbda244f5b5f1e-00542a0944
  Content-Type: text/plain; charset=utf-8

Mounting the filesystem

s3ql provides the mount.s3ql utility that’s pretty straightforward:

$ mkdir /mnt/s3ql
$ mount.s3ql swiftks://keystone.example.org/region:s3ql /mnt/s3ql/
Using 4 upload threads.
Autodetected 4052 file descriptors available for cache entries
Enter file system encryption passphrase:
Using cached metadata.
Setting cache size to 37584 MB
Mounting filesystem...
$ df -h
Filesystem                                  Size  Used Avail Use% Mounted on
/dev/vda1                                    50G  1.1G   46G   3% /
none                                        4.0K     0  4.0K   0% /sys/fs/cgroup
udev                                        991M   12K  991M   1% /dev
tmpfs                                       201M  352K  200M   1% /run
none                                        5.0M     0  5.0M   0% /run/lock
none                                       1001M     0 1001M   0% /run/shm
none                                        100M     0  100M   0% /run/user
swiftks://keystone.example.org/region:s3ql  1.0T     0  1.0T   0% /mnt/s3ql
$ ls -al /mnt/s3ql/
total 0
drwx------ 1 root root 0 Sep 29 21:32 lost+found

Cool.

Trying it out

$ dd if=/dev/zero of=/mnt/s3ql/bigfile.bin bs=1024k count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 80.1757 s, 13.4 MB/s

So, pretty much as fast as my upload speed will go ?

What does it look like on Swift’s side ?

$ swift list s3ql --lh
1.7K 2014-09-30 01:47:25 s3ql_data_1
 792 2014-09-30 01:48:45 s3ql_data_2
 155 2014-09-30 01:32:34 s3ql_metadata
 132 2014-09-30 01:32:34 s3ql_passphrase
 132 2014-09-30 01:32:34 s3ql_passphrase_bak1
 132 2014-09-30 01:32:34 s3ql_passphrase_bak2
 132 2014-09-30 01:32:34 s3ql_passphrase_bak3
 108 2014-09-30 01:32:35 s3ql_seq_no_1
 108 2014-09-30 01:39:42 s3ql_seq_no_2
3.3K

Oh, what’s this. Where’s my 1GB file ? Nothing’s being uploaded in the background.

Unmounting and re-mounting the filesystem, still nothing in Swift.. but my file is there, so it has to be stored somewhere, right ?

$ ls -alh /mnt/s3ql/
total 1.0G
-rw-r--r-- 1 root root 1.0G Sep 29 21:48 bigfile.bin
drwx------ 1 root root    0 Sep 29 21:32 lost+found

Pleasant surprises

That’s right, s3ql provides data compression and de-duplication out of the box. I have a 1GB file but it’s all zeroes so it compresses really well.

Let’s download an ISO file and see what happens:

$ wget http://releases.ubuntu.com/14.04.1/ubuntu-14.04.1-server-amd64.iso -O /mnt/s3ql/ubuntu-14.04.1-server-amd64.iso

Once the download completed, I could see the space started being used up slowly until most of the space taken by the ISO was there:

$ df -h
Filesystem                                  Size  Used Avail Use% Mounted on
/dev/vda1                                    50G  1.7G   46G   4% /
none                                        4.0K     0  4.0K   0% /sys/fs/cgroup
udev                                        991M   12K  991M   1% /dev
tmpfs                                       201M  352K  200M   1% /run
none                                        5.0M     0  5.0M   0% /run/lock
none                                       1001M     0 1001M   0% /run/shm
none                                        100M     0  100M   0% /run/user
swiftks://keystone.example.org/region:s3ql  1.0T  194M  1.0T   1% /mnt/s3ql

$ df -h
Filesystem                                  Size  Used Avail Use% Mounted on
/dev/vda1                                    50G  1.7G   46G   4% /
none                                        4.0K     0  4.0K   0% /sys/fs/cgroup
udev                                        991M   12K  991M   1% /dev
tmpfs                                       201M  352K  200M   1% /run
none                                        5.0M     0  5.0M   0% /run/lock
none                                       1001M     0 1001M   0% /run/shm
none                                        100M     0  100M   0% /run/user
swiftks://keystone.example.org/region:s3ql  1.0T  586M  1.0T   1% /mnt/s3ql

And then querying Swift I could finally see some objects:

$ swift stat s3ql
       Account: AUTH_e8217e83ef32427bb4e4d217f1390ab4
     Container: s3ql
       Objects: 72
         Bytes: 582715279
      Read ACL:
     Write ACL:
       Sync To:
      Sync Key:
 Accept-Ranges: bytes
        Server: nginx
    Connection: keep-alive
   X-Timestamp: 1412040138.20837
    X-Trans-Id: tx26300dd0f2bf411da158a-00542a10bb
  Content-Type: text/plain; charset=utf-8
  
$ swift list --lh s3ql
1.7K 2014-09-30 01:47:25 s3ql_data_1
10.0M 2014-09-30 02:01:09 s3ql_data_10
9.3M 2014-09-30 02:01:23 s3ql_data_11
10.0M 2014-09-30 02:01:29 s3ql_data_12
[...]
9.9M 2014-09-30 02:01:06 s3ql_data_8
9.3M 2014-09-30 02:01:00 s3ql_data_9
 375 2014-09-30 01:54:18 s3ql_metadata
 155 2014-09-30 01:54:17 s3ql_metadata_bak_0
 155 2014-09-30 01:54:17 s3ql_metadata_bak_0_tmp$oentuhuo23986konteuh1062$
 375 2014-09-30 01:54:15 s3ql_metadata_new
 375 2014-09-30 01:54:17 s3ql_metadata_tmp$oentuhuo23986konteuh1062$
 132 2014-09-30 01:32:34 s3ql_passphrase
 132 2014-09-30 01:32:34 s3ql_passphrase_bak1
 132 2014-09-30 01:32:34 s3ql_passphrase_bak2
 132 2014-09-30 01:32:34 s3ql_passphrase_bak3
 108 2014-09-30 01:32:35 s3ql_seq_no_1
 108 2014-09-30 01:39:42 s3ql_seq_no_2
 108 2014-09-30 01:54:59 s3ql_seq_no_3
555M

In conclusion

It works, and it works pretty well from what I’ve seen so far.

I’ve recently written a post on how you could encrypt your backups and send them to a Swift object storage with duplicity.

s3ql also encrypts your data with a passphrase, preventing a third party to peek at your data.

I still think duplicity is great since it really keeps track of your backups and is very efficient in doing so.

I don’t have a great use case that comes to mind right now for s3ql but I’m sure users will find out as object storage becomes cheaper and cheaper.