tech oriented notes to self and lessons learned
Amazon Glacier is a secure, durable, and extremely low-cost cloud storage service for data archiving and long-term backup. Glacier offers a cold storage data archival solution meaning the stored data is not available for immediate retrieval. You need to first request retrieval of the data and access time can vary from minutes to several hours, depending on the service level you choose.
While cold storage may feel cumbersome at first, it also has its advantages. No one will be able to accidentally modify important, archived files. It’s also possible to prevent deletion altogether, if needed.
Glacier is designed for use cases in which retrievals are infrequent and exceptional, and data will be stored for extended periods of time.
If you haven’t worked with AWS services or Glacier before, it’s helpful to learn a few concepts first:
AWS Region – a named set of AWS resources in the same geographical area. Regions are completely isolated from each other, so when you view your resources, you’ll only see the resources tied to the region you’ve specified. In Glacier terms, stored data is bound to a particular region. Glacier storage prices vary across regions.
Vault – a container for storing data in the form of archives. An unlimited number of archives can be stored in a vault. Vaults and their contents are available only in the region where they were created. Access permissions, notifications and compliance controls are configured on vault level.
Archive – an archive can be any data such as a photo, video, or document and is a base unit of storage in Amazon Glacier. Each archive has a unique ID and an optional description. You may upload a single file as an archive, but your costs will be lower if you aggregate your data. Archives stored in Amazon Glacier are immutable, i.e. archives can be uploaded, downloaded and deleted, but cannot be edited or overwritten as with services like Dropbox.
(Vault) Inventory – AWS Console will show you a list of vaults, but not a list of vault contents, or inventory. Inventory needs to be separately requested for retrieval and fulfilling the request can take several hours.
Job – retrieving an archive or vault inventory (list of archives) are asynchronous operations in Amazon Glacier. You first initiate a job, and then download the job output after Amazon Glacier completes the job. With Amazon Glacier, your data retrieval requests are queued and jobs will take hours to complete.
Notification-configuration – because jobs take time to complete, Amazon Glacier supports a notification mechanism to notify you when a job is complete. You can configure a vault to send notification to an Amazon Simple Notification Service (Amazon SNS) topic when jobs complete. You can specify one SNS topic per vault in the notification configuration.
More info on these concepts can be found here: Amazon Glacier data model
Amazon Glacier can be used with the Amazon AWS CLI, but it’s quite clumsy to use, especially for archive uploads. Some backup tools support Glacier based storage, but the ones I came across didn’t seem to be suited for server side backups or programmatic use. Amazon AWS Console allows you to e.g. create and configure vaults, but archive operations are not supported.
Glacier client is a simple tool I created for working with Amazon Glacier. It was designed to support both interactive use (with Scala REPL), as well as programmatic use with Scala or Java. It’s well suited for server side use. Glacier client is built on Amazon AWS SDK for Java.
The code can be found on GitHub: https://github.com/marko-asplund/glacier-client
To use Glacier you need to first set up an AWS user account and permissions in AWS Console as follows:
Some operations, such as creating a vault inventory or preparing an archive for download are performed asynchronously. Setting up notifications will be helpful with these operations. You need to enable notifications on the vault and configure a corresponding SNS topic in AWS Console.
The simplest way to setup Glacier client authorisation is to configure “default credential profiles file” as described in Working with AWS Credentials.
The profiles file is a text file with a simple file format, so you can set it up with just a text editor by following instructions on the above mentioned page.
You can also set up the file by using the AWS CLI invoking “aws configure” command to set up the default credentials file, as described in AWS CLI configure options.
To run glacier-client, you need to have Git, sbt and Java JRE installed.
git clone https://github.com/marko-asplund/glacier-client.git cd glacier-client
Start up Scala REPL with sbt
~/glacier-backup-cli (master ✔) ᐅ sbt console [info] Loading settings from plugins.sbt ... [info] Loading project definition from /Users/marko/glacier-backup-cli/project [info] Loading settings from build.sbt ... [info] Set current project to glacier-backup-cli (in build file:/Users/marko/glacier-backup-cli/) [info] Starting scala interpreter... Welcome to Scala 2.11.11 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_151). Type in expressions for evaluation. Or try :help.
List the names of available AWS regions
scala> fi.markoa.glacier.GlacierClient.regions res0: Array[String] = Array(us-gov-west-1, us-east-1, us-east-2, us-west-1, us-west-2, eu-west-1, eu-west-2, eu-central-1, ap-south-1, ap-southeast-1, ap-southeast-2, ap-northeast-1, ap-northeast-2, sa-east-1, cn-north-1, ca-central-1)
Create a Glacier client that connects to the us-west-2 region
scala> val c = fi.markoa.glacier.GlacierClient("us-west-2") c: fi.markoa.glacier.GlacierClient = fi.markoa.glacier.GlacierClient@11b6e34a
Create new vault. The ID (or ARN) for the newly created vault is returned.
scala> c.createVault("test-vault-1") res1: String = /429963740182/vaults/test-vault-1
List all vaults in the region. A sequence of vault objects is returned, in this case it includes just the vault we created above. Please note that with vault operations the results are visible immediately.
scala> c.listVaults res2: Seq[fi.markoa.glacier.Vault] = ArrayBuffer(Vault(arn:aws:glacier:us-west-2:429963740182:vaults/test-vault-1,test-vault-1,2017-11-19T08:18:38.990Z,None,0,0))
Now we’re ready to upload an archive into the vault:
scala> c.uploadArchive("test-vault-1", "my backup archive", "my-backup.zip") TransferStarted: transfer started TransferProgress: transfer progress: 5% (bytes: 516096) TransferProgress: transfer progress: 10% (bytes: 1024000) TransferProgress: transfer progress: 15% (bytes: 1540096) TransferProgress: transfer progress: 20% (bytes: 2048000) TransferProgress: transfer progress: 25% (bytes: 2564096) TransferProgress: transfer progress: 30% (bytes: 3072000) ... TransferProgress: transfer progress: 90% (bytes: 9216000) TransferProgress: transfer progress: 95% (bytes: 9732096) TransferProgress: transfer progress: 100% (bytes: 10240000) TransferCompleted: transfer completed res3: fi.markoa.glacier.Archive = Archive(WREjqj2BItYhI5BGV7mdJGsDl3oztPvpvVh_hngm5SWqJkOd5jnLipLyYy2KkM74-3mkt85nUjI4a_hcQZhtLnQF03K0sv2Bc97BYEwYQ7M4O_lmtgrCTuGCyAEEiuQmCFfRSnBkTw,Some(my-backup.zip),0c5dc86251d157e29cfadb04ac615426600a4e1177a8ac2c1134d895378b3acd,10240000,Some(my backup archive))
Note that Glacier doesn’t maintain an up-to-date list vault contents – a list of contents needs to be requested explicitly and preparing it can take a very long time. For this reason Glacier client stores a local catalogue of archives per vault. Vault contents can be listed as follows:
scala> c.catListArchives("test-vault-1") res4: Seq[fi.markoa.glacier.Archive] = ArraySeq(Archive(WREjqj2BItYhI5BGV7mdJGsDl3oztPvpvVh_hngm5SWqJkOd5jnLipLyYy2KkM74-3mkt85nUjI4a_hcQZhtLnQF03K0sv2Bc97BYEwYQ7M4O_lmtgrCTuGCyAEEiuQmCFfRSnBkTw,Some(my-backup.zip),0c5dc86251d157e29cfadb04ac615426600a4e1177a8ac2c1134d895378b3acd,10240000,Some(my backup archive)))
Archives need to be prepared prior to their retrieval and preparation can take several hours. For this reason it’s often more convenient to retrieve them asynchronously: 1) you request archive retrieval and after Glacier has finished preparing archive you can 2) download it.
scala> c.prepareArchiveRetrieval("test-vault-1", "WREjqj2BItYhI5BGV7mdJGsDl3oztPvpvVh_hngm5SWqJkOd5jnLipLyYy2KkM74-3mkt85nUjI4a_hcQZhtLnQF03K0sv2Bc97BYEwYQ7M4O_lmtgrCTuGCyAEEiuQmCFfRSnBkTw") res1: Option[String] = Some(h479o4kxdawFsho0POzQAznw6e6beampFAIBYuI7s41O_HmzqqWsg2qk2vL2Lw_4MOsI1VFarvokz7NXczBq0CrwPKzv)
Archive retrieval is added in the vault’s list of jobs. You can list unfinished jobs as follows:
scala> c.listJobs("test-vault-1") res4: Seq[fi.markoa.glacier.Job] = ArrayBuffer(Job(h479o4kxdawFsho0POzQAznw6e6beampFAIBYuI7s41O_HmzqqWsg2qk2vL2Lw_4MOsI1VFarvokz7NXczBq0CrwPKzv,arn:aws:glacier:us-west-2:429963740182:vaults/test-vault-1,ArchiveRetrieval,null,2017-11-19T09:00:34.339Z,InProgress,null,None,Some(WREjqj2BItYhI5BGV7mdJGsDl3oztPvpvVh_hngm5SWqJkOd5jnLipLyYy2KkM74-3mkt85nUjI4a_hcQZhtLnQF03K0sv2Bc97BYEwYQ7M4O_lmtgrCTuGCyAEEiuQmCFfRSnBkTw)))
Notice the InProgress status. Once archive preparation has been finished the job list will look something like this:
scala> c.listJobs("test-vault-1") res8: Seq[fi.markoa.glacier.Job] = ArrayBuffer(Job(h479o4kxdawFsho0POzQAznw6e6beampFAIBYuI7s41O_HmzqqWsg2qk2vL2Lw_4MOsI1VFarvokz7NXczBq0CrwPKzv,arn:aws:glacier:us-west-2:429963740182:vaults/test-vault-1,ArchiveRetrieval,null,2017-11-19T09:00:34.339Z,Succeeded,Succeeded,Some(2017-11-19T12:52:38.363Z),Some(WREjqj2BItYhI5BGV7mdJGsDl3oztPvpvVh_hngm5SWqJkOd5jnLipLyYy2KkM74-3mkt85nUjI4a_hcQZhtLnQF03K0sv2Bc97BYEwYQ7M4O_lmtgrCTuGCyAEEiuQmCFfRSnBkTw)))
Setting up notifications relieves you from having to periodically poll job completion status and instead receive notifications. Notifications can be set up via AWS Console.
A prepared archive can then be downloaded from Glacier using the retrieval job ID:
scala> c.downloadPreparedArchive("test-vault-1", "h479o4kxdawFsho0POzQAznw6e6beampFAIBYuI7s41O_HmzqqWsg2qk2vL2Lw_4MOsI1VFarvokz7NXczBq0CrwPKzv", "my-backup-restored.zip") TransferStarted: transfer started TransferProgress: transfer progress: 5% (bytes: 520869) TransferProgress: transfer progress: 10% (bytes: 1025701) TransferProgress: transfer progress: 15% (bytes: 1547941) TransferProgress: transfer progress: 20% (bytes: 2052773) TransferProgress: transfer progress: 25% (bytes: 2575013) TransferProgress: transfer progress: 30% (bytes: 3079845) ... TransferProgress: transfer progress: 90% (bytes: 9228965) TransferProgress: transfer progress: 95% (bytes: 9736869) TransferProgress: transfer progress: 100% (bytes: 10240000) TransferCompleted: transfer completed
That’s it for the basic operations!
Some other tasks Glacier client let’s you do include delete vault, request a vault inventories (list of archives a vault contains), download inventories and delete archives.
Transport layer security (TLS) has long been the way to provide privacy, integrity and optionally authentication to protect TCP/IP traffic on the internet. TLS relies on public key infrastructure and trusted third-parties (Certificate Authority) to vouch for communicating parties correctness, which can complicate its use for the impatient software engineer.
Acquiring a X.509 server certificate from an established Certificate Authority (CA) is usually somewhat of a hassle. Certificate Authorities vouch for the legitimacy of the server certificate and the vouched party’s right to use it. In order to gain a level of trust in the vouched party Certificate Authorities use different kinds of authentication methods, which from a developer perspective add to the delivery delay, cost and effort.
As a result, for non-production or internal uses self-signed certificates are often used instead third-party signed certificates. Self-signed certificates can sometimes be good for ad-hoc use, but initially no shared chain of trust exists between the server and its clients. This means that for each client you either have to disable certificate validation or make the certificate trusted.
If you find yourself generating lots of self-signed certificates, a better solution may be to use a CA with fast turnaround time. These are usually also ones with an automated authentication process, and hence lower cost. Examples include commercial operators such as gandi.net, as well as non-profits like Let’s Encrypt. Let’s Encrypt also allows certificate requests to be fulfilled using an automated process. It’s a security vs. fast turnaround / ease-of-use tradeoff: faster turnaround usually translates to less scrutiny in the authentication process.
Customer/partner-facing systems often warrant using a certificate from an established CA, but sometimes, particularly for internal purposes, when fast turnaround time is needed, as well as control over the certificate issuance process in order to automate it, the best solution may be to run your own CA. Running your own certificate authority is not difficult, technically. Keeping the system secure, on the other hand, will be hard work.
The big risk is that if the private key associated with your CA certificate is compromised, a malicious party can create and sign new certificates with your CA certificate. Also, if a private key associated with a certificate is compromised, a malicious party can intercept communication to/from the certificate owner or impersonate the owner. When a software-based crypto implementation is used, your best bet of securing your keys is applying operating system hardening practices. The full data lifecycle (including backups!) should be considered when planning to secure your keys. For an added level of security, specialised crypto hardware are able to store private keys in a secure manner.
The OpenSSL project provides a basic tool for running your own CA, in the form of the ‘openssl ca’ command. OpenSSL also includes the CA.pl wrapper script for making certificate management easier to use.
While Linux distributions and macOS ship with OpenSSL, it’s only newer versions of both OpenSSL and CA.pl that support generating certificates using just command line tools and with no user interaction during the process. The simplest way to update the script is to download it from the project’s GitHub repository: https://github.com/openssl/openssl/blob/master/apps/CA.pl.in
Make sure you get commit 022696cab014ffb94c8ef0bfc79c8955b9970eb6 or newer (not part of a released version at the time of this writing). Older version of openssl command should suffice. (It seems that ‘openssl ca’ has supported the required passin parameter for a while, but this was only documented in commit 16e1b281b2e16ff6deb8ca431dfc5743de31d0e2.)
With CA.pl you can automate certificate generation with a simple shell script like the following:
cn=$1 pass=`openssl rand -hex 18` pushd `dirname $CA_ROOT` echo "issuing certificate for $cn" # create certificate request SUBJECT="/C=FI/L=Helsinki/O=Practicing techie/CN=$cn/emailAddressfirstname.lastname@example.org" OPENSSL=$OPENSSL $CAPL -newreq -extra-req "-passout pass:$pass -subj '$SUBJECT'" # sign certificate request OPENSSL=$OPENSSL $CAPL -sign -extra-ca "-passin file:$CA_ROOT/ca-cert-passphrase.txt -batch" if [ "$?" -ne 0 ]; then echo "FATAL: failed to sign, aborting" exit 1 fi # export private key unencrypted, archive files $OPENSSL rsa -in newkey.pem -out newkey-nodes.pem -passin pass:$pass mkdir -p $CERT_BASE/$cn mv new*.pem $CERT_BASE/$cn
See my issue_certificate.sh Gist for a more details.
When you need to secure TCP/IP traffic in a non-production environment or non-customer facing system, running your own CA can be a very viable solution. Using certificates signed by a third-party can prove burdensome and costly when your architecture is based on microservices and TLS is being used to secure the traffic. In this case running your own CA to create certificates can help. Other uses include securing traffic between system management and monitoring software and remote agents (e.g. Nagios, JMX etc.).
OpenSSL tools can make automating certificate creation and signing quite easy. Keeping the CA keys secure will require considerable effort, so you need to carefully weigh the risks.
A file system running out of space doesn’t always have to be a major catastrophe. The problem could be quite easily resolved, if you happen to be running Linux and were foresighted enough to create your file system on a LVM volume. If you also happen to have extra storage space available, the file system could likely be extended in a fairly straightforward manner. Despite the procedure’s relative simplicity, it’s quite easy to forget the details if you perform it very infrequently. So, this time I’m writing this down.
In order to extend a LVM based file system, a new block device needs to be allocated for LVM. This can be done by adding a new physical disk, partitioning an existing disk or adding a virtual disk. Typically, reboot is required to get Linux to detect a new physical or virtual disk.
Once the new block device is detected, you need to identify its device path. This information can be obtained e.g. by using lsblk or sfdisk commands. If you’ve ran out of disk space you know which file system is causing the problems, but you also need to determine the LVM logical volume and volume group underlying that file system. These can be determined using output from df and lvdisplay (pvscan, vgscan and lvscan may also sometimes be helpful). In the snippet below we assume the block device path to be /dev/sdb, volume group ubuntu-vg and logical volume /dev/ubuntu-vg/root.
After you learning these parameters the LVM based file system can be extended as follows:
Recently, I bumped into a tool called Asciinema, that allows recording and playback of terminal sessions. This differs from videos in that the terminal session’s input and output is recorded in a textual form, allowing you to e.g. copy-paste commands during playback. In order to try out the tool, I made a recording of extending a file system.