Planet Goerzen

John's BlogMud, Airplanes, Arduino, and Fun

The last few weeks have been pretty hectic in their way, but I’ve also had the chance to take some time off work to spend with family, which has been nice.

Memorial Day: breakfast and mud

For Memorial Day, I decided it would be nice to have a cookout for breakfast rather than for dinner. So we all went out to the fire ring. Jacob and Oliver helped gather kindling for the fire, while Laura chopped up some vegetables. Once we got a good fire going, I cooked some scrambled eggs in a cast iron skillet, mixed with meat and veggies. Mmm, that was tasty.

Then we all just lingered outside. Jacob and Oliver enjoyed playing with the cats, and the swingset, and then…. water. They put the hose over the slide and made a “water slide” (more mud slide maybe).

IMG_7688

Then we got out the water balloon fillers they had gotten recently, and they loved filling up water balloons. All in all, we all just enjoyed the outdoors for hours.

MVI_7738

Flying to Petit Jean, Arkansas

Somehow, neither Laura nor I have ever really been to Arkansas. We figured it was about time. I had heard wonderful things about Petit Jean State Park from other pilots: it’s rather unique in that it has a small airport right in the park, a feature left over from when Winthrop Rockefeller owned much of the mountain.

And what a beautiful place it was! Dense forests with wonderful hiking trails, dotted with small streams, bubbling springs, and waterfalls all over; a nice lake, and a beautiful lodge to boot. Here was our view down into the valley at breakfast in the lodge one morning:

IMG_7475

And here’s a view of one of the trails:

IMG_7576

The sunset views were pretty nice, too:

IMG_7610

And finally, the plane we flew out in, parked all by itself on the ramp:

IMG_20160522_171823

It was truly a relaxing, peaceful, re-invigorating place.

Flying to Atchison

Last weekend, Laura and I decided to fly to Atchison, KS. Atchison is one of the oldest cities in Kansas, and has quite a bit of history to show off. It was fun landing at the Amelia Earhart Memorial Airport in a little Cessna, and then going to three museums and finding lunch too.

Of course, there is the Amelia Earhart Birthplace Museum, which is a beautifully-maintained old house along the banks of the Missouri River.

IMG_20160611_134313

I was amused to find this hanging in the county historical society museum:

IMG_20160611_153826

One fascinating find is a Regina Music Box, popular in the late 1800s and early 1900s. It operates under the same principles as those that you might see that are cylindrical. But I am particular impressed with the effort that would go into developing these discs in the pre-computer era, as of course the holes at the outer edge of the disc move faster than the inner ones. It would certainly take a lot of careful calculation to produce one of these. I found this one in the Cray House Museum:

VID_20160611_151504

An Arduino Project with Jacob

One day, Jacob and I got going with an Arduino project. He wanted flashing blue lights for his “police station”, so we disassembled our previous Arduino project, put a few things on the breadboard, I wrote some code, and there we go. Then he noticed an LCD in my Arduino kit. I hadn’t ever gotten around to using it yet, and of course he wanted it immediately. So I looked up how to connect it, found an API reference, and dusted off my C skills (that was fun!) to program a scrolling message on it. Here is Jacob showing it off:

VID_20160614_074802.mp4

Flickr PhotosVID_20160611_151504

prairiecode posted a video:

VID_20160611_151504

Flickr PhotosMVI_7738

prairiecode posted a video:

MVI_7738

Flickr PhotosIMG_7684

prairiecode posted a photo:

IMG_7684

Flickr PhotosIMG_7687

prairiecode posted a photo:

IMG_7687

Flickr PhotosIMG_7688

prairiecode posted a photo:

IMG_7688

Flickr PhotosIMG_7691

prairiecode posted a photo:

IMG_7691

Flickr PhotosIMG_7711

prairiecode posted a photo:

IMG_7711

Flickr PhotosIMG_7712

prairiecode posted a photo:

IMG_7712

Flickr PhotosIMG_7720

prairiecode posted a photo:

IMG_7720

Flickr PhotosVID_20160614_074802.mp4

prairiecode posted a video:

VID_20160614_074802.mp4

Flickr PhotosIMG_20160611_153826

prairiecode posted a photo:

IMG_20160611_153826

Flickr PhotosIMG_20160611_134313

prairiecode posted a photo:

IMG_20160611_134313

John's BlogHow git-annex replaces Dropbox + encfs with untrusted providers

git-annex has been around for a long time, but I just recently stumbled across some of the work Joey has been doing to it. This post isn’t about it’s traditional roots in git or all the features it has for partial copies of large data sets, but rather for its live syncing capabilities like Dropbox. It takes a bit to wrap your head around, because git-annex is just a little different from everything else. It’s sort of like a different-colored smell.

The git-annex wiki has a lot of great information — both low-level reference and a high-level 10-minute screencast showing how easy it is to set up. I found I had to sort of piece together the architecture between those levels, so I’m writing this all down hoping it will benefit others that are curious.

Ir you just want to use it, you don’t need to know all this. But I like to understand how my tools work.

Overview

git-annex lets you set up a live syncing solution that requires no central provider at all, or can be used with a completely untrusted central provider. Depending on your usage pattern, this central provider could require only a few MBs of space even for repositories containing gigabytes or terabytes of data that is kept in sync.

Let’s take a look at the high-level architecture of the tool. Then I’ll illustrate how it works with some scenarios.

Three Layers

Fundamentally, git-annex takes layers that are all combined in Dropbox and separates them out. There is the storage layer, which stores the literal data bytes that you are interested in. git-annex indexes the data in storage by a hash. There is metadata, which is for things like a filename-to-hash mapping and revision history. And then there is an optional layer, which is live signaling used to drive the real-time syncing.

git-annex has several modes of operation, and the one that enables live syncing is called the git-annex assistant. It runs as a daemon, and is available for Linux/POSIX platforms, Windows, Mac, and Android. I’ll be covering it here.

The storage layer

The storage layer simply is blobs of data. These blobs are indexed by a hash, and can be optionally encrypted at rest at remote backends. git-annex has a large number of storage backends; some examples include rsync, a remote machine with git-annex on it that has ssh installed, WebDAV, S3, Amazon Glacier, removable USB drive, etc. There’s a huge list.

One of the git-annex features is that each client knows the state of each storage repository, as well as the capability set of each storage repository. So let’s say you have a workstation at home and a laptop you take with you to work or the coffee shop. You’d like changes on one to be instantly recognized on another. With something like Dropbox or OwnCloud, every file in the set you want synchronized has to reside on a server in the cloud. With git-annex, it can be configured such that the server in the cloud only contains a copy of a file until every client has synced it up, at which point it gets removed. Think about it – that is often what you want anyhow, so why maintain an unnecessary copy after it’s synced everywhere? (This behavior is, of course, configurable.) git-annex can also avoid storing in the cloud entirely if the machines are able to reach each other directly at least some of the time.

The metadata layer

Metadata about your files includes a mapping from the file names to the storage location (based on hashes), change history, and information about the status of each machine that participates in the syncing. On your clients, git-annex stores this using git. This detail is very useful to some, and irrelevant to others.

Some of the git-annex storage backends can support only storage (S3, for instance). Some can support both storage and metadata (rsync, ssh, local drives, etc.) You can even configure a backend to support only metadata (more on why that may be useful in a bit). When you are working with a git-backed repository for git-annex, it can hold data, metadata, or both.

So, to have a working sync system, you must have a way to transport both the data and the metadata. The transport for the metadata is generally rsync or git, but it can also be XMPP in which Git changesets are basically wrapped up in XMPP presence messages. Joey says, however, that there are some known issues with XMPP servers sometimes dropping or reordering some XMPP messages, so he doesn’t encourage that method currently.

The live signaling layer

So once you have your data and metadata, you can already do syncs via git annex sync --contents. But the real killer feature here will be automatic detection of changes, both on the local and the remote. To do that, you need some way of live signaling. git-annex supports two methods.

The first requires ssh access to a remote machine where git-annex is installed. In this mode of operation, when the git-annex assistant fires up, it opens up a persistent ssh connection to the remote and runs the git-annex-shell over there, which notifies it of changes to the git metadata repository. When a change is detected, a sync is initiated. This is considered ideal.

A substitute can be XMPP, and git-annex actually converts git commits into a form that can be sent over XMPP. As I mentioned above, there are some known reliability issues with this and it is not the recommended option.

Encryption

When it comes to encryption, you generally are concerned about all three layers. In an ideal scenario, the encryption and decryption happens entirely on the client side, so no service provider ever has any details about your data.

The live signaling layer is encrypted pretty trivially; the ssh sessions are, of course, encrypted and TLS support in XMPP is pervasive these days. However, this is not end-to-end encryption; those messages are decrypted by the service provider, so a service provider could theoretically spy on metadata, which may include change times and filenames, though not the contents of files themselves.

The data layer also can be encrypted very trivially. In the case of the “dumb” backends like S3, git-annex can use symmetric encryption or a gpg keypair and all that ever shows up on the server are arbitrarily-named buckets.

You can also use a gcrypt-based git repository. This can cover both data and metadata — and, if the target also has git-annex installed, the live signalling layer. Using a gcrypt-based git repository for the metadata and live signalling is the only way to accomplish live syncing with 100% client-side encryption.

All of these methods are implemented in terms of gpg, and can support symmetric of public-key encryption.

It should be noted here that the current release versions of git-annex need a one-character patch in order to fix live syncing with a remote using gcrypt. For those of you running jessie, I recommend the version in jessie-backports, which is presently 5.20151208. For your convenience, I have compiled an amd64 binary that can drop in over /usr/bin/git-annex if you have this version. You can download it and a gpg signature for it. Note that you only need this binary on the clients; the server can use the version from jessie-backports without issue.

Putting the pieces together: some scenarios

Now that I’ve explained the layers, let’s look at how they fit together.

Scenario 1: Central server

In this scenario, you might have a workstation and a laptop that sync up with each other by way of a central server that also has a full copy of the data. This is the scenario that most closely resembles Dropbox, box, or OwnCloud.

Here you would basically follow the steps in the git-assistant screencast: install git-annex on a server somewhere, and point your clients to it. If you want full end-to-end encryption, I would recommend letting git-annex generate a gpg keypair for you, which you would then need to copy to both your laptop and workstation (but not the server).

Every change you make locally will be synced to the server, and then from the server to your other PC. All three systems would be configured in the “client” transfer group.

Scenario 1a: Central server without a full copy of the data

In this scenario, everything is configured the same except the central server is configured with the “transfer” transfer group. This means that the actual data synced to it is deleted after it has been propagated to all clients. Since git-annex can verify which repository has received a copy of which data, it can easily enough delete the actual file content from the central server after it has been copied to all the clients. Many people use something like Dropbox or OwnCloud as a multi-PC syncing solution anyhow, so once the files have been synced everywhere, it makes sense to remove them from the central server.

This is often a good ideal for people. There are some obvious downsides that are sometimes relevant. For instance, to add a third sync client, it must be able to initially copy down from one of the existing clients. Or, if you intend to access the data from a device such as a cell phone where you don’t intend for it to have a copy of all data all the time, you won’t have as convenient way to download your data.

Scenario 1b: Split data/metadata central servers

Imagine that you have a shell or rsync account on some remote system where you can run git-annex, but don’t have much storage space. Maybe you have a cheap VPS or shell account somewhere, but it’s just not big enough to hold your data.

The answer to this would be to use this shell or rsync account for the metadata, but put the data elsewhere. You could, for instance, store the data in Amazon S3 or Amazon Glacier. These backends aren’t capable of storing the git-annex metadata, so all you need is a shell or rsync account somewhere to sync up the metadata. (Or, as below, you might even combine a fully distributed approach with this.) Then you can have your encrypted data pushed up to S3 or some such service, which presumably will grow to whatever size you need.

Scenario 2: Fully distributed

Like git itself, git-annex does not actually need a central server at all. If your different clients can reach each other directly at least some of the time, that is good enough. Of course, a given client will not be able to do fully automatic live sync unless it can reach at least one other client, so changes may not propagate as quickly.

You can simply set this up by making ssh connections available between your clients. git-annex assistant can automatically generate appropriate ~/.ssh/authorized_keys entries for you.

Scenario 2a: Fully distributed with multiple disconnected branches

You can even have a graph of connections available. For instance, you might have a couple machines at home and a couple machines at work with no ability to have a direct connection between them (due to, say, firewalls). The two machines at home could sync with each other in real-time, as could the two machines at work. git-annex also supports things like USB drives as a transport mechanism, so you could throw a USB drive in your pocket each morning, pop it in to one client at work, and poof – both clients are synced up over there. Repeat when you get home in the evening, and you’re synced there. The USB drive’s repository can, of course, be of the “transport” type so data is automatically deleted from it once it’s been synced everywhere.

Scenario 3: Hybrid

git-annex can support LAN sync even if you have a central server. If your laptop, say, travels around but is sometimes on the same LAN as your PC, git-annex can easily sync directly between the two when they are reachable, saving a round-trip to the server. You can assign a cost to each remote, and git-annex will always try to sync first to the lowest-cost path that is available.

Drawbacks of git-annex

There are some scenarios where git-annex with the assistant won’t be as useful as one of the more traditional instant-sync systems.

The first and most obvious one is if you want to access the files without the git-annex client. For instance, many of the other tools let you generate a URL that you can email to people, and then they can download files without any special client software. This is not directly possible with git-annex. You could, of course, make something like a public_html directory be managed with git-annex, but it wouldn’t provide things like obfuscated URLs, password-protected sharing, time-limited sharing, etc. that you get with other systems. While you can share your repositories with others that have git-annex, you can’t share individual subdirectories; for a given repository, it is all or nothing.

The Android client for git-annex is a pretty interesting thing: it is mostly a small POSIX environment, providing a terminal, git, gpg, and the same web interface that you get on a standalone machine. This means that the git-annex Android client is fully functional compared to a desktop one. It also has a quick setup process for syncing off your photos/videos. On the other hand, the integration with the Android ecosystem is poor compared to most other tools.

Other git-annex features

git-annex has a lot to offer besides the git-annex assistant. Besides the things I’ve already mentioned, any given git-annex repository — including your client repository — can have a partial copy of the full content. Say, for instance, that you set up a git-annex repository for your music collection, which is quite large. You want some music on your netbook, but don’t have room for it all. You can tell git-annex to get or drop files from the netbook’s repository without deleting them remotely. git-annex has quite a few ways to automate and configure this, including making sure that at least a certain number of copies of a file exist in your git-annex ecosystem.

Conclusion

I initially started looking at git-annex due to the security issues with encfs, and the difficulty with setting up ecryptfs in this way. (I had been layering encfs atop OwnCloud). git-annex certainly ticks the box for me security-wise, and obviously anything encrypted with encfs wasn’t going to be shared with others anyhow. I’ll be using git-annex more in the future, I’m sure.

Flickr PhotosMVI_7738

prairiecode posted a video:

MVI_7738

Flickr PhotosIMG_7735

prairiecode posted a photo:

IMG_7735

Flickr PhotosIMG_7726

prairiecode posted a photo:

IMG_7726

Flickr PhotosIMG_7720

prairiecode posted a photo:

IMG_7720

Flickr PhotosIMG_7719

prairiecode posted a photo:

IMG_7719

Flickr PhotosIMG_7718

prairiecode posted a photo:

IMG_7718

Flickr PhotosIMG_7714

prairiecode posted a photo:

IMG_7714

Flickr PhotosIMG_7712

prairiecode posted a photo:

IMG_7712

Flickr PhotosIMG_7711

prairiecode posted a photo:

IMG_7711

Flickr PhotosIMG_7691

prairiecode posted a photo:

IMG_7691

Flickr PhotosIMG_7688

prairiecode posted a photo:

IMG_7688

Flickr PhotosIMG_7687

prairiecode posted a photo:

IMG_7687

Flickr PhotosIMG_7684

prairiecode posted a photo:

IMG_7684

Flickr PhotosIMG_7678

prairiecode posted a photo:

IMG_7678

John's BlogThat was satisfying

It’s been awhile due to all sorts of other stuff going on. Nice to see this clogging my inbox again:

screenshot

It really is satisfying to close bugs!

Flickr Photosscreenshot

prairiecode posted a photo:

screenshot

Flickr PhotosIMG_20160525_160419

prairiecode posted a photo:

IMG_20160525_160419

Goodbye to our cabin.

Flickr PhotosIMG_20160525_104441

prairiecode posted a photo:

IMG_20160525_104441

The lake, with a view towards the airport

Flickr PhotosIMG_7660

prairiecode posted a photo:

IMG_7660

A Beth'el that Laura spotted

Flickr PhotosIMG_7657

prairiecode posted a photo:

IMG_7657

The Natural Bridge

Flickr PhotosIMG_7651

prairiecode posted a photo:

IMG_7651

Flickr PhotosIMG_7638

prairiecode posted a photo:

IMG_7638

Flickr PhotosIMG_7614

prairiecode posted a photo:

IMG_7614

Flickr PhotosIMG_7610

prairiecode posted a photo:

IMG_7610

Flickr PhotosIMG_7602

prairiecode posted a photo:

IMG_7602

Flickr PhotosIMG_7599

prairiecode posted a photo:

IMG_7599

Flickr PhotosIMG_7576

prairiecode posted a photo:

IMG_7576

Flickr PhotosIMG_7573

prairiecode posted a photo:

IMG_7573

Flickr PhotosIMG_7570

prairiecode posted a photo:

IMG_7570

Flickr PhotosIMG_7562

prairiecode posted a photo:

IMG_7562

Flickr PhotosIMG_7540

prairiecode posted a photo:

IMG_7540

Flickr PhotosIMG_7519

prairiecode posted a photo:

IMG_7519

Flickr PhotosIMG_7515

prairiecode posted a photo:

IMG_7515

Flickr PhotosIMG_7510

prairiecode posted a photo:

IMG_7510

Flickr PhotosIMG_7504

prairiecode posted a photo:

IMG_7504

Flickr PhotosIMG_7484

prairiecode posted a photo:

IMG_7484

Flickr PhotosIMG_0489

prairiecode posted a photo:

IMG_0489

1544:051116:63f:Camera1 :1

Flickr PhotosIMG_0486

prairiecode posted a photo:

IMG_0486

1234:051116:61f:Camera1 :1

Flickr PhotosIMG_0485

prairiecode posted a photo:

IMG_0485

1233:051116:59f:Camera1 :1

Flickr PhotosIMG_0483

prairiecode posted a photo:

IMG_0483

1535:051016:74f:Camera1 :1

Flickr PhotosIMG_0447

prairiecode posted a photo:

IMG_0447

1241:042716:54f:Camera1 :6

Flickr PhotosIMG_0420

prairiecode posted a photo:

IMG_0420

2130:041516:54f:Camera1 :2

Flickr PhotosIMG_0418

prairiecode posted a photo:

IMG_0418

1710:040816:70f:Camera1 :1

Flickr PhotosIMG_0357

prairiecode posted a photo:

IMG_0357

1324:032416:41f:Camera1 :4

Flickr PhotosIMG_0348

prairiecode posted a photo:

IMG_0348

1134:031716:54f:Camera1 :2

Flickr PhotosIMG_0346

prairiecode posted a photo:

IMG_0346

1156:031516:54f:Camera1 :2

Footnotes