Posts from December 2017
There are topics we research once in a while to confirm that they're still bad ideas.
I jokingly recall that my dad tries cooking eggplant once every two years to see if he still dislikes it. The last time I was around, we both ate a single slice of a rather appealing looking Eggplant Parmesan, looked at each other, and threw the rest of the tray away.
Today, that mistake is
git-submodules. It makes sense that we'd want to have code that can be used together but that is upgraded at different rates! And, in all honesty, there are plenty of solutions for this kind of problem that don't demand submodules. A typical ruby solution would be to build a gem and either publish it on rubygems, or point the Gemfile to a known git repo; I imagine most other packaging systems offer a similar solution.
Back at Demyst, we used a monorepo, and while I can't decide offhand if that was the right decision, it was skill-appropriate for our situation, and we made it work.
For a time we considered using submodules, but our CTO waved me off and so I didn't do more than some perfunctory research into them at the time.
Today, I thought I'd look at them again, and (as you do), I ended up reading git-scm.
(For context, I'm also knees-deep in writing the book at the moment.)
Looking at the description of submodules, there's … nothing at all worrying here. As one person on HackerNews described it, "so the problem with submodules is that they don't work if you don't use them right."
While writing the book recently, I wondered how to explain to readers how to get over the biggest hump in algebra. And it occurred to me: most problems with git are the same problems people face when learning algebra.
Consider this: algebra is a way of describing highly abstract situations using symbolic representations. We wish to accomplish some task, so we convert the relevant information to symbols, then manipulate those symbols until we recognize that we've achieved a result. We then translate the result back into the language of the problem.
Students who never grasp algebra tend to view it as a collection of recipes. They find a formula that seems to fit their situation, plug numbers in, manipulate by rote, and hope for the best.
I think we've all met developers who do this with git. (Some of us are these developers.) The trouble in both situations is that we don't know what we're trying to accomplish; the canonical "developer who doesn't understand git" treats commits as a secretary treats saving in Microsoft Word: something to be done periodically to ensure you don't lose work.
git add .; git commit -m "Update". All that fits in their model of the world is "this is a way to save my progress and make it available to other developers."
Which can be fine, but it means that in a typical repo the value of using git at all can be lost in the mix1.
For math students and developers, the solutions are probably similar. Motivate the core abstractions, then practice applying them until a) they can see that the abstractions relate to their real-world problems and b) the actions they take mean things and aren't just arcane spells.
The syntax of algebra, and the syntax of git, can be confused as the goal itself, when in truth the syntax is a means; the goal itself is abstract, and without having a pre-existing understanding of what you're trying to accomplish, both skills will always seem like magic.
I for one am a huge fan of patch mode. Even when I'm editing files that span concerns, I'll use
git add -pto triage the files into multiple thematically related commits so that when it comes time to review the history of the codebase, I can remember why I did that particular stupid thing. ↩
One victory and a couple setbacks.
I've been thinking about moving to AWS for some time, which caused me to change my billing on Dreamhost to month-to-month. (Given that I did this probably two years ago, I'm not sure the economics have really worked out.)
Per my previous journal entry, I am considering moving to a static frontend/app backend. It's just and proper that I establish my motivations and desired outcomes for this configuration:
- The ostensible purpose of all this is to have my computing resources meet some standard for stability, reliability, responsiveness and usefulness.
- Stability and reliability are a function of delivering on DevOps skills, i.e. setting up automation and so forth.
- Responsiveness means no free heroku dynos, and no Wordpress slowness.
- Usefulness means again, no Wordpress slowness, and providing a lot of the features I sketched out in yesterday's brainstorming. Basically, the personal goal here is to have all this heterogeneous content available and browsable in a sensible way.
Now, the professional goals here are a bit orthogonal. This configuration should act as proof-of-concept for a number of infrastructure skills that I intend to profit from through work. So the idea of hacking together something that just gets the job done satisfies personal goals but doesn't satisfy the larger professional goals.
For the time being, journaling in sublime and pushing via
scpwill have to suffice, then.
Meanwhile, as much as I like the terms of DreamHost's happy hosting, there are some downsides. Although it looks like you can get away with installing a ~recent version of ruby and running whatever rails version you want, their knowledge base makes clear that they'd prefer you don't on shared hosting. I can move up to a VPS (which would allow me to gain some of the account controls I wanted—more on this later), at the cost of $15/month.
Here, I come to a fork in the road. A lot of the employers I've talked to are ~early stage enough that they get by running on heroku. While I'll admit heroku is a bit of a mystery to me (beyond their relatively painless ruby deploy process) I'm not sure that's the basket I want to put my eggs in.
Meanwhile, there are a couple other options. I could go with AWS because of its practical applications, or for instance DigitalOcean or Google Compute. I think I'll end up on AWS but it's probably worth considering all the alternatives.
If I go with AWS, the temptation is to proof-of-concept setting up overkill architecture to gain experience with as many technologies as I can. That's kind of the point, but I can foresee there being a lot of barriers to success there if I'm not careful. (Also, I don't much like Amazon, the company.)
Ultimately it doesn't much matter, except inasmuch as I don't rely on the crutches of one-click installs and GUIs to configure everything.
Finally, there's a lot of variability in terms of pricing, and I should really look into that a bit more before I commit to a particular infrastructure.
All this said, I have some concrete goals in mind. First, keep this site (ed: static site hosted with DreamHost) resolving as long as possible.
Second, set up an acceptable dev environment, and begin automating everything. Figure out setups/teardowns and immutable infrastructure, backups, etc. before committing significant data to it. This approach has the advantage of saving costs while the system isn't running fully.
Third, establish migration tools to move the data from middleman to the new platform.
Fourth, test everything and then close down shop and re-route DNS to the new server.
I alluded to a victory in the first paragraph. Well, it turns out that for a site like this one, rails offers pretty turn-key page caching that gets served directly from nginx or apache. Score.
- Easy deployment
- Backups, in transferable format, of post content
- Total ownership of content
- Post by email
- Post by web
- Formatting for short posts, versus long form writing
- Easy configuration/additions without
scp(e.g. git push/commit hooks)
- Low server overhead—it's a damned static site 99% of the time; it shouldn't require an m4.large
- Article permalinks regardless of underlying categorization/content changes
- Presentation of heterogeneous data
- A way to share more than just text (e.g. images)
- Maintain pagination without incurring excessive backend load
- Portfolio elements—running/presenting arbitrary code, separate from blog content/formatting (e.g. cssris)
- Easy import/export
I've achieved the goal of having something I control, but there's a weakness here: I haven't found an easy workflow that lets me get short thoughts out quickly and painlessly. Yesterday I wanted to write about a half dozen topics that didn't merit a full journal entry, but didn't have an easy way to do it. I ended up just opening a new note on my phone and brain dumping, which isn't the worst thing ever.
I'm trying to avoid using a database, but a lot of the stuff I want to do is most easily solved with access to a database. A forum of this scale doesn't merit a backend, or at least strongly benefits from being statically generated. On my feature list, however, are a lot of things that become easier if I'm running a real server, or at least building the site off something that has a database available.
… Hmm, now that I think of it, that's not a bad solution. The ideal admin tooling suggests the need for a first class backend, but I can easily use that to generate a static site that gets the benefits of both worlds.
Spitballing here, there are a couple decent ways to get a simple, high speed personal website running in the hosting environment I have available.
- Run a tiny app for one user that has reasonable login protections. When I hit a button, dump some form of the content into a format that a static site generator understands, and have that generator deploy the content to my preferred host (could be S3, could be similar to the current configuration).
- Wire up a tiny blog app that caches aggressively when not signed in. I've seen some of this magic in a previous life; the nginx config used would serve assets directly from nginx instead of routing through rails, boosting app speed. If there's a clean way to have rails do that automatically, I might go that route.
- Use normal app-level caching and hope for the best. I don't much like this one because I would have to do a lot of testing ahead of time to ensure that I could serve hundreds or thousands of requests for the same page without hitting server bottlenecks.
Taking a step in any of these directions will change the calculus I've been using so far. Notably, it's been easy (or, relatively easy) to bulk import/edit preexisting posts to work with this tech stack, but having a database at a distance will make it harder. Since I should be doing more original writing and less migrations as time goes on, the balance of burden will shift.
Looking back at my decisions up to this point, the following facts are true:
- I'm happy that I've kept data, configuration, and code separate so far; it will make any migration to a different platform easy. Many existing frameworks for producing static sites encourage too much commingling of these things.
- The mere existence of this blog has been a great boon, in the sense that I'm writing at all.
- The current form of this site is essentially an MVP: I've got posts on a screen that can be viewed on the internet. In that sense, it's a win so far. It looks like any path forward will demand more backend work and server orchestration, which was half the purpose of this exercise in the first place. I can't say, then, that I'm upset with this state of affairs.
I can't overstate how much having this bootstrapped site up and running has been for my mental well being. I feel lately like I'm overflowing with ideas.
We came back from a brief trip to Orlando with over 700 new photos in tow. I dutifully uploaded them to Photos.app1, and realized that it's almost impossible to find anything.
In fairness, Apple does have some automated tooling to make looking for specific photos easier. Faces, events, and upload groups make finding a specific event and flipping through photos for projects or just recollection simple, but there are some flaws. It's not clear to me if they got rid of the feature to flip through all unnamed faces or if I just can't find it—either way, it reflects poorly on Apple2, and is just another marker of their trend toward entropy.
So I'm sitting at my computer, looking at over 16,000 photos, trying to make sense of them. What ends up working, and what I spent Sunday evening doing, is making a smart album that only selects files that don't belong to an album, then triaging. As I observe natural collections of files, a taxonomy emerges: some things are vacations or events (happening in a constrained time and place), some are of people (e.g. my wife, or the cat), and the remainder have conceptual boundaries, for the most part.
Beyond normal tourism, the way I use my camera is to capture moments of interest ("memories"), interesting locations ("explorations"), or to record information where writing would be too slow (e.g. snapping a photo of a serial number, recording damage to the apartment). Triaging gave me the opportunity to filter some obvious duds/outdated information (I don't need to know the model number of my refrigerator two apartments ago), and to realize that this specific taxonomy predates this current push.
Notably, when I would upload photos to Facebook in the past, the only way I could get a handle on them (and find them in the future) is to split along similar axes.
I'm going to have some trouble making sure I'm not repeating myself here, but the broad purposes of organization are as follows:
- Finding trends in existing material
- Relocating specific material later on
- Reminding yourself what exists
- Extracting specific information (e.g. the date an event happened)
- Finding a general case of a specific collection
The overall goal, of course, is to do productive work, so we need to impose just enough order on the system to allow it to work for us.
Contrast a search engine. You might search for some restaurant in general (general case), or a particular restaurant (specific information). You might want to look for where restaurants cluster so you can go to an area with a lot of options, where you can search in person (finding trends). You might want to recall the name of that little place you visited two years ago (relocating information), or you might want to find that area you used to go all the time (reminding yourself what exists).
I've noticed that Google has become good at a subset of these and bad at the rest. Notably, Google is pretty good at finding general information, but if you're looking for something you know exists and the information is a bit stale (e.g. hasn't been reposted or updated in a couple years), or if the search term you're using is specific but close enough to a more common term that it auto-corrects to something else, you might struggle to find "your thing." I just experienced this situation when looking for a specific comic that used to appear on reddit all the time3.
This is where local caches come into play. I have layers and layers of data that I've been trying to keep organized; photos are just one. The app-centered model is a bit odd for some of the uses I have in mind. Case in point: sometimes (as mentioned above) the easiest way to quickly record something for later consumption/digestion is to take a screenshot of text. But this ends with a nightmare taxonomy, as follows.
- When I just take a screenshot, that ends up in my photo stream and can be triaged quickly (it's hard to mistake a page of text for a photo of a person, even at thumbnail level). I have both a phone and a tablet, so I have to keep track of two separate streams.
- Further, I have computers, and those screenshots go straight to desktop, so I have to deliberately aggregate them somewhere so that I can find what I'm looking for quickly.
- Sometimes I use the "highlight" function in my eReader4, and those notes are stored… in my eReader, I guess? There's probably some way to setup a workflow to move those notes somewhere useful, but keep in mind the ultimate goal is to have stuff available for later use, so too much overhead defeats the purpose5.
- Sometimes I copy relevant text to a note, or one of several note taking apps, and if the text becomes unsearchable some day (altogether too many pages go missing, either taken down, lost, or changed beyond recognition), then the source won't be available for contextualization.
- And, most pathologically of all, sometimes I hand-write notes.
So, sorting through fodder is a matter of paging through screenshots, photos in my library, text files, and scraps of paper, trying to find some specific thing.
I don't think there's a need to solve this case in particular, but it's worth highlighting what a struggle it can be, and which has relevance to real-world scenarios6. It's not worth solving fully, but I want to have at least a first-pass handle on it so that I have a chance of finding something that I at one time thought worth remembering.
Remember, the purpose of all this is to make things do work for you. If you spend so much time organizing that you never address any projects, you haven't won.
A final note, and justification for the "victory" tag: a bunch of files that I thought I'd misplaced, containing a lot of business ideas and so forth, were actually filed away on my NAS.
One day I'll need to talk about how hard Apple's app naming conventions have made googling for tech support… ↩
I checked some help docs; seems it was possibly an inadvertent removal in High Sierra. ↩
In fact, for reasons I can't quite determine, Google has gotten much worse at finding all comics, even when I would have sworn that the same search terms would yield the results I seek not even a year ago. I'm not sure if this is a result of a change in Google or the pages themselves. ↩
Marvin, an iOS app that allows you to make clippings of text without any of the asinine copyright hurdles that the kindle inflicts on you. ↩
Spending a day figuring this workflow out when I only read at most a book a week, and can easily recall that the thing I saw was in a book, seems rather pointless. On the other hand… for the sake of just having fodder available to inspire writing, it might be in my interest. ↩
Off the top of my head, legal discovery. ↩
No matter how good my memory is, I forget things.
The things I forget tend to fit into two categories: that which I need an artifact to recall, and that which sounds foreign, even when I see evidence of it.
Dresden Codak addressed one face of this in a poignant comic, about future memories. The upshot of it, though, is that I think I have a good memory mostly because I've forgotten things I forgot… my bias is showing.
Some things I recall better than others, but perhaps because I obsess over them—reading and re-reading stuff from my Facebook timeline or what have you—but there's clear evidence that without pictures, documents, objects, souvenirs, and so forth, my brain prunes out or otherwise makes inaccessible whole portions of my life.
For instance: I can recall, broadly, what I was doing in 2011 only by remembering where I lived at that time, and then thinking of the sorts of things I did in that place. But otherwise the entire year is blur.
I don't think this problem is going to get less acute over time. Journaling is a way, then, to mitigate some of these effects.
Bruno laughed around the stem of his pipe. “Yes, make it work. Clever lad. Alas, I fear I'm not up to the task. These old chalkboards are getting white.”
“Chalkboards. Blackboards. Ah, what do you children know?” The cloud around him thickened with his huffing, and he waved it away. “In the tradition-heavy wilds of Catalonia, where I cut my first set of teeth, the last vestiges of the stone age lingered very nearly until the rise of the Queendom. A chalkboard was a slab of hard, dark slate onto which you would scribble with little cylinders of soft, white chalk. Really! We had one in every classroom, every kitchen. You'd erase the board with a rag, you see, and write in a new batch of lessons or chores or ingredients. But sometimes you'd misplace the rag, and you'd have to scribble around the margins of what you'd already written. If you let this go on long enough, eventually the board would get so white with scribbles that you couldn't read it anymore. And so we learned: too much knowledge is as bad as none at all. We forget how to forget."
–Wil McCarthy, To Crush the Moon
Phase 1: Deleting/archiving presence elsewhere
A big part of this thrust to centralize … myself, for lack of a better word—is getting all the bits and pieces together in one place.
There have been a variety of platforms over the years. Some have died—I don't think I stored anything important on orkut—but on those that still exist, you can see the entropy.
I've been convinced of the value of holding onto your data in formats that will resist the test of time for years. Nevertheless, it's amazing to me how much entropy has struck in places where there has been platform continuity for over ten years. Embedded tags (nominally HTML, but obviously parsed somehow by the internal tooling) and other metadata has bit-rotted until everything is nearly unreadable.
After some effort, I have archives in local storage of almost everything I've ever posted online, and the old versions are "deleted"1 off those platforms.
With some exceptions. Facebook is a tough nut to crack; I needed to register as a developer to even begin to download my post graph entries, and after toying with it for a bit I'm not sure that I'm ready to go all the way down that rabbit-hole. It looks like I'd have to run a local server in order to scrape an effective copy of my data from them, since their "archive" tool is god-awful for someone who shared as many links as me.
Phase 2: Triage
A lot of the stuff I've written down can go back up in due time. A lot of it should never see the light of day.
This leads to a question of, what's the point of a blog? I know that a lot of what I write now will seem embarrassing by the standards of future me, but this process seems asymptotic. Meanwhile, angsty blog posts from when I was 15 contain … embarrassing turns of phrase, among other things, that don't shed light on who I am now.
Broadly speaking, I don't believe in deleting things. On reddit2, if I was blatantly wrong, and someone pointed it out, I'd always leave my posts up as a matter of principle, and to provide context for people who came by after.
But does that mean I have to put everything back up? I doubt it.
What's interesting about this place is that (my intent is) it is a place to focus and refine my thoughts. The ideas I've gotten the most mileage out of are the ones I write down in a place I also read. So, some turn of phrase or tiny stub in a page of a notebook I constantly flip through worms its way into memory via spaced repetition, essentially.
This suggests that the real goal is to find the kernels that reflect deeper truths… and then consolidate them into something wiki-like.
Further, blog entries outside my journals should be considered transient. This suggests another layer of metadata, for posts that don't hold interest because they've been superseded by something more refined or more correct. Not just for currently outdated posts, but for stuff that's fresh now that'll seem stupid in two or ten years.
Do I need to own up to everything I've ever said? That way lies madness, surely. But reading and consolidating is probably in the cards, which suggests something wiki-like is on the medium term road map.
Phase 3: Editing
For the stuff that's worth posting, editing is necessary. I may end up going through and doing this cleanup on everything, but the first-order changes involve more or less the following:
- Remove useless metadata (e.g.
date_gmt, added by Wordpress)
- Clean up remaining metadata
- Change formatting to markdown
- Resolve obvious typos
- Add appropriate tags and categories
The point of this blog, again, is organizing the data, so having effective cross-links is the first part of that. As mentioned before, I'll probably need some sort of "archival" tag, to indicate to readers that what they're reading is historical reference3.
This is a highly manual task. An automated tool might sound good at first blush, but the data sources are heterogeneous and the sorts of formatting that each demanded differ that re-establishing the correct context in markdown syntax demands I re-read and hand-tweak everything.
In theory I'm not against this, because I want to re-read and consolidate as many of my old thoughts as I can, but 1) it's a lot of content and 2) it's going to screw up my plan to get permalinks running. I don't think date+slug is an effective permalink schema for content where date might not be the most important aspect of what I'm doing, but without a database there aren't really any unique primary keys to work against. I'll have to chew on this for a while.
What's the point of all this effort? Well, I have bits and pieces all over, and it's part of a process to more well-define my identity and make my thinking more coherent.
Journaling is more of a "in the moment" process, that reflects where I am in the day and time. Organizing, as an umbrella, is about finding a kernel of myself, and building on it logically.
Who knows if these platforms actually delete old content, though. ↩
… speaking of sites I wrote content for but never archived… However, so much of what I wrote on reddit only makes sense in the context of the thread it's in, so maybe it's okay that I don't have that. ↩
And perhaps add an obvious note, and cross-links to newer versions? ↩
- Remove useless metadata (e.g.
When I graduated from AppAcademy, I hacked up a stand-in portfolio website using turn-key Wordpress provided by my hosting company and a slightly tweaked version of a theme I kind of liked.
Well, over time I realized I didn't much like Wordpress. The toolkit seemed robust, but so slow, to the point that I would get frustrated playing with settings. I can sort of tolerate that sort of thing, except for the following:
- It's a personal home, not something for work, so comfort and familiarity is supreme.
- Comfort, for me, is derived largely from things like responsiveness. If there is every visible lag in typing, I get subtly frustrated and that frustration mounts.
- The plugins I most wanted to use were flaky at best, and I was not and am not inclined to learn PHP to get (for instance) an email-to-blog portal working
- It was subtly messing up some raw entries when they got written to database, leading to subtle rendering artifacts that were hard to fix due to the slowness mentioned above.
- I realized the theme I had landed on was flaky in its own way, and I didn't much want to debug someone else's idea of what a good layout looks like.
All credit to the authors of WP, and free WP themes, but they're just not for me, not for this.
So I had been vowing to re-write my blog to something I would enjoy, and fiddled around with a couple things over the years. It never was a priority, so I tinkered back and forth between rails and static site generators, like Jekyll.
I'm going to elide a huge amount of history and research, but basically I realized a hand-written rails CMS was a total waste of time, and the half-measures—existing rails CMS apps and engines—were inappropriate for my goals. I made a fair shot at using Jekyll, too, but liquid is way too constraining1 for a personal site where I have total control over the build process.
Meanwhile, I'd gotten sick of Facebook et al's controls. I pay for hosting, I pay for registration, and I know how to manage my own content—I don't need to be someone else's revenue source. So I downloaded as much of my own content from other sites as possible and archived it locally, and I'm in the process of shutting down my social media presence. This is my home now.
This, then, is my blog. I have a backlog of topics and to-dos I want to address, so there should be a real burst of new topics as I have time to put things down. I'm not entirely sure what this is going to become, but it's lightweight (static site generated using Middleman), and it's mine.
As I understand it, liquid is intended for things like storefronts where you don't want the store owners having the ability to break things or to access a full interpreter via templates. I don't care, so it's not right for my needs. ↩