Website Circle Meeting - February 14, 2018
Attendees: chreekat, iko, JazzyEagle, mray, MSiep
- Share my (chreekat’s) priorities
- Talk about the “process” that wolftune is so excited about
- Website feature roadmap
- chreekat: I’ve been talking to Charles (an advisor who wolftune and I converse with on occasions), who prompted me to make a list and discuss important priorities
- most of my list is ops-related tasks: storing ssh and tls credentials (secrets), some way to deploy the secrets, and backups
- when I get into website features, I get distracted/unsure on what to talk about, so I’m going to start on secrets
- our bus factor is really bad. I believe you want a high bus factor, i.e. a lot of people to be run over by a bus before a project goes down
- re: secrets, I’d like something more fleshed out than a PGP-encrypted text file. I need to launch/provision that. the doc below list steps to make that happen
- tests - some there are some holes in our architecture right now. one of the biggest holes we were just discussing in irc, stripe libs
- JazzyEagle, fr33domlover and mitchellsalad are working on it
- I will not be doing other feature development until the tests are in place
- after tests, we need web and email notifications. we need an independent component, worked on independently
- “work with designers and PMs to define work needed to fulfill business needs” is an ongoing process. I think that’s where I hand off to MSiep
- after adding notifications, it’s up to people building the product what features to implement
- another question I missed earlier is whether we’re going to iterate on the frontend dev process?
- we don’t need it to launch alpha/beta, it’s not a critical path (though it sounds like other people are going to work on it)
- re: new process, who knows if it’ll work in reality. one thing I feel good about though: wolftune feels like he has delegated, and I think this is what he needs
- this goes back to MSiep being the one driving the critical path. comments about what we covered so far?
- MSiep: I’m glad to hear about backups, if we’re going to roll this out to people, people should be able to trust our system
- JazzyEagle: I’m not happy that we’re like, starting over again, but I do see the positive sides of what you wanted to get in place
- I hope we don’t scrap things again, and this will be a forward momentum that keeps going forward
- chreekat: I’m optimistic about the current direction. I’m also avoiding pushing back against having any process as it’s not my accountability
- MSiep, how do you feel about being the driver of the critical path?
- MSiep: other than my bandwidth fluctuatations, I’m okay. I’ll be busy in a few weeks, but hope I can stay involved
- chreekat: it may help if you load up the plate, which will add a buffer
- I was hoping to talk about what the roadmap is, but I’m not in charge of this, so I have nothing to say. my agenda is done at this point.
- one idea I just remembered: maybe we could resume metrics at meetings, e.g. checklist of “did we do backups?”
- JazzyEagle: I have a question. I’m not sure where I can help. are there tasks that can be doled out to me, or should I wait?
- chreekat: waiting right now. are you familiar with processor pipelines? (the ability of doing things faster at different stages)
- currently the pipeline is empty. as MSiep loads up the pipeline, there will be more backend tasks to do
- another idea: 3-way wireframe with a 3-way team: visual designer, frontend and backend development
- basically a buddy system, if one person is doing something, the others can be aware
- particularly with open source projects, we can’t tell people to do things, we need to start with people being engaged before ending out tasks. 3-way ownership.
- mray: will there be more meetings of this style?
chreekat: I’m open to having another meeting next week. we can start checking in on the pipeline and buddy system. I’ll send an email
Next Action: email from chreekat to broader circle re: critical path and personal priorities
My priorities, my plan
One thing that would raise our bus factor (a good thing) is better management of operational secrets. These are things like backups, logins, account passphrases, and certificates. The primary goal is to have a secure location to store secrets that can be accessed during disaster recovery. Secondarily, we want secrets stored somewhere that will permit automatic access by systems that have the proper credentials. The plan for implementation is as follows:
- Prototype an access mechanism — probably using a combination of Vault, PGP, and ssh keys.
- Implement the access mechanism
- Document the setup
- Implement a NixOps module that instantiates a server with the access mechanism
- Research non-AWS, non-US hosting options for an off-site server.
- Provision an off-site server and install the access mechanism on it
- Populate the secrets
- Distribute access to admins
- Schedule disaster recovery tests (system loss, personnel loss)
Once secret storage in place, we’ll be able to sigh a collective sigh of relief. We will also be able to begin developing more automated deployment procedures. A ‘staging’ secrets server could be used to test deployments.
The next critical component for the Snowdrift system is safe, automatic backups. This will give us actual disaster recovery capability. Losing system data could significantly set back the project’s adoption and chances for success. Backups will reduce everyone’s stress levels by mitigating that eventuality.
Backups will be stored as secrets. Thus, access to backups will use the secrets system designed above. Backups will be automated by creating systemd timer units. Using systemd requires that the app server be upgraded, which it desperately needs anyway. Upgrading the server will allow us to flex the secrets system: all secrets needed to set up the server can be noted and stored. Ideally, the systemd module would only have append-only push access to the secret store. The plan:
- Implement systemd timer units for backups; test locally
- Implement a NixOps module that instantiates an app server that includes the backups timer-unit
- Provision a new app server and migrate to it
- Document the secrets needed for the app server
- Install a backups timer unit on the aux and discourse servers as well
- Schedule a disaster recovery test (database corruption)
Now we have backups and secrets. The world is our oyster. As a last step before leaving ops aside for a while, I will update the system architecture diagram, which is currently a mobile phone photograph of a hand-drawn diagram. (It’s pretty sweet, tho.)
With the critical ops components in place, coding work can continue.
- Enable more website interaction tests by mocking out 3rd party components (email, Stripe).
- Add notification system (needs email mock)
- Hand off to MSiep