#ktistec 75 hashtags

I just released v2.0.0-10 of ktistec. I expect this to be the last pre-release before releasing v2.0.0.

As ironic as it sounds, the Fediverse doesn't feel very federated. ActivityPub, in particular, doesn't account for the real topology of the Fediverse—large groups of users clustered together on large server instances. (Or maybe it does, and this is a feature, not a bug!) Exchanges are largely actor to actor, and large servers create the illusion of "a Fediverse" by pooling their local actors' aggregate inbound and outbound activity.

The consequence of this is that running a single-user instance can feel lonely.

This release finally tries to address that. Hashtags and threads are the backbone of expressed interests and conversations in the Fediverse. Ktistec now lets you follow hashtags and threads, and will proactively (but gently) pull relevant content in to your server. Most of the changes in the last year revolve around making this work well.

The rest of the changes are less visible:

  • Substantial reduction in build times and memory required to build.
  • Substantial reduction in database size (if you care to shrink it) and query performance.
  • Substantial reduction in the time it takes to run tests.
  • Tons and tons of refactoring.

You can see all of the changes here.

(So that it's clear, I have a massive amount of respect for anyone who builds software and gives it away for free. None of the decisions I've made with Ktistec should be taken as personal criticism of anyone else in this space!)

#ktistec

my current canary for build resource utilization is a low end cloud server.  when builds start to fail it's time to optimize.

more on the last round of build optimizations for ktistec, shortly.

#ktistec #optimization

The Cost of Small Methods

Ktistec uses a template engine for it's views.

View templates are transformed into Crystal code that generates HTML when executed. As you'd expect, the template language allows you to use string interpolation syntax (e.g. #{expression}) for dynamic values.

To ensure expression is only evaluated once, and to limit the scope of the temporary variable holding the evaluated value of expression, I originally bound the value to the variable using Object#tap (commit 5e1bf19e). The generated code looked something like:

(expression).tap do |__value__|
   <template code that uses expression>
end

Blocks in Crystal are always inlined, so the code above should be equivalent to the following (sacrificing local scope):

__value__ = (expression)
<template code that uses expression>

Functionally, they are equivalent. But operationally, not so much! With Object#tap, the Ktistec executable is about 1% larger (36823603 bytes vs. 36526307 bytes) and build times take 20% longer (23 seconds vs. 19 seconds, generally).

In total, view templates represent about 6% of the Ktistec executable by size, so it doesn't surprise me that there's a measurable impact when I make changes to the template engine, but wow...! I can almost live with the size of the executable, but the build time...!

The cost has to be the method call.

What I'm looking for is something like let in Scheme. The following macro comes close, but doesn't limit scope quite the same way:

macro let(expr, &block)
  {{block.args.first}} = ({{expr}})
  {{block.body}}
end

I maybe have to live with the macro—I tried to implement let as a method with the annotation @[AlwaysInline] but there was no improvement over the original.

The template engine is a fork of Slang—which I've been evolving to be more Slim-compatible.

#ktistec #crystallang

I replaced five indexes* on the relationships table with two**, improved query performance in at least one case, and cut the size of the database down by 11.4% (98MB).

Lessons (finally) learned:

  1. You can have too many indexes. At best, this makes the database larger. In at least one case, however, this caused the query planner to pick a less effective index, which resulted in worse performance.
  2. Unless you understand the data well, it is hard to know what indexes you are going to need up front. For example, on the relationships table, it's clear in retrospect that an index on the to_iri column has better selectivity than an index on the from_iri column—and, in fact, no index is even necessary on the from_iri column. For reasons of symmetry, I created both when I created the table. I'll go so far as to say, don't even create indexes until/unless you can analyze actual data. (Aside: the SQLite3 function likelihood is an excellent way to hint about that data to the query planner.)
  3. Ordering results using the automatically assigned, monotonically increasing id primary key behaves identically to ordering by something like created_at, so order by id and save yourself an index on created_at.

#ktistec #sqlite #optimization

* The original five:

CREATE INDEX idx_relationships_type_from_iri_created_at
    ON relationships (type ASC, from_iri ASC, created_at DESC);
CREATE INDEX idx_relationships_from_iri_created_at_type
    ON relationships (from_iri ASC, created_at DESC, type ASC);
CREATE INDEX idx_relationships_type_to_iri
    ON relationships (type ASC, to_iri ASC);
CREATE INDEX idx_relationships_to_iri_type
    ON relationships (to_iri ASC, type ASC);
CREATE INDEX idx_relationships_type_id
    ON relationships (type ASC, id ASC);


* The final two:

CREATE INDEX idx_relationships_type
    ON relationships (type ASC);
CREATE INDEX idx_relationships_to_iri
    ON relationships (to_iri ASC);

pushing a boatload of small improvements and fixes to main that i've been running myself for the last couple weeks... there are many ways a request to another activity pub server can fail—ktistec does a much better job of logging those failures, among other things...

#ktistec

i added code to log slow queries in ktistec and it's already paying dividends. most are obviously missing indexes and it's great to fix them, but the latest example—which is missing an index—is querying a table that only has one row (in my single user instance). should that table need an index on that column? i mean, just return that row...

fwiw, a slow query is currently anything that takes longer than 50msec. i wonder if that is tight enough...?

#ktistec

one thing ktistec related that i haven't had the time for is working on build and deployment tools. there are a bunch of outstanding requests—and a few PRs—for docker builds, packaged deployments for various hosting environments, etc.

if you're interested in contributing, let me know. you only have to agree to maintain them—i won't be able to.

#ktistec

epiktistes memory metrics

well, i've run epiktistes long enough without restarting to have some confidence about memory performance. while there is an extended period of growing memory usage, the server does settle down after about 15 days.

#ktistec

i've been following hashtags for a while now. i turn off shares (boosts) and replies so they don't appear in my timeline (there's too much sharing going on out there), but then follow a handful of hashtags (like #woodworking and #crystallang and #boardgame) to see more of what i like!

#ktistec

Epiktistes Memory Statistics

I've been tracking epiktistes inbox/outbox traffic and memory statistics (as reported by the Boehm garbage collector) for a while. There's always a consistent increase in both heap size and free memory—to the point where reported free memory is greater than the originally allocated heap—though the difference between the two doesn't appear constant over time. At the moment, heap seems to have plateaued but (pessimistically) I don't expect it to remain flat.

Given relatively flat traffic, the growth in free memory is surprising. I haven't had a chance to investigate, but based on what I understand about Boehm the increase in free memory could be due to increased fragmentation, and the growing difference between heap and free memory could be due to the conservative nature of the garbage collection algorithm.

Or there could be legitimate memory leaks. I did find one, months ago, which was the result of caching SQL prepared statements (in a Hash) and poor practice constructing queries in a couple places, which led to nearly linear growth in cached statements. The difference then was that heap growth was much more consistent, which is not what I see here, now.

#ktistec #crystallang