#ktistec 198 hashtags

Todd Sundsted
Release v3.6.0 of Ktistec

It is said that there are only two hard things in computer science: cache invalidation and naming things. The story goes: you have something that is expensive to compute, so you compute it once and then you cache it and use the cached value in the future. But the inputs to that computation change, and so the cached value grows stale. You have to decide when and how to recompute that value.

In Ktistec, presenting accurate tag counts is expensive because not every tagged post counts. Posts are deleted, actors are blocked. My own drafts don't count, but when they're published they do. A post tagged with the same hashtag more than once, must count as one. And tag cardinality is not uniform: #3dprinting has hundreds of thousands of posts, others have one or two. Even with indexes, there is no single query that counts all cases in an acceptable amount of time.

So I reached for a cache, counted once and then cached the count. Because I didn't want to maintain adjustments from every place in the code that changed something that touched the count, I settled for eventual consistency and recomputed counts after every server restart.

As it turns out, that's not good enough. On a server with reasonable traffic, an event that affects some tag's count happens every few hours. Days or weeks later there is significant drift. Worse, the implementation didn't recompute on first read, it recomputed on first write (a new tagged object arrives).

This release fixes all that. Counts are still eventually consistent, but all counts are recomputed in a regular background task, so they really are eventually consistent, and care was taken in constructing the query to minimize database (read) locking to ~100-200msec.

Is it better? Yes! Is it perfect? Probably not. Cache invalidation is hard.

Here's the full changelog for this release:

Added

  • Background task to reconcile tag statistics.

Fixed

  • Prevent model hook callbacks from interleaving.
  • Add spacing between content and the sticky footer.

Changed

  • Replace Semantic UI with Fomantic UI.
  • Cache the PURL and GoToSocial JSON-LD contexts.
  • Reduce database lock time when reconciling tags.
  • Block npm dependency install scripts.

Removed

  • The unused idx_relationships_type database index.

In the next release, I'm going to fix a few bugs in the Mastodon-compatible API. These require an internal redesign, so I've held off until a few other things were out of the way. And I'm turning my attention to reading and better tools for surfacing and finding interesting content.

#ktistec #crystallang #activitypub #fediverse

Todd Sundsted

I need to use a prefix to namespace status IDs vs. boost IDs in the #ktistec Mastodon-compatible API. In Mastodon, a boost is just a status and they share the same ID namespace. Ktistec predates its Mastodon-compatible API, so statuses and boosts are maintained in different tables. I wanted to use an emoji (鉁嶏笍 vs. 馃摚) to distinguish them, but that breaks too many clients.

Shame...

Todd Sundsted
Release v3.5.0 of Ktistec

I really enjoy optimization. Release v3.5.0 of Ktistec doesn't drop significant new features, but it does deliver a ~15% smaller executable and significantly faster queries on anonymous endpoints. The two are intertwined.

The size reduction comes from replacing a poorly designed, custom rules engine with a materialized view layer that uses SQL to define membership in a collection. The rules engine worked well enough but required a lot of supporting code to present rules as a DSL (Domain Specific Language) over the domain objects in ktistec. The driving realization was that SQL is a DSL and membership in a collection is just a query and domain objects are just rows. Voil脿!

Query performance improvements came from using this new view layer to materialize two very popular but expensive-to-query views: the instance's public timeline and public hashtag pages. Because both are public pages they receive more traffic than internal pages.

The problem with the original queries was that performance was not uniform. Querying for posts with popular tags was okay. Querying for posts with sparse tags was very slow. I could have added more indexes, but that's its own cost. After the change, endpoints all respond in a consistent ~10msec timeframe and the CPU barely registers when a crawler hits. (I don't want to make things easier for bots, but I don't want to pay a tax for their activity either鈥攁sk me about my new nginx configuration.)

Here is the full changelog:

Added

  • Lightweight probe endpoint for authenticated sessions.
  • max-id and min-id pagination links on web pages.

Fixed

  • Correct the notifications collection's JSON representation.
  • Accept both single-value and array forms of JSON-LD properties.
  • Handle variation in schema.org property mapping.

Changed

  • Faster timeline, public, hashtag, and notification collections.
  • Adjust the layout of actor profile properties.

Removed

  • The school dependency; replaced by activity processors and materialized views.
  • The openssl_ext dependency; vendored in.

There are still a few slow queries. In the next release I'm going to see if I can get everything under 10msec, and maybe release a new feature, too. 馃殌

#ktistec #crystallang #activitypub #fediverse

Todd Sundsted

I just finished working on improvements to #ktistec that cut about 15% off the built executable size, and speed up some of the more common public queries by 2x to 5x (they were already fast, so this is headroom).

It does this by replacing a poorly designed, feature poor, custom rules engine with a materialized view layer that uses SQL as its DSL (domain specific language).

I am about to smoke test it on my own site. If it鈥檚 not available鈥攚ell, you know why! 馃榾

Todd Sundsted
Todd Sundsted
Release v3.4.1 of Ktistec

This release fixes a small number of bugs found in recent releases.

The full changelog:

Fixed

  • Prevent runaway recursion when handling filtered posts.
  • Ensure profile header and header_static images are always present.
  • Render the inline replies collection for local objects.
  • Exclude blocked actors from object statistics and notifications.

Changed

  • Return 410 Gone instead of 404 Not Found for missing actors.

Removed

  • Tag counts on public pages.

This release fixes a hard-to-exploit but potentially server-crashing bug. If you're running v3.3.9 or v3.4.0, you should upgrade.

#ktistec #crystallang #activitypub #fediverse

Todd Sundsted
Release v3.4.0 of Ktistec

The biggest change in release v3.4.0 of Ktistec is cursor-based pagination for all web-navigable collections (timeline, notifications, etc.). Offset-based pagination will be removed completely in the next release.

Offset-based (e.g. page/size) pagination works well on collections that don't change. But, what does "the second page" contain in a dynamic timeline? Support for cursor-based pagination is required by the Mastodon-compatible API, but has been a desirable feature for quite a while.

While updating queries to paginate by cursor, I also made performance improvements to the queries themselves, as mentioned elsewhere. Scrapers and bots have already adapted鈥攕ort of. I now see odd hybrid requests in the log like /tags/xyz?page=7&min_id=123. Overall CPU usage under normal load is now sitting at 0-1%.

Here is the full changelog for the release:

Added

  • Cursor-based pagination for web-navigable collections. (fixes #122)
  • Mastodon-compatible API: /api/v1/timelines/tag/:hashtag endpoint.

Fixed

  • Negative replies count when viewing a post that is also a reply.
  • Order cached actors' posts by published rather than id.

Changed

  • Report 401 and 403 as distinct errors in Ktistec::Network.get.

Removed

  • Unused paginated query methods.

Enjoy!

#ktistec #crystallang #activitypub #fediverse

Todd Sundsted

while replacing page/size pagination with cursor-based pagination throughout ktistec, i took the opportunity to optimize queries. various changes鈥攍ike leveraging the natural sort order of existing indexes鈥攊mproved performance across the board by about 10x. that number is a little bit misleading鈥攓ueries that took ~10msec now take less than 1msec, but that isn鈥檛 much in absolute terms. still, it moves the bottleneck!

#ktistec

Todd Sundsted
Release v3.3.9 of Ktistec

Release v3.3.9 of Ktistec continues the security hardening work from recent releases, with further progress on the Mastodon-compatible API.

Of note: all network connections now go through a new Ktistec::Network module. This allows Ktistec to limit the size of HTTP bodies it reads, on both inbound and outbound requests, and ensures it only opens connections to valid remote IP addresses.

Here's the full changelog:

Added

  • New Mastodon-compatible APIs.

Fixed

  • Close DNS rebinding window for outbound HTTP requests.
  • Limit the size of HTTP bodies the server reads.
  • Sanitize RSS feed output to prevent CDATA breakout.
  • Destroy all sessions and access tokens on account termination.

Changed

  • Ensure all GET and POST requests utilize Ktistec::Network.
  • Process local recipients in-process in inbox/outbox activity processors.

As always, it's worth upgrading for the security fixes!

#ktistec #crystallang #activitypub #fediverse

Todd Sundsted

I don't have a large number of followers, but a recent reply to a relatively short thread (< 10 total posts) resulted in 247 HTTP GETs in the first 100 seconds after the post. Only 29 of those were requests for the object. 218 were requests for the object's replies, which surprised me鈥攚hy do servers poll for replies within the first 100 seconds? Mean response time was 481渭s鈥攚ell under 1ms. Peak throughput hit 20 req/s.

#ktistec