This action will delete this post on this instance and on all federated instances, and it cannot be undone. Are you certain you want to delete this post?
This action will delete this post on this instance and on all federated instances, and it cannot be undone. Are you certain you want to delete this post?
This action will block this actor and hide all of their past and future posts. Are you certain you want to block this actor?
This action will block this object. Are you certain you want to block this object?
Are you sure you want to delete the OAuth client [Client Name]? This action cannot be undone and will revoke all access tokens for this client.
Are you sure you want to revoke the OAuth token [Token ID]? This action cannot be undone and will immediately revoke access for this token.
#ktistec 198 hashtags

It is said that there are only two hard things in computer science: cache invalidation and naming things. The story goes: you have something that is expensive to compute, so you compute it once and then you cache it and use the cached value in the future. But the inputs to that computation change, and so the cached value grows stale. You have to decide when and how to recompute that value.
In Ktistec, presenting accurate tag counts is expensive because not every tagged post counts. Posts are deleted, actors are blocked. My own drafts don't count, but when they're published they do. A post tagged with the same hashtag more than once, must count as one. And tag cardinality is not uniform: #3dprinting has hundreds of thousands of posts, others have one or two. Even with indexes, there is no single query that counts all cases in an acceptable amount of time.
So I reached for a cache, counted once and then cached the count. Because I didn't want to maintain adjustments from every place in the code that changed something that touched the count, I settled for eventual consistency and recomputed counts after every server restart.
As it turns out, that's not good enough. On a server with reasonable traffic, an event that affects some tag's count happens every few hours. Days or weeks later there is significant drift. Worse, the implementation didn't recompute on first read, it recomputed on first write (a new tagged object arrives).
This release fixes all that. Counts are still eventually consistent, but all counts are recomputed in a regular background task, so they really are eventually consistent, and care was taken in constructing the query to minimize database (read) locking to ~100-200msec.
Is it better? Yes! Is it perfect? Probably not. Cache invalidation is hard.
Here's the full changelog for this release:
Added
Fixed
Changed
Removed
idx_relationships_type database index.In the next release, I'm going to fix a few bugs in the Mastodon-compatible API. These require an internal redesign, so I've held off until a few other things were out of the way. And I'm turning my attention to reading and better tools for surfacing and finding interesting content.

I need to use a prefix to namespace status IDs vs. boost IDs in the #ktistec Mastodon-compatible API. In Mastodon, a boost is just a status and they share the same ID namespace. Ktistec predates its Mastodon-compatible API, so statuses and boosts are maintained in different tables. I wanted to use an emoji (鉁嶏笍 vs. 馃摚) to distinguish them, but that breaks too many clients.
Shame...

I really enjoy optimization. Release v3.5.0 of Ktistec doesn't drop significant new features, but it does deliver a ~15% smaller executable and significantly faster queries on anonymous endpoints. The two are intertwined.
The size reduction comes from replacing a poorly designed, custom rules engine with a materialized view layer that uses SQL to define membership in a collection. The rules engine worked well enough but required a lot of supporting code to present rules as a DSL (Domain Specific Language) over the domain objects in ktistec. The driving realization was that SQL is a DSL and membership in a collection is just a query and domain objects are just rows. Voil脿!
Query performance improvements came from using this new view layer to materialize two very popular but expensive-to-query views: the instance's public timeline and public hashtag pages. Because both are public pages they receive more traffic than internal pages.
The problem with the original queries was that performance was not uniform. Querying for posts with popular tags was okay. Querying for posts with sparse tags was very slow. I could have added more indexes, but that's its own cost. After the change, endpoints all respond in a consistent ~10msec timeframe and the CPU barely registers when a crawler hits. (I don't want to make things easier for bots, but I don't want to pay a tax for their activity either鈥攁sk me about my new nginx configuration.)
Here is the full changelog:
Added
max-id and min-id pagination links on web pages.Fixed
Changed
Removed
school dependency; replaced by activity processors and materialized views.openssl_ext dependency; vendored in.There are still a few slow queries. In the next release I'm going to see if I can get everything under 10msec, and maybe release a new feature, too. 馃殌

I just finished working on improvements to #ktistec that cut about 15% off the built executable size, and speed up some of the more common public queries by 2x to 5x (they were already fast, so this is headroom).
It does this by replacing a poorly designed, feature poor, custom rules engine with a materialized view layer that uses SQL as its DSL (domain specific language).
I am about to smoke test it on my own site. If it鈥檚 not available鈥攚ell, you know why! 馃榾


This release fixes a small number of bugs found in recent releases.
The full changelog:
Fixed
header and header_static images are always present.replies collection for local objects.Changed
Removed
This release fixes a hard-to-exploit but potentially server-crashing bug. If you're running v3.3.9 or v3.4.0, you should upgrade.

The biggest change in release v3.4.0 of Ktistec is cursor-based pagination for all web-navigable collections (timeline, notifications, etc.). Offset-based pagination will be removed completely in the next release.
Offset-based (e.g. page/size) pagination works well on collections that don't change. But, what does "the second page" contain in a dynamic timeline? Support for cursor-based pagination is required by the Mastodon-compatible API, but has been a desirable feature for quite a while.
While updating queries to paginate by cursor, I also made performance improvements to the queries themselves, as mentioned elsewhere. Scrapers and bots have already adapted鈥攕ort of. I now see odd hybrid requests in the log like /tags/xyz?page=7&min_id=123. Overall CPU usage under normal load is now sitting at 0-1%.
Here is the full changelog for the release:
Added
/api/v1/timelines/tag/:hashtag endpoint.Fixed
published rather than id.Changed
Ktistec::Network.get.Removed
Enjoy!

while replacing page/size pagination with cursor-based pagination throughout ktistec, i took the opportunity to optimize queries. various changes鈥攍ike leveraging the natural sort order of existing indexes鈥攊mproved performance across the board by about 10x. that number is a little bit misleading鈥攓ueries that took ~10msec now take less than 1msec, but that isn鈥檛 much in absolute terms. still, it moves the bottleneck!

Release v3.3.9 of Ktistec continues the security hardening work from recent releases, with further progress on the Mastodon-compatible API.
Of note: all network connections now go through a new Ktistec::Network module. This allows Ktistec to limit the size of HTTP bodies it reads, on both inbound and outbound requests, and ensures it only opens connections to valid remote IP addresses.
Here's the full changelog:
Added
Fixed
Changed
Ktistec::Network.As always, it's worth upgrading for the security fixes!

I don't have a large number of followers, but a recent reply to a relatively short thread (< 10 total posts) resulted in 247 HTTP GETs in the first 100 seconds after the post. Only 29 of those were requests for the object. 218 were requests for the object's replies, which surprised me鈥攚hy do servers poll for replies within the first 100 seconds? Mean response time was 481渭s鈥攚ell under 1ms. Peak throughput hit 20 req/s.