At KubeCon Europe a very good chunk of booths were observability stacks. Everyone was claiming they're better than the competitors (with some of the just justifying themselves by saying "it's written in Rust).
Having dealt with Prometheus (+Thanos) / Grafana / OTEL and other stacks (e.g: custom solution on ClickHouse, Victoria{Metrics,Logs}, Jaeger/Tempo, Loki, ...) and even cloud ones (Google's Monarch rebranded as Prometheus)... what's your selling point? This to me seems like yet another way to re-invent the wheel.
If it's just for running locally, okay, fine, but when it comes to production (where the stack really matters) at scale, you end up with lots of tradeoffs and approaches.
Why is this one a winning one compared to the overwhelming "competition"? Seems like we're re-inventing the wheel for the 100th time instead of focusing on unifying the efforts in making the existing solutions better. Thankfully we now have OTEL, so at least the interoperability part is somewhat solved (or mitigated)
I have tried to self host grafana (loki prom and alloy) as o11y stack for prepbook.app. This is hard. I have a bsc in cs not that it says something. I managed to do it eventually, after some research. It was not plug and play in any way. The docs kept saying this solution is not production ready even. I couldn't find the production guide, only the "forget about self hosting and simply pay for us hosting this". After I deployed it the UX was so abrasive my partner won't even try to go into it to figure out a problem. It was a few months ago. Since then new solutions have arrived and I'm waiting to have the time to migrate. I saw PostHog have a solution but I prefer something I could self host and completely own.
I thought how come no one is trying to solve this problem. It looks like it's just a matter of time.
With that being said, my experience can be very skewed since prepbook is a passion project running on a VPS with essentially 0 scale. All I care about is the UX of the stack, not scale. Just for context.
I was looking into this just yesterday. So the Loki + … comparison is a bit off in the Open Source space. The main ones are Signoz and ClickStack in this space. Both using ClickHouse as the database. Heavy compared to something like Loki, but they are OTEL native and not log monitoring. So not in the same category.
I used Signoz + Clickstack on a vibe coded Go server project a few weeks ago. I just made codex figure out how to setup signoz + dependencies via docker compose. I even got it to pre-populate signoz with dashboards. It wasn't too bad. The whole thing runs with a few GB. I tried to cover metrics, tracing, and logging at the same time. This is not a production ready setup but you need to trade off cost vs. utility here. If it's useful enough, that could justify extra cost.
I have a background in having done a lot of stuff on the Elastic stack related to this; including setting up a big Elastic Fleet based stack for one client at some point. It might not be the cheapest, but it does provide awesome filtering and querying capabilities. However, a lot of teams that use it don't really know how to tap into that capability so it tends to be overengineered for what it does in the end. And the extra, underutilized complexity is why a lot of teams are wary of dealing with that stack.
Storing the data is the easy part but what's the point if you can't run queries against it and produce dashboards and diagnostic tools that actually help you? Prometheus/grafana or older graphite type setups tend to be compromises where you get lots of data but are then limited on the querying front or the number of metrics. The tradeoff is always between scale and querying flexibility. If you store tens/hundreds of GB of telemetry per day, you need a way to make sense of it. Clickhouse seems to be quite good at scaling and querying. It's basically a column database. I don't have direct experience with Loki.
But in the end, all that power only matters if people actually use it. And, again, in my experience teams tend not to. They tend to have a lot of unrealized aspirations around their tools and infrastructure. If it's just a dumping ground for data + a few simplistic dashboards, optimize for that. A lot of that data is actually only kept for compliance/auditing reasons. For that, querying is usually a secondary concern and it's OK if queries take a bit longer and are less powerful.
At KubeCon Europe a very good chunk of booths were observability stacks. Everyone was claiming they're better than the competitors (with some of the just justifying themselves by saying "it's written in Rust).
Having dealt with Prometheus (+Thanos) / Grafana / OTEL and other stacks (e.g: custom solution on ClickHouse, Victoria{Metrics,Logs}, Jaeger/Tempo, Loki, ...) and even cloud ones (Google's Monarch rebranded as Prometheus)... what's your selling point? This to me seems like yet another way to re-invent the wheel.
If it's just for running locally, okay, fine, but when it comes to production (where the stack really matters) at scale, you end up with lots of tradeoffs and approaches.
Why is this one a winning one compared to the overwhelming "competition"? Seems like we're re-inventing the wheel for the 100th time instead of focusing on unifying the efforts in making the existing solutions better. Thankfully we now have OTEL, so at least the interoperability part is somewhat solved (or mitigated)
I have tried to self host grafana (loki prom and alloy) as o11y stack for prepbook.app. This is hard. I have a bsc in cs not that it says something. I managed to do it eventually, after some research. It was not plug and play in any way. The docs kept saying this solution is not production ready even. I couldn't find the production guide, only the "forget about self hosting and simply pay for us hosting this". After I deployed it the UX was so abrasive my partner won't even try to go into it to figure out a problem. It was a few months ago. Since then new solutions have arrived and I'm waiting to have the time to migrate. I saw PostHog have a solution but I prefer something I could self host and completely own.
I thought how come no one is trying to solve this problem. It looks like it's just a matter of time.
With that being said, my experience can be very skewed since prepbook is a passion project running on a VPS with essentially 0 scale. All I care about is the UX of the stack, not scale. Just for context.
how can you claim in the readme "no per-language vendor SDK" and then link to a list of per-language client SDKs?
Aren't they two different things? Vendor SDKs to get the data in, client SDKs as an option to get the data out?
I was looking into this just yesterday. So the Loki + … comparison is a bit off in the Open Source space. The main ones are Signoz and ClickStack in this space. Both using ClickHouse as the database. Heavy compared to something like Loki, but they are OTEL native and not log monitoring. So not in the same category.
I used Signoz + Clickstack on a vibe coded Go server project a few weeks ago. I just made codex figure out how to setup signoz + dependencies via docker compose. I even got it to pre-populate signoz with dashboards. It wasn't too bad. The whole thing runs with a few GB. I tried to cover metrics, tracing, and logging at the same time. This is not a production ready setup but you need to trade off cost vs. utility here. If it's useful enough, that could justify extra cost.
I have a background in having done a lot of stuff on the Elastic stack related to this; including setting up a big Elastic Fleet based stack for one client at some point. It might not be the cheapest, but it does provide awesome filtering and querying capabilities. However, a lot of teams that use it don't really know how to tap into that capability so it tends to be overengineered for what it does in the end. And the extra, underutilized complexity is why a lot of teams are wary of dealing with that stack.
Storing the data is the easy part but what's the point if you can't run queries against it and produce dashboards and diagnostic tools that actually help you? Prometheus/grafana or older graphite type setups tend to be compromises where you get lots of data but are then limited on the querying front or the number of metrics. The tradeoff is always between scale and querying flexibility. If you store tens/hundreds of GB of telemetry per day, you need a way to make sense of it. Clickhouse seems to be quite good at scaling and querying. It's basically a column database. I don't have direct experience with Loki.
But in the end, all that power only matters if people actually use it. And, again, in my experience teams tend not to. They tend to have a lot of unrealized aspirations around their tools and infrastructure. If it's just a dumping ground for data + a few simplistic dashboards, optimize for that. A lot of that data is actually only kept for compliance/auditing reasons. For that, querying is usually a secondary concern and it's OK if queries take a bit longer and are less powerful.
I agree. The sentiment applies to most analytics. People who setup analytics are not the same as end users.
I'm partial to open observe, especially because in Ruby the OTEL stuff isn't great for metrics and logs yet.
I also run open observe at home, but I can't help but feel that the interface could use some... sparkle, and the mobile experience kinda sucks.
But you can't beat the excellent price and performance. Does what I need and much more
There's a few contenders in self-hostable otel:
- ClickStack (ex HyperDX) - SigNoz - Traceway - a few more
does someone has enough feedback on those to be able to tell which one works best?
This looks cool
Funny, the first thing I look for for infra projects like these is to find out if it's written in Go. At that point, my confidence level is increased.
Here's something better than that:
https://github.com/plausible/analytics
Elixir.
Why is it better? On the internet it is not enough to just say something. You need to deliver some facts and / or a comparison. Please try it.