[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"$pC4ZxlFnru":3,"blog-post-shrinking-redis-cache-with-msgp-and-zstd-in-golang":4},"email-kctowspj",{"content":5,"created_at":6,"description":7,"id":8,"keywords":9,"reading_time":13,"slug":14,"title":15,"updated_at":16,"ok":17},"# Shrinking Redis cache with msgp and zstd in Golang.\n\nIf you are storing structured data in Redis using `encoding\u002Fjson`, you might be surprised how much memory you are wasting. JSON is readable, easy to debug, and universally supported. It's also bloated. Field names repeat on every single record, numbers get stored as strings, and boolean values take 4-5 bytes instead of 1.\n\nIn my previous article [High-performance Golang struct optimizations: Paddings and Alignments](https:\u002F\u002Fgozman.space\u002Fblog\u002Fhigh-performance-golang-struct-optimizations-paddings-and-alignments), I showed how reordering struct fields can save 25% of RAM. This time, we are going after the other side of the equation: how the data is serialized before it hits Redis.\n\nI will compare four approaches: plain JSON, [MessagePack via tinylib\u002Fmsgp](https:\u002F\u002Fgithub.com\u002Ftinylib\u002Fmsgp), JSON compressed with [zstd via klauspost\u002Fcompress](https:\u002F\u002Fgithub.com\u002Fklauspost\u002Fcompress), and msgp + zstd combined. We care mostly about stored size here. CPU cost matters too, but for cache serialization, the bottleneck is almost always memory. The results weren't what I expected.\n\n## The struct.\n\nI will reuse the struct from my [previous article](https:\u002F\u002Fgozman.space\u002Fblog\u002Fhigh-performance-golang-struct-optimizations-paddings-and-alignments), with `msg` tags added for msgp code generation:\n```go\n\u002F\u002Fgo:generate go tool msgp\n\ntype NestedLayout struct {\n\tID    int64 `msg:\"id\"`\n\tPhone int64 `msg:\"phone\"`\n\tAge   int32 `msg:\"age\"`\n}\n\ntype Layout struct {\n\tBalanceInCents int64        `msg:\"balance_in_cents\"`\n\tIdempotencyKey int64        `msg:\"idempotency_key\"`\n\tKey            float64      `msg:\"key\"`\n\tUser           NestedLayout `msg:\"user\"`\n\tAreaID         int32        `msg:\"area_id\"`\n\tCreatedAt      int32        `msg:\"created_at\"`\n\tUpdatedAt      int32        `msg:\"updated_at\"`\n\tID             uint32       `msg:\"id\"`\n\tStatus         uint16       `msg:\"status\"`\n\tIsActive       bool         `msg:\"is_active\"`\n\tIsSpecial      bool         `msg:\"is_special\"`\n\tIsMigrated     bool         `msg:\"is_migrated\"`\n\tTenantID       int8         `msg:\"tenant_id\"`\n}\n\n\u002F\u002F Layouts is a named slice type.\n\u002F\u002F msgp cannot generate methods on anonymous slices like []*Layout,\n\u002F\u002F so we need this named type for codegen to work.\ntype Layouts []*Layout\n```\nOne thing to note here: `msgp` requires a named slice type to generate `MarshalMsg` and `UnmarshalMsg` for collections. You can't just pass `[]*Layout` to msgp's codegen. The named type `Layouts` solves this and lets you marshal the whole array in one call.\n\nAfter adding these tags, install msgp as a tool dependency and run code generation:\n```bash\ngo get -tool github.com\u002Ftinylib\u002Fmsgp@latest\ngo generate .\u002F...\n```\nThis produces `*_gen.go` files with `MarshalMsg`, `UnmarshalMsg`, and `Msgsize()` methods for both types. No reflection, no runtime overhead for field lookup.\n\n## Why msgp and not protobuf or other codecs.\n\nI know what you are thinking. Why not protobuf? I wrote about [pitfalls of using Protobuf for Kafka](https:\u002F\u002Fgozman.space\u002Fblog\u002Flong-term-pitfalls-of-using-protobuf-for-apache-kafka) before, and some of those concerns apply to caching too: schema management, versioning, and the requirement to maintain `.proto` files separately from your Go structs. For a cache layer, I want something that works directly with existing Go types.\n\nmsgp generates code from Go struct tags. No separate schema files, no extra compilation step beyond `go generate`. The generated code is fast because it produces direct binary encoding with no reflection, similar to how protobuf works at runtime but without the schema overhead.\n\nOther options like `gob` are Go-specific and not particularly compact. `encoding\u002Fbinary` needs manual marshaling. msgp sits right where I want it: generates fast code from existing structs, no schema files required.\n\n## The encoder and decoder.\n\nThe zstd encoder and decoder are safe for concurrent use but expensive to create, so initialize them once at the package level:\n```go\nimport \"github.com\u002Fklauspost\u002Fcompress\u002Fzstd\"\n\nvar (\n\tzstdEncoder, _ = zstd.NewWriter(nil, zstd.WithEncoderLevel(zstd.SpeedFastest))\n\tzstdDecoder, _ = zstd.NewReader(nil)\n)\n```\nI use `zstd.SpeedFastest` here because we are optimizing for cache throughput. The compression ratio difference between fastest and default level is small for structured data like this, but the CPU savings are noticeable at high request rates.\n\nNow here are the four serialization functions we will benchmark:\n```go\n\u002F\u002F 1. Plain JSON\nfunc encodeJSON(data Layouts) ([]byte, error) {\n\treturn json.Marshal(data)\n}\n\n\u002F\u002F 2. msgp only\nfunc encodeMsgp(data *Layouts) ([]byte, error) {\n\treturn data.MarshalMsg(nil)\n}\n\n\u002F\u002F 3. JSON + zstd\nfunc encodeJSONZstd(data Layouts) ([]byte, error) {\n\tjsonBytes, err := json.Marshal(data)\n\tif err != nil {\n\t\treturn nil, err\n\t}\n\treturn zstdEncoder.EncodeAll(jsonBytes, nil), nil\n}\n\n\u002F\u002F 4. msgp + zstd\nfunc encodeMsgpZstd(data *Layouts) ([]byte, error) {\n\tmsgpBytes, err := data.MarshalMsg(nil)\n\tif err != nil {\n\t\treturn nil, err\n\t}\n\treturn zstdEncoder.EncodeAll(msgpBytes, nil), nil\n}\n```\nAnd decoding:\n```go\nfunc decodeJSON(b []byte) (Layouts, error) {\n\tvar result Layouts\n\treturn result, json.Unmarshal(b, &result)\n}\n\nfunc decodeMsgp(b []byte) (Layouts, error) {\n\tvar result Layouts\n\t_, err := result.UnmarshalMsg(b)\n\treturn result, err\n}\n\nfunc decodeJSONZstd(b []byte) (Layouts, error) {\n\tdecompressed, err := zstdDecoder.DecodeAll(b, nil)\n\tif err != nil {\n\t\treturn nil, err\n\t}\n\tvar result Layouts\n\treturn result, json.Unmarshal(decompressed, &result)\n}\n\nfunc decodeMsgpZstd(b []byte) (Layouts, error) {\n\tdecompressed, err := zstdDecoder.DecodeAll(b, nil)\n\tif err != nil {\n\t\treturn nil, err\n\t}\n\tvar result Layouts\n\t_, err = result.UnmarshalMsg(decompressed)\n\treturn result, err\n}\n```\n\n## Benchmarks.\n\nLet's create a test dataset of 1000 records with realistic data and measure the output size and encoding speed for each approach:\n```go\nfunc generateTestData(n int) Layouts {\n\tdata := make(Layouts, n)\n\tfor i := range n {\n\t\tdata[i] = &Layout{\n\t\t\tBalanceInCents: int64(i*100 + 42),\n\t\t\tIdempotencyKey: int64(1000000 + i),\n\t\t\tKey:            float64(i) * 1.337,\n\t\t\tUser: NestedLayout{\n\t\t\t\tID:    int64(i + 1),\n\t\t\t\tPhone: 15551234567,\n\t\t\t\tAge:   25 + int32(i%40),\n\t\t\t},\n\t\t\tAreaID:     int32(i % 50),\n\t\t\tCreatedAt:  1700000000 + int32(i),\n\t\t\tUpdatedAt:  1700000000 + int32(i) + 3600,\n\t\t\tID:         uint32(i + 1),\n\t\t\tStatus:     uint16(i % 5),\n\t\t\tIsActive:   i%2 == 0,\n\t\t\tIsSpecial:  i%7 == 0,\n\t\t\tIsMigrated: i%3 == 0,\n\t\t\tTenantID:   int8(i % 10),\n\t\t}\n\t}\n\treturn data\n}\n\nfunc BenchmarkEncode(b *testing.B) {\n\tdata := generateTestData(1000)\n\n\tb.Run(\"JSON\", func(b *testing.B) {\n\t\tfor b.Loop() {\n\t\t\t_, _ = encodeJSON(data)\n\t\t}\n\t})\n\n\tb.Run(\"Msgp\", func(b *testing.B) {\n\t\tfor b.Loop() {\n\t\t\t_, _ = encodeMsgp(&data)\n\t\t}\n\t})\n\n\tb.Run(\"JSON+Zstd\", func(b *testing.B) {\n\t\tfor b.Loop() {\n\t\t\t_, _ = encodeJSONZstd(data)\n\t\t}\n\t})\n\n\tb.Run(\"Msgp+Zstd\", func(b *testing.B) {\n\t\tfor b.Loop() {\n\t\t\t_, _ = encodeMsgpZstd(&data)\n\t\t}\n\t})\n}\n```\nAnd a separate test to print the actual byte sizes, which is really the number we care about:\n```go\nfunc TestOutputSize(t *testing.T) {\n\tdata := generateTestData(1000)\n\n\tjsonBytes, _ := encodeJSON(data)\n\tmsgpBytes, _ := encodeMsgp(&data)\n\tjsonZstdBytes, _ := encodeJSONZstd(data)\n\tmsgpZstdBytes, _ := encodeMsgpZstd(&data)\n\n\tt.Logf(\"JSON:       %d bytes\", len(jsonBytes))\n\tt.Logf(\"Msgp:       %d bytes\", len(msgpBytes))\n\tt.Logf(\"JSON+Zstd:  %d bytes\", len(jsonZstdBytes))\n\tt.Logf(\"Msgp+Zstd:  %d bytes\", len(msgpZstdBytes))\n}\n```\nHere are the results from my machine (Go 1.25, Apple M4 Pro). Sizes first, since that is the whole point:\n```bash\n=== RUN   TestOutputSize\n    t_test.go:  JSON:       256243 bytes\n    t_test.go:  Msgp:       189709 bytes\n    t_test.go:  JSON+Zstd:  22914 bytes\n    t_test.go:  Msgp+Zstd:  27517 bytes\n--- PASS: TestOutputSize (0.01s)\n```\nJSON produces ~256 KB for 1000 records. msgp alone drops that to ~190 KB, a **26% reduction**. And here is where it gets interesting: JSON+zstd compresses down to ~23 KB, but msgp+zstd lands at ~27.5 KB. **The msgp+zstd combination is larger than JSON+zstd.** That wasn't what I expected.\n\n## Why msgp+zstd is bigger than JSON+zstd.\n\nThis surprised me at first, but it makes sense once you think about how zstd works. zstd is a dictionary-based compressor. It finds repeated byte sequences and replaces them with short back-references. JSON is full of exactly that kind of redundancy: the field names `\"balance_in_cents\":`, `\"idempotency_key\":`, `\"is_active\":` repeat verbatim for every record in the array. With 1000 records, the string `\"balance_in_cents\"` appears 1000 times. zstd sees that pattern and after the first occurrence, each repetition costs almost nothing.\n\nmsgp has already eliminated that redundancy. Field names are encoded as short binary keys, so there are no long repeated strings left for zstd to find. The msgp output is already compact, which paradoxically means zstd has less to work with. You end up with zstd's frame overhead on top of data that doesn't compress well.\n\nSo the tradeoff is not as simple as \"stack both optimizations for maximum savings.\" If your goal is strictly the smallest possible cache footprint and you can afford the CPU cost, JSON+zstd actually wins on size. But that is not the full picture.\n\n## The speed side of things.\n\nHere are the encoding benchmarks:\n```bash\nBenchmarkEncode\u002FJSON-14          4254       274337 ns\u002Fop     262699 B\u002Fop     2 allocs\u002Fop\nBenchmarkEncode\u002FMsgp-14         37448        31810 ns\u002Fop     221185 B\u002Fop     1 allocs\u002Fop\nBenchmarkEncode\u002FJSON+Zstd-14     1960       557230 ns\u002Fop     591831 B\u002Fop     3 allocs\u002Fop\nBenchmarkEncode\u002FMsgp+Zstd-14     3906       300849 ns\u002Fop     417836 B\u002Fop     2 allocs\u002Fop\n```\nmsgp encoding is **8.6x faster** than JSON. But JSON+zstd, the size winner, is also the **slowest** option at 557 us\u002Fop, more than twice as slow as plain JSON. You pay for JSON's reflection-based marshaling first, then zstd's compression pass on top of a much larger input buffer.\n\nmsgp+zstd at 300 us\u002Fop is almost **2x faster** than JSON+zstd while producing a comparable cache size (27.5 KB vs 22.9 KB). For a cache layer handling thousands of requests per second, that speed difference matters more than 4.6 KB.\n\nAnd msgp alone at 31 us\u002Fop is almost **9x faster** than plain JSON. If your cache data fits comfortably in Redis at 190 KB per user instead of 256 KB, that might be all you need, and you avoid the zstd dependency entirely.\n\nThe auto-generated msgp benchmarks confirm the per-struct performance as well:\n```bash\nBenchmarkMarshalMsgLayout-14     32107621     36.08 ns\u002Fop    224 B\u002Fop    1 allocs\u002Fop\nBenchmarkAppendMsgLayout-14      76366846     15.95 ns\u002Fop      0 B\u002Fop    0 allocs\u002Fop\nBenchmarkUnmarshalLayout-14      18507248     64.46 ns\u002Fop      0 B\u002Fop    0 allocs\u002Fop\n```\n36 nanoseconds to marshal a single struct with 13 fields and a nested sub-struct. Zero allocations on the append path. Generated code with no reflection does well here.\n\n## So which one should you pick?\n\nIt depends on what you are constrained by. Here is how I think about it:\n\nIf **Redis memory is the bottleneck** and you need the absolute smallest stored size, use JSON+zstd. ~23 KB per 1000 records, 91% reduction from plain JSON. You pay for it with 557 us per encode operation, which might be fine if your write rate is low.\n\nIf **CPU and latency matter** and you want fast serialization with a decent size reduction, use msgp alone. ~190 KB per 1000 records, 26% reduction, but 8.6x faster encoding with a single allocation. No compression dependency.\n\nIf you want a **balance between size and speed**, use msgp+zstd. ~27.5 KB per 1000 records, 89% reduction from plain JSON (close to JSON+zstd's 91%) at nearly 2x the encoding speed. This is probably the right default for most applications that need to optimize cache size without introducing a CPU bottleneck.\n\nWhat does this mean in real numbers? If you are caching data for 100,000 active users with 1000 records each, JSON will eat ~24.4 GB of Redis memory. JSON+zstd brings that to ~2.2 GB. Msgp+zstd lands at ~2.6 GB, but encodes twice as fast. Either way, that is the difference between needing a large Redis cluster and getting by with a single instance.\n\n## A note on zstd singleton.\n\nThe first thing the klauspost\u002Fcompress\u002Fzstd documentation warns about is not to create a new `zstd.NewWriter` or `zstd.NewReader` per request. Both are designed to be reused and are safe for concurrent access. Creating them is expensive because zstd initializes internal lookup tables and allocates buffers on construction. If you put `zstd.NewWriter(nil)` inside your request handler, you will burn CPU on initialization that has nothing to do with your actual data.\n```go\n\u002F\u002F NOT this per request\nfunc handleRequest(data []byte) []byte {\n\tencoder, _ := zstd.NewWriter(nil) \u002F\u002F expensive, don't do this\n\tdefer encoder.Close()\n\treturn encoder.EncodeAll(data, nil)\n}\n```\nA cleaner approach is to wrap both into a struct that you initialize once and inject where needed:\n```go\ntype ZstdCompressor struct {\n\tencoder *zstd.Encoder\n\tdecoder *zstd.Decoder\n}\n\nfunc NewZstdCompressor() (*ZstdCompressor, error) {\n\tencoder, err := zstd.NewWriter(nil, zstd.WithEncoderLevel(zstd.SpeedFastest))\n\tif err != nil {\n\t\treturn nil, fmt.Errorf(\"zstd: failed to initialize zstd encoder: %w\", err)\n\t}\n\n\tdecoder, err := zstd.NewReader(nil)\n\tif err != nil {\n\t\t_ = encoder.Close()\n\t\treturn nil, fmt.Errorf(\"zstd: failed to initialize zstd decoder: %w\", err)\n\t}\n\n\treturn &ZstdCompressor{\n\t\tencoder: encoder,\n\t\tdecoder: decoder,\n\t}, nil\n}\n\nfunc (z *ZstdCompressor) Compress(src []byte) []byte {\n\treturn z.encoder.EncodeAll(src, nil)\n}\n\nfunc (z *ZstdCompressor) Decompress(src []byte) ([]byte, error) {\n\tresult, err := z.decoder.DecodeAll(src, nil)\n\tif err != nil {\n\t\treturn nil, fmt.Errorf(\"zstd: failed to decompress: %w\", err)\n\t}\n\n\treturn result, nil\n}\n```\nThis way you handle initialization errors properly instead of swallowing them with `_`, and the compressor can be passed around as a dependency.\n\n## A generic cache helper.\n\nIf you want to use this approach across different types, you can write a generic helper using Go's type constraints. The trick is the two-type-parameter pattern: Go generics cannot call pointer-receiver interface methods on `T` directly, so you need a constraint that ties the pointer type to the value type.\n```go\ntype MsgpCodec[T any] interface {\n\tMarshalMsg([]byte) ([]byte, error)\n\tUnmarshalMsg([]byte) ([]byte, error)\n\t*T\n}\n\nfunc CacheSet[T any, PT MsgpCodec[T]](\n\tctx context.Context,\n\trdb *redis.Client,\n\tkey string,\n\tvalue *T,\n\tttl time.Duration,\n) error {\n\tencoded, err := PT(value).MarshalMsg(nil)\n\tif err != nil {\n\t\treturn err\n\t}\n\tcompressed := zstdEncoder.EncodeAll(encoded, nil)\n\treturn rdb.Set(ctx, key, compressed, ttl).Err()\n}\n\nfunc CacheGet[T any, PT MsgpCodec[T]](\n\tctx context.Context,\n\trdb *redis.Client,\n\tkey string,\n) (*T, error) {\n\tval, err := rdb.Get(ctx, key).Bytes()\n\tif err != nil {\n\t\treturn nil, err\n\t}\n\tdecompressed, err := zstdDecoder.DecodeAll(val, nil)\n\tif err != nil {\n\t\treturn nil, err\n\t}\n\tresult := PT(new(T))\n\t_, err = result.UnmarshalMsg(decompressed)\n\tif err != nil {\n\t\treturn nil, err\n\t}\n\treturn (*T)(result), nil\n}\n```\nThe call site looks a bit verbose because of Go's generics syntax:\n```go\nerr := CacheSet[Layouts, *Layouts](\n\tctx, rdb, \"user:123:accounts\", &accounts, 10*time.Minute,\n)\n```\nIt's not pretty, but it is type safe and you only write the serialization logic once.\n## When this does not help.\n\nBefore you get excited about 90%+ reductions, I need to be clear about what kind of data benefits from this approach. It only makes sense for serialized JSON stored in Redis. If you are caching a plain string, a boolean flag, a counter, or any other scalar value, there is nothing to optimize. Redis already stores those efficiently. A `SET user:123:name \"John\"` is 4 bytes of payload. Running it through msgp + zstd would make it larger, not smaller, because of the encoding headers and compression frame overhead.\n\nThe wins come from structured data: arrays of objects, nested JSON documents, anything where `encoding\u002Fjson` adds repeated field names, type coercion overhead, and structural characters like `{`, `}`, `[`, `]`, `:`, and `,`. The more records in your array and the more fields in your struct, the bigger the savings. A single flat struct with 3 fields will barely compress. A thousand records with 13 fields each will compress dramatically, as we saw in the benchmarks above.\n\nSo keep your regular `SET`\u002F`GET` for simple values. This approach is specifically for the cases where you are serializing Go structs or slices of structs into Redis as JSON blobs.\n\n## Caveats.\n\nA few gotchas before you switch everything to msgp + zstd.\nFirst, **debugging gets harder**. You can't just `redis-cli GET` a key and read the value anymore. You need a small tool to decode and decompress the data. For development and debugging, I would recommend keeping a fallback to JSON or at least having a CLI utility that can decode your cached values.\n\nSecond, **msg tags on every field are mandatory**. Without them, msgp falls back to Go field names for serialization keys. This works until someone renames a struct field and silently breaks deserialization of all existing cached data. Use explicit `msg:\"field_name\"` tags and treat them like database column names: once set, they should not change.\n\nThird, **versioning your cache keys** is good practice when changing serialization formats. If you switch from JSON to msgp, old cached values will fail to decode. Use a versioned key prefix like `v2:user:123:accounts` so old and new formats can coexist during rollout.\n\nFourth, **zstd compression ratio depends on your data**. Repetitive data like arrays of similar structs compresses well. A single small struct with unique values might not shrink much, and you will pay the CPU cost for no benefit. Test with your actual data before committing.\n\nFifth, **this works because our struct uses simple built-in types**. All fields here are `int64`, `int32`, `float64`, `bool`, and so on. msgp knows how to serialize those out of the box. If your struct contains fields from external libraries, like `decimal.Decimal`from shopspring\u002Fdecimal or `uuid.UUID` from google\u002Fuuid, msgp won't know how to encode them. You would need to implement the `msgp.Marshaler` and `msgp.Unmarshaler` interfaces on those types yourself, or convert the fields to primitive types before serialization (for example, storing a Decimal as a string or as cents in `int64`). Not a dealbreaker, but worth knowing before you adopt msgp for structs with non-trivial field types.\n\n## When to bother with this.\n\nSame as with struct padding optimizations: most applications do not need this. If your Redis usage is well within limits and you are not worried about memory costs, plain JSON works fine and is easier to debug.\n\nBut if you are running into Redis memory limits, paying for oversized instances, or caching data for millions of users, a 90% size reduction is hard to ignore. Whether you pick JSON+zstd for maximum compression or msgp+zstd for the speed\u002Fsize balance depends on your workload. Measure both on your actual data. The answer might surprise you, as it surprised me.\n>How to read benchmarks and what else you can optimize in your Go application I described in my article [Optimization Odyssey: pprof-ing & Benchmarking Golang App](https:\u002F\u002Fgozman.space\u002Fblog\u002Foptimization-odyssey-profiling-and-benchmarking-golang-app-with-pprof).\n\nSources:\n- tinylib\u002Fmsgp: https:\u002F\u002Fgithub.com\u002Ftinylib\u002Fmsgp\n- klauspost\u002Fcompress\u002Fzstd: https:\u002F\u002Fgithub.com\u002Fklauspost\u002Fcompress\n- MessagePack specification: https:\u002F\u002Fmsgpack.org\u002F\n- Zstandard RFC 8878: https:\u002F\u002Fdatatracker.ietf.org\u002Fdoc\u002Fhtml\u002Frfc8878","2026-04-03T15:04:53.553823Z","JSON serialization in Redis is convenient but wasteful. I benchmarked four approaches for cached Go structs: plain JSON, msgp, JSON+zstd, and msgp+zstd. All compressed options hit ~90% size reduction, but the results surprised me: msgp+zstd wasn't the smallest because zstd thrives on the exact redundancy that msgp already removes. Real benchmarks, tradeoffs, and a generic cache helper inside.",8,[10,11,12],"golang","optimization","caching",685,"shrinking-redis-cache-with-msgp-and-zstd-in-golang","Shrinking Redis cache with msgp and zstd in Golang","2026-04-04T08:31:18.03766Z",true]