I am implementing a Zero MQ message listener service in Go, and I am struggling to find the most idiomatic pattern of processing those messages and storing them in a DB using Goroutines. I am expecting to receive around 15 messages a second. Below is a basic snippet without any Goroutines at all.
func main() { subscriber := connect() defer subscriber.Close() listenForMessages(subscriber)}// implementation of connect()func listenForMessages(subscriber *zmq4.Socket) { for { msg, err := subscriber.RecvBytes() if err != nil { log.Fatal(err) } process(msg) }}func process(msg bytes[]) { // process the message and write to the db}
As someone who has done a lot of Javascript, I tend to think in Async/Await terms, but I understand that it's the wrong paradigm to think about in Go. Given that, my knee-jerk reaction was to run the function process
as a go routine and be done with it, like below.
func main() { subscriber := connect() defer subscriber.Close() listenForMessages(subscriber)}// implementation of connect()func listenForMessages(subscriber *zmq4.Socket) { for { msg, err := subscriber.RecvBytes() if err != nil { log.Fatal(err) } go process(msg) }}func process(msg bytes[]) { // process the message and write to the db}
This is more so as each incoming message I receive from ZeroMQ will be processed independently, and the only thing I will need to worry about later is DB transactions and locking (which I will look into separately, unless it's relevant to this question).
Now, I have been reading that I should instead be creating some worker goroutines beforehand and pass the msg
to these goroutines via channels, like so. (Note that I have attempted to make it a bit brief, so there might be undeclared variables.)
func main() { messageChannel := make(chan MessageData, 100) var wg sync.WaitGroup numWorkers := runtime.NumCPU() for i := 0; i < numWorkers; i++ { wg.Add(1) go func(workerID int) { defer wg.Done() process(ctx, messageChannel, workerID) }(i) } // ommiting some other channels related to handling graceful exits for brevity subscriber := connect() wg.Add(1) go func() { defer wg.Done() listenForMessages(ctx, subscriber, messageChannel) }() defer subscriber.Close()}// implementation of connect()func listenForMessages(ctx context.Context, subscriber *zmq4.Socket, messageChannel chan<- MessageData) { for { select { case <-ctx.Done(): // Context cancelled, exit the goroutine log.Println("Message listener shutting down") return default: msg, err := subscriber.RecvBytes() if err != nil { log.Fatal(err) } select { case messageChannel <- MessageData{rawMessage: msg}: log.Println("Received and queued message") // handle more signals for graceful shutdown } } }}func process(ctx context.Context, messageChannel <-chan MessageData, workerID int) { for { select { case <-ctx.Done(): // Context cancelled, exit the goroutine log.Printf("Worker %d shutting down", workerID) return case msgData, ok := <-messageChannel: if !ok { // Channel closed, exit the goroutine log.Printf("Worker %d: message channel closed, shutting down", workerID) return } // process the message and write to the db } }}
So, for a 6-core hyperthreaded processor, there will be 12 workers. A separate goroutine, apart from the main one, handles the listening and sends the received message to the channel that all of these 12 workers are waiting(?) for. This looks overly complex, and I am wary that this might lead to race conditions or other issues, especially when writing to the DB. I have read that spawning a new goroutine on every message might lead to a huge number of goroutines, but isn't Go able to handle thousands of goroutines? But at the same time, I don't know enough Go to judge whether this is the correct way to achieve what I want.
As an aside, I am also confused if I should go unbuffered or use a buffer size. As mentioned in the beginning, I will be receiving around 15 messages a second with maybe higher bursts and wouldn't want to drop messages.