In January Stuart Marks published a blog post named “Processing Large Files in Java” as a response to a post by Paige Niedringhaus about “Using Java to Read Really, Really Large Files”. Niedringhaus there reports her experience with JavaScript to solve a “coding challenge” where a “very large file” has to be processed and four specific questions where asked about the processed file. After solving the challenge in JavaScript, Niedringhaus then moved forward and successfully implemented a solution to the challenge in Java as she was curious about Java and how to do it in that language.

This article starts where Marks left and tries to improve on the performance aspect of the code further; until we hit the wall.

Objectives

In this article we discover how to implement a challenges solution in Go and then increase performance by 650% going below 4 seconds from 25 seconds, which is over 20x improvement from the challenges original version of 84 seconds over the course of 9 revisions. We’ll also have a look at performance limiting aspects of this challenge.

Performance is our priority for now. Marks and others pointed out that code quality can be equally important. Thats very true and Niedringhaus’s first implementation was “just” about how to solve the challenge and not about performance; that became a focus when other solutions where available to her. For a program to run such a long time, nearly one an a half minute, it is a valid thing to measure its performance and then compare it to the code that lead to this perfomance. With that measurement we can at least quantify a quality of the code.

But lets first get an overview about the challenge, have a short look at two existing implementations and then create a first port of the fastest of those implementations to Go.

The Challenge

The mission statement of the challenge is written as

c1 – Write a program that will print out the total number of lines in the file.

c2 – Notice that the 8th column contains a person’s name. Write a program that loads in this data and creates an array with all name strings. Print out the 432nd and 43243rd names.

c3 – Notice that the 5th column contains a form of date. Count how many donations occurred in each month and print out the results.

c4 – Notice that the 8th column contains a person’s name. Create an array with each first name. Identify the most common first name in the data and how many times it occurs.

(we name the challenges here to make them referenceable later)

The dataset to be processed is freely available data about “Campain Finance Election Contributions by individuals” provided by the U.S. Federal Elections Commission and about 3.3GB in size uncompressed.

Baseline – 7 Iterations in Java

In his article, Marks takes the opportunity to analyse Niedringhaus' Java implementation and “focus on changing aspects of the computation to improve runtime performance” and also to “present some interesting opportunities for optimizations and use of newer APIs”. Overall, Marks establishes a performance baseline for his environment, transforms the given Java programm using his knowledge of Java and reduces the programs runtime by factor of about 3.4x from 108 seconds down to 32 seconds in the course of seven iterations.

After reading Marks article I started to think of how to implement that challenges solution in Go and where we can go in terms of its performance.

Before we start, we set ourselves a baseline by executing the existing implementations given by Niedringhaus and Marks and run them on our environment, a 2017 MacBookPro (Model 14,3, I7-7920HQ) laptop.

Author	Implementation	Duration [s]	VM Version
Niedringhaus	Java – BufferedReader	84	openjdk version “11.0.2” 2019-01-15
Marks	Java – Variation7	23	openjdk version “11.0.2” 2019-01-15

The dataset was taken at the 4th of January 2019 and has 18'245'416 lines of ASCII text data. Its SHA1 sum is: 0093f547b28f6c897c81bc3865b28a723a497847.

Check also Implementations of Others that where done after Marks' initial post and after I started with my article. These articles are interesting for their comparing runtime but also for their differences in methods, libraries and languages they used for their implementation.

9 Revisions in Go

After setting up the baseline for performance comparisons we port Marks' fastest implementation and then go through nine revisions of Go code. During these revisions we’ll find interesting aspects that influence the performance in terms of their runtime as well as of their memory-consumption and -behaviour.

I don’t keep it back that we’re driven by plain performance interests at first, making this thing as fast as possible. On the way down to low single digit seconds performance we take different roads of techniques available only to get to a point where we “hit the wall” and get some surprising insights whats happening there. Stay tuned!

Revision 0 – A Go Port

readfile0.go is a straightforward port from Variation7 in Java to Go. We use no external libraries throughout this article. For certain constructs we don’t have a library or utility like Java’s Stream utilities at all, so we have to implement them using Go’s builtin types or the standard library.

For this challenge the given solution can be divided into three sections

opening the file and defining variables and lists
reading the file and processing it line by line
calculate and report what was found

In the first section, Lines 17–28, we define all lists and variables needed to collect data in section 2. Then we use Go’s bufio.Scanner to read and tokenize the given file line by line. We also use a regexp here to parse firstnames in the 8th column of the line.

The 2nd section, Lines 31–48, first parses the whole name from column 8 using strings.SplitN and then appends it to the names list. If we find a first name with the regexp it is also collected to the firstNames list. Last, we take the 5th column and parse it as a year-month string (“2017-10”) and append it to the dates list.

After all lines are processed, the third section, Lines 50-83, then analyses and reports according to the challenges given tasks (c1-c4). Reporting c1: total number of lines of the processed file is easy. Then as the challenges c2 states, we have to report three names at the indexes 0, 432 and 43243. Next a frequency table of the donations has to be made by each month over the years.

The Java implementation uses a sequence of stream processing utilities (java.util.stream.Collectors.groupingBy(…)) that we don’t have in Go to get the donation dates frequencies. Instead we use a map of strings to ints and count up donation dates occurrences by iterating over the dates list and report them according to c3. Last, for c4, the most common donators firstname has to be reported. Similarly to the donations dates, we use a map of strings to integers and iterate over the firstnames found in section two. During that process we note the most common name and its count that then gets reported.

readfile0.go
Lines 17–28
Lines 31–48
Lines 50–83

 1package main
 2
 3import (
 4	"bufio"
 5	"fmt"
 6	"log"
 7	"os"
 8	"path/filepath"
 9	"regexp"
10	"strings"
11	"time"
12)
13
14func main() {
15	start := time.Now()
16
17	file, err := os.Open(os.Args[1])
18	if err != nil {
19		log.Fatal(err)
20	}
21	defer file.Close()
22
23	firstNamePat := regexp.MustCompile(", \\s*([^, ]+)")
24	names := make([]string, 0)
25	firstNames := make([]string, 0)
26	dates := make([]string, 0)
27	commonName := ""
28	commonCount := 0
29
30	scanner := bufio.NewScanner(file)
31	for scanner.Scan() {
32		text := scanner.Text()
33
34		// get all the names
35		split := strings.SplitN(text, "|", 9) // 10.95
36		name := strings.TrimSpace(split[7])
37		names = append(names, name)
38
39		// extract first names
40		if matches := firstNamePat.FindAllStringSubmatch(name, 1); len(matches) > 0 {
41			firstNames = append(firstNames, matches[0][1])
42		}
43
44		// extract dates
45		chars := strings.TrimSpace(split[4])[:6]
46		date := chars[:4] + "-" + chars[4:6]
47		dates = append(dates, date)
48	}
49
50	// report c2: names at index
51	fmt.Printf("Name: %s at index: %v\n", names[0], 0)
52	fmt.Printf("Name: %s at index: %v\n", names[432], 432)
53	fmt.Printf("Name: %s at index: %v\n", names[43243], 43243)
54	fmt.Printf("Name time: %v\n", time.Since(start))
55
56	// report c1: total number of lines
57	fmt.Printf("Total file line count: %v\n", len(names))
58	fmt.Printf("Line count time: : %v\n", time.Since(start))
59
60	// report c3: donation frequency
61	dateMap := make(map[string]int)
62	for _, date := range dates {
63		dateMap[date] += 1
64	}
65	for k, v := range dateMap {
66		fmt.Printf("Donations per month and year: %v and donation count: %v\n", k, v)
67	}
68	fmt.Printf("Donations time: : %v\n", time.Since(start))
69
70	// report c4: most common firstName
71	nameMap := make(map[string]int)
72	ncount := 0 // new count
73	for _, name := range firstNames {
74		ncount = nameMap[name] + 1
75		nameMap[name] = ncount
76		if ncount > commonCount {
77			commonName = name
78			commonCount = ncount
79		}
80	}
81
82	fmt.Printf("The most common first name is: %s and it occurs: %v times.\n", commonName, commonCount)
83	fmt.Printf("Most common name time: %v\n", time.Since(start))
84	fmt.Fprintf(os.Stderr, "revision: %v, runtime: %v\n", filepath.Base(os.Args[0]), time.Since(start))
85}

17	file, err := os.Open(os.Args[1])
18	if err != nil {
19		log.Fatal(err)
20	}
21	defer file.Close()
22
23	firstNamePat := regexp.MustCompile(", \\s*([^, ]+)")
24	names := make([]string, 0)
25	firstNames := make([]string, 0)
26	dates := make([]string, 0)
27	commonName := ""
28	commonCount := 0

31	for scanner.Scan() {
32		text := scanner.Text()
33
34		// get all the names
35		split := strings.SplitN(text, "|", 9) // 10.95
36		name := strings.TrimSpace(split[7])
37		names = append(names, name)
38
39		// extract first names
40		if matches := firstNamePat.FindAllStringSubmatch(name, 1); len(matches) > 0 {
41			firstNames = append(firstNames, matches[0][1])
42		}
43
44		// extract dates
45		chars := strings.TrimSpace(split[4])[:6]
46		date := chars[:4] + "-" + chars[4:6]
47		dates = append(dates, date)
48	}

50	// report c2: names at index
51	fmt.Printf("Name: %s at index: %v\n", names[0], 0)
52	fmt.Printf("Name: %s at index: %v\n", names[432], 432)
53	fmt.Printf("Name: %s at index: %v\n", names[43243], 43243)
54	fmt.Printf("Name time: %v\n", time.Since(start))
55
56	// report c1: total number of lines
57	fmt.Printf("Total file line count: %v\n", len(names))
58	fmt.Printf("Line count time: : %v\n", time.Since(start))
59
60	// report c3: donation frequency
61	dateMap := make(map[string]int)
62	for _, date := range dates {
63		dateMap[date] += 1
64	}
65	for k, v := range dateMap {
66		fmt.Printf("Donations per month and year: %v and donation count: %v\n", k, v)
67	}
68	fmt.Printf("Donations time: : %v\n", time.Since(start))
69
70	// report c4: most common firstName
71	nameMap := make(map[string]int)
72	ncount := 0 // new count
73	for _, name := range firstNames {
74		ncount = nameMap[name] + 1
75		nameMap[name] = ncount
76		if ncount > commonCount {
77			commonName = name
78			commonCount = ncount
79		}
80	}
81
82	fmt.Printf("The most common first name is: %s and it occurs: %v times.\n", commonName, commonCount)
83	fmt.Printf("Most common name time: %v\n", time.Since(start))

Source code is presented from now on always in their entire length but accomodated by snippets of lines and also a diff-file. Sometimes and where applicable lines are highlighted. Unfortunately the Highlighter I use here has a bug, an Go code sometimes is slightly wrong intended, especially after line-comments

Revision 0 – Performance

Lets run this first revision and see how it performs.

$ go run rev0/readfile0.go itcont.txt
Most common name time: 24.976828s
$

Just below 25 seconds, that’s interesting! Lets see how the output compares to Variation7.

Go - Revision 0
Java - Variation 7

 1Name: PEREZ, JOHN A at index: 0
 2Name: MILESKI, JOHN A at index: 432
 3Name: COX, JOHN MARTIN at index: 43243
 4Name time: 29.957748306s
 5Total file line count: 18245416
 6Line count time: 29.957770809s
 7Donations per month and year: 2018-10 and donation count: 2615902
 8Donations per month and year: 2018-04 and donation count: 3033610
 9Donations per month and year: 2018-06 and donation count: 496569
10Donations per month and year: 2018-02 and donation count: 765317
11Donations per month and year: 2017-08 and donation count: 298740
12Donations per month and year: 2017-12 and donation count: 486447
13Donations per month and year: 2018-07 and donation count: 1105070
14Donations per month and year: 2018-09 and donation count: 1264597
15Donations per month and year: 2018-11 and donation count: 302025
16Donations per month and year: 2017-02 and donation count: 36367
17Donations per month and year: 2017-03 and donation count: 78664
18Donations per month and year: 2017-10 and donation count: 585708
19Donations per month and year: 2018-05 and donation count: 533304
20Donations per month and year: 2018-01 and donation count: 1252689
21Donations per month and year: 2017-07 and donation count: 768051
22Donations per month and year: 2017-09 and donation count: 377544
23Donations per month and year: 2018-03 and donation count: 271267
24Donations per month and year: 2017-06 and donation count: 180699
25Donations per month and year: 2017-11 and donation count: 346019
26Donations per month and year: 2017-01 and donation count: 21
27Donations per month and year: 2017-05 and donation count: 197636
28Donations per month and year: 2017-04 and donation count: 254810
29Donations per month and year: 2018-12 and donation count: 2259957
30Donations per month and year: 2018-08 and donation count: 733865
31Donations per month and year: 2019-01 and donation count: 538
32Donations time: 30.339466074s
33The most common first name is: JOHN and it occurs: 475350 times.
34Most common name time: 33.686980274s

 1Name: PEREZ, JOHN A at index: 0
 2Name: MILESKI, JOHN A at index: 432
 3Name: COX, JOHN MARTIN at index: 43243
 4Name time: 22136ms
 5Total file line count: 18245416
 6Line count time: 22152ms
 7Donations per month and year: 2018-04 and donation count: 3033610
 8Donations per month and year: 2018-05 and donation count: 533304
 9Donations per month and year: 2018-06 and donation count: 496569
10Donations per month and year: 2018-07 and donation count: 1105070
11Donations per month and year: 2017-10 and donation count: 585708
12Donations per month and year: 2018-01 and donation count: 1252689
13Donations per month and year: 2017-11 and donation count: 346019
14Donations per month and year: 2018-02 and donation count: 765317
15Donations per month and year: 2017-12 and donation count: 486447
16Donations per month and year: 2018-03 and donation count: 271267
17Donations per month and year: 2018-08 and donation count: 733865
18Donations per month and year: 2018-09 and donation count: 1264597
19Donations per month and year: 2017-03 and donation count: 78664
20Donations per month and year: 2017-04 and donation count: 254810
21Donations per month and year: 2017-05 and donation count: 197636
22Donations per month and year: 2017-06 and donation count: 180699
23Donations per month and year: 2018-11 and donation count: 302025
24Donations per month and year: 2019-01 and donation count: 538
25Donations per month and year: 2018-12 and donation count: 2259957
26Donations per month and year: 2017-01 and donation count: 21
27Donations per month and year: 2017-02 and donation count: 36367
28Donations per month and year: 2018-10 and donation count: 2615902
29Donations per month and year: 2017-07 and donation count: 768051
30Donations per month and year: 2017-08 and donation count: 298740
31Donations per month and year: 2017-09 and donation count: 377544
32Donations time: 22781ms
33The most common first name is: JOHN and it occurs: 475350 times.
34Most common name time: 23881ms

We get the same output regarding its format. And with 25 seconds we’re 8% slower than Variation7.

Lets close this section of the first revision for now. We will now go through several steps of optimization (and sometimes non-optimization) and try to get as low as possible from a performance perspective.

At the end I’ll present performance data for all revisions in detail. For each revision the revisions best runtime will be noted so we get an indication where we are. Beside plain runtime we’ll also inspect the total memory allocation as we as the allocated heap- and system memory.

A Plan to Go Forward

With Revision 0 we are 8% slower than the run-time of Variation7. One can say that readline0.go is a fairly simple and readable Go program. It is implemented in one function, like its predecessors, and has a simple linear non-parallel structure. There is no explicit parallelism or concurrency going on during its runtime (actually the GC is doing its work concurrently but thats out of our control for now).

Concurrency in Go – When Go entered the attention of the programming language and computing communities, it was presented as a simple and programming efficiency oriented langugage. It also had some features to be prepared for the then coming multi-core environments. Coming from Google and with its concept of goroutines and channels backed right into the language (inspired by Tony Hoares Communicating Sequential Processes, CSP) it was ready to be a language for a network-computing oriented audience (The term Cloud then was not invented yet). Concurrency is and was a thing for Go, not parallelism. Concurrency in Go is more like a tool to structural composition, not doing stuff in parallel even it it can be used to do that.

Why talking about concurrency and parallelism here? Sometimes Go newcomers tend to improve lacking performance by using Go’s goroutines and channels just to get better performance. Programmers from other languages do the same and “go parallel” using multi-threading to get work distributed to their multicore CPUs. There is nothing wrong by doing more work at the same time, but it can lead to other problems when the communication overhead kicks in and limits its intended performance gain. Writing this, we now will go ahead and kind of naively apply Go’s concurrency features to our problem; and see what happens.

Revision 1 – A Naive Performance Improvement

If we think about the structure of our challenges solution, where could we improve performance? We read lines of data, process them, store them in lists and then process them to get a report. So if we try to process these parts of our programs structure in parallel, why not use Go’s goroutines and let them communicate through channels. This is what we try to do with Revision 1 – readfile1.go

start a goroutine reading from three channels nameC, lastnameC, datesC to append lists
for each line, start a goroutine for every line and parse 3 fields to send them over to one of these three channels
wait until all goroutines are finished
report according to the challenges rules (c1-c4)

After reading a line of the dataset, we start a goroutine Lines 59-76 where we parse the three fields we’re interested in (name, lastname, donation date) and send those fields over to a concurrently runnning goroutine Lines 34-54 which collects those fields and appends them to their corresponding lists.

Lines 59-76
Lines 34-54
readfile1.go
diff -u rev0.go rev1.go

59		go func() {
60			// get all the names
61			split := strings.SplitN(text, "|", 9)
62			name := strings.TrimSpace(split[7])
63			namesC <- name
64
65			// extract first names
66			if matches := firstNamePat.FindAllStringSubmatch(name, 1); len(matches) > 0 {
67				firstNamesC <- matches[0][1]
68			} else {
69				wg.Add(-1)
70			}
71
72			// extract dates
73			chars := strings.TrimSpace(split[4])[:6]
74			date := chars[:4] + "-" + chars[4:6]
75			datesC <- date
76		}()

34	go func() {
35		for {
36			select {
37			case n, ok := <-namesC:
38				if ok {
39					names = append(names, n)
40					wg.Done()
41				}
42			case fn, ok := <-firstNamesC:
43				if ok {
44					firstNames = append(firstNames, fn)
45					wg.Done()
46				}
47			case d, ok := <-datesC:
48				if ok {
49					dates = append(dates, d)
50					wg.Done()
51				}
52			}
53		}
54	}()

  1package main
  2
  3import (
  4	"bufio"
  5	"fmt"
  6	"log"
  7	"os"
  8	"regexp"
  9	"strings"
 10	"sync"
 11	"time"
 12)
 13
 14func main() {
 15	start := time.Now()
 16	file, err := os.Open(os.Args[1])
 17	if err != nil {
 18		log.Fatal(err)
 19	}
 20	defer file.Close()
 21
 22	firstNamePat := regexp.MustCompile(", \\s*([^, ]+)")
 23	names := make([]string, 0)
 24	firstNames := make([]string, 0)
 25	dates := make([]string, 0)
 26	commonName := ""
 27	commonCount := 0
 28	scanner := bufio.NewScanner(file)
 29	namesC := make(chan string)
 30	firstNamesC := make(chan string)
 31	datesC := make(chan string)
 32	wg := sync.WaitGroup{}
 33
 34	go func() {
 35		for {
 36			select {
 37			case n, ok := <-namesC:
 38				if ok {
 39					names = append(names, n)
 40					wg.Done()
 41				}
 42			case fn, ok := <-firstNamesC:
 43				if ok {
 44					firstNames = append(firstNames, fn)
 45					wg.Done()
 46				}
 47			case d, ok := <-datesC:
 48				if ok {
 49					dates = append(dates, d)
 50					wg.Done()
 51				}
 52			}
 53		}
 54	}()
 55
 56	for scanner.Scan() {
 57		text := scanner.Text()
 58		wg.Add(3)
 59		go func() {
 60			// get all the names
 61			split := strings.SplitN(text, "|", 9)
 62			name := strings.TrimSpace(split[7])
 63			namesC <- name
 64
 65			// extract first names
 66			if matches := firstNamePat.FindAllStringSubmatch(name, 1); len(matches) > 0 {
 67				firstNamesC <- matches[0][1]
 68			} else {
 69				wg.Add(-1)
 70			}
 71
 72			// extract dates
 73			chars := strings.TrimSpace(split[4])[:6]
 74			date := chars[:4] + "-" + chars[4:6]
 75			datesC <- date
 76		}()
 77	}
 78	wg.Wait()
 79	close(namesC)
 80	close(firstNamesC)
 81	close(datesC)
 82
 83	// report c2: names at index
 84	fmt.Printf("Name: %s at index: %v\n", names[0], 0)
 85	fmt.Printf("Name: %s at index: %v\n", names[432], 432)
 86	fmt.Printf("Name: %s at index: %v\n", names[43243], 43243)
 87	fmt.Printf("Name time: %v\n", time.Since(start))
 88
 89	// report c1: total number of lines
 90	fmt.Printf("Total file line count: %v\n", len(names))
 91	fmt.Printf("Line count time: %v\n", time.Since(start))
 92
 93	// report c3: donation frequency
 94	dateMap := make(map[string]int)
 95	for _, date := range dates {
 96		dateMap[date] += 1
 97	}
 98	for k, v := range dateMap {
 99		fmt.Printf("Donations per month and year: %v and donation count: %v\n", k, v)
100	}
101	fmt.Printf("Donations time: %v\n", time.Since(start))
102
103	// report c4: most common firstName
104	nameMap := make(map[string]int)
105	nameCount := 0 // new count
106	for _, name := range firstNames {
107		nameCount = nameMap[name] + 1
108		nameMap[name] = nameCount
109		if nameCount > commonCount {
110			commonName = name
111			commonCount = nameCount
112		}
113	}
114	fmt.Printf("The most common first name is: %s and it occurs: %v times.\n", commonName, commonCount)
115	fmt.Printf("Most common name time: %v\n", time.Since(start))
116}

 1--- rev0/readfile0.go	2019-03-30 01:35:01.000000000 +0100
 2+++ rev1/readfile1.go	2019-03-30 01:29:26.000000000 +0100
 3@@ -7,6 +7,7 @@
 4 	"os"
 5 	"regexp"
 6 	"strings"
 7+	"sync"
 8 	"time"
 9 )
10 
11@@ -24,26 +25,60 @@
12 	dates := make([]string, 0)
13 	commonName := ""
14 	commonCount := 0
15-
16 	scanner := bufio.NewScanner(file)
17-	for scanner.Scan() {
18-		text := scanner.Text()
19-
20-		// get all the names
21-		split := strings.SplitN(text, "|", 9)
22-		name := strings.TrimSpace(split[7])
23-		names = append(names, name)
24-
25-		// extract first names
26-		if matches := firstNamePat.FindAllStringSubmatch(name, 1); len(matches) > 0 {
27-			firstNames = append(firstNames, matches[0][1])
28+	namesC := make(chan string)
29+	firstNamesC := make(chan string)
30+	datesC := make(chan string)
31+	wg := sync.WaitGroup{}
32+
33+	go func() {
34+		for {
35+			select {
36+			case n, ok := <-namesC:
37+				if ok {
38+					names = append(names, n)
39+					wg.Done()
40+				}
41+			case fn, ok := <-firstNamesC:
42+				if ok {
43+					firstNames = append(firstNames, fn)
44+					wg.Done()
45+				}
46+			case d, ok := <-datesC:
47+				if ok {
48+					dates = append(dates, d)
49+					wg.Done()
50+				}
51+			}
52 		}
53+	}()
54 
55-		// extract dates
56-		chars := strings.TrimSpace(split[4])[:6]
57-		date := chars[:4] + "-" + chars[4:6]
58-		dates = append(dates, date)
59+	for scanner.Scan() {
60+		text := scanner.Text()
61+		wg.Add(3)
62+		go func() {
63+			// get all the names
64+			split := strings.SplitN(text, "|", 9)
65+			name := strings.TrimSpace(split[7])
66+			namesC <- name
67+
68+			// extract first names
69+			if matches := firstNamePat.FindAllStringSubmatch(name, 1); len(matches) > 0 {
70+				firstNamesC <- matches[0][1]
71+			} else {
72+				wg.Add(-1)
73+			}
74+
75+			// extract dates
76+			chars := strings.TrimSpace(split[4])[:6]
77+			date := chars[:4] + "-" + chars[4:6]
78+			datesC <- date
79+		}()
80 	}
81+	wg.Wait()
82+	close(namesC)
83+	close(firstNamesC)
84+	close(datesC)
85 
86 	// report c2: names at index
87 	fmt.Printf("Name: %s at index: %v\n", names[0], 0)

We decouple the part where we read lines of text from the parts where these lines are processsed and then the data is collected. The decoupling is done by issuing goroutines and connect them back using channels. When layed out to multiple CPU cores the following could happen by reading lines and processing them concurrently on other cores

CPU-0: [read line1][read line2][read line3]
CPU-1:             [process line1]         [process line3]
CPU-2:                         [process line2]

which is obiviously more efficient than

CPU-0: [read line1][process line1][read line2][process line2][read line3][process line3]

Lets run Revision 1 and see how it performs.

Performance

Running Revision 1 we get

$ go run rev1/readfile1.go itcont.txt
Most common name time: 732s
$

Thats over 12 minutes! I ran it a 2nd time and it took even longer. What happens here?

We fired off a goroutine that receives names, firstnames and dates from three channels and then for every line of text parsed we spun off an another goroutine and sent over names, firstnames and dates. Thats over 18 million goroutines feeding one other goroutine through three channels and therefore 3x18 million = 54 million messages over those three channels.

send name      -> [ nameC      ] --\ 
send firstName -> [ firstNameC ] --- -> select() one -> [ append one to a list ]
send date      -> [ datesC     ] --/

While we ran Revision 1 the CPU cores where highly saturated with “work”. For every line we fire up a goroutine then gets running and distributed over to all available cores. But it seems that the overhead of communication in relation to the small part of parsing a line into three data fields (firstname, lastname, date) is high and we end up waiting over 12 minutes to get completed.

There is one detail about Channels that also plays against us. Channels, at least non-buffered ones with a capacity of 1, do block when messages are sent to them. A Channel is blocked until the receiving end of it reads the message. The for - select construct seen here is a well known pattern to select over multiple channels and receive from one of them when available.

This way, the sending goroutines have to wait until the draining Channel is receiving the message, either a name, a firstname or a date. This produces a substantional amount of contention.

Go Tracer

Lets run Revision1 with Go’s Tracer enabled and see what happens with all those goroutines.

$ ./bin/rev1 data/indiv18/itcont_1e6.txt 2> rev1.trace
Most common name time: 9.761422905s
$ go tool trace rev1.trace
2019/03/21 22:15:28 Parsing trace...
2019/03/21 22:16:29 Splitting trace...
2019/03/21 22:18:47 Opening browser. Trace viewer is listening on http://127.0.0.1:58572
$

We run a subset of 1 million lines of data of the full dataset to trace what happens in this revision to shorten the runtime a bit. It runs for just under 10 seconds. If we extrapolate that to the full dataset, we would run for 18 x 10s = 180s but it seems over the whole dataset the contention we experience gets a whole lot worse.

We inspect the first 10 seconds and therefore a million lines of data are processed and 3 million messages are sent over the three channels. We can’t display the whole tracing time with the Go Tracer as it has too many data points. The browser that is used to visualise would just blow up with its memory used. The Tracer will provide links of consecutive segements of about 100-300ms depending on the traces density.

What we see in these Tracer logs are two goroutines, one reading lines and starting goroutines and one receiving messages and then a million of line parsing goroutines that get split up into three pieces whenever they send a message to a channel. When we pick one of the ~120ms in the beginning of the trace, we see excactly that. The goroutine G1 issues systemcalls to read the file and then start the line processing goroutine. Then a goroutine called G6 that represents the for-select loop to drain messages from one of these channels and append the data to one of the lists.

Picking a tracer segment from the middle or end of the trace log, we see that the file reading goroutine is not running anymore and G6 is just there to pick up any waiting gorutines receiving their messages (G6 waits in runtime.selectgo()). All CPU cores are saturated with mostly just waiting to get messages from these waiting goroutines. This then sums up to the 12minutes we measured. (For a more detailed description of the Tracer, see Appendix B4)

~50ms - 300ms
~180ms – 220ms

If we are right and the channels have to wait for their receiving end to read the messages, we should also find that goroutines are split up into three pieces most of the time. They split when we send a message to one of the three channels as a concurrently running goroutine with our line parsing because Go is scheduling goroutines at points where it is very likely that a goroutine will wait and an another goroutine could be scheduled to run.

62		go func() {
63			// get all the names
64			split := strings.SplitN(text, "|", 9) // 10.95
65			name := strings.TrimSpace(split[7])
66			namesC <- name
67
68			// extract first names
69			if matches := firstNamePat.FindAllStringSubmatch(name, 1); len(matches) > 0 {
70				firstNamesC <- matches[0][1]
71			} else {
72				wg.Add(-1)
73			}
74
75			// extract dates
76			chars := strings.TrimSpace(split[4])[:6]
77			date := chars[:4] + "-" + chars[4:6]
78			datesC <- date
79		}()

At each point, Go schedules the goroutine if the receiving end is not ready to receive our value. There is a high probability that the receiving end is not ready for our very current value. Other goroutines are doing the same, and the one receiving goroutine G6 has to select from all other waiting goroutines doing the same.

Revision 1 – Conclusion

I think it was a silly idea to kick off a goroutine for every line we process and even sillier to send then three times of that amount messages to three channels over to an another goroutine. But even if silly, its an explanatory example of what not to do. And for this it was worth and educating to see what happens then.

Revision 2 – Channels Reduced

So Go’s concurrency utilities applied to a problem does not help as we learned it quite fast. With Revision2 – readfile2.go – we try to reduce the overhead of communication over the three channels. We introduce a type named entry that combines the name, firstname and donation date into a struct. Also we embedd a reference to a [sync.WaitGroup][go_sync_waitgroup] which helps us to have some kind of bookkeeping of where we are with our concurrent processing of such entries Lines 30-37. [go_sync_waitgroup]:https://golang.org/pkg/sync/#WaitGroup

Next, we allocate a slice of strings to buffer a chunk of lines and then process them using a goroutine. We try to find a good chunk-size for this and select 64k lines. So instead of starting a new goroutine fo every line, we do that for every 64k lines. A line is then parsed into a entry struct and sent over to a collecting goroutine over 1 channel Lines 55-85.

Lines 30-37
Lines 55-85
readfile2.go
diff -u readfile1.go readfile2.go

30	type entry struct {
31		firstName string
32		name      string
33		date      string
34		wg        *sync.WaitGroup
35	}
36	entries := make(chan entry)
37	wg := sync.WaitGroup{}

55	linesChunkLen := 64 * 1024
56	lines := make([]string, 0, 0)
57	for scanner.Scan() {
58		line := scanner.Text()
59		lines = append(lines, line)
60		if len(lines) == linesChunkLen {
61			wg.Add(len(lines))
62			process := lines
63			go func() {
64				for _, text := range process {
65					// get all the names
66					e := entry{wg: &wg}
67					split := strings.SplitN(text, "|", 9)
68					name := strings.TrimSpace(split[7])
69					e.name = name
70
71					// extract first names
72					if matches := firstNamePat.FindAllStringSubmatch(name, 1); len(matches) > 0 {
73						e.firstName = matches[0][1]
74					}
75					// extract dates
76					chars := strings.TrimSpace(split[4])[:6]
77					e.date = chars[:4] + "-" + chars[4:6]
78					entries <- e
79				}
80			}()
81			lines = make([]string, 0, chunkLen)
82		}
83	}
84	wg.Wait()
85	close(entries)

  1package main
  2
  3import (
  4	"bufio"
  5	"fmt"
  6	"log"
  7	"os"
  8	"regexp"
  9	"strings"
 10	"sync"
 11	"time"
 12)
 13
 14func main() {
 15	start := time.Now()
 16	file, err := os.Open(os.Args[1])
 17	if err != nil {
 18		log.Fatal(err)
 19	}
 20	defer file.Close()
 21
 22	firstNamePat := regexp.MustCompile(", \\s*([^, ]+)")
 23	names := make([]string, 0)
 24	firstNames := make([]string, 0)
 25	dates := make([]string, 0)
 26	commonName := ""
 27	commonCount := 0
 28	scanner := bufio.NewScanner(file)
 29
 30	type entry struct {
 31		firstName string
 32		name      string
 33		date      string
 34		wg        *sync.WaitGroup
 35	}
 36	entries := make(chan entry)
 37	wg := sync.WaitGroup{}
 38
 39	go func() {
 40		for {
 41			select {
 42			case entry, ok := <-entries:
 43				if ok {
 44					if entry.firstName != "" {
 45						firstNames = append(firstNames, entry.firstName)
 46					}
 47					names = append(names, entry.name)
 48					dates = append(dates, entry.date)
 49					entry.wg.Done()
 50				}
 51			}
 52		}
 53	}()
 54
 55	linesChunkLen := 64 * 1024
 56	lines := make([]string, 0, 0)
 57	for scanner.Scan() {
 58		line := scanner.Text()
 59		lines = append(lines, line)
 60		if len(lines) == linesChunkLen {
 61			wg.Add(len(lines))
 62			process := lines
 63			go func() {
 64				for _, text := range process {
 65					// get all the names
 66					e := entry{wg: &wg}
 67					split := strings.SplitN(text, "|", 9)
 68					name := strings.TrimSpace(split[7])
 69					e.name = name
 70
 71					// extract first names
 72					if matches := firstNamePat.FindAllStringSubmatch(name, 1); len(matches) > 0 {
 73						e.firstName = matches[0][1]
 74					}
 75					// extract dates
 76					chars := strings.TrimSpace(split[4])[:6]
 77					e.date = chars[:4] + "-" + chars[4:6]
 78					entries <- e
 79				}
 80			}()
 81			lines = make([]string, 0, chunkLen)
 82		}
 83	}
 84	wg.Wait()
 85	close(entries)
 86
 87	// report c2: names at index
 88	fmt.Printf("Name: %s at index: %v\n", names[0], 0)
 89	fmt.Printf("Name: %s at index: %v\n", names[432], 432)
 90	fmt.Printf("Name: %s at index: %v\n", names[43243], 43243)
 91	fmt.Printf("Name time: %v\n", time.Since(start))
 92
 93	// report c1: total number of lines
 94	fmt.Printf("Total file line count: %v\n", len(names))
 95	fmt.Printf("Line count time: %v\n", time.Since(start))
 96
 97	// report c3: donation frequency
 98	dateMap := make(map[string]int)
 99	for _, date := range dates {
100		dateMap[date] += 1
101	}
102	for k, v := range dateMap {
103		fmt.Printf("Donations per month and year: %v and donation count: %v\n", k, v)
104	}
105	fmt.Printf("Donations time: %v\n", time.Since(start))
106
107	/* report c4: most common firstName */
108	nameMap := make(map[string]int)
109	nameCount := 0
110	for _, name := range firstNames {
111		nameCount = nameMap[name] + 1
112		nameMap[name] = nameCount
113		if nameCount > commonCount {
114			commonName = name
115			commonCount = nameCount
116		}
117	}
118	fmt.Printf("The most common first name is: %s and it occurs: %v times.\n", commonName, commonCount)
119	fmt.Printf("Most common name time: %v\n", time.Since(start))
120}

  1--- rev1/readfile1.go	2019-03-30 17:23:07.000000000 +0100
  2+++ rev2/readfile2.go	2019-03-30 17:24:00.000000000 +0100
  3@@ -26,59 +26,63 @@
  4 	commonName := ""
  5 	commonCount := 0
  6 	scanner := bufio.NewScanner(file)
  7-	namesC := make(chan string)
  8-	firstNamesC := make(chan string)
  9-	datesC := make(chan string)
 10+
 11+	type entry struct {
 12+		firstName string
 13+		name      string
 14+		date      string
 15+		wg        *sync.WaitGroup
 16+	}
 17+	entries := make(chan entry)
 18 	wg := sync.WaitGroup{}
 19 
 20 	go func() {
 21 		for {
 22 			select {
 23-			case n, ok := <-namesC:
 24-				if ok {
 25-					names = append(names, n)
 26-					wg.Done()
 27-				}
 28-			case fn, ok := <-firstNamesC:
 29-				if ok {
 30-					firstNames = append(firstNames, fn)
 31-					wg.Done()
 32-				}
 33-			case d, ok := <-datesC:
 34+			case entry, ok := <-entries:
 35 				if ok {
 36-					dates = append(dates, d)
 37-					wg.Done()
 38+					if entry.firstName != "" {
 39+						firstNames = append(firstNames, entry.firstName)
 40+					}
 41+					names = append(names, entry.name)
 42+					dates = append(dates, entry.date)
 43+					entry.wg.Done()
 44 				}
 45 			}
 46 		}
 47 	}()
 48 
 49+	chunkLen := 64 * 1024
 50+	lines := make([]string, 0, 0)
 51 	for scanner.Scan() {
 52-		text := scanner.Text()
 53-		wg.Add(3)
 54-		go func() {
 55-			// get all the names
 56-			split := strings.SplitN(text, "|", 9)
 57-			name := strings.TrimSpace(split[7])
 58-			namesC <- name
 59-
 60-			// extract first names
 61-			if matches := firstNamePat.FindAllStringSubmatch(name, 1); len(matches) > 0 {
 62-				firstNamesC <- matches[0][1]
 63-			} else {
 64-				wg.Add(-1)
 65-			}
 66-
 67-			// extract dates
 68-			chars := strings.TrimSpace(split[4])[:6]
 69-			date := chars[:4] + "-" + chars[4:6]
 70-			datesC <- date
 71-		}()
 72+		line := scanner.Text()
 73+		lines = append(lines, line)
 74+		if len(lines) == chunkLen {
 75+			wg.Add(len(lines))
 76+			process := lines
 77+			go func() {
 78+				for _, text := range process {
 79+					// get all the names
 80+					e := entry{wg: &wg}
 81+					split := strings.SplitN(text, "|", 9)
 82+					name := strings.TrimSpace(split[7])
 83+					e.name = name
 84+
 85+					// extract first names
 86+					if matches := firstNamePat.FindAllStringSubmatch(name, 1); len(matches) > 0 {
 87+						e.firstName = matches[0][1]
 88+					}
 89+					// extract dates
 90+					chars := strings.TrimSpace(split[4])[:6]
 91+					e.date = chars[:4] + "-" + chars[4:6]
 92+					entries <- e
 93+				}
 94+			}()
 95+			lines = make([]string, 0, chunkLen)
 96+		}
 97 	}
 98 	wg.Wait()
 99-	close(namesC)
100-	close(firstNamesC)
101-	close(datesC)
102+	close(entries)
103 
104 	// report c2: names at index
105 	fmt.Printf("Name: %s at index: %v\n", names[0], 0)

Performance

$ go run rev2/readfile2.go itcont.txt
Most common name time: 18.28508803s
$

With 18.2 seconds for this revision, we’re 6.8 seconds below our first revision. Thats a wooping -27% or 1.36x.

The trace log shows what we might expect. A goroutine reads lines of data and hands them over in chunks where they are parsed and for each line an entry of our three data fields are composed and then, entry by entry, sent over a channel to a collecting goroutine. We’re now concurrent, but the CPU cores still not working for us.

Revision 3 – Multiple Entries to Send

With readfile3.go lets reduce the overhead for communication over channels even further and for every 64k chunk of lines, we collect the same amount of entries into a slice and send them over the entries channel entriesC. Our channel is now a channel of a slice of entries Lines 30-83.

Lines 30-83
readfile3.go
diff -u readfile2.go readfile3.go

30type entry struct {
31     firstName string
32     name      string
33     date      string
34     wg        *sync.WaitGroup
35 }
36  entriesC := make(chan []entry)
37  wg := sync.WaitGroup{}
38  go func() {
39     for {
40         select {
41         case entries, ok := <-entriesC:
42             if ok {
43                    for _, entry := range entries {
44                     if entry.firstName != "" {
45                         firstNames = append(firstNames, entry.firstName)
46                     }
47                     names = append(names, entry.name)
48                     dates = append(dates, entry.date)
49                     entry.wg.Done()
50                 }
51              }
52         }
53     }
54 }()
55  linesChunkLen := 64 * 1024
56 lines := make([]string, 0, 0)
57 scanner.Scan()
58 for {
59     lines = append(lines, scanner.Text())
60     willScan := scanner.Scan()
61     if len(lines) == linesChunkLen || !willScan {
62         toProcess := lines
63         wg.Add(len(toProcess))
64         go func() {
65              entries := make([]entry, 0, len(toProcess))
66              for _, text := range toProcess {
67                 // get all the names
68                 e := entry{wg: &wg}
69                 split := strings.SplitN(text, "|", 9)
70                 name := strings.TrimSpace(split[7])
71                 e.name = name
72                 // extract first names
73                 if matches := firstNamePat.FindAllStringSubmatch(name, 1); len(matches) > 0 {
74                     e.firstName = matches[0][1]
75                 }
76                 // extract dates
77                 chars := strings.TrimSpace(split[4])[:6]
78                 e.date = chars[:4] + "-" + chars[4:6]
79                    entries = append(entries, e)
80              }
81                entriesC <- entries
82          }()
83         lines = make([]string, 0, linesChunkLen)

  1package main
  2
  3import (
  4	"bufio"
  5	"fmt"
  6	"log"
  7	"os"
  8	"regexp"
  9	"strings"
 10	"sync"
 11	"time"
 12)
 13
 14func main() {
 15	start := time.Now()
 16	file, err := os.Open(os.Args[1])
 17	if err != nil {
 18		log.Fatal(err)
 19	}
 20	defer file.Close()
 21
 22	firstNamePat := regexp.MustCompile(", \\s*([^, ]+)")
 23	names := make([]string, 0)
 24	firstNames := make([]string, 0)
 25	dates := make([]string, 0)
 26	commonName := ""
 27	commonCount := 0
 28	scanner := bufio.NewScanner(file)
 29
 30	type entry struct {
 31		firstName string
 32		name      string
 33		date      string
 34		wg        *sync.WaitGroup
 35	}
 36	entriesC := make(chan []entry)
 37	wg := sync.WaitGroup{}
 38
 39	go func() {
 40		for {
 41			select {
 42			case entries, ok := <-entriesC:
 43				if ok {
 44					for _, entry := range entries {
 45						if entry.firstName != "" {
 46							firstNames = append(firstNames, entry.firstName)
 47						}
 48						names = append(names, entry.name)
 49						dates = append(dates, entry.date)
 50						entry.wg.Done()
 51					}
 52				}
 53			}
 54		}
 55	}()
 56
 57	linesChunkLen := 64 * 1024
 58	lines := make([]string, 0, 0)
 59	scanner.Scan()
 60	for {
 61		lines = append(lines, scanner.Text())
 62		willScan := scanner.Scan()
 63		if len(lines) == linesChunkLen || !willScan {
 64			toProcess := lines
 65			wg.Add(len(toProcess))
 66			go func() {
 67				entries := make([]entry, 0, len(toProcess))
 68				for _, text := range toProcess {
 69					// get all the names
 70					entry := entry{wg: &wg}
 71					split := strings.SplitN(text, "|", 9)
 72					name := strings.TrimSpace(split[7])
 73					entry.name = name
 74
 75					// extract first names
 76					if matches := firstNamePat.FindAllStringSubmatch(name, 1); len(matches) > 0 {
 77						entry.firstName = matches[0][1]
 78					}
 79					// extract dates
 80					chars := strings.TrimSpace(split[4])[:6]
 81					entry.date = chars[:4] + "-" + chars[4:6]
 82					entries = append(entries, entry)
 83				}
 84				entriesC <- entries
 85			}()
 86			lines = make([]string, 0, linesChunkLen)
 87		}
 88		if !willScan {
 89			break
 90		}
 91	}
 92	wg.Wait()
 93	close(entriesC)
 94
 95	// report c2: names at index
 96	fmt.Printf("Name: %s at index: %v\n", names[0], 0)
 97	fmt.Printf("Name: %s at index: %v\n", names[432], 432)
 98	fmt.Printf("Name: %s at index: %v\n", names[43243], 43243)
 99	fmt.Printf("Name time: %v\n", time.Since(start))
100
101	// report c1: total number of lines
102	fmt.Printf("Total file line count: %v\n", len(names))
103	fmt.Printf("Line count time: %v\n", time.Since(start))
104
105	// report c3: donation frequency
106	dateMap := make(map[string]int)
107	for _, date := range dates {
108		dateMap[date] += 1
109	}
110	for k, v := range dateMap {
111		fmt.Printf("Donations per month and year: %v and donation count: %v\n", k, v)
112	}
113	fmt.Printf("Donations time: %v\n", time.Since(start))
114
115	// report c4: most common firstName
116	nameMap := make(map[string]int)
117	nameCount := 0 // new count
118	for _, name := range firstNames {
119		nameCount = nameMap[name] + 1
120		nameMap[name] = nameCount
121		if nameCount > commonCount {
122			commonName = name
123			commonCount = nameCount
124		}
125	}
126	fmt.Printf("The most common first name is: %s and it occurs: %v times.\n", commonName, commonCount)
127	fmt.Printf("Most common name time: %v\n", time.Since(start))
128}

  1--- rev2/readfile2.go	2019-03-30 19:06:05.000000000 +0100
  2+++ rev3/readfile3.go	2019-03-30 19:38:57.000000000 +0100
  3@@ -33,20 +33,22 @@
  4 		date      string
  5 		wg        *sync.WaitGroup
  6 	}
  7-	entries := make(chan entry)
  8+	entriesC := make(chan []entry)
  9 	wg := sync.WaitGroup{}
 10 
 11 	go func() {
 12 		for {
 13 			select {
 14-			case entry, ok := <-entries:
 15+			case entries, ok := <-entriesC:
 16 				if ok {
 17-					if entry.firstName != "" {
 18-						firstNames = append(firstNames, entry.firstName)
 19+					for _, entry := range entries {
 20+						if entry.firstName != "" {
 21+							firstNames = append(firstNames, entry.firstName)
 22+						}
 23+						names = append(names, entry.name)
 24+						dates = append(dates, entry.date)
 25+						entry.wg.Done()
 26 					}
 27-					names = append(names, entry.name)
 28-					dates = append(dates, entry.date)
 29-					entry.wg.Done()
 30 				}
 31 			}
 32 		}
 33@@ -54,35 +56,41 @@
 34 
 35 	linesChunkLen := 64 * 1024
 36 	lines := make([]string, 0, 0)
 37-	for scanner.Scan() {
 38-		line := scanner.Text()
 39-		lines = append(lines, line)
 40-		if len(lines) == linesChunkLen {
 41-			wg.Add(len(lines))
 42-			process := lines
 43+	scanner.Scan()
 44+	for {
 45+		lines = append(lines, scanner.Text())
 46+		willScan := scanner.Scan()
 47+		if len(lines) == linesChunkLen || !willScan {
 48+			toProcess := lines
 49+			wg.Add(len(toProcess))
 50 			go func() {
 51-				for _, text := range process {
 52+				entries := make([]entry, 0, len(toProcess))
 53+				for _, text := range toProcess {
 54 					// get all the names
 55-					e := entry{wg: &wg}
 56+					entry := entry{wg: &wg}
 57 					split := strings.SplitN(text, "|", 9)
 58 					name := strings.TrimSpace(split[7])
 59-					e.name = name
 60+					entry.name = name
 61 
 62 					// extract first names
 63 					if matches := firstNamePat.FindAllStringSubmatch(name, 1); len(matches) > 0 {
 64-						e.firstName = matches[0][1]
 65+						entry.firstName = matches[0][1]
 66 					}
 67 					// extract dates
 68 					chars := strings.TrimSpace(split[4])[:6]
 69-					e.date = chars[:4] + "-" + chars[4:6]
 70-					entries <- e
 71+					entry.date = chars[:4] + "-" + chars[4:6]
 72+					entries = append(entries, entry)
 73 				}
 74+				entriesC <- entries
 75 			}()
 76 			lines = make([]string, 0, linesChunkLen)
 77 		}
 78+		if !willScan {
 79+			break
 80+		}
 81 	}
 82 	wg.Wait()
 83-	close(entries)
 84+	close(entriesC)
 85 
 86 	// report c2: names at index
 87 	fmt.Printf("Name: %s at index: %v\n", names[0], 0)
 88@@ -106,7 +114,7 @@
 89 
 90 	// report c4: most common firstName
 91 	nameMap := make(map[string]int)
 92-	nameCount := 0 // new count
 93+	nameCount := 0
 94 	for _, name := range firstNames {
 95 		nameCount = nameMap[name] + 1
 96 		nameMap[name] = nameCount
 97@@ -117,4 +125,4 @@
 98 	}
 99 	fmt.Printf("The most common first name is: %s and it occurs: %v times.\n", commonName, commonCount)
100 	fmt.Printf("Most common name time: %v\n", time.Since(start))
101-}
102+}
103\ No newline at end of file

Performance

$ go run ./rev3/readfile3.go
Most common name time: 10.66832747s
$

Here we go: 10.66 seconds. We’re down another 8 seconds or -42%, which is quite a lot. We see less goroutines running in general. The CPU cores are still well used and 6-7 goroutines are constantly Running. Revision 3 is what we planned to do, reading lines and hand them over to process them on a dedicated goroutine. With this plan, we got a reduction of 60% from our port, Revision 0.

Revision 4 – Hang in on a Mutex

In this revision – readfile4.go – we still build up chunks of parsed lines as entries, but we don’t share them over a channel to get them appended to our three lists. Instead we inline the lists appending into our loop and protect this code section with a sync.Mutex. Here we remove one instance of resource contention. Instead of having chunks of entries sent over a channel and get processed on the other end by a collecting goroutine, we wait, through a call to mutex.Lock() until we can enter the section of code to append the parsed entries to the three lists Lines 65-74.

Lines 65-74
readfile4.go
diff -u readfile3.go readfile4.go

65				mutex.Lock()
66				for _, entry := range entries {
67					if entry.firstName != "" {
68						firstNames = append(firstNames, entry.firstName)
69					}
70					names = append(names, entry.name)
71					dates = append(dates, entry.date)
72				}
73				wg.Add(-len(entries))
74				mutex.Unlock()

  1package main
  2
  3import (
  4	"bufio"
  5	"fmt"
  6	"log"
  7	"os"
  8	"regexp"
  9	"strings"
 10	"sync"
 11	"time"
 12)
 13
 14func main() {
 15	start := time.Now()
 16	file, err := os.Open(os.Args[1])
 17	if err != nil {
 18		log.Fatal(err)
 19	}
 20	defer file.Close()
 21
 22	firstNamePat := regexp.MustCompile(", \\s*([^, ]+)")
 23	names := make([]string, 0)
 24	firstNames := make([]string, 0)
 25	dates := make([]string, 0)
 26	commonName := ""
 27	commonCount := 0
 28	scanner := bufio.NewScanner(file)
 29
 30	type entry struct {
 31		firstName string
 32		name      string
 33		date      string
 34	}
 35	mutex := sync.Mutex{}
 36	wg := sync.WaitGroup{}
 37
 38	linesChunkLen := 64 * 1024
 39	lines := make([]string, 0, 0)
 40	scanner.Scan()
 41	for {
 42		lines = append(lines, scanner.Text())
 43		willScan := scanner.Scan()
 44		if len(lines) == linesChunkLen || !willScan {
 45			toProcess := lines
 46			wg.Add(len(toProcess))
 47			go func() {
 48				entries := make([]entry, 0, len(toProcess))
 49				for _, text := range toProcess {
 50					// get all the names
 51					entry := entry{}
 52					split := strings.SplitN(text, "|", 9)
 53					name := strings.TrimSpace(split[7])
 54					entry.name = name
 55
 56					// extract first names
 57					if matches := firstNamePat.FindAllStringSubmatch(name, 1); len(matches) > 0 {
 58						entry.firstName = matches[0][1]
 59					}
 60					// extract dates
 61					chars := strings.TrimSpace(split[4])[:6]
 62					entry.date = chars[:4] + "-" + chars[4:6]
 63					entries = append(entries, entry)
 64				}
 65				mutex.Lock()
 66				for _, entry := range entries {
 67					if entry.firstName != "" {
 68						firstNames = append(firstNames, entry.firstName)
 69					}
 70					names = append(names, entry.name)
 71					dates = append(dates, entry.date)
 72				}
 73				wg.Add(-len(entries))
 74				mutex.Unlock()
 75			}()
 76			lines = make([]string, 0, linesChunkLen)
 77		}
 78		if !willScan {
 79			break
 80		}
 81	}
 82	wg.Wait()
 83
 84	// report c2: names at index
 85	fmt.Printf("Name: %s at index: %v\n", names[0], 0)
 86	fmt.Printf("Name: %s at index: %v\n", names[432], 432)
 87	fmt.Printf("Name: %s at index: %v\n", names[43243], 43243)
 88	fmt.Printf("Name time: %v\n", time.Since(start))
 89
 90	// report c1: total number of lines
 91	fmt.Printf("Total file line count: %v\n", len(names))
 92	fmt.Printf("Line count time: %v\n", time.Since(start))
 93
 94	// report c3: donation frequency
 95	dateMap := make(map[string]int)
 96	for _, date := range dates {
 97		dateMap[date] += 1
 98	}
 99	for k, v := range dateMap {
100		fmt.Printf("Donations per month and year: %v and donation count: %v\n", k, v)
101	}
102	fmt.Printf("Donations time: %v\n", time.Since(start))
103
104	// report c4: most common firstName
105	nameMap := make(map[string]int)
106	nameCount := 0
107	for _, name := range firstNames {
108		nameCount = nameMap[name] + 1
109		nameMap[name] = nameCount
110		if nameCount > commonCount {
111			commonName = name
112			commonCount = nameCount
113		}
114	}
115	fmt.Printf("The most common first name is: %s and it occurs: %v times.\n", commonName, commonCount)
116	fmt.Printf("Most common name time: %v\n", time.Since(start))
117}

 1--- rev3/readfile3.go	2019-03-30 19:38:57.000000000 +0100
 2+++ rev4/readfile4.go	2019-03-30 19:38:28.000000000 +0100
 3@@ -31,29 +31,10 @@
 4 		firstName string
 5 		name      string
 6 		date      string
 7-		wg        *sync.WaitGroup
 8 	}
 9-	entriesC := make(chan []entry)
10+	mutex := sync.Mutex{}
11 	wg := sync.WaitGroup{}
12 
13-	go func() {
14-		for {
15-			select {
16-			case entries, ok := <-entriesC:
17-				if ok {
18-					for _, entry := range entries {
19-						if entry.firstName != "" {
20-							firstNames = append(firstNames, entry.firstName)
21-						}
22-						names = append(names, entry.name)
23-						dates = append(dates, entry.date)
24-						entry.wg.Done()
25-					}
26-				}
27-			}
28-		}
29-	}()
30-
31 	linesChunkLen := 64 * 1024
32 	lines := make([]string, 0, 0)
33 	scanner.Scan()
34@@ -67,7 +48,7 @@
35 				entries := make([]entry, 0, len(toProcess))
36 				for _, text := range toProcess {
37 					// get all the names
38-					entry := entry{wg: &wg}
39+					entry := entry{}
40 					split := strings.SplitN(text, "|", 9)
41 					name := strings.TrimSpace(split[7])
42 					entry.name = name
43@@ -81,7 +62,16 @@
44 					entry.date = chars[:4] + "-" + chars[4:6]
45 					entries = append(entries, entry)
46 				}
47-				entriesC <- entries
48+				mutex.Lock()
49+				for _, entry := range entries {
50+					if entry.firstName != "" {
51+						firstNames = append(firstNames, entry.firstName)
52+					}
53+					names = append(names, entry.name)
54+					dates = append(dates, entry.date)
55+				}
56+				wg.Add(-len(entries))
57+				mutex.Unlock()
58 			}()
59 			lines = make([]string, 0, linesChunkLen)
60 		}
61@@ -90,7 +80,6 @@
62 		}
63 	}
64 	wg.Wait()
65-	close(entriesC)
66 
67 	// report c2: names at index
68 	fmt.Printf("Name: %s at index: %v\n", names[0], 0)

As we can see here, we don’t need a channel to share our data to get finally be collected. Our goroutine can wait, with its collected data, on a mutex until the mutex is free. Then the goroutine is just parked in memory until it is scheduled to proceed. Go is well prepared for such a scenario as explained in sync/mutex.go. Instead of, that the data is waiting to get drained on a channel and then appended to a list, our processing goroutine is waiting until it can append the parsed data to those lists itself.

Performance

$ go run ./rev4/readfile4.go
Most common name time: 10.31209465s
$

The CPU cores are still well used as we’re having 6 goroutines running at a time. With this revision the compository structure of code by an explicit channel is gone. We didn’t gain much, but we keep this revisions changes for now.

Revision 5 – Inside the Loop

When we inspect the run-times, we can see that different sections take different run times.

Donations per month and year: 2018-08 and donation count: 733865
Donations per month and year: 2018-10 and donation count: 2615902
Donations per month and year: 2017-05 and donation count: 197636
Donations time: : 10.371920253s
The most common first name is: JOHN and it occurs: 475350 times.
Most common name time: 12.296942248s

Here, between donations time and common name time, we take about 2 seconds to range over the firstNames slice and find the most common name. Lets try to bring both of them, the donations frequency table and the commons name counting into our parsing loop. So after we parsed our lines into entries, when appending to the lists, we update the date frequencies and common name counting right inside the loop Lines 68-81. This way, we don’t have to loop over those two maps, we should gain some seconds this way – readfile5.go

Lines 68-81
readfile5.go
diff -u readfile4.go readfile5.go

68				for _, entry := range entries {
69					if entry.firstName != "" {
70						firstNames = append(firstNames, entry.firstName)
71						 nameCount := nameMap[entry.firstName]
72						nameMap[entry.firstName] = nameCount + 1
73						if nameCount+1 > commonCount {
74							commonName = entry.firstName
75							commonCount = nameCount + 1
76						}
77					}
78					names = append(names, entry.name)
79					dates = append(dates, entry.date)
80					dateMap[entry.date]++
81				}

  1package main
  2
  3import (
  4	"bufio"
  5	"fmt"
  6	"log"
  7	"os"
  8	"regexp"
  9	"strings"
 10	"sync"
 11	"time"
 12)
 13
 14func main() {
 15	start := time.Now()
 16	file, err := os.Open(os.Args[1])
 17	if err != nil {
 18		log.Fatal(err)
 19	}
 20	defer file.Close()
 21
 22	firstNamePat := regexp.MustCompile(", \\s*([^, ]+)")
 23	names := make([]string, 0)
 24	firstNames := make([]string, 0)
 25	dates := make([]string, 0)
 26	commonName := ""
 27	commonCount := 0
 28	scanner := bufio.NewScanner(file)
 29	nameMap := make(map[string]int)
 30	dateMap := make(map[string]int)
 31
 32	type entry struct {
 33		firstName string
 34		name      string
 35		date      string
 36	}
 37	mutex := &sync.Mutex{}
 38	wg := sync.WaitGroup{}
 39
 40	linesChunkLen := 64 * 1024
 41	lines := make([]string, 0, linesChunkLen)
 42	scanner.Scan()
 43	for {
 44		lines = append(lines, scanner.Text())
 45		willScan := scanner.Scan()
 46		if len(lines) == linesChunkLen || !willScan {
 47			toProcess := lines
 48			wg.Add(len(toProcess))
 49			go func() {
 50				entries := make([]entry, 0, len(toProcess))
 51				for _, text := range toProcess {
 52					// get all the names
 53					entry := entry{}
 54					split := strings.SplitN(text, "|", 9)
 55					name := strings.TrimSpace(split[7])
 56					entry.name = name
 57
 58					// extract first names
 59					if matches := firstNamePat.FindAllStringSubmatch(name, 1); len(matches) > 0 {
 60						entry.firstName = matches[0][1]
 61					}
 62					// extract dates
 63					chars := strings.TrimSpace(split[4])[:6]
 64					entry.date = chars[:4] + "-" + chars[4:6]
 65					entries = append(entries, entry)
 66				}
 67				mutex.Lock()
 68				for _, entry := range entries {
 69					if entry.firstName != "" {
 70						firstNames = append(firstNames, entry.firstName)
 71						nameCount := nameMap[entry.firstName]
 72						nameMap[entry.firstName] = nameCount + 1
 73						if nameCount+1 > commonCount {
 74							commonName = entry.firstName
 75							commonCount = nameCount + 1
 76						}
 77					}
 78					names = append(names, entry.name)
 79					dates = append(dates, entry.date)
 80					dateMap[entry.date]++
 81				}
 82				wg.Add(-len(entries))
 83				mutex.Unlock()
 84			}()
 85			lines = make([]string, 0, linesChunkLen)
 86		}
 87		if !willScan {
 88			break
 89		}
 90	}
 91	wg.Wait()
 92
 93	// report c2: names at index
 94	fmt.Printf("Name: %s at index: %v\n", names[0], 0)
 95	fmt.Printf("Name: %s at index: %v\n", names[432], 432)
 96	fmt.Printf("Name: %s at index: %v\n", names[43243], 43243)
 97	fmt.Printf("Name time: %v\n", time.Since(start))
 98
 99	// report c1: total number of lines
100	fmt.Printf("Total file line count: %v\n", len(names))
101	fmt.Printf("Line count time: %v\n", time.Since(start))
102
103	// report c3: donation frequency
104	for k, v := range dateMap {
105		fmt.Printf("Donations per month and year: %v and donation ncount: %v\n", k, v)
106	}
107	fmt.Printf("Donations time: %v\n", time.Since(start))
108
109	// report c4: most common firstName
110	fmt.Printf("The most common first name is: %s and it occurs: %v times.\n", commonName, commonCount)
111	fmt.Printf("Most common name time: %v\n", time.Since(start))
112}

 1--- rev4/readfile4.go	2019-03-30 19:38:28.000000000 +0100
 2+++ rev5/readfile5.go	2019-03-30 20:12:21.000000000 +0100
 3@@ -26,17 +26,19 @@
 4 	commonName := ""
 5 	commonCount := 0
 6 	scanner := bufio.NewScanner(file)
 7+	nameMap := make(map[string]int)
 8+	dateMap := make(map[string]int)
 9 
10 	type entry struct {
11 		firstName string
12 		name      string
13 		date      string
14 	}
15-	mutex := sync.Mutex{}
16+	mutex := &sync.Mutex{}
17 	wg := sync.WaitGroup{}
18 
19 	linesChunkLen := 64 * 1024
20-	lines := make([]string, 0, 0)
21+	lines := make([]string, 0, linesChunkLen)
22 	scanner.Scan()
23 	for {
24 		lines = append(lines, scanner.Text())
25@@ -66,9 +68,16 @@
26 				for _, entry := range entries {
27 					if entry.firstName != "" {
28 						firstNames = append(firstNames, entry.firstName)
29+						nameCount := nameMap[entry.firstName]
30+						nameMap[entry.firstName] = nameCount + 1
31+						if nameCount+1 > commonCount {
32+							commonName = entry.firstName
33+							commonCount = nameCount + 1
34+						}
35 					}
36 					names = append(names, entry.name)
37 					dates = append(dates, entry.date)
38+					dateMap[entry.date]++
39 				}
40 				wg.Add(-len(entries))
41 				mutex.Unlock()
42@@ -92,26 +101,12 @@
43 	fmt.Printf("Line count time: %v\n", time.Since(start))
44 
45 	// report c3: donation frequency
46-	dateMap := make(map[string]int)
47-	for _, date := range dates {
48-		dateMap[date] += 1
49-	}
50 	for k, v := range dateMap {
51 		fmt.Printf("Donations per month and year: %v and donation count: %v\n", k, v)
52 	}
53 	fmt.Printf("Donations time: %v\n", time.Since(start))
54 
55 	// report c4: most common firstName
56-	nameMap := make(map[string]int)
57-	nameCount := 0
58-	for _, name := range firstNames {
59-		nameCount = nameMap[name] + 1
60-		nameMap[name] = nameCount
61-		if nameCount > commonCount {
62-			commonName = name
63-			commonCount = nameCount
64-		}
65-	}
66 	fmt.Printf("The most common first name is: %s and it occurs: %v times.\n", commonName, commonCount)
67 	fmt.Printf("Most common name time: %v\n", time.Since(start))
68 }
69\ No newline at end of file

Performance

$ go run ./rev5/readfile5.go
Most common name time: 8.781626047s
$

And we get an another 1.5 seconds and break the 10 seconds mark for the first time.

Whats happening here? I suspect that the code section where we append parsed data to our lists, is not fully saturating the time that is available to run this code section in relation to everything thats going on elsewhere. Instead of waste this time during the parsing and then serially take time to loop over the maps and do the calculation of the frequency table and the most common name, we place this code inside the loop.

One might come to think about how well that scales with possible future requirements by what we did in Revision 5. In a future challenge or with changing requirements, it could be that placing this code inside the loop has negative consequences regardig its performance. A reasonable point. But we’re here also to perform and these numbers don’t lie when it comes to quantify performance. I don’t say that I’m a supporter of this in all instances in terms of code quality, but we still have some fun here, right?

Revision 6 – Reuse Allocated Memory

Until now we followed no methodical approach to improve our performance. Some claim (also Marks in his article) to have enough experience to come up with good enough ideas to improve performance of a short enough piece of code. And to be honest, I did the same throughout these iterations; without profiling, I decided to apply a method or tool in terms of language- or library features only on my own or out of curiousity. Brendan Gregg defined this as the Streetligh Anti-Method in his book Systems Performance: Enterprise and the Cloud. To improve performance or the try of it by applying stuff you heard of might work, is exactly that.

Streetlight Anti-Method – This method is actually the absence of a deliberate methodology. The user analyzes performance by choosing observability tools that are familiar, found on the Internet, or just at random to see if anything obvious shows up. This approach is hit or miss and can overlook many types of issues.

– Gregg, B. Systems Performance: Enterprise and the Cloud. Prentice Hall, 2013

So lets use a sync.Pool just out of curiosity of what happens Lines 39-53. In our code we allocate slices of lines and slices of entries where we then collect lines of data and then entries of parsed fields. Allocating these 64k-sized slices over and over again might have a negative effect on our performance, right? With a sync.Pool we can re-use allocated slices and making re-allocation unnecessary. Beside the allocation of the slices itself, whenever we append an element to a slice where its capacity is not sufficient, Go grows these slices by a growing factor of 2 for capacities less than 1024 and by 1.25 above – readfile6.go

Until now we allocated line- and entry-slices with a starting capacity of 0. It might be beneficial to allocate them with the linesChunkLen size to where we will grow for sure during the processing of a 64k chunk of lines Lines 64-102.

Lines 39-53
Lines 64-102
readfile6.go
diff -u readfile5.go readfile6.go

39	linesChunkLen := 64 * 1024
40	linesChunkPoolAllocated := int64(0)
41	linesPool := sync.Pool{New: func() interface{} {
42		lines := make([]string, 0, linesChunkLen)
43		atomic.AddInt64(&linesChunkPoolAllocated, 1)
44		return lines
45	}}
46	lines := linesPool.Get().([]string)[:0]
47
48	entriesPoolAllocated := int64(0)
49	entriesPool := sync.Pool{New: func() interface{} {
50		entries := make([]entry, 0, linesChunkLen)
51		atomic.AddInt64(&entriesPoolAllocated, 1)
52		return entries
53	}}

 64			go func() {
 65				entries := entriesPool.Get().([]entry)[:0]
 66				for _, text := range linesToProcess {
 67					// get all the names
 68					entry := entry{}
 69					split := strings.SplitN(text, "|", 9)
 70					name := strings.TrimSpace(split[7])
 71					entry.name = name
 72
 73					// extract first names
 74					if matches := firstNamePat.FindAllStringSubmatch(name, 1); len(matches) > 0 {
 75						entry.firstName = matches[0][1]
 76					}
 77					// extract dates
 78					chars := strings.TrimSpace(split[4])[:6]
 79					entry.date = chars[:4] + "-" + chars[4:6]
 80					entries = append(entries, entry)
 81				}
 82				mutex.Lock()
 83				for _, entry := range entries {
 84					if entry.firstName != "" {
 85						firstNames = append(firstNames, entry.firstName)
 86						nameCount := nameMap[entry.firstName]
 87						nameMap[entry.firstName] = nameCount + 1
 88						if nameCount+1 > commonCount {
 89							commonName = entry.firstName
 90							commonCount = nameCount + 1
 91						}
 92					}
 93					names = append(names, entry.name)
 94					dates = append(dates, entry.date)
 95					dateMap[entry.date]++
 96				}
 97				entriesPool.Put(entries)
 98				linesPool.Put(linesToProcess)
 99				wg.Add(-len(entries))
100				mutex.Unlock()
101			}()
102			lines = linesPool.Get().([]string)[:0]

  1package main
  2
  3import (
  4	"bufio"
  5	"fmt"
  6	"log"
  7	"os"
  8	"regexp"
  9	"strings"
 10	"sync"
 11	"sync/atomic"
 12	"time"
 13)
 14
 15func main() {
 16	start := time.Now()
 17	file, err := os.Open(os.Args[1])
 18	if err != nil {
 19		log.Fatal(err)
 20	}
 21	defer file.Close()
 22
 23	firstNamePat := regexp.MustCompile(", \\s*([^, ]+)")
 24	names := make([]string, 0)
 25	firstNames := make([]string, 0)
 26	dates := make([]string, 0)
 27	commonName := ""
 28	commonCount := 0
 29	scanner := bufio.NewScanner(file)
 30	nameMap := make(map[string]int)
 31	dateMap := make(map[string]int)
 32
 33	type entry struct {
 34		firstName string
 35		name      string
 36		date      string
 37	}
 38
 39	linesChunkLen := 64 * 1024
 40	linesChunkPoolAllocated := int64(0)
 41	linesPool := sync.Pool{New: func() interface{} {
 42		lines := make([]string, 0, linesChunkLen)
 43		atomic.AddInt64(&linesChunkPoolAllocated, 1)
 44		return lines
 45	}}
 46	lines := linesPool.Get().([]string)[:0]
 47
 48	entriesPoolAllocated := int64(0)
 49	entriesPool := sync.Pool{New: func() interface{} {
 50		entries := make([]entry, 0, linesChunkLen)
 51		atomic.AddInt64(&entriesPoolAllocated, 1)
 52		return entries
 53	}}
 54	mutex := &sync.Mutex{}
 55	wg := sync.WaitGroup{}
 56
 57	scanner.Scan()
 58	for {
 59		lines = append(lines, scanner.Text())
 60		willScan := scanner.Scan()
 61		if len(lines) == linesChunkLen || !willScan {
 62			linesToProcess := lines
 63			wg.Add(len(linesToProcess))
 64			go func() {
 65				entries := entriesPool.Get().([]entry)[:0]
 66				for _, text := range linesToProcess {
 67					// get all the names
 68					entry := entry{}
 69					split := strings.SplitN(text, "|", 9)
 70					name := strings.TrimSpace(split[7])
 71					entry.name = name
 72
 73					// extract first names
 74					if matches := firstNamePat.FindAllStringSubmatch(name, 1); len(matches) > 0 {
 75						entry.firstName = matches[0][1]
 76					}
 77					// extract dates
 78					chars := strings.TrimSpace(split[4])[:6]
 79					entry.date = chars[:4] + "-" + chars[4:6]
 80					entries = append(entries, entry)
 81				}
 82				mutex.Lock()
 83				for _, entry := range entries {
 84					if entry.firstName != "" {
 85						firstNames = append(firstNames, entry.firstName)
 86						nameCount := nameMap[entry.firstName]
 87						nameMap[entry.firstName] = nameCount + 1
 88						if nameCount+1 > commonCount {
 89							commonName = entry.firstName
 90							commonCount = nameCount + 1
 91						}
 92					}
 93					names = append(names, entry.name)
 94					dates = append(dates, entry.date)
 95					dateMap[entry.date]++
 96				}
 97				entriesPool.Put(entries)
 98				linesPool.Put(linesToProcess)
 99				wg.Add(-len(entries))
100				mutex.Unlock()
101			}()
102			lines = linesPool.Get().([]string)[:0]
103		}
104		if !willScan {
105			break
106		}
107	}
108	wg.Wait()
109
110	// report c2: names at index
111	fmt.Printf("Name: %s at index: %v\n", names[0], 0)
112	fmt.Printf("Name: %s at index: %v\n", names[432], 432)
113	fmt.Printf("Name: %s at index: %v\n", names[43243], 43243)
114	fmt.Printf("Name time: %v\n", time.Since(start))
115
116	// report c1: total number of lines
117	fmt.Printf("Total file line count: %v\n", len(names))
118	fmt.Printf("Line count time: %v\n", time.Since(start))
119
120	// report c3: donation frequency
121	for k, v := range dateMap {
122		fmt.Printf("Donations per month and year: %v and donation count: %v\n", k, v)
123	}
124	fmt.Printf("Donations time: %v\n", time.Since(start))
125
126	// report c4: most common firstName
127	fmt.Printf("The most common first name is: %s and it occurs: %v times.\n", commonName, commonCount)
128	fmt.Printf("Most common name time: %v\n", time.Since(start))
129}

 1--- rev5/readfile5.go	2019-03-31 15:30:24.000000000 +0200
 2+++ rev6/readfile6.go	2019-03-31 15:29:30.000000000 +0200
 3@@ -8,6 +8,7 @@
 4 	"regexp"
 5 	"strings"
 6 	"sync"
 7+	"sync/atomic"
 8 	"time"
 9 )
10 
11@@ -34,21 +35,35 @@
12 		name      string
13 		date      string
14 	}
15+
16+	linesChunkLen := 64 * 1024
17+	linesChunkPoolAllocated := int64(0)
18+	linesPool := sync.Pool{New: func() interface{} {
19+		lines := make([]string, 0, linesChunkLen)
20+		atomic.AddInt64(&linesChunkPoolAllocated, 1)
21+		return lines
22+	}}
23+	lines := linesPool.Get().([]string)[:0]
24+
25+	entriesPoolAllocated := int64(0)
26+	entriesPool := sync.Pool{New: func() interface{} {
27+		entries := make([]entry, 0, linesChunkLen)
28+		atomic.AddInt64(&entriesPoolAllocated, 1)
29+		return entries
30+	}}
31 	mutex := &sync.Mutex{}
32 	wg := sync.WaitGroup{}
33 
34-	linesChunkLen := 64 * 1024
35-	lines := make([]string, 0, linesChunkLen)
36 	scanner.Scan()
37 	for {
38 		lines = append(lines, scanner.Text())
39 		willScan := scanner.Scan()
40 		if len(lines) == linesChunkLen || !willScan {
41-			toProcess := lines
42-			wg.Add(len(toProcess))
43+			linesToProcess := lines
44+			wg.Add(len(linesToProcess))
45 			go func() {
46-				entries := make([]entry, 0, len(toProcess))
47-				for _, text := range toProcess {
48+				entries := entriesPool.Get().([]entry)[:0]
49+				for _, text := range linesToProcess {
50 					// get all the names
51 					entry := entry{}
52 					split := strings.SplitN(text, "|", 9)
53@@ -79,10 +94,12 @@
54 					dates = append(dates, entry.date)
55 					dateMap[entry.date]++
56 				}
57+				entriesPool.Put(entries)
58+				linesPool.Put(linesToProcess)
59 				wg.Add(-len(entries))
60 				mutex.Unlock()
61 			}()
62-			lines = make([]string, 0, linesChunkLen)
63+			lines = linesPool.Get().([]string)[:0]
64 		}
65 		if !willScan {
66 			break

Performance

$ go run ./rev6/readfile6.go
Most common name time: 9.235948927s
$

There seems to be no immediate benefit using a sync.Pool to reuse previously allocated slices. Comparing memory allocation TotalAlloc for Revision 5 and 6, we can see a reduction of 1.1GiB less memory allocated during its runtime.

rev5: Alloc = 7072 MiB  TotalAlloc = 17216 MiB  Sys = 8007 MiB  NumGC = 22
rev6: Alloc = 5155 MiB  TotalAlloc = 16125 MiB  Sys = 7679 MiB  NumGC = 27

So there is a benefit, but we don’t measure it in terms of runtime performance. We even loose 0.5 seconds in runtime for the fastest measured time. But if we look closer, the average runtime of revision 5 and revision 6 are very similar

#       Min           Avg           SD            TotalAlloc
rev5    8.781626047   9.765141717   0.425628359   17216
rev6    9.235948927   9.651679988   0.421454038   16125

We keep this optimisation to the next round.

Revision 7: Regexp Usage

readfile7.go – Regexp engine implementations are precious machines. Some say Go’s regexp implementaion is not the fastest and indeed having seen Go’s performance in the field vs. Java like in this benchmarksgame where Go’s is about 3x slower than Java supports this opinion.

So lets have a look at our regexp, they are always interesting. I’m not an expert in the matter but lets try to improve here a bit.

We used

// FindAllStringSubmatch is the 'All' version of FindStringSubmatch; it
// returns a slice of all successive matches of the expression, as defined by
// the 'All' description in the package comment.
// A return value of nil indicates no match.
func (re *Regexp) FindAllStringSubmatch(s string, n int) [][]string

which seems to be a bad choice of what we actually need. What we’re looking for is to get the first word after a lastname separated by a comma and a whitespace.

"PEREZ, JOHN A"
"DEEHAN, WILLIAM N"
"WATJEN, THOMAS R."
"SABOURIN, JAMES"

Here we’re looking for JOHN, WILLIAM, THOMAS and JAMES. We also have a capturing group in the regexp which brings a performance penalty generally.

Typically, non-capturing groups perform better than capturing groups, because they require less allocation of memory, and do not make a copy of the group match

StackOverflow 33243292

diff -u readfile6.go readfile7.go
readfile7.go

 1--- rev6/readfile6.go	2019-03-31 17:20:27.000000000 +0200
 2+++ rev7/readfile7.go	2019-03-31 17:25:05.000000000 +0200
 3@@ -20,7 +20,7 @@
 4 	}
 5 	defer file.Close()
 6 
 7-	firstNamePat := regexp.MustCompile(", \\s*([^, ]+)")
 8+	firstNamePat := regexp.MustCompile(", \\s*[^, ]+")
 9 	names := make([]string, 0)
10 	firstNames := make([]string, 0)
11 	dates := make([]string, 0)
12@@ -71,8 +71,8 @@
13 					entry.name = name
14 
15 					// extract first names
16-					if matches := firstNamePat.FindAllStringSubmatch(name, 1); len(matches) > 0 {
17-						entry.firstName = matches[0][1]
18+					if matched := firstNamePat.FindString(name); matched != "" {
19+						entry.firstName = matched[2:]
20 					}
21 					// extract dates
22 					chars := strings.TrimSpace(split[4])[:6]

  1package main
  2
  3import (
  4	"bufio"
  5	"fmt"
  6	"log"
  7	"os"
  8	"regexp"
  9	"strings"
 10	"sync"
 11	"sync/atomic"
 12	"time"
 13)
 14
 15func main() {
 16	start := time.Now()
 17	file, err := os.Open(os.Args[1])
 18	if err != nil {
 19		log.Fatal(err)
 20	}
 21	defer file.Close()
 22
 23	firstNamePat := regexp.MustCompile(", \\s*[^, ]+")
 24	names := make([]string, 0, 0)
 25	firstNames := make([]string, 0, 0)
 26	dates := make([]string, 0, 0)
 27	commonName := ""
 28	commonCount := 0
 29	scanner := bufio.NewScanner(file)
 30	nameMap := make(map[string]int)
 31	dateMap := make(map[string]int)
 32
 33	type entry struct {
 34		firstName string
 35		name      string
 36		date      string
 37	}
 38
 39	linesChunkLen := 64 * 1024
 40	linesChunkPoolAllocated := int64(0)
 41	linesPool := sync.Pool{New: func() interface{} {
 42		lines := make([]string, 0, linesChunkLen)
 43		atomic.AddInt64(&linesChunkPoolAllocated, 1)
 44		return lines
 45	}}
 46	lines := linesPool.Get().([]string)[:0]
 47
 48	entriesPoolAllocated := int64(0)
 49	entriesPool := sync.Pool{New: func() interface{} {
 50		entries := make([]entry, 0, linesChunkLen)
 51		atomic.AddInt64(&entriesPoolAllocated, 1)
 52		return entries
 53	}}
 54	mutex := &sync.Mutex{}
 55	wg := sync.WaitGroup{}
 56
 57	scanner.Scan()
 58	for {
 59		lines = append(lines, scanner.Text())
 60		willScan := scanner.Scan()
 61		if len(lines) == linesChunkLen || !willScan {
 62			linesToProcess := lines
 63			wg.Add(len(linesToProcess))
 64			go func() {
 65				entries := entriesPool.Get().([]entry)[:0]
 66				for _, text := range linesToProcess {
 67					// get all the names
 68					entry := entry{}
 69					split := strings.SplitN(text, "|", 9)
 70					name := strings.TrimSpace(split[7])
 71					entry.name = name
 72
 73					// extract first names
 74					if matched := firstNamePat.FindString(name); matched != "" {
 75						entry.firstName = matched[2:]
 76					}
 77					// extract dates
 78					chars := strings.TrimSpace(split[4])[:6]
 79					entry.date = chars[:4] + "-" + chars[4:6]
 80					entries = append(entries, entry)
 81				}
 82				mutex.Lock()
 83				for _, entry := range entries {
 84					if entry.firstName != "" {
 85						firstNames = append(firstNames, entry.firstName)
 86						nameCount := nameMap[entry.firstName]
 87						nameMap[entry.firstName] = nameCount + 1
 88						if nameCount+1 > commonCount {
 89							commonName = entry.firstName
 90							commonCount = nameCount + 1
 91						}
 92					}
 93					names = append(names, entry.name)
 94					dates = append(dates, entry.date)
 95					dateMap[entry.date]++
 96				}
 97				entriesPool.Put(entries)
 98				linesPool.Put(linesToProcess)
 99				mutex.Unlock()
100				wg.Add(-len(entries))
101			}()
102			lines = linesPool.Get().([]string)[:0]
103		}
104		if !willScan {
105			break
106		}
107	}
108	wg.Wait()
109
110	// report c2: names at index
111	fmt.Printf("Name: %s at index: %v\n", names[0], 0)
112	fmt.Printf("Name: %s at index: %v\n", names[432], 432)
113	fmt.Printf("Name: %s at index: %v\n", names[43243], 43243)
114	fmt.Printf("Name time: %v\n", time.Since(start))
115
116	// report c1: total number of lines
117	fmt.Printf("Total file line count: %v\n", len(names))
118	fmt.Printf("Line count time: %v\n", time.Since(start))
119
120	// report c3: donation frequency
121	for k, v := range dateMap {
122		fmt.Printf("Donations per month and year: %v and donation count: %v\n", k, v)
123	}
124	fmt.Printf("Donations time: %v\n", time.Since(start))
125
126	// report c4: most common firstName
127	fmt.Printf("The most common first name is: %s and it occurs: %v times.\n", commonName, commonCount)
128	fmt.Printf("Most common name time: %v\n", time.Since(start))
129}

We don’t need the capturing group as we don’t have to reference to the found group later. So lets get rid of them and also use the simpler regexp.FindString()

// FindString returns a string holding the text of the leftmost match in s of the regular
// expression. If there is no match, the return value is an empty string,
// but it will also be empty if the regular expression successfully matches
// an empty string. Use FindStringIndex or FindStringSubmatch if it is
// necessary to distinguish these cases.
func (re *Regexp) FindString(s string) string

Performance

$ go run ./rev7/readfile7.go
Most common name time: 8.155115627s
$

With the simpler regexp we are down another second.

Revision 8: No Regexp

Next, we eliminate our regular expression. Splitting a string by a comma and then find the first word could not be that complicated. Lets do it manually – readfile8.go.

diff -u readfile7.go readfile8.go
readfile8.go

 1--- rev7/readfile7.go	2019-03-31 17:25:05.000000000 +0200
 2+++ rev8/readfile8.go	2019-03-31 17:32:46.000000000 +0200
 3@@ -5,7 +5,6 @@
 4 	"fmt"
 5 	"log"
 6 	"os"
 7-	"regexp"
 8 	"strings"
 9 	"sync"
10 	"sync/atomic"
11@@ -20,7 +19,6 @@
12 	}
13 	defer file.Close()
14 
15-	firstNamePat := regexp.MustCompile(", \\s*[^, ]+")
16 	names := make([]string, 0)
17 	firstNames := make([]string, 0)
18 	dates := make([]string, 0)
19@@ -71,8 +69,16 @@
20 					entry.name = name
21 
22 					// extract first names
23-			if matched := firstNamePat.FindString(name); matched != "" {
24-				entry.firstName = matched[2:]
25+			if name != "" {
26+				startOfName := strings.Index(name, ", ") + 2
27+				if endOfName := strings.Index(name[startOfName:], " "); endOfName < 0 {
28+					entry.firstName = name[startOfName:]
29+				} else {
30+					entry.firstName = name[startOfName : startOfName+endOfName]
31+				}
32+				if strings.HasSuffix(entry.firstName, ",") {
33+					entry.firstName = strings.Replace(entry.firstName, ",", "", -1)
34+				}
35 			}
36 			// extract dates
37 			chars := strings.TrimSpace(split[4])[:6]

  1package main
  2
  3import (
  4	"bufio"
  5	"fmt"
  6	"log"
  7	"os"
  8	"strings"
  9	"sync"
 10	"sync/atomic"
 11	"time"
 12)
 13
 14func main() {
 15	start := time.Now()
 16	file, err := os.Open(os.Args[1])
 17	if err != nil {
 18		log.Fatal(err)
 19	}
 20	defer file.Close()
 21
 22	names := make([]string, 0)
 23	firstNames := make([]string, 0)
 24	dates := make([]string, 0)
 25	commonName := ""
 26	commonCount := 0
 27	scanner := bufio.NewScanner(file)
 28	nameMap := make(map[string]int)
 29	dateMap := make(map[string]int)
 30
 31	type entry struct {
 32		firstName string
 33		name      string
 34		date      string
 35	}
 36
 37	linesChunkLen := 64 * 1024
 38	linesChunkPoolAllocated := int64(0)
 39	linesPool := sync.Pool{New: func() interface{} {
 40		lines := make([]string, 0, linesChunkLen)
 41		atomic.AddInt64(&linesChunkPoolAllocated, 1)
 42		return lines
 43	}}
 44	lines := linesPool.Get().([]string)[:0]
 45
 46	entriesPoolAllocated := int64(0)
 47	entriesPool := sync.Pool{New: func() interface{} {
 48		entries := make([]entry, 0, linesChunkLen)
 49		atomic.AddInt64(&entriesPoolAllocated, 1)
 50		return entries
 51	}}
 52	mutex := &sync.Mutex{}
 53	wg := sync.WaitGroup{}
 54
 55	scanner.Scan()
 56	for {
 57		lines = append(lines, scanner.Text())
 58		willScan := scanner.Scan()
 59		if len(lines) == linesChunkLen || !willScan {
 60			linesToProcess := lines
 61			wg.Add(len(linesToProcess))
 62			go func() {
 63				entries := entriesPool.Get().([]entry)[:0]
 64				for _, text := range linesToProcess {
 65					// get all the names
 66					entry := entry{}
 67					split := strings.SplitN(text, "|", 9)
 68					name := strings.TrimSpace(split[7])
 69					entry.name = name
 70
 71					// extract first names
 72					if name != "" {
 73						startOfName := strings.Index(name, ", ") + 2
 74						if endOfName := strings.Index(name[startOfName:], " "); endOfName < 0 {
 75							entry.firstName = name[startOfName:]
 76						} else {
 77							entry.firstName = name[startOfName : startOfName+endOfName]
 78						}
 79						if strings.HasSuffix(entry.firstName, ",") {
 80							entry.firstName = strings.Replace(entry.firstName, ",", "", -1)
 81						}
 82					}
 83					// extract dates
 84					chars := strings.TrimSpace(split[4])[:6]
 85					entry.date = chars[:4] + "-" + chars[4:6]
 86					entries = append(entries, entry)
 87				}
 88				mutex.Lock()
 89				for _, entry := range entries {
 90					if entry.firstName != "" {
 91						firstNames = append(firstNames, entry.firstName)
 92						nameCount := nameMap[entry.firstName]
 93						nameMap[entry.firstName] = nameCount + 1
 94						if nameCount+1 > commonCount {
 95							commonName = entry.firstName
 96							commonCount = nameCount + 1
 97						}
 98					}
 99					names = append(names, entry.name)
100					dates = append(dates, entry.date)
101					dateMap[entry.date]++
102				}
103				entriesPool.Put(entries)
104				linesPool.Put(linesToProcess)
105				wg.Add(-len(entries))
106				mutex.Unlock()
107			}()
108			lines = linesPool.Get().([]string)[:0]
109		}
110		if !willScan {
111			break
112		}
113	}
114	wg.Wait()
115
116	// report c2: names at index
117	fmt.Printf("Name: %s at index: %v\n", names[0], 0)
118	fmt.Printf("Name: %s at index: %v\n", names[432], 432)
119	fmt.Printf("Name: %s at index: %v\n", names[43243], 43243)
120	fmt.Printf("Name time: %v\n", time.Since(start))
121
122	// report c1: total number of lines
123	fmt.Printf("Total file line count: %v\n", len(names))
124	fmt.Printf("Line count time: %v\n", time.Since(start))
125
126	// report c3: donation frequency
127	for k, v := range dateMap {
128		fmt.Printf("Donations per month and year: %v and donation ncount: %v\n", k, v)
129	}
130	fmt.Printf("Donations time: %v\n", time.Since(start))
131
132	// report c4: most common firstName
133	fmt.Printf("The most common first name is: %s and it occurs: %v times.\n", commonName, commonCount)
134	fmt.Printf("Most common name time: %v\n", time.Since(start))
135}

Performance

$ go run ./rev8/readfile8.go
Most common name time: 6.644675587s
$

Another 1.5 seconds down. Thats awesome, isn’t it?

To peel off one second for the processing of those 18.2 million lines of data, we have to save 55ns for every line. Since we are down to a few seconds runtime even these small improvements pay off. In this case we improved by over 20%.

Revision 9 – Reduce the Garbage

When we look about how much memory our program allocates during its runtime we are down to 11.2GiB now. Thats 3.4 times the size of the file we read through. For sure we have to read at least every byte of our file but we allocate over three times of that memory. Beside the allocation that needs the memory management of the Go runtime involved it also means that Go’s Garbage Collector has to clean up memory that is allocated temporarily and not part of the results lists.

rev8: Alloc = 5788 MiB    TotalAlloc = 11217 MiB    Sys = 7116 MiB    NumGC = 18

With the next step – readfile9.go – we try to reduce the allocate memory or not even allocate memory where it hasn’t to be allocated at all. First, we get rid of our names list; we report the requested indexes right in the loop. We also parse the donations date not as a string, and most important, we don’t slice and combine it to a dash separated year-month string. Instead we parse the given year and month part of the string as an integer. That fits well into our requirements. We can, later still reformat the few dozens or so dates into a format that is better suited to read. The challenge does not require a specific format, so we can leave them right in this more efficient format.

readfile9.go
diff -u readfile8.go readfile9.go

  1package main
  2
  3import (
  4	"bufio"
  5	"fmt"
  6	"log"
  7	"os"
  8	"strconv"
  9	"strings"
 10	"sync"
 11	"sync/atomic"
 12	"time"
 13)
 14
 15func main() {
 16	start := time.Now()
 17	file, err := os.Open(os.Args[1])
 18	if err != nil {
 19		log.Fatal(err)
 20	}
 21	defer file.Close()
 22
 23	commonName := ""
 24	commonCount := 0
 25	scanner := bufio.NewScanner(file)
 26	nameMap := make(map[string]int)
 27	dateMap := make(map[int]int)
 28
 29	namesCounted := false
 30	namesCount := 0
 31	fileLineCount := int64(0)
 32
 33	type entry struct {
 34		firstName string
 35		name      string
 36		date      int
 37	}
 38
 39	linesChunkLen := 64 * 1024
 40	linesChunkPoolAllocated := int64(0)
 41	linesPool := sync.Pool{New: func() interface{} {
 42		lines := make([]string, 0, linesChunkLen)
 43		atomic.AddInt64(&linesChunkPoolAllocated, 1)
 44		return lines
 45	}}
 46	lines := linesPool.Get().([]string)[:0]
 47
 48	entriesPoolAllocated := int64(0)
 49	entriesPool := sync.Pool{New: func() interface{} {
 50		entries := make([]entry, 0, linesChunkLen)
 51		atomic.AddInt64(&entriesPoolAllocated, 1)
 52		return entries
 53	}}
 54	mutex := &sync.Mutex{}
 55	wg := sync.WaitGroup{}
 56
 57	scanner.Scan()
 58	for {
 59		lines = append(lines, scanner.Text())
 60		willScan := scanner.Scan()
 61		if len(lines) == linesChunkLen || !willScan {
 62			linesToProcess := lines
 63			wg.Add(len(linesToProcess))
 64			go func() {
 65				atomic.AddInt64(&fileLineCount, int64(len(linesToProcess)))
 66				entries := entriesPool.Get().([]entry)[:0]
 67				for _, text := range linesToProcess {
 68					// get all the names
 69					entry := entry{}
 70					split := strings.SplitN(text, "|", 9)
 71					entry.name = strings.TrimSpace(split[7])
 72
 73					// extract first names
 74					if entry.name != "" {
 75						startOfName := strings.Index(entry.name, ", ") + 2
 76						if endOfName := strings.Index(entry.name[startOfName:], " "); endOfName < 0 {
 77							entry.firstName = entry.name[startOfName:]
 78						} else {
 79							entry.firstName = entry.name[startOfName : startOfName+endOfName]
 80						}
 81						if cs := strings.Index(entry.firstName, ","); cs > 0 {
 82							entry.firstName = entry.firstName[:cs]
 83						}
 84					}
 85					// extract dates
 86					entry.date, _ = strconv.Atoi(split[4][:6])
 87					entries = append(entries, entry)
 88				}
 89				linesPool.Put(linesToProcess)
 90				mutex.Lock()
 91				for _, entry := range entries {
 92					if len(entry.firstName) != 0 {
 93						nameCount := nameMap[entry.firstName] + 1
 94						nameMap[entry.firstName] = nameCount
 95						if nameCount > commonCount {
 96							commonCount = nameCount
 97							commonName = entry.firstName
 98						}
 99					}
100					if namesCounted == false {
101						if namesCount == 0 {
102							fmt.Printf("Name: %s at index: %v\n", entry.name, 0)
103						} else if namesCount == 432 {
104							fmt.Printf("Name: %s at index: %v\n", entry.name, 432)
105						} else if namesCount == 43243 {
106							fmt.Printf("Name: %s at index: %v\n", entry.name, 43243)
107							namesCounted = true
108						}
109						namesCount++
110					}
111					dateMap[entry.date]++
112				}
113				mutex.Unlock()
114				entriesPool.Put(entries)
115				wg.Add(-len(entries))
116			}()
117			lines = linesPool.Get().([]string)[:0]
118		}
119		if !willScan {
120			break
121		}
122	}
123	wg.Wait()
124
125	// report c2: names at index
126	fmt.Printf("Name time: %v\n", time.Since(start))
127
128	// report c1: total number of lines
129	fmt.Printf("Total file line count: %v\n", fileLineCount)
130	fmt.Printf("Line count time: %v\n", time.Since(start))
131
132	// report c3: donation frequency
133	for k, v := range dateMap {
134		fmt.Printf("Donations per month and year: %v and donation ncount: %v\n", k, v)
135	}
136	fmt.Printf("Donations time: %v\n", time.Since(start))
137
138	// report c4: most common firstName
139	fmt.Printf("The most common first name is: %s and it occurs: %v times.\n", commonName, commonCount)
140	fmt.Printf("Most common name time: %v\n", time.Since(start))
141}

  1--- rev8/readfile8.go	2019-03-31 17:32:46.000000000 +0200
  2+++ rev9/readfile9.go	2019-03-31 18:58:16.000000000 +0200
  3@@ -5,6 +5,7 @@
  4 	"fmt"
  5 	"log"
  6 	"os"
  7+	"strconv"
  8 	"strings"
  9 	"sync"
 10 	"sync/atomic"
 11@@ -19,19 +20,20 @@
 12 	}
 13 	defer file.Close()
 14 
 15-	names := make([]string, 0)
 16-	firstNames := make([]string, 0)
 17-	dates := make([]string, 0)
 18 	commonName := ""
 19 	commonCount := 0
 20 	scanner := bufio.NewScanner(file)
 21 	nameMap := make(map[string]int)
 22-	dateMap := make(map[string]int)
 23+	dateMap := make(map[int]int)
 24+
 25+	namesCounted := false
 26+	namesCount := 0
 27+	fileLineCount := int64(0)
 28 
 29 	type entry struct {
 30 		firstName string
 31 		name      string
 32-		date      string
 33+		date      int
 34 	}
 35 
 36 	linesChunkLen := 64 * 1024
 37@@ -60,50 +62,57 @@
 38 			linesToProcess := lines
 39 			wg.Add(len(linesToProcess))
 40 			go func() {
 41+				atomic.AddInt64(&fileLineCount, int64(len(linesToProcess)))
 42 				entries := entriesPool.Get().([]entry)[:0]
 43 				for _, text := range linesToProcess {
 44 					// get all the names
 45 					entry := entry{}
 46 					split := strings.SplitN(text, "|", 9)
 47-					name := strings.TrimSpace(split[7])
 48-					entry.name = name
 49+					entry.name = strings.TrimSpace(split[7])
 50 
 51 					// extract first names
 52-					if name != "" {
 53-						startOfName := strings.Index(name, ", ") + 2
 54-						if endOfName := strings.Index(name[startOfName:], " "); endOfName < 0 {
 55-							entry.firstName = name[startOfName:]
 56+					if entry.name != "" {
 57+						startOfName := strings.Index(entry.name, ", ") + 2
 58+						if endOfName := strings.Index(entry.name[startOfName:], " "); endOfName < 0 {
 59+							entry.firstName = entry.name[startOfName:]
 60 						} else {
 61-							entry.firstName = name[startOfName : startOfName+endOfName]
 62+							entry.firstName = entry.name[startOfName : startOfName+endOfName]
 63 						}
 64-						if strings.HasSuffix(entry.firstName, ",") {
 65-							entry.firstName = strings.Replace(entry.firstName, ",", "", -1)
 66+						if cs := strings.Index(entry.firstName, ","); cs > 0 {
 67+							entry.firstName = entry.firstName[:cs]
 68 						}
 69 					}
 70 					// extract dates
 71-					chars := strings.TrimSpace(split[4])[:6]
 72-					entry.date = chars[:4] + "-" + chars[4:6]
 73+					entry.date, _ = strconv.Atoi(split[4][:6])
 74 					entries = append(entries, entry)
 75 				}
 76+				linesPool.Put(linesToProcess)
 77 				mutex.Lock()
 78 				for _, entry := range entries {
 79-					if entry.firstName != "" {
 80-						firstNames = append(firstNames, entry.firstName)
 81-						nameCount := nameMap[entry.firstName]
 82-						nameMap[entry.firstName] = nameCount + 1
 83-						if nameCount+1 > commonCount {
 84+					if len(entry.firstName) != 0 {
 85+						nameCount := nameMap[entry.firstName] + 1
 86+						nameMap[entry.firstName] = nameCount
 87+						if nameCount > commonCount {
 88+							commonCount = nameCount
 89 							commonName = entry.firstName
 90-							commonCount = nameCount + 1
 91 						}
 92 					}
 93-					names = append(names, entry.name)
 94-					dates = append(dates, entry.date)
 95+					if namesCounted == false {
 96+						if namesCount == 0 {
 97+							fmt.Printf("Name: %s at index: %v\n", entry.name, 0)
 98+						} else if namesCount == 432 {
 99+							fmt.Printf("Name: %s at index: %v\n", entry.name, 432)
100+						} else if namesCount == 43243 {
101+							fmt.Printf("Name: %s at index: %v\n", entry.name, 43243)
102+							namesCounted = true
103+						}
104+						namesCount++
105+					}
106 					dateMap[entry.date]++
107 				}
108+				mutex.Unlock()
109 				entriesPool.Put(entries)
110-				linesPool.Put(linesToProcess)
111 				wg.Add(-len(entries))
112-				mutex.Unlock()
113 			}()
114 			lines = linesPool.Get().([]string)[:0]
115 		}
116@@ -114,13 +123,10 @@
117 	wg.Wait()
118 
119 	// report c2: names at index
120-	fmt.Printf("Name: %s at index: %v\n", names[0], 0)
121-	fmt.Printf("Name: %s at index: %v\n", names[432], 432)
122-	fmt.Printf("Name: %s at index: %v\n", names[43243], 43243)
123 	fmt.Printf("Name time: %v\n", time.Since(start))
124 
125 	// report c1: total number of lines
126-	fmt.Printf("Total file line count: %v\n", len(names))
127+	fmt.Printf("Total file line count: %v\n", fileLineCount)
128 	fmt.Printf("Line count time: %v\n", time.Since(start))
129 
130 	// report c3: donation frequency

To be honest, our little small 81 lines Go program of revision 0 has mutated into a little monster of 141 lines of code. But lets measure its performance now.

Performance

$ go run ./rev9/readfile9.go
Most common name time: 3.880034076s
$

With that last changes we are below 4 seconds. That’s over 20 seconds less than where we started and over 80 seconds less than the initial solution. What have we done 80 seconds long?

With this revision we’re able to process 4.7 million lines per second. The first revision did 0.73 million lines and the initial original solution to the challenge less than a third of that, 0.217 million lines/second.

rev9: Alloc = 81 MiB          TotalAlloc = 6590 MiB   Sys = 205 MiB   NumGC = 129

When we look at memory allocation we now allocate over 10GiB less memory over its runtime and we are also at half of the previous revision. We also see that the GC is running 129 times vs. the ~20-30 times of all other revisions. Third, heap allocation is constantly at about 120MiB where the OS systems memory is only at 205MiB (see Appendix B2). At the end, Revision 9 had only 81MiB allocated on the heap.

With these memory metrics we can conclude that runtime is also a function of the memory allocation and its management by the garbage collector. When we ran the benchmarking runs for Revision 0 to 9, we can see that Revision 9 also has the smallest variance in runtime with a standard deviation of only 0.07 and therefore a very conistent and stable runtime.

From Revision 0 to 9 – Results

To find the maximum performance, I ran each revision 20x times on my Laptop. I uninstalled my AntiVirus Scanner (Sophos) as he influenced the results significantly. I also turned off background services to get consistent results. MacOS’s Spotlight (Indexer) and my Backup-Daemon (Arq) but also Dropbox where turned off.

The following table shows for each relevant revision (we ommited rev1 for obivious reasons)

Min, Avg and Standard Deviation/SD of its runtime.
TotalAlloc is cumulative bytes allocated for heap objects.
Alloc is bytes of allocated heap objects.
SD the standard deviation (S)
NumGC is the number of completed GC cycles.
LPS is the normalized throughput in million lines per second.
n-1/n the performance gain relative to the previous revision

#	Min[s]	Avg[s]	SD[s]	TotalAlloc [MiB]	Alloc [MiB]	NumGC [1]	LPS[1e6/s]	n-1/n [1]
rev0	24.976	26.083	0.821	16100	5144	33	0.730493	1
rev2	18.285	19.509	1.400	16376	7360	19	0.997830	1.36
rev3	10.668	11.366	0.451	17360	4677	24	1.710241	1.71
rev4	10.312	11.495	0.541	17221	7107	23	1.769322	1.03
rev5	8.781	9.765	0.425	17216	7072	22	2.077680	1.17
rev6	9.235	9.651	0.421	16125	5155	27	1.975478	0.95
rev7	8.155	8.531	0.228	11148	5713	17	2.237297	1.13
rev8	6.644	7.039	0.296	11217	5788	18	2.745870	1.22
rev9	3.880	4.102	0.0709	6590	81	129	4.702385	1.71

Figure 1 shows the previous table runtime data graphically as a box-and-whisker plot. It illustrates the progress we’ve made from the initial port with Revision 0 through nine revisions following up to Revision 9.

Conclusion – ‘Hitting the Wall’

We clearly hit a wall here with Revision 9. A runtime of 3.88 seconds for these 18.2 million lines of data is kind of impressive compared to other versions. I didn’t expect to get down to this number.

So, why do we hit the wall here? In Going Further we discover that the plain reading of all lines takes nearly the same amount of time as to process these lines, about 3.56 seconds (see below for details). Knowing this, we have to conclude that we can’t get below these 3.56 seconds as we can’t read the file faster. This is where we hit the wall and this is the benchmark for our challenge.

Saying this, I think we have two options to go even faster, perhaps only a few fractions of a second. First, we have to check if we can read the file faster. We haven’t done any optimization of the file reading and just used a bufio.Scanner to read lines. Second, we still allocate twice the memory we read from the data file. These two points might end in the same last optimisation because to read the file faster perhaps means to allocate less memory. We can’t read less memory than these 3.3GiB, or perhaps we can? There is a schema for the FEC CSV file, and we might jump right into the portion of our file where we know data is located. See Exercises for more on this.

That's an interesting challenge! Any more variations? As a rule of thumb, when you optimize an algorithm, there's a factor of 10 to find
— José Paumard (@JosePaumard) January 13, 2019

We produced a little monster of code and we ran it to an interesting point. Even if this challenge is not a serious thing, it shows what can be done; at all costs! Performance is one aspect of quality of code. The quality in terms of readability, maintainability and robustness against changing requirements is another and similar important.

But imagine, regardless of the silly challenge, when a tasks performance has impact on real stuff, peoples or animals lives, money, or any other matter of real value, then, every second might count and then performance at any cost can be justified, even if the code looks awfull, right?

That might be true. But you also have to take into account that, when you’ve delivered this optimized and highly performant code, you have to ensure that any change on your code doesn’t endanger anyone depending on it. So think about this when optimizing at all costs. Any further change might harm others when you left the company or the project and such a change ends in ‘SHTF’ and you might have to come back to clean up all that.

Exercices

Here are some exercices to go futher. These are points I was unable to integrate into this article.

Read the file concurrently from two goroutines (hint: use a io.SectionReader) => what happens? Why is this a good/bad idea for a SSD and/or a HDD?
Try to achieve (nearly) zero-copy of the strings read from the file and then use them in the resulting lists / frequency tables. “reading” the file with scanner.Bytes() instead of scanner.Text() only takes 2 seconds. Try to use that to break the wall. How can you do it?
Make Variation7 in Java as fast as our Revision 9 in Go, that should be possible as there is no magic happening here.
Revision 2 has a Bug, fix it!
We violated c4 of the Challenge and eliminated the names list. Try to implement that back with as little peformance impact as possible.
Disable the GC, what happens? Why?
Write Revision 10 in Idiomatic Go, is it still “fast”?

Appendix A

A1 – Going Further?

To get an indication where we are now performance wise, we can do a few things. We can measure how long it would take just to read the 18.2 million lines of data and measure the reading of lines without its processing. With that we get a baseline where we are at its bare minimum of processing, like no processing. We would obiviously not go below this number, right?

# remove the processing goroutine and..
$ go run ./rev9/readfile9.go
Most common name time: 3.568166257s
$

It takes 3.56 seconds to just read the lines with scanner.Text() and do nothing with them. Compare that to our best run of 3.88 seconds. We don’t process 18 million lines in the other 0.28 seconds. Instead the lines are processed nearly at the same time as we read them concurrently in an another goroutine and hand the chunk of lines off to a processing goroutines concurrently. That is most probably our wall we hit here. The disk, or in our performance runs, the OS disk cache, is our limiting factor.

When we look at the memory behaviour

Alloc = 0 MiB   TotalAlloc = 3540 MiB   Sys = 68 MiB    NumGC = 932

it seems we copy our data file of 3.3GB in size about once within scanner.Text(). Later we’ll see that all other revisions allocate at least two times this size. scanner.Text() copies the byte-buffer from an io.Reader to form a string and return it. How much memory do we allocate using solely scanner.Bytes()? Lets see

Most common name time: 2.392783786s
Alloc = 0 MiB   TotalAlloc = 0 MiB      Sys = 66 MiB    NumGC = 0

None! Here we are. I assume if we want to go below 3.88 seconds this is the attacking angle in, to break this limit.

A2 – Concurrency Variations

With Go its easy to limit the available CPU cores to a Go program. Its a matter of setting the GOMAXPROCS environment variable before running the compiled program.

go with 1 CPU Core

$ GOMAXPROCS=1 go run ./rev9/readfile9.go itcont.txt
Most common name time: 12.006801212s
$

and with 2 CPU Cores

$ GOMAXPROCS=2 go run ./rev9/readfile9.go itcont.txt
Most common name time: 6.409553051s
$

we take about half of that.

$ GOMAXPROCS=3 go run ./rev9/readfile9.go itcont.txt
Most common name time: 4.5708305s
$

we are right at about 4.5 seconds. Think about what that means. One of the other implementations had a runtime of about 12.5 seconds, but as far as I can see without any concurrent or parallel processing of what was read from the file. Does this correlate with our meausurement of 12s here?

Appendix B

B1 – Get the Code

https://github.com/marcellanz/file-read-challenge-go

B1 – Other Implementations

Philippe Marschall, A Reimplementation using Eclipse Collections and marschall/mini-csv
- Marschall reports a runtime “to about 10 seconds” on a Intel Core i7-7700T (Desktop). Variation7 ran in about 18-19 seconds on his machine. With 23s on my MacBook Pro (I7-7920HQ) I read a factor of 25%, so Marschall’s 10 seconds might be about 12.5 seconds of ours.
Gabriele Bianchet-David, in Rust, runs for about 1-2 minutes
Morgen Peschke, Variations in Scala, ran on my Laptop with
- Scala StdLib – 37.103s
- Akka Streams – 53.507s
- Java StdLib – 63.921s

B1.1 – Other Implementations, After Publication

2019-04-04, Denis posted this after my blog post. He is using his “ldetool”, a Generator of fast log file parsers written in Go, and it runs the challenge in "revision: mineislarger, runtime: 1.837077615s" and with disk cache cleared "revision: mineislarger, runtime: 2.812756462s" and its just over 430 lines of code, impressive!
2019-05-08, Ben E. C. Boyter published a nice blog post and went from this article (took Exercice #3) and worked through different and interesting angles to find better performance with JAVA.

B2 – Memory Allocation

$ GODEBUG=gctrace=1 go run ./revX/readlineX.go

rev0: gc 33 @22.877s 2%: 0.007+1321+0.043 ms clock, 0.057+15/1990/29+0.34 ms cpu, 6544->6544->4198 MB, 6556 MB goal, 8 P
rev1: gc 18 @21.921s 45%: 8.0+14368+9.0 ms clock, 64+73938/24847/0+72 ms cpu, 4322->4447->3709 MB, 4380 MB goal, 8 P
rev2: gc 19 @13.833s 7%: 0.88+1100+0.40 ms clock, 7.0+2780/2199/1576+3.2 ms cpu, 7171->7261->4510 MB, 7355 MB goal, 8 P
rev3: gc 24 @12.147s 14%: 193+747+0.059 ms clock, 1547+2410/1493/722+0.47 ms cpu, 6901->7065->4586 MB, 7228 MB goal, 8 P
rev4: gc 24 @11.941s 15%: 475+629+0.068 ms clock, 3807+1342/1256/1823+0.54 ms cpu, 6762->6819->4340 MB, 6995 MB goal, 8 P
rev5: gc 22 @8.404s 11%: 0.027+473+0.057 ms clock, 0.21+1117/942/389+0.46 ms cpu, 6804->6954->4171 MB, 7100 MB goal, 8 P
rev6: gc 22 @12.260s 12%: 185+694+0.12 ms clock, 1481+2120/1386/1136+1.0 ms cpu, 6635->6778->4401 MB, 6998 MB goal, 8 P
rev7: gc 17 @9.341s 9%: 72+819+0.34 ms clock, 583+1958/787/514+2.7 ms cpu, 5928->5929->4380 MB, 5929 MB goal, 8 P
rev8: gc 18 @6.637s 7%: 0.019+425+0.079 ms clock, 0.15+303/843/892+0.63 ms cpu, 5639->5761->4200 MB, 5807 MB goal, 8 P
rev9: gc 129 @4.391s 3%: 0.006+3.6+0.044 ms clock, 0.052+2.8/6.1/1.9+0.35 ms cpu, 130->131->49 MB, 133 MB goal, 8 P

# using runtime.ReadMemStats()

rev0: Alloc = 5144 MiB        TotalAlloc = 16100 MiB  Sys = 7279 MiB  NumGC = 33
rev2: Alloc = 7360 MiB        TotalAlloc = 16376 MiB  Sys = 9360 MiB  NumGC = 19
rev3: Alloc = 4677 MiB        TotalAlloc = 17360 MiB  Sys = 7944 MiB  NumGC = 24
rev4: Alloc = 7107 MiB        TotalAlloc = 17221 MiB  Sys = 8007 MiB  NumGC = 23
rev5: Alloc = 7072 MiB        TotalAlloc = 17216 MiB  Sys = 8007 MiB  NumGC = 22
rev6: Alloc = 5155 MiB        TotalAlloc = 16125 MiB  Sys = 7679 MiB  NumGC = 27
rev7: Alloc = 5713 MiB        TotalAlloc = 11148 MiB  Sys = 6983 MiB  NumGC = 17
rev8: Alloc = 5788 MiB        TotalAlloc = 11217 MiB  Sys = 7116 MiB  NumGC = 18
rev9: Alloc = 81 MiB          TotalAlloc = 6590 MiB   Sys = 205 MiB   NumGC = 129

B3 – Turn off Spotlight

# to turn off Spotlight on MacOS
$ sudo mdutil -a -i off
# and turn it on again
$ sudo mdutil -a -i on

B4 – GC Tracer in Detail

Let’s explain the diagram in detail, it will help us later to understand what happens with other revisions

What we see here are the first few hundred milliseconds of revision 1. There are two main rows, STATS and PROCS. Under Goroutines STATS shows Runnable- and Running goroutines. Heap shows Allocated Heap Memory together with the NextGC target and Threads the running OS threads (yes Go use threads, goroutine are scheduled over N OS-threads where N is usually the number of CPU cores). The second column shows GC activity and then Proc n shows the processors where the goroutines are scheduled on.

Starting from the top on goroutines the kined of blue waves right at the time when the GC is running with 7 times with increasing duration. Wave 4 and 5 have about 5k goroutines in Runnable state, but they don’t run here as GC seems to block them a bit. The 6th and 7th wave have 10k and just over 12k runnable goroutines. The GC is not the limiting factor in our run as we’ll see later. Relative to its runtime, GC runtime is irrelevant.

Starting in Proc 0 there is a goroutine named G1 runtime.main, it is our main function (even not started as a goroutines itself, it is a Goroutine). G1 is running small spikes of systemcalls syscal.read(), that can be seen as orange blocks below the G1 goroutine. Here our program reads data from the file using bufio.Scanner and its scanner.Text() method. G1 is buffering lines of data and then fires off goroutines to parse the lines for names, firstnames and donation dates which is exactly what we planned to do.

these goroutine names are so silly because we have a one method implementation with no names for the goroutines given.

G6 main.main.func1 is our for – select block, where we receive the three parsed data fields and then append them to the three lists. All other processors have thousands of goroutines running. We can see the behaviour we expected. G1 reads lines of data, fires off new goroutines to parse them and then feeds them over the three channels. Our CPU cores seem to be fully saturated with “workload”. Unfortunately its the wrong kind of work they’re on, its a lot of coordination sending and receiving messages over three channels, beside some time to parse the lines.

March 31, 2019

Processing Large Files – Java, Go and 'hitting the wall'

Objectives

The Challenge

Baseline – 7 Iterations in Java

9 Revisions in Go

Revision 0 – A Go Port

Revision 0 – Performance

A Plan to Go Forward

Revision 1 – A Naive Performance Improvement

Performance

Go Tracer

Revision 1 – Conclusion

Revision 2 – Channels Reduced

Performance

Revision 3 – Multiple Entries to Send

Performance

Revision 4 – Hang in on a Mutex

Performance

Revision 5 – Inside the Loop

Performance

Revision 6 – Reuse Allocated Memory

Performance

Revision 7: Regexp Usage

Performance

Revision 8: No Regexp

Performance

Revision 9 – Reduce the Garbage

Performance

From Revision 0 to 9 – Results

Conclusion – ‘Hitting the Wall’

Exercices

Appendix A

A1 – Going Further?

A2 – Concurrency Variations

Appendix B

B1 – Get the Code

B1 – Other Implementations

B1.1 – Other Implementations, After Publication

B2 – Memory Allocation

B3 – Turn off Spotlight

B4 – GC Tracer in Detail