Skip to content

Interface to libxml2, with DOM interface

License

Notifications You must be signed in to change notification settings

iamyeswc/libxml2

 
 

Repository files navigation

libxml2 Memory Leak Issue and Resolution

This repository addresses a memory leak issue encountered while using the libxml2 library to parse XSD files in a Go application.

Background

A cron job runs every two hours, sending requests to a service with multiple domains. The service queries the DNS records for these domains and retrieves the corresponding SVGs for validation. During this validation process, the libxml2 library's XSD methods are used to obtain the template schema, which is then validated against the SVGs retrieved from the DNS records.

However, it was observed in Grafana that the service's memory usage increased gradually every two hours, suggesting a potential memory leak in the code.

The issue was ultimately traced to the xsd.Parse method in the libxml2 library. Below is the debug process used to identify the problem:

Debug Process

  1. Install Valgrind: On Ubuntu, the installation command is:

    sudo apt update
    sudo apt install valgrind
  2. Build the Leak Test: Navigate to the memoryLeak/leak folder in the current repository and run:

    go build leak.go
  3. Run Valgrind to Identify Memory Leaks: Execute the following command:

    valgrind --leak-check=full --track-origins=yes ./leak

    Results:

     ==2354== 290,200 (752 direct, 289,448 indirect) bytes in 1 blocks are definitely lost in loss record 226 of 230
     ==2354==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
     ==2354==    by 0x488D323: xmlNewParserCtxt (in /usr/lib/x86_64-linux-gnu/libxml2.so.2.9.10)
     ==2354==    by 0x48A531C: xmlCreateMemoryParserCtxt (in /usr/lib/x86_64-linux-gnu/libxml2.so.2.9.10)
     ==2354==    by 0x4AF2ED: _cgo_6d363e99d30e_Cfunc_xmlCreateMemoryParserCtxt (cgo-gcc-prolog:391)
     ==2354==    by 0x471DC3: runtime.asmcgocall.abi0 (asm_amd64.s:923)
     ==2354==    by 0xC0000061BF: ???
     ==2354==    by 0x470189: runtime.systemstack.abi0 (asm_amd64.s:514)
     ==2354==    by 0x1FFEFFFD17: ???
     ==2354==    by 0x47479E: runtime.newproc.abi0 (<autogenerated>:1)
     ==2354==    by 0x470084: runtime.mstart.abi0 (asm_amd64.s:395)
     ==2354==    by 0x47000E: runtime.rt0_go.abi0 (asm_amd64.s:358)
     ==2354==
     ==2354== 1,196,126 (176 direct, 1,195,950 indirect) bytes in 1 blocks are definitely lost in loss record 230 of 230
     ==2354==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
     ==2354==    by 0x48B0E69: xmlNewDoc (in /usr/lib/x86_64-linux-gnu/libxml2.so.2.9.10)
     ==2354==    by 0x498DB50: xmlSAX2StartDocument (in /usr/lib/x86_64-linux-gnu/libxml2.so.2.9.10)
     ==2354==    by 0x48A7B55: xmlParseDocument (in /usr/lib/x86_64-linux-gnu/libxml2.so.2.9.10)
     ==2354==    by 0x48A8397: ??? (in /usr/lib/x86_64-linux-gnu/libxml2.so.2.9.10)
     ==2354==    by 0x4AF33D: _cgo_6d363e99d30e_Cfunc_xmlCtxtReadMemory (cgo-gcc-prolog:416)
     ==2354==    by 0x471DC3: runtime.asmcgocall.abi0 (asm_amd64.s:923)
     ==2354==    by 0xC0000061BF: ???
     ==2354==    by 0x470189: runtime.systemstack.abi0 (asm_amd64.s:514)
     ==2354==    by 0x1FFEFFFD17: ???
     ==2354==    by 0x47479E: runtime.newproc.abi0 (<autogenerated>:1)
     ==2354==    by 0x470084: runtime.mstart.abi0 (asm_amd64.s:395)
    
    

    The output shows a significant amount of memory is "definitely lost," indicating that functions xmlNewDoc and xmlParseDocument are responsible for the leak.

Code Analysis

The xsd.Parse function is defined as follows:

func Parse(buf []byte, options ...Option) (*Schema, error) {
    sptr, err := clib.XMLSchemaParse(buf, options...)
    if err != nil {
        return nil, errors.Wrap(err, "failed to parse input")
    }
    return &Schema{ptr: sptr}, nil
}

The XMLSchemaParse function:

func XMLSchemaParse(buf []byte, options ...option.Interface) (uintptr, error) {
	var uri string
	var encoding string
	var coptions int
	//nolint:forcetypeassert
	for _, opt := range options {
		switch opt.Name() {
		case option.OptKeyWithURI:
			uri = opt.Value().(string)
		}
	}

	docctx := C.xmlCreateMemoryParserCtxt((*C.char)(unsafe.Pointer(&buf[0])), C.int(len(buf)))
	if docctx == nil {
		return 0, errors.New("error creating doc parser")
	}

	var curi *C.char
	if uri != "" {
		curi = C.CString(uri)
		defer C.free(unsafe.Pointer(curi))
	}

	var cencoding *C.char
	if encoding != "" {
		cencoding = C.CString(encoding)
		defer C.free(unsafe.Pointer(cencoding))
	}

	doc := C.xmlCtxtReadMemory(docctx, (*C.char)(unsafe.Pointer(&buf[0])), C.int(len(buf)), curi, cencoding, C.int(coptions))
	if doc == nil {
		return 0, errors.Errorf("failed to read schema from memory: %v",
			xmlCtxtLastErrorRaw(uintptr(unsafe.Pointer(docctx))))
	}

	parserCtx := C.xmlSchemaNewDocParserCtxt((*C.xmlDoc)(unsafe.Pointer(doc)))
	if parserCtx == nil {
		return 0, errors.New("failed to create parser")
	}
	defer C.xmlSchemaFreeParserCtxt(parserCtx)

	s := C.xmlSchemaParse(parserCtx)
	if s == nil {
		return 0, errors.New("failed to parse schema")
	}

	return uintptr(unsafe.Pointer(s)), nil
}

The XMLSchemaParse function allocates memory to store the parsing context. To avoid memory leaks, it is necessary to call xmlFreeParserCtxt after the context is no longer needed. Additionally, the C.xmlCtxtReadMemory function returns an xmlDoc structure, which should be released using C.xmlFreeDoc(doc) when it is no longer required.

Solutions and Validation

Method 1: Fixing the Leak in a Forked Library

Since libxml2 is now read-only and the official repository has not addressed this issue, I forked the official code and implemented a fix. You can navigate to the memoryLeak/fixLeak folder in this repository and run:

go build fixLeak.go

Then, use Valgrind to check for memory leaks again:

valgrind --leak-check=full --track-origins=yes ./fixLeak

Results:

goroutine 1 [running]:
main.main()
        /mnt/k/iamyeswc/code/project/libxml2/memoryLeak/fixLeak/fixLeak.go:47 +0xa9
==996==
==996== HEAP SUMMARY:
==996==     in use at exit: 1,824 bytes in 6 blocks
==996==   total heap usage: 19 allocs, 13 frees, 76,576 bytes allocated
==996==
==996== 304 bytes in 1 blocks are possibly lost in loss record 1 of 2
==996==    at 0x483DD99: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==996==    by 0x40149DA: allocate_dtv (dl-tls.c:286)
==996==    by 0x40149DA: _dl_allocate_tls (dl-tls.c:532)
==996==    by 0x4A16322: allocate_stack (allocatestack.c:622)
==996==    by 0x4A16322: pthread_create@@GLIBC_2.2.5 (pthread_create.c:660)
==996==    by 0x4B0090: _cgo_try_pthread_create (gcc_libinit.c:154)
==996==    by 0x4B0293: _cgo_sys_thread_start (gcc_linux_amd64.c:69)
==996==    by 0x471DC3: runtime.asmcgocall.abi0 (asm_amd64.s:923)
==996==    by 0xC0000061BF: ???
==996==    by 0x470189: runtime.systemstack.abi0 (asm_amd64.s:514)
==996==    by 0x1FFEFFFD07: ???
==996==    by 0x47479E: runtime.newproc.abi0 (<autogenerated>:1)
==996==    by 0x470084: runtime.mstart.abi0 (asm_amd64.s:395)
==996==    by 0x47000E: runtime.rt0_go.abi0 (asm_amd64.s:358)
==996==
==996== 1,520 bytes in 5 blocks are possibly lost in loss record 2 of 2
==996==    at 0x483DD99: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==996==    by 0x40149DA: allocate_dtv (dl-tls.c:286)
==996==    by 0x40149DA: _dl_allocate_tls (dl-tls.c:532)
==996==    by 0x4A16322: allocate_stack (allocatestack.c:622)
==996==    by 0x4A16322: pthread_create@@GLIBC_2.2.5 (pthread_create.c:660)
==996==    by 0x4B0090: _cgo_try_pthread_create (gcc_libinit.c:154)
==996==    by 0x4B0293: _cgo_sys_thread_start (gcc_linux_amd64.c:69)
==996==    by 0x471DFC: runtime.asmcgocall.abi0 (asm_amd64.s:951)
==996==
==996== LEAK SUMMARY:
==996==    definitely lost: 0 bytes in 0 blocks
==996==    indirectly lost: 0 bytes in 0 blocks
==996==      possibly lost: 1,824 bytes in 6 blocks
==996==    still reachable: 0 bytes in 0 blocks
==996==         suppressed: 0 bytes in 0 blocks
==996==
==996== For lists of detected and suppressed errors, rerun with: -s
==996== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)

The results indicate that the previous memory leak issues have been resolved.

Method 2: Using xsd.ParseFromFile

An alternative solution is to avoid using the libxml2 library's xsd.Parse() method and instead utilize xsd.ParseFromFile(). Navigate to the memoryLeak/useParseFromFile folder and run:

go build useParseFromFile.go

Then, validate memory usage with Valgrind:

valgrind --leak-check=full --track-origins=yes ./useParseFromFile

Results:

==4386== HEAP SUMMARY:
==4386==     in use at exit: 18,018 bytes in 194 blocks
==4386==   total heap usage: 33,246 allocs, 33,052 frees, 3,049,458 bytes allocated
==4386==
==4386== 17 bytes in 1 blocks are definitely lost in loss record 53 of 192
==4386==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==4386==    by 0x4ADE37: _cgo_6d363e99d30e_Cfunc__Cmalloc (_cgo_export.c:32)
==4386==    by 0x471DC3: runtime.asmcgocall.abi0 (asm_amd64.s:923)
==4386==    by 0xC0000061BF: ???
==4386==    by 0x470189: runtime.systemstack.abi0 (asm_amd64.s:514)
==4386==    by 0x1FFEFFFCF7: ???
==4386==    by 0x47479E: runtime.newproc.abi0 (<autogenerated>:1)
==4386==    by 0x470084: runtime.mstart.abi0 (asm_amd64.s:395)
==4386==    by 0x47000E: runtime.rt0_go.abi0 (asm_amd64.s:358)
==4386==
==4386== 304 bytes in 1 blocks are possibly lost in loss record 189 of 192
==4386==    at 0x483DD99: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==4386==    by 0x40149DA: allocate_dtv (dl-tls.c:286)
==4386==    by 0x40149DA: _dl_allocate_tls (dl-tls.c:532)
==4386==    by 0x4A16322: allocate_stack (allocatestack.c:622)
==4386==    by 0x4A16322: pthread_create@@GLIBC_2.2.5 (pthread_create.c:660)
==4386==    by 0x4AF730: _cgo_try_pthread_create (gcc_libinit.c:154)
==4386==    by 0x4AF933: _cgo_sys_thread_start (gcc_linux_amd64.c:69)
==4386==    by 0x471DC3: runtime.asmcgocall.abi0 (asm_amd64.s:923)
==4386==    by 0xC0000061BF: ???
==4386==    by 0x470189: runtime.systemstack.abi0 (asm_amd64.s:514)
==4386==    by 0x1FFEFFFCF7: ???
==4386==    by 0x47479E: runtime.newproc.abi0 (<autogenerated>:1)
==4386==    by 0x470084: runtime.mstart.abi0 (asm_amd64.s:395)
==4386==    by 0x47000E: runtime.rt0_go.abi0 (asm_amd64.s:358)
==4386==
==4386== 912 bytes in 3 blocks are possibly lost in loss record 191 of 192
==4386==    at 0x483DD99: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==4386==    by 0x40149DA: allocate_dtv (dl-tls.c:286)
==4386==    by 0x40149DA: _dl_allocate_tls (dl-tls.c:532)
==4386==    by 0x4A16322: allocate_stack (allocatestack.c:622)
==4386==    by 0x4A16322: pthread_create@@GLIBC_2.2.5 (pthread_create.c:660)
==4386==    by 0x4AF730: _cgo_try_pthread_create (gcc_libinit.c:154)
==4386==    by 0x4AF933: _cgo_sys_thread_start (gcc_linux_amd64.c:69)
==4386==    by 0x471DFC: runtime.asmcgocall.abi0 (asm_amd64.s:951)
==4386==
==4386== LEAK SUMMARY:
==4386==    definitely lost: 17 bytes in 1 blocks
==4386==    indirectly lost: 0 bytes in 0 blocks
==4386==      possibly lost: 1,216 bytes in 4 blocks
==4386==    still reachable: 16,785 bytes in 189 blocks
==4386==         suppressed: 0 bytes in 0 blocks
==4386== Reachable blocks (those to which a pointer was found) are not shown.
==4386== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==4386==
==4386== For lists of detected and suppressed errors, rerun with: -s
==4386== ERROR SUMMARY: 3 errors from 3 contexts (suppressed: 0 from 0)

The results indicate that this method does not exhibit any significant memory leakage.

Conclusion

This repository documents the memory leak issue encountered with libxml2 during XSD parsing in a Go application, along with the debugging steps and solutions implemented to resolve the problem. The provided methods ensure efficient memory management and prevent leaks in the application.

Feel free to explore the repository for further details and implementations.

About

Interface to libxml2, with DOM interface

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Go 100.0%