Genie is hosted by Hepforge, IPPP Durham

Opened 3 years ago

Closed 2 months ago

#60 closed defect (fixed)

gspl2root seg faults when run with the -w option

Reported by: Hignight Owned by: sdennis
Priority: minor Milestone:
Component: Event Formats and Converters Version:
Keywords: gspl2root Cc:

Description

I have encountered a bug that causes gspl2root to seg fault when it is run with the -w (print graphs to postscript file) option. If the -w option is not added, there is no issue, so it is unique to the making of the PS. The exact command I used was:

gspl2root -f xsec_splines_14_1000010010.xml -p 14 -t 1000010010 -w

but can replicate the issue with any particle/target/xml file I have.

I have tried several things, such as commenting out the function SaveGraphsToRootFile() and the seg fault still occurs. Adding comments into the code I figured out that the program is crashing right after return 0, implying to me it is a problem during garbage collection and something has gone out of scope but it still tries to delete it. Running with GDB seems to support this hypothesis with the program seg faulting during the call of the AlgFactory destructor, specifically when the COHKinematicsGenerator destructor is called. The GDB log is appended to the end of this report.

I have been able to replicate the problem with both GENIE 2.6.4 and GENIE 2.8.0 (the only two version of GENIE I have installed.) I have both versions installed on RHEL 6.3 and SL 6.5 with both OS showing the same issue. GENIE was built with gcc 4.4.7 and gcc 4.4.8 and are linked to ROOT version 5.34.{18,19,20}. All versions are linked to LHAPDF 5.9.1, LOG4CPP 1.1, PYTHIA 6.4.24, and GSL 1.16.


GDB log: Starting program: /mnt/home/hignight/GENIE/lamp-master/GENIE_2_8/bin/gspl2root -f xsec_splines_14_1000010010.xml -p 14 -t 1000010010 -w [Thread debugging using libthread_db enabled] Detaching after fork from child process 9192.

Program received signal SIGSEGV, Segmentation fault. 0x00000000016394e0 in ?? () Missing separate debuginfos, use: debuginfo-install freetype-2.3.11-6.el6_2.9.x86_64 glibc-2.12-1.80.el6_3.5.x86_64 keyutils-libs-1.4-4.el6.x86_64 krb5-libs-1.9-33.el6_3.2.x86_64 libcom_err-1.41.12-12.el6.x86_64 libgcc-4.4.6-4.el6.x86_64 libgfortran-4.4.6-4.el6.x86_64 libselinux-2.0.94-5.3.el6.x86_64 libstdc++-4.4.6-4.el6.x86_64 libxml2-2.7.6-4.el6_2.4.x86_64 openssl-1.0.0-25.el6_3.1.x86_64 pcre-7.8-4.el6.x86_64 zlib-1.2.3-27.el6.x86_64 #0 0x00000000016394e0 in ?? () #1 0x00002aaaab79ddc0 in genie::COHKinematicsGenerator::~COHKinematicsGenerator (this=0xf9fce0, __in_chrg=<value optimized out>) at COHKinematicsGenerator.cxx:72 #2 0x00002aaaab79de24 in genie::COHKinematicsGenerator::~COHKinematicsGenerator (this=0xf9fce0, __in_chrg=<value optimized out>) at COHKinematicsGenerator.cxx:73 #3 0x00002aaaaacee690 in genie::AlgFactory::~AlgFactory (this=0xf462e0, __in_chrg=<value optimized out>) at AlgFactory.cxx:67 #4 0x00002aaaaacee758 in genie::AlgFactory::~AlgFactory (this=0xf462e0, __in_chrg=<value optimized out>) at AlgFactory.cxx:73 #5 0x00002aaaaacefb8b in genie::AlgFactory::Cleaner::~Cleaner (this=0x2aaaaaf0c728, __in_chrg=<value optimized out>)

at /mnt/home/hignight/GENIE/lamp-master/GENIE_2_8/src/Algorithm/AlgFactory.h:82

#6 0x00002aab387e3db2 in exit () from /lib64/libc.so.6 #7 0x00002aab387ccce4 in __libc_start_main () from /lib64/libc.so.6 #8 0x0000000000405a19 in _start () Breakpoint 1 at 0x2aaaab79dd75: file COHKinematicsGenerator.cxx, line 70. (2 locations) The program being debugged has been started already. Start it from the beginning? (y or n) Starting program: /mnt/home/hignight/GENIE/lamp-master/GENIE_2_8/bin/gspl2root -f xsec_splines_14_1000010010.xml -p 14 -t 1000010010 -w [Thread debugging using libthread_db enabled] Detaching after fork from child process 9206.

Breakpoint 1, genie::COHKinematicsGenerator::~COHKinematicsGenerator (this=0xf9fce0, __in_chrg=<value optimized out>) at COHKinematicsGenerator.cxx:73 73 }

Breakpoint 1, genie::COHKinematicsGenerator::~COHKinematicsGenerator (this=0xf9fce0, __in_chrg=<value optimized out>) at COHKinematicsGenerator.cxx:70 70 COHKinematicsGenerator::~COHKinematicsGenerator() 72 if(fEnvelope) delete fEnvelope;

Program received signal SIGSEGV, Segmentation fault. 0x00000000016394e0 in ?? () A debugging session is active.

Inferior 1 [process 9203] will be killed.

Quit anyway? (y or n)

Change History (3)

comment:1 Changed 2 years ago by candreop

  • Version 2.8.0 deleted

comment:2 Changed 2 months ago by sdennis

  • Owner changed from candreop to sdennis
  • Status changed from new to assigned

I just ran into this myself, and I think I understand it if not how to fix it in a sane way.

The cause is some ROOT voodoo. When called with -w, this program makes a TCanvas, which means ROOT makes a TApplication object. For whatever reason, this TApplication assumes ownership of the fEnvelope object (which is a TF2 on the heap) in COHKinematicsGenerator and deletes it when the program begins to exit, leading to a segfault when our object (which *should* own the TF2) tries to delete it.

We could hack a fix in by only deleting that object (and any other TFx members) if gApplication==0, but it seems like there has to be better way.

comment:3 Changed 2 months ago by sdennis

  • Resolution set to fixed
  • Status changed from assigned to closed

I worked out a relatively sane solution - just removing the function from gROOT::GetListOfFunctions()

This is apparently a fairly common issue - a bunch of stuff has been added to the TF1 interface in ROOT6 to get around it (every constructor has an extra AddToGlobal() argument and there's a static method to change the default behaviour). Since we still support ROOT5, I had to fix it this slightly more ad-hoc way.

Note: See TracTickets for help on using tickets.