okay so…from developer documentation, this only partially answers my question but a start:
“Time values are expressed as a rational number of seconds with a 64-bit numerator and a 32-bit denominator. Frame rates for NTSC-compatible media, for example, use a frame duration of “1001/30000s” (29.97 fps) or “1001/60000s” (59.94 fps). If a time value is equal to a whole number of seconds, the fraction may be reduced into whole seconds (for example, “5s”).”
I found this by googling “fcpxml audio tag” and clicking on the pdf