Your idea to do the clipping in software when the TA's user clipping won't do is probably the best. Doing clipping with modifier volumes complicates basically all rendering code. Doing clipping manually only has extra complexity when clipping needs to occur. It would also be the more efficient method in terms of fillrate, since you save the PVR from having to draw invisible pixels.
Most 3D hardware during the Dreamcast's time frame either did clipping in software inside the driver, or required the programmer to do it manually, depending on the graphics library (A MiniGL would do clipping in the driver, something like Glide would require manual clipping). The PVR was unusual for the time for being able to do XY clipping completely in hardware (with the 32-pixel limitation). You still have to do near-Z clipping in software, though.
When I was working on a Dreamcast/PC 2D engine a long time ago, I was just going to make camera boundary's having a multiple-of-32 requirement be part of the spec. Too lazy to work around it.
I've included two demos of how to use the user clipping feature and how to use modifier volumes to clip.
The controls for both demos are the same:
Pressing A prints the PVR render time to the console (I was curious what kind of speed differences there would be for different clipping methods)
Holding B disables clipping
Holding X does a more complex clipping example (for user clipping, it does 4 way split screen, for modifier clipping it does a wavy split)
Holding Y changes the demo from displaying rotating sprites to randomly placed sprites
The code is designed to be pretty simple, but I feel like the modifier clipping example really doesn't explain how everything works if you don't already understand how modifier volumes work; the comments explain what is happening, but not why, but I don't really feel like writing a theory of operation for it...
Since the examples are meant to be simple, not everything is done the most efficient way (e.g. it uses pvr_prim to draw instead of directly using the store queues).