There are some large files, and in most setups (including CI runners) we
have multiple cores available. Use xargs to run multiple parallel
uncrustify jobs rather than one large one. Just hardcode 4 jobs and 4
files at the same time for now.
We can avoid having multiple device feature-check functions now and
just rely on a few.
Add uncrustify config to properly handle begin/end deprecation macros.