I briefly investigated Content Defined Chunking recently after HuggingFace brought out 'Optimized Parquet', which involves CDC and page indexes, but when I tried implementing the former for Polars I found that in fact the Polars parquet format already handles page alignment well (much better than PyArrow in fact) and hence it wasn't really a priority to develop.
Nevertheless, it was interesting to read about it, and I took some notes in the accompanying blog post.