Commit 06fd2da
committed
ARROW-6077: [C++][Parquet] Build Arrow "schema tree" from Parquet schema to help with nested data implementation
Introduces auxiliary internal `SchemaManifest` and `SchemaField` data structures.
This also permits dictionary-encoded subfields in a slightly more principled way (the dictionary type creation is resolved one time, so this removes the `FixSchema` hacks that were there before). I rewrote the nested schema conversion logic to hopefully be slightly easier to follow though it could still use some work. I added comments within to explain the 3 different styles of list encoding
There are a couple of API changes:
* The `FileReader::GetSchema(indices, &schema)` method has been removed. The way that "projected" schemas were being constructed was pretty hacky, and this function is non-essential to the operation of the class. I had to remove bindings in the GLib and R libraries for this function, but as far as I can tell these bindings were non-essential to operation, and were added only because the function was there to wrap.
* Added `FileWriter::Make` factory method, making constructor private
This patch was pretty unpleasant to do -- it removes some hacky functions used to create Arrow fields with leaf nodes trimmed. There is little functional change; it is an attempt to bring a cleaner structure for full-fledged nested data reading
I'm going to get on with seeing through user-facing dictionary-encoding functionality in Python
Closes #4971 from wesm/parquet-arrow-schema-tree and squashes the following commits:
e1f19c0 <Wes McKinney> Code review feedback
e2c117a <Wes McKinney> Factor out list nesting into helper function
Authored-by: Wes McKinney <wesm+git@apache.org>
Signed-off-by: Wes McKinney <wesm+git@apache.org>1 parent e4febfb commit 06fd2da
23 files changed
Lines changed: 1408 additions & 1429 deletions
File tree
- c_glib
- parquet-glib
- test/parquet
- cpp/src/parquet
- arrow
- python/pyarrow
- r
- R
- src
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
231 | 231 | | |
232 | 232 | | |
233 | 233 | | |
234 | | - | |
235 | | - | |
236 | | - | |
237 | | - | |
238 | | - | |
239 | | - | |
240 | | - | |
241 | 234 | | |
242 | | - | |
| 235 | + | |
243 | 236 | | |
244 | 237 | | |
245 | 238 | | |
| |||
249 | 242 | | |
250 | 243 | | |
251 | 244 | | |
252 | | - | |
253 | | - | |
254 | | - | |
255 | | - | |
256 | | - | |
257 | | - | |
258 | | - | |
259 | | - | |
260 | | - | |
261 | | - | |
262 | | - | |
263 | | - | |
264 | | - | |
265 | | - | |
266 | | - | |
267 | | - | |
268 | | - | |
269 | | - | |
270 | | - | |
271 | | - | |
272 | | - | |
273 | | - | |
274 | | - | |
275 | | - | |
276 | | - | |
277 | | - | |
278 | | - | |
279 | | - | |
280 | | - | |
281 | | - | |
282 | | - | |
283 | | - | |
284 | | - | |
285 | | - | |
286 | | - | |
287 | | - | |
288 | 245 | | |
289 | 246 | | |
290 | 247 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
48 | 48 | | |
49 | 49 | | |
50 | 50 | | |
51 | | - | |
52 | | - | |
53 | | - | |
54 | | - | |
55 | | - | |
56 | 51 | | |
57 | 52 | | |
58 | 53 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
39 | 39 | | |
40 | 40 | | |
41 | 41 | | |
42 | | - | |
43 | | - | |
44 | | - | |
45 | | - | |
46 | | - | |
47 | | - | |
48 | | - | |
49 | | - | |
50 | | - | |
51 | | - | |
52 | | - | |
53 | | - | |
54 | | - | |
55 | 42 | | |
56 | 43 | | |
57 | 44 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
40 | 40 | | |
41 | 41 | | |
42 | 42 | | |
| 43 | + | |
43 | 44 | | |
44 | 45 | | |
45 | 46 | | |
| |||
597 | 598 | | |
598 | 599 | | |
599 | 600 | | |
600 | | - | |
601 | | - | |
602 | | - | |
603 | | - | |
604 | | - | |
605 | | - | |
| 601 | + | |
| 602 | + | |
| 603 | + | |
| 604 | + | |
| 605 | + | |
| 606 | + | |
| 607 | + | |
| 608 | + | |
| 609 | + | |
| 610 | + | |
606 | 611 | | |
607 | 612 | | |
608 | 613 | | |
| |||
789 | 794 | | |
790 | 795 | | |
791 | 796 | | |
792 | | - | |
| 797 | + | |
| 798 | + | |
| 799 | + | |
| 800 | + | |
| 801 | + | |
793 | 802 | | |
794 | | - | |
| 803 | + | |
795 | 804 | | |
796 | | - | |
| 805 | + | |
797 | 806 | | |
798 | | - | |
| 807 | + | |
799 | 808 | | |
800 | 809 | | |
801 | 810 | | |
| |||
859 | 868 | | |
860 | 869 | | |
861 | 870 | | |
862 | | - | |
863 | | - | |
| 871 | + | |
| 872 | + | |
| 873 | + | |
| 874 | + | |
| 875 | + | |
864 | 876 | | |
865 | | - | |
| 877 | + | |
866 | 878 | | |
867 | | - | |
| 879 | + | |
868 | 880 | | |
869 | | - | |
| 881 | + | |
870 | 882 | | |
871 | 883 | | |
872 | 884 | | |
| |||
2624 | 2636 | | |
2625 | 2637 | | |
2626 | 2638 | | |
2627 | | - | |
| 2639 | + | |
| 2640 | + | |
| 2641 | + | |
| 2642 | + | |
| 2643 | + | |
2628 | 2644 | | |
2629 | | - | |
| 2645 | + | |
2630 | 2646 | | |
2631 | | - | |
| 2647 | + | |
2632 | 2648 | | |
2633 | 2649 | | |
2634 | 2650 | | |
| |||
0 commit comments