DEV Community

Arijit Ghosh
Arijit Ghosh

Posted on

Can not subset ranges from another range set

Ca21chr2_C_albicans_SC5314 2159343 2228327 Ca21chr2_C_albicans_SC5314 636587 638608
Ca21chr2_C_albicans_SC5314 5286 50509 Ca21chr2_C_albicans_SC5314 634021 636276
Ca21chr2_C_albicans_SC5314 1886545 1900975 Ca21chr2_C_albicans_SC5314 610758 613544
Ca21chr2_C_albicans_SC5314 1919115 1930649 Ca21chr2_C_albicans_SC5314 606248 608308
Ca21chr2_C_albicans_SC5314 590278 603163 Ca21chr2_C_albicans_SC5314 1554724 1556511
Ca21chr2_C_albicans_SC5314 267403 279993 Ca21chr2_C_albicans_SC5314 1547799 1548998
Ca21chr2_C_albicans_SC5314 1611869 1622753 Ca21chr2_C_albicans_SC5314 1519257 1520960
Ca21chr2_C_albicans_SC5314 1479229 1490747 Ca21chr2_C_albicans_SC5314 1514712 1516178
Ca21chr2_C_albicans_SC5314 157814 166956 Ca21chr2_C_albicans_SC5314 897896 900774
Ca21chr2_C_albicans_SC5314 2148223 2149627 Ca21chr2_C_albicans_SC5314 890821 892818
Ca21chr2_C_albicans_SC5314 1041578 1051493 Ca21chr2_C_albicans_SC5314 588237 589598
Ca21chr2_C_albicans_SC5314 736894 745664 Ca21chr2_C_albicans_SC5314 557079 558713
Ca21chr2_C_albicans_SC5314 618550 627903 Ca21chr2_C_albicans_SC5314 7510 8043
Ca21chr2_C_albicans_SC5314 1116919 1125425 Ca21chr2_C_albicans_SC5314 922654 924717
Ca21chr2_C_albicans_SC5314 1262940 1271939 Ca21chr2_C_albicans_SC5314 1778986 1779687
Ca21chr2_C_albicans_SC5314 288630 296284 Ca21chr2_C_albicans_SC5314 795730 798201
Ca21chr2_C_albicans_SC5314 1250513 1258731 Ca21chr2_C_albicans_SC5314 766651 768309
Ca21chr2_C_albicans_SC5314 1499806 1508334 Ca21chr2_C_albicans_SC5314 763501 765159
Ca21chr2_C_albicans_SC5314 98269 105803 Ca21chr2_C_albicans_SC5314 758203 758733
Ca21chr2_C_albicans_SC5314 1604362 1611315 Ca21chr2_C_albicans_SC5314 700893 702539

This is a snippet of my data. What I want to do is to find out if the range of column 5 and column 6 is a subset of the range between column 2 and column 3. The data in column 2 and 3 are longer than data in column 5 and 6. A script has to scan through columns 2 and 3 in totality for every range defined by column 5 and 6. How do I do it. Any awk, sed, r, python scripts? I am sorry if I did not follow the forum's rules, this is my first time using it.

Top comments (0)