I looked at your eval framework. I have adopted a similar subset / superset result set matching approach in some of my research. One word of caution is that result set matching cannot prove semantic equivalence; so you may want to consider adding multiple database instances to reduce false positives. False positives are particularly prevalent when gold queries generate scalar values or empty result sets.
Are you planning on submitting SQLCoder-34b to other NL-to-SQL benchmarks like Spider or its other derivatives?
I looked at your eval framework. I have adopted a similar subset / superset result set matching approach in some of my research. One word of caution is that result set matching cannot prove semantic equivalence; so you may want to consider adding multiple database instances to reduce false positives. False positives are particularly prevalent when gold queries generate scalar values or empty result sets.
Are you planning on submitting SQLCoder-34b to other NL-to-SQL benchmarks like Spider or its other derivatives?