阿里巴巴论文-m6米乐安卓版下载

m6米乐安卓版下载-米乐app官网下载

阿里巴巴论文_can llm already serve as a database interfacea big bench for large-scale database groundedtext-to-sqls.pdf

通讯员

20页

0次

1小时前

免费下载

can llm already serve as a database interface?

a big bench for large-scale database grounded

text-to-sqls

jinyang li

1,♣‡

, binyuan hui

2,♣

, ge qu

1,♣

, binhua li

, jiaxi yang

, bowen li

, bailin wang

bowen qin

, rongyu cao

, ruiying geng

, nan huo

, xuanhe zhou

, chenhao ma

guoliang li

, kevin c.c. chang

6†

, fei huang

, reynold cheng

1†

, yongbin li

2†

the university of hong kong

damo academy, alibaba group

tsinghua university

massachusetts institute of technology

the chinese university of hong kong (shenzhen)

university of illinois at urbana-champaign

jl0725@connect.hku.hk, ckcheng@cs.hku.hk

binyuan.hby@alibaba-inc.com

abstract

text-to-sql parsing, which aims at converting natural language instructions into

executablesqls,hasgainedincreasingattentioninrecentyears.inparticular,

codex and chatgpt have shown impressive results in this task.however, most of

the prevalent benchmarks, i.e., spider, and wikisql, focus on database schema

with few rows of database contents leaving the gap between academic studyand

real-world applications.to mitigate this gap, we presentbird,a big benchmark

for large-scale database grounded in text-to-sqltasks, containing 12,751 pairs

oftext-to-sqldataand95databaseswithatotalsizeof33.4gb,spanning

37professionaldomains.ouremphasisondatabasevalueshighlightsthenew

challenges of dirty database contents, external knowledge between nl questions

and database contents, and sql efﬁciency, particularly in the context of massive

databases.tosolvethese problems,text-to-sqlmodelsmustfeaturedatabase

value comprehensioninaddition tosemanticparsing.the experimentalresults

demonstratethesigniﬁcanceofdatabasevaluesingeneratingaccuratetext-to-sqls

for big databases.furthermore, even themost popular and effectivetext-to-sql

models, i.e.chatgpt, only achieve 40.08%in execution accuracy, which is still far

from the human result of 92.96%, proving that challenges still stand.besides, we

also provide an efﬁciency analysis to offer insights into generating text-to-efﬁcient-

sqlsthatarebeneﬁcialto industries.webelievethatbirdwillcontributeto

advancingreal-world applications of text-to-sql research.the leaderboard and

source code are available:https://bird-bench.github.io/.

1introduction

text-to-sqlparsing [

], whichfocuses ontransforming naturallanguage intosql

queries, has attracted signiﬁcant research interests from both academia and industry.this attention

stemsfromitspotentialto empowernon-expertdataanalystsinautomatically extractingdesired

information from ubiquitous relational databases using natural language.recent advances in neural

♣equal contribution.

‡

work done during an intern at alibaba damo academy.

†

corresponding authors.

arxiv:2305.03111v2 [cs.cl] 30 may 2023

external knowledge reasoning

large and realistic database values

sqlexecutionefficiency

what is the winning rate of boston celtics in 2000?

selectcount(won) / ((count(won)count(lose))

from teams where team_name = ‘boston celtics’

and year = 2000;

external knowledge:

winning rate = # won / (# won # lose)

what is the average salary of the worst performing managers?

selectavg(cast(replace(substr(t1.salary, 4), ',', '') asreal)) from

last_name

milgrom

… …

em_id

0000

… …

employees

us$57,500.00

first_name

salary

2222

6543

adams

wood

milgrom

sandy

emily

… …

us$19,500.00

us$69,000.00

… …

reasoned database:

employee ast1 joinposition ast2 ont1.positionid =t2.positionid

wheret1.performance = 'poor' andt2.positiontitle = 'manager'

among the coaches who have served more than 2 nba teams, during

which coach‘s period of coaching, a team has the least numbers of

games lost in the post-season games?

select coachid from coaches where lgid='nba’ and post_wins !=0

sql

: normal semantic parser

and post_losses !=0 and coachid in

(select coachid from coaches where lgid='nba’ group by coachid

having count(tmid)>=2) order by post_losses asc limit 1 ;

run time: 22.4s

what is the average salary of the worst performing managers?

selectavg(cast(replace(substr(t1.salary, 4), ',', '') asreal)) from

last_name

milgrom

… …

em_id

0000

… …

employees

us$57,500.00

first_name

salary

2222

6543

adams

wood

santa

sandy

emily

… …

us$19,500.00

us$69,000.00

… …

reasoned database:

employee ast1 joinposition ast2 ont1.positionid =t2.positionid

wheret1.performance = 'poor' andt2.positiontitle = 'manager'

sql

: efficient semantic parser

select coachid from coaches where lgid=‘nba’ and post_wins !=0

and post_losses !=0 and exists (select 1 from coaches as coaches1

where (coaches1.lgid=‘nba’) and (coaches.coachid=coaches1.coachid)

group by coaches1.coachidhaving count(coaches1.tmid) >= 2

order by null ) order by coaches.post_lossesasc limit 1

run time: 4.0s

how many accounts are eligible for loans in new yo rk city?

the condition of loans is that

thetype of the account should

be “owner”.

selectcount(*) from account where account.type

= ‘owner’ andcity = ‘ny’;

external knowledge:

(a).

(b).

(c).

how many accounts are eligible for loans in new yo rk city?

the condition of loans is that

thetype of the account should

be “owner”.

selectcount(*) from account where account.type

= ‘owner’ and disp_id = ‘ny’;

external knowl edge:

list account id who chooses weekly issue issuance statement?

‘poplatek tydne’ stands

for weekly issuance.

select account_id from account where account.frequency

= ‘poplatek tydne‘;

external knowl edge:

how many accounts are eligible for loans in new yo rk city?

the condition of loans is that

thetype of the account should

be “owner”.

selectcount(*) from account where account.type

= ‘owner’ and disp_id = ‘ny’;

external knowledge:

list account id who chooses weekly issue issuance statement?

‘poplatek tydne’ stands

for weekly issuance.

select account_id from account where account.frequency

= ‘poplatek tydne‘;

external knowledge:

what is the average salaryof the worst performing managers?

figure 1:examples ofchallenges in our bird benchmark.1) databases containvalues of noisy data

types[

].inthe leftexample,the average salarycouldbe fetchedbyprocessing the

data type from string (

text

in sqlite) to ﬂoat (

real

in sqlite) after deleting the special tokens,

"us$"

and

","

.2) external knowledge and reasoning are required.in the middle example, models

must handle that only

"owner"

accounts are eligible for loans.3) query execution efﬁciency needs

to be considered.in the right example, the adoption of more efﬁcient sql queries leads to signiﬁcant

gains in speed, which is of great value in industries.

models, including those basedon large language models (llms),have ledto impressive performance

on existing benchmarks such as spider [

] and wikisql [

].for instance, the execution accuracy

of the top-performing model in spider leaderboardhas increased from 53.5% [

] to 85.3% [

] over

the past three years.thelatest sota parser [

] in spider beneﬁts from the powerful understanding

and coding capabilities of the large language model (llm), and such excellent performance leads us

to ask a question:canllmalreadyserveasadatabaseinterface?

theanswerisno,asshowninfigure.1, wediscoveredthatcurrentstate-of-the-artmodels stillstruggle

togeneralizetomorerealisticsituationscharacterizedbylargedatabasesizesandnoisycontent.

besides,themysterieshiddenbehindthehugedatabasevaluesrequireexternalknowledgeand

reasoning to reveal.furthermore, existing benchmarks do not account for sql execution efﬁciency,

whichholds signiﬁcantpracticalimportanceinreal-lifeapplications,notablyinthecaseoflarge

databases.motivated by these observations, we aimto develop a new text-to-sql benchmark that

better represents real-life scenarios and narrows the gap between experimental and practical settings.

in this work, we propose bird, a big bench for large-scale database grounded in text-to-sqls

for real-world applications.bird contains complex 12,751 examples of querying information over

95 big databaseswithatotalsizeof 33.4gbspanning37professionaldomains.for training, we

collected 80open-sourcerelational databases fromreal analysis platforms(kaggle, relation.vit);

forevaluation,wecurated15additionalrelationaldatabases.giventhesedatabases,werelyon

crowdsourcingtocollectnaturallanguageinstructionsandthecorrespondingsqls.first,our

databaseexperts createadescriptionﬁleexplainingallcolumnnames,abbreviatedvalues,value

types, and externalknowledge for each database to helpannotators better understand the database

contents.thenwehireandtrainnativespeakerstoaskquestionsfacingthesedatabasesonone

side; on theotherside,asqlannotationteamconsistingofdataengineersanddatabase students

is recruited to generate sqls to answer questions.to accommodate efﬁciency, we propose a new

metric valid efﬁciency score (ves) to evaluate the efﬁciency of generated sqls in addition to the

standard execution accuracy.to the best of our knowledge, bird is the ﬁrst text-to-sql benchmark

to incorporate efﬁciency, promoting more efﬁcient query methods within the context of massive and

noisy database contents.

we evaluate theperformance of state-of-the-arttext-to-sql parsers usingtwo popularmethodologies:

ﬁne-tuning with t5 [

], and in-context learning with large language models (llms) such as codex

[

](

code-davinci-002

)andchatgpt[

](

gpt-3.5-turbo

).ourexperimentalresults

revealthatthecurrentmodelsstruggletogeneralizewell.speciﬁcally,thespidersotamodel,

which depends solelyon the database schema,achieves executionaccuracies of only 25.88%and

28.95% on the development andtest sets, respectively.in comparison, the performance still lags far

behind human performance,which we also provide inthis benchmark.we encourage furtherresearch

to address the more realistic settings presented in this benchmark.

of 20

免费下载

关注

阿里巴巴论文-m6米乐安卓版下载

评论