阿里巴巴论文-m6米乐安卓版下载

m6米乐安卓版下载-米乐app官网下载
阿里巴巴论文_can llm already serve as a database interfacea big bench for large-scale database groundedtext-to-sqls.pdf
4
20页
0次
1小时前
免费下载
can llm already serve as a database interface?
a big bench for large-scale database grounded
text-to-sqls
jinyang li
1,♣‡
, binyuan hui
2,
, ge qu
1,
, binhua li
2
, jiaxi yang
2
, bowen li
3
, bailin wang
4
,
bowen qin
2
, rongyu cao
2
, ruiying geng
2
, nan huo
1
, xuanhe zhou
3
, chenhao ma
5
,
guoliang li
3
, kevin c.c. chang
6
, fei huang
2
, reynold cheng
1
, yongbin li
2
1
the university of hong kong
2
damo academy, alibaba group
3
tsinghua university
4
massachusetts institute of technology
5
the chinese university of hong kong (shenzhen)
6
university of illinois at urbana-champaign
jl0725@connect.hku.hk, ckcheng@cs.hku.hk
binyuan.hby@alibaba-inc.com
abstract
text-to-sql parsing, which aims at converting natural language instructions into
executablesqls,hasgainedincreasingattentioninrecentyears.inparticular,
codex and chatgpt have shown impressive results in this task.however, most of
the prevalent benchmarks, i.e., spider, and wikisql, focus on database schema
with few rows of database contents leaving the gap between academic studyand
real-world applications.to mitigate this gap, we presentbird,a big benchmark
for large-scale database grounded in text-to-sqltasks, containing 12,751 pairs
oftext-to-sqldataand95databaseswithatotalsizeof33.4gb,spanning
37professionaldomains.ouremphasisondatabasevalueshighlightsthenew
challenges of dirty database contents, external knowledge between nl questions
and database contents, and sql efficiency, particularly in the context of massive
databases.tosolvethese problems,text-to-sqlmodelsmustfeaturedatabase
value comprehensioninaddition tosemanticparsing.the experimentalresults
demonstratethesignificanceofdatabasevaluesingeneratingaccuratetext-to-sqls
for big databases.furthermore, even themost popular and effectivetext-to-sql
models, i.e.chatgpt, only achieve 40.08%in execution accuracy, which is still far
from the human result of 92.96%, proving that challenges still stand.besides, we
also provide an efficiency analysis to offer insights into generating text-to-efficient-
sqlsthatarebeneficialto industries.webelievethatbirdwillcontributeto
advancingreal-world applications of text-to-sql research.the leaderboard and
source code are available:https://bird-bench.github.io/.
1introduction
text-to-sqlparsing [
57
,
52
,
53
,
2
,
54
,
37
], whichfocuses ontransforming naturallanguage intosql
queries, has attracted significant research interests from both academia and industry.this attention
stemsfromitspotentialto empowernon-expertdataanalystsinautomatically extractingdesired
information from ubiquitous relational databases using natural language.recent advances in neural
equal contribution.
work done during an intern at alibaba damo academy.
corresponding authors.
arxiv:2305.03111v2 [cs.cl] 30 may 2023
external knowledge reasoning
large and realistic database values
sqlexecutionefficiency
what is the winning rate of boston celtics in 2000?
selectcount(won) / ((count(won)count(lose))
from teams where team_name = boston celtics
and year = 2000;
external knowledge:
winning rate = # won / (# won # lose)
what is the average salary of the worst performing managers?
selectavg(cast(replace(substr(t1.salary, 4), ',', '') asreal)) from
last_name
milgrom
… …
em_id
0000
… …
employees
us$57,500.00
first_name
salary
2222
6543
adams
wood
milgrom
sandy
emily
… …
us$19,500.00
us$69,000.00
… …
reasoned database:
employee ast1 joinposition ast2 ont1.positionid =t2.positionid
wheret1.performance = 'poor' andt2.positiontitle = 'manager'
among the coaches who have served more than 2 nba teams, during
which coach‘s period of coaching, a team has the least numbers of
games lost in the post-season games?
select coachid from coaches where lgid='nba’ and post_wins !=0
sql
1
: normal semantic parser
and post_losses !=0 and coachid in
(select coachid from coaches where lgid='nba’ group by coachid
having count(tmid)>=2) order by post_losses asc limit 1 ;
run time: 22.4s
what is the average salary of the worst performing managers?
selectavg(cast(replace(substr(t1.salary, 4), ',', '') asreal)) from
last_name
milgrom
… …
em_id
0000
… …
employees
us$57,500.00
first_name
salary
2222
6543
adams
wood
santa
sandy
emily
… …
us$19,500.00
us$69,000.00
… …
reasoned database:
employee ast1 joinposition ast2 ont1.positionid =t2.positionid
wheret1.performance = 'poor' andt2.positiontitle = 'manager'
sql
2
: efficient semantic parser
select coachid from coaches where lgid=‘nba’ and post_wins !=0
and post_losses !=0 and exists (select 1 from coaches as coaches1
where (coaches1.lgid=‘nba’) and (coaches.coachid=coaches1.coachid)
group by coaches1.coachidhaving count(coaches1.tmid) >= 2
order by null ) order by coaches.post_lossesasc limit 1
run time: 4.0s
(a).
(b).
(c).
how many accounts are eligible for loans in new yo rk city?
the condition of loans is that
thetype of the account should
be “owner”.
selectcount(*) from account where account.type
= ‘owner’ and disp_id = ‘ny’;
external knowl edge:
list account id who chooses weekly issue issuance statement?
‘poplatek tydne’ stands
for weekly issuance.
select account_id from account where account.frequency
= ‘poplatek tydne;
external knowl edge:
how many accounts are eligible for loans in new yo rk city?
the condition of loans is that
thetype of the account should
be “owner”.
selectcount(*) from account where account.type
= ‘owner’ and disp_id = ‘ny’;
external knowledge:
list account id who chooses weekly issue issuance statement?
‘poplatek tydne’ stands
for weekly issuance.
select account_id from account where account.frequency
= ‘poplatek tydne;
external knowledge:
what is the average salaryof the worst performing managers?
figure 1:examples ofchallenges in our bird benchmark.1) databases containvalues of noisy data
types[
14
,
24
,
19
,
32
].inthe leftexample,the average salarycouldbe fetchedbyprocessing the
data type from string (
text
in sqlite) to float (
real
in sqlite) after deleting the special tokens,
"us$"
and
","
.2) external knowledge and reasoning are required.in the middle example, models
must handle that only
"owner"
accounts are eligible for loans.3) query execution efficiency needs
to be considered.in the right example, the adoption of more efficient sql queries leads to significant
gains in speed, which is of great value in industries.
models, including those basedon large language models (llms),have ledto impressive performance
on existing benchmarks such as spider [
55
] and wikisql [
60
].for instance, the execution accuracy
of the top-performing model in spider leaderboardhas increased from 53.5% [
61
] to 85.3% [
35
] over
the past three years.thelatest sota parser [
35
] in spider benefits from the powerful understanding
and coding capabilities of the large language model (llm), and such excellent performance leads us
to ask a question:canllmalreadyserveasadatabaseinterface?
theanswerisno,asshowninfigure.1, wediscoveredthatcurrentstate-of-the-artmodels stillstruggle
togeneralizetomorerealisticsituationscharacterizedbylargedatabasesizesandnoisycontent.
besides,themysterieshiddenbehindthehugedatabasevaluesrequireexternalknowledgeand
reasoning to reveal.furthermore, existing benchmarks do not account for sql execution efficiency,
whichholds significantpracticalimportanceinreal-lifeapplications,notablyinthecaseoflarge
databases.motivated by these observations, we aimto develop a new text-to-sql benchmark that
better represents real-life scenarios and narrows the gap between experimental and practical settings.
in this work, we propose bird, a big bench for large-scale database grounded in text-to-sqls
for real-world applications.bird contains complex 12,751 examples of querying information over
95 big databaseswithatotalsizeof 33.4gbspanning37professionaldomains.for training, we
collected 80open-sourcerelational databases fromreal analysis platforms(kaggle, relation.vit);
forevaluation,wecurated15additionalrelationaldatabases.giventhesedatabases,werelyon
crowdsourcingtocollectnaturallanguageinstructionsandthecorrespondingsqls.first,our
databaseexperts createadescriptionfileexplainingallcolumnnames,abbreviatedvalues,value
types, and externalknowledge for each database to helpannotators better understand the database
contents.thenwehireandtrainnativespeakerstoaskquestionsfacingthesedatabasesonone
side; on theotherside,asqlannotationteamconsistingofdataengineersanddatabase students
is recruited to generate sqls to answer questions.to accommodate efficiency, we propose a new
metric valid efficiency score (ves) to evaluate the efficiency of generated sqls in addition to the
standard execution accuracy.to the best of our knowledge, bird is the first text-to-sql benchmark
to incorporate efficiency, promoting more efficient query methods within the context of massive and
noisy database contents.
we evaluate theperformance of state-of-the-arttext-to-sql parsers usingtwo popularmethodologies:
fine-tuning with t5 [
38
], and in-context learning with large language models (llms) such as codex
[
6
](
code-davinci-002
)andchatgpt[
33
](
gpt-3.5-turbo
).ourexperimentalresults
revealthatthecurrentmodelsstruggletogeneralizewell.specifically,thespidersotamodel,
which depends solelyon the database schema,achieves executionaccuracies of only 25.88%and
28.95% on the development andtest sets, respectively.in comparison, the performance still lags far
behind human performance,which we also provide inthis benchmark.we encourage furtherresearch
to address the more realistic settings presented in this benchmark.
2
of 20
免费下载
【米乐app官网下载的版权声明】本文为墨天轮用户原创内容,转载时必须标注文档的来源(墨天轮),文档链接,文档作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论

关注
最新上传
暂无内容,敬请期待...
下载排行榜
top250 周榜 月榜
网站地图