Elasticsearch:自定义得分
Elasticsearch的自定义打分策略。
bool
bool查询的语法就是query关键字下嵌套一个
bool
。
bool query的逻辑比较简单:
- must/should:其下的查询分纳入总分;
- filter/must_not:其下的查询分重置为0;
所以bool的总分就是所有must
和should
下的子查询的分值加和。
The bool query takes a more-matches-is-better approach, so the score from each matching must or should clause will be added together to provide the final
_score
for each document.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
POST _search
{
"query": {
"bool" : {
"must" : {
"term" : { "user.id" : "kimchy" }
},
"filter": {
"term" : { "tags" : "production" }
},
"must_not" : {
"range" : {
"age" : { "gte" : 10, "lte" : 20 }
}
},
"should" : [
{ "term" : { "tags" : "env1" } },
{ "term" : { "tags" : "deployed" } }
],
"minimum_should_match" : 1,
"boost" : 1.0
}
}
}
nested
nested查询的语法就是query关键字下嵌套一个
nested
。
先创建一个含有nested field的索引:一个manager下有多个employee,大家都只有name和age
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
PUT nested-properties
{
"mappings": {
"dynamic": "strict",
"properties": {
"first": {
"type": "text"
},
"second": {
"type": "text"
},
"manager": {
"properties": {
"age": { "type": "integer" },
"name": { "type": "text" }
}
},
"employees": {
"type": "nested",
"properties": {
"age": { "type": "integer" },
"name": { "type": "text" }
}
}
}
}
}
放三条数据:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
PUT nested-properties/_doc/1
{
"first": "hello",
"second": "world",
"manager": {
"name": "Alice White",
"age": 30
},
"employees": [
{
"name": "John Smith",
"age": 34
},
{
"name": "Peter Brown",
"age": 26
}
]
}
# object类型可以双重赋值?只是因为作为数组存储了
PUT nested-properties/_doc/2
{
"first": "hello",
"second": "kugou",
"manager": {
"name": "Henry White",
"age": 30
},
"manager.name": "Bob White",
"manager.age": 40,
"employees": [
{
"name": "John Smith",
"age": 34
},
{
"name": "Peter Brown",
"age": 26
}
]
}
# update
POST nested-properties/_update/1
{
"doc": {
"employees": [
{
"name": "liuhaibo",
"age": 12345
},
{
"name": "pikachu",
"age": 1234
},
{
"name": "pikachu",
"age": 4321
},
{
"name": "pikachu",
"age": 1111
}
]
}
}
PUT nested-properties/_doc/3
{
"first": "hello",
"second": "kugou",
"manager": {
"name": "Henry White",
"age": 30
},
"manager.name": "Bob White",
"manager.age": 40,
"employees": [
{
"name": "Liuhaibo",
"age": 12345
},
{
"name": "pikachu",
"age": 1234
}
]
}
nested query的计分本身比较简单:查询到子文档后,使用score_mode
把子文档的得分(默认是avg
,求平均)汇总为父文档得分即可。
parent-child
父子文档有三种查询方式:
根据子文档查询父文档(类似nested query)的语法就是query关键字下嵌套一个
has_child
;根据父文档查询子文档的语法就是query关键字下嵌套一个
nested
;查询一个父文档的所有子文档的语法就是query关键字下嵌套一个
parent_id
。
has_child
和nested
类似,所以都使用score_mode
将子文档得分叠加为父文档得分,不同的是它的默认值为none
,默认不拿子文档给父文档计分。
has_parent
使用score
将父文档得分叠加为子文档得分,默认为false
,不计分。
function score
function_score查询的语法就是query关键字下嵌套一个
function_score
。
至此,所有上述的单独查询的计分规则都很简单。但是,如果我们想自定义查询的得分,多半需要使用function score查询来对得分进行多重修改,事情就变得复杂起来。
function score查询本身稍显复杂:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
GET /_search
{
"query": {
"function_score": {
"query": { "match_all": {} },
"boost": "5",
"functions": [
{
"filter": { "match": { "test": "bar" } },
"random_score": {},
"weight": 23
},
{
"filter": { "match": { "test": "cat" } },
"weight": 42
}
],
"max_boost": 42,
"score_mode": "max",
"boost_mode": "multiply",
"min_score": 42
}
}
}
它主要由两部分组成,所以得分主要来源于两部分:
- query的得分:function_score下可以指定一个query,代表本次打分涉及到的文档;
- functions:函数得分。可以指定不止一个function,使用函数进行额外计分。
- 每个函数也可以指定一个filter,代表打分的文档范围,可以和上边的query查询范围不同。因为是filter,所以不计分;
- 具体打分函数有多种,可以实现相对复杂的公式;
- 每个函数的得分可以通过
weight
独立加权;
最后要考虑的是怎么把这两部分得分合并起来:
score_mode
:用于合并函数之间的得分,默认是multiply
。函数总得分可以使用max_boost
修正一下,作为得分上限;boost_mode
:用于合并query和函数总得分之间的得分,默认是multiply
。最终总得分可以通过min_score
限制一下,低于该得分的文档被抛弃;
听起来已经很复杂了,但是如果再叠加上nested查询或parent-child查询,理解起来会更麻烦,有一种无限套娃的感觉。
和nested结合
以上面的nested索引为例。当前共有三条文档:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
{
"took": 0,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 3,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": "nested-properties",
"_id": "2",
"_score": 1,
"_source": {
"first": "hello",
"second": "kugou",
"manager": {
"name": "Henry White",
"age": 30
},
"manager.name": "Bob White",
"manager.age": 40,
"employees": [
{
"name": "John Smith",
"age": 34
},
{
"name": "Peter Brown",
"age": 26
}
]
}
},
{
"_index": "nested-properties",
"_id": "1",
"_score": 1,
"_source": {
"first": "hello",
"second": "world",
"manager": {
"name": "Alice White",
"age": 30
},
"employees": [
{
"name": "liuhaibo",
"age": 12345
},
{
"name": "pikachu",
"age": 1234
},
{
"name": "pikachu",
"age": 4321
},
{
"name": "pikachu",
"age": 1111
}
]
}
},
{
"_index": "nested-properties",
"_id": "3",
"_score": 1,
"_source": {
"first": "hello",
"second": "kugou",
"manager": {
"name": "Henry White",
"age": 30
},
"manager.name": "Bob White",
"manager.age": 40,
"employees": [
{
"name": "Liuhaibo",
"age": 12345
},
{
"name": "pikachu",
"age": 1234
}
]
}
}
]
}
}
我们要实现如下自定义打分:
- 找到manager名为alice的文档;
- 选择它下面名为pikachu的employee;
- 所有的employee以age作为打分依据,但是要经过一些函数变换;
- manager也以age作为打分依据,也要经过一些函数变换;
- 最终合并manager和employee的得分,作为文档总得分;
因为得分点过多,为了搞清楚每一个分值来自于哪里,可以使用explain API查看具体某一个文档的详细得分信息。
function_score + nested
先看怎么给nested文档打分。
文档1有三个pikachu嵌套子文档,age分别为1234、4321、1111。
查到一个二者结合进行打分的回答。但是如果这么组合,会发现得分很奇怪:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
GET nested-properties/_explain/1
{
"query": {
"function_score": {
"query": {
"nested": {
"path": "employees",
"query": {
"bool": {
"must": [
{
"match": {
"employees.name": "pikachu"
}
}
]
}
}
}
},
"functions": [
{
"field_value_factor": {
"field": "employee.age",
"factor": 1.1,
"modifier": "square",
"missing": 1
}
}
]
}
}
}
解释详情如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
{
"_index": "nested-properties",
"_id": "1",
"matched": true,
"explanation": {
"value": 0.9134444,
"description": "sum of:",
"details": [
{
"value": 0.91344434,
"description": "function score, product of:",
"details": [
{
"value": 0.7549127,
"description": "Score based on 3 child docs in range from 3 to 6, best match:",
"details": [
{
"value": 0.7549127,
"description": "weight(employees.name:pikachu in 1) [PerFieldSimilarity], result of:",
"details": [
{
"value": 0.7549127,
"description": "score(freq=1.0), computed as boost * idf * tf from:",
"details": [
{
"value": 2.2,
"description": "boost",
"details": []
},
{
"value": 0.6931472,
"description": "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
"details": [
{
"value": 4,
"description": "n, number of documents containing term",
"details": []
},
{
"value": 8,
"description": "N, total number of documents with field",
"details": []
}
]
},
{
"value": 0.49504948,
"description": "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
"details": [
{
"value": 1,
"description": "freq, occurrences of term within document",
"details": []
},
{
"value": 1.2,
"description": "k1, term saturation parameter",
"details": []
},
{
"value": 0.75,
"description": "b, length normalization parameter",
"details": []
},
{
"value": 1,
"description": "dl, length of field",
"details": []
},
{
"value": 1.25,
"description": "avgdl, average length of field",
"details": []
}
]
}
]
}
]
}
]
},
{
"value": 1.21,
"description": "min of:",
"details": [
{
"value": 1.21,
"description": "field value function: square(doc['employee.age'].value?:1.0 * factor=1.1)",
"details": []
},
{
"value": 3.4028235e+38,
"description": "maxBoost",
"details": []
}
]
}
]
},
{
"value": 0,
"description": "match on required clause, product of:",
"details": [
{
"value": 0,
"description": "# clause",
"details": []
},
{
"value": 1,
"description": "FieldExistsQuery [field=_primary_term]",
"details": []
}
]
}
]
}
}
function score得分为0.91344434,主要由两部分组成:
- 第一部分是bool里must下使用term query查询pikachu,所以term的得分(tfidf)保留了下来,一共0.7549127;
- 第二部分得分为1.21,打分公式为
none(doc['employee.age'].value?:1.0 * factor=1.0)
=1.21;
第二部分为什么是1.21?employee.age是有值的,但是似乎没有用到。很明显,1.21=(1*1.1)^2,所以没有用employee.age的实际值,而是用了默认值1(missing=1)。为什么?
看到这个回答才发现,上面的回答有误,查询的时候把嵌套层级搞错了。function score内部的query为nested查询,也就是说function score查询在nested查询外面,在nested查询外面是不可能访问到nested属性的。所以上面的查询实际是在查完nested子文档得分之后,再拿父文档的employee.age计算的分,但是父文档只有manager.age,哪来的employee.age?因此使用了missing value=1。
调一下两个查询的嵌套层级,整个function_score查询放到nested的query里,在nested query里面才能用里面的field打分:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
GET nested-properties/_explain/1
{
"query": {
"nested": {
"path": "employees",
"query": {
"function_score": {
"query": {
"bool": {
"filter": {
"match": {
"employees.name": "pikachu"
}
}
}
},
"functions": [
{
"field_value_factor": {
"field": "employees.age",
"factor": 1,
"modifier": "none",
"missing": 1
}
}
]
}
}
}
}
}
因为不关心tfidf的得分,只是想做个过滤,所以这次使用了filter。同时employee.age的打分规则也简化一下,不乘以1.1再平方了,就是用原始值。
结果答案更奇怪了,零分:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
{
"_index": "nested-properties",
"_id": "1",
"matched": true,
"explanation": {
"value": 0,
"description": "Score based on 3 child docs in range from 3 to 6, best match:",
"details": [
{
"value": 0,
"description": "sum of:",
"details": [
{
"value": 0,
"description": "function score, product of:",
"details": [
{
"value": 0,
"description": "ConstantScore(employees.name:pikachu)^0.0",
"details": []
},
{
"value": 1234,
"description": "min of:",
"details": [
{
"value": 1234,
"description": "field value function: none(doc['employees.age'].value?:1.0 * factor=1.0)",
"details": []
},
{
"value": 3.4028235e+38,
"description": "maxBoost",
"details": []
}
]
}
]
},
{
"value": 0,
"description": "match on required clause, product of:",
"details": [
{
"value": 0,
"description": "# clause",
"details": []
},
{
"value": 1,
"description": "_nested_path:employees",
"details": []
}
]
}
]
}
]
}
}
显然,0分来自filter。仔细想想function score的计分方式:默认是query的得分乘以functions的得分。所以乘完得分也为0了。
因此应该给function score设置"boost_mode": "sum"
,求和:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
GET nested-properties/_explain/1
{
"query": {
"nested": {
"path": "employees",
"query": {
"function_score": {
"query": {
"bool": {
"filter": {
"match": {
"employees.name": "pikachu"
}
}
}
},
"functions": [
{
"field_value_factor": {
"field": "employees.age",
"factor": 1,
"modifier": "none",
"missing": 1
}
}
],
"boost_mode": "sum"
}
}
}
}
}
依然不太对:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
{
"_index": "nested-properties",
"_id": "1",
"matched": true,
"explanation": {
"value": 2222,
"description": "Score based on 3 child docs in range from 3 to 6, best match:",
"details": [
{
"value": 4321,
"description": "sum of:",
"details": [
{
"value": 4321,
"description": "sum of",
"details": [
{
"value": 0,
"description": "ConstantScore(employees.name:pikachu)^0.0",
"details": []
},
{
"value": 4321,
"description": "min of:",
"details": [
{
"value": 4321,
"description": "field value function: none(doc['employees.age'].value?:1.0 * factor=1.0)",
"details": []
},
{
"value": 3.4028235e+38,
"description": "maxBoost",
"details": []
}
]
}
]
},
{
"value": 0,
"description": "match on required clause, product of:",
"details": [
{
"value": 0,
"description": "# clause",
"details": []
},
{
"value": 1,
"description": "_nested_path:employees",
"details": []
}
]
}
]
}
]
}
}
"Score based on 3 child docs in range from 3 to 6, best match:"
说明explain api只给了其中分最高的一个子文档的得分,4321,符合预期,但是最终得分为2222,而非4321+1234+1111。再仔细看,2222是三个数的均值……想起来了,nested查询默认给所有子文档求平均得分,作为父文档的总得分。
所以给所有子文档得分求和,修改nested查询的属性,"score_mode": "sum"
:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
GET nested-properties/_explain/1
{
"query": {
"nested": {
"path": "employees",
"query": {
"function_score": {
"query": {
"bool": {
"filter": {
"match": {
"employees.name": "pikachu"
}
}
}
},
"functions": [
{
"field_value_factor": {
"field": "employees.age",
"factor": 1,
"modifier": "none",
"missing": 1
}
}
],
"boost_mode": "sum"
}
},
"score_mode": "sum"
}
}
}
这次得分对了,三个pikachu的age和叠加,值为6666。
叠加父文档得分
再叠加父属性(manager.age)怎么搞?
显然要再嵌套一层!为了防止绕晕,先把最外层的查询框架写出来:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
GET nested-properties/_search
{
"query": {
"function_score": {
"query": {
},
"functions": [
{
"field_value_factor": {
"field": "manager.age"
}
}
],
"score_mode": "sum",
"boost_mode": "sum"
}
}
}
这次把属性也先设置好:
- score_mode:functions之间的分数叠加方式;
- boost_mode:query score和function score之间的分数叠加方式;
然后把上面的子查询嵌入新的function_score的query处:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
GET nested-properties/_explain/1
{
"query": {
"function_score": {
"query": {
"nested": {
"path": "employees",
"query": {
"function_score": {
"query": {
"bool": {
"filter": {
"match": {
"employees.name": "pikachu"
}
}
}
},
"functions": [
{
"field_value_factor": {
"field": "employees.age",
"factor": 1,
"modifier": "none",
"missing": 0
}
}
],
"boost_mode": "sum"
}
},
"score_mode": "sum"
}
},
"functions": [
{
"field_value_factor": {
"field": "manager.age"
}
}
],
"score_mode": "sum",
"boost_mode": "sum"
}
}
}
得分详情:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
{
"_index": "nested-properties",
"_id": "1",
"matched": true,
"explanation": {
"value": 6696,
"description": "sum of:",
"details": [
{
"value": 6696,
"description": "sum of",
"details": [
{
"value": 6666,
"description": "Score based on 3 child docs in range from 3 to 6, best match:",
"details": [
{
"value": 4321,
"description": "sum of:",
"details": [
{
"value": 4321,
"description": "sum of",
"details": [
{
"value": 0,
"description": "ConstantScore(employees.name:pikachu)^0.0",
"details": []
},
{
"value": 4321,
"description": "min of:",
"details": [
{
"value": 4321,
"description": "field value function: none(doc['employees.age'].value?:0.0 * factor=1.0)",
"details": []
},
{
"value": 3.4028235e+38,
"description": "maxBoost",
"details": []
}
]
}
]
},
{
"value": 0,
"description": "match on required clause, product of:",
"details": [
{
"value": 0,
"description": "# clause",
"details": []
},
{
"value": 1,
"description": "_nested_path:employees",
"details": []
}
]
}
]
}
]
},
{
"value": 30,
"description": "min of:",
"details": [
{
"value": 30,
"description": "field value function: none(doc['manager.age'].value * factor=1.0)",
"details": []
},
{
"value": 3.4028235e+38,
"description": "maxBoost",
"details": []
}
]
}
]
},
{
"value": 0,
"description": "match on required clause, product of:",
"details": [
{
"value": 0,
"description": "# clause",
"details": []
},
{
"value": 1,
"description": "FieldExistsQuery [field=_primary_term]",
"details": []
}
]
}
]
}
}
只看最外层的function score:query是nested query,得分6666,functions得分30,经过sum最终为6696。
重要的是分层!
那么怎么实现一开始说的过滤manager.name=alice呢?把上面的查询放到bool查询里即可:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
GET nested-properties/_search
{
"query": {
"bool": {
"filter": [
{
"match": {
"manager.name": "alice"
}
}
],
"must": [
{
"function_score": {
"query": {
"nested": {
"path": "employees",
"query": {
"function_score": {
"query": {
"bool": {
"filter": {
"match": {
"employees.name": "pikachu"
}
}
}
},
"functions": [
{
"field_value_factor": {
"field": "employees.age",
"factor": 1,
"modifier": "none",
"missing": 0
}
}
],
"boost_mode": "sum"
}
},
"score_mode": "sum"
}
},
"functions": [
{
"field_value_factor": {
"field": "manager.age"
}
}
],
"score_mode": "sum",
"boost_mode": "sum"
}
}
]
}
}
}
相当于有两个独立的子查询,一个在过滤alice,另一个在计算每一条父文档的得分。
但是如果思路不同,还能写出其他的查询方式。比如先过滤alice,计算每一个父文档的所有子文档得分,最终再和父文档的manager.age做加权:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
GET nested-properties/_search
{
"query": {
"function_score": {
"query": {
"bool": {
"filter": [
{
"match": {
"manager.name": "alice"
}
}
],
"must": [
{
"nested": {
"path": "employees",
"query": {
"function_score": {
"query": {
"terms": {
"employees.name": [
"pikachu"
]
}
},
"functions": [
{
"field_value_factor": {
"field": "employees.age",
"factor": 1,
"modifier": "none",
"missing": 0
}
}
],
"score_mode": "sum",
"boost_mode": "multiply"
}
},
"score_mode": "sum"
}
}
]
}
},
"functions": [
{
"field_value_factor": {
"field": "manager.age"
}
}
],
"score_mode": "sum",
"boost_mode": "sum"
}
}
}
得分也是6696。仔细看看,它和第一个查询的区别在于,把内层function_score
查询的functions
函数拎出来放到了外层的function_score
里,对于得分来说是等价的,但是对应了两种不同的思路,当然两种思路的效果也是等价的。
上面的查询还修改了另外一个地方:内层function_score
的query从bool+filter改成了terms,所以query的得分从0分变成了1分,相应的,function_score
的boost_mode
从sum改为了multiply,得分不变。
terms查询和term查询不同:前者得分为1,后者得分为tfidf。
terms查询的得分为1是因为它是一个布尔查询,只要有一个匹配的项,就会将该文档返回为匹配文档,因此每个匹配项的得分都被认为是相同的,并且都为1。因为terms查询的得分仅仅反映了文档是否匹配查询条件,而不反映匹配程度的高低。因此,在terms查询中,文档的相关性得分与查询无关。
所以不想用tfidf的话,用terms不错。
和parent-child结合
道理和nested是一样的,无论叠加什么函数,重要的是分清把谁放在谁里面:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
GET /manager_employee/_search
{
"query":{
"bool":{
"filter":[
{
"term":{
"manager.platform":"platform1"
}
}
],
"must":[
{
"function_score":{
"query":{
"has_child":{
"type":"employee",
"score_mode":"sum",
"query":{
"bool":{
"should":[
{
"function_score":{
"query":{
"match_phrase":{
"employee.title.":"halo infinite"
}
},
"boost":10
}
},
{
"function_score":{
"query":{
"match_phrase":{
"employee.description.":"halo infinite"
}
},
"boost":1
}
},
{
"function_score":{
"query":{
"match_phrase":{
"employee.tags.eng":"halo infinite"
}
},
"boost":1
}
}
],
"filter":[
{
"range":{
"employee.timestamp":{
"gte":1640966400000
}
}
}
],
"minimum_should_match":1
}
}
}
},
"script_score":{
"script":"_score * Math.log(3 + doc['manager.age'].value)"
},
"score_mode":"multiply"
}
}
]
}
}
}
综上,使用function_score写查询的关键是:头脑清晰。先外后内,一层层写出每一个查询,再拼接起来。尤其是注意区分nested/has_child本身的分值叠加方式和function score查询的分值叠加方式,不要混为一谈。
翻译成Java代码
怎么把上面的查询用Java client写出来……查询本身就很复杂了,这要是再翻译成代码……
开摆,交给ChatGPT!
我有一个这样的索引:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
PUT nested-properties
{
"mappings": {
"dynamic": "strict",
"properties": {
"first": {
"type": "text"
},
"second": {
"type": "text"
},
"manager": {
"properties": {
"age": { "type": "integer" },
"name": { "type": "text" }
}
},
"employees": {
"type": "nested",
"properties": {
"age": { "type": "integer" },
"name": { "type": "text" }
}
}
}
}
}
把下面的elasticsearch查询使用Java api写,代码怎么写
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
GET nested-properties/_search
{
"query": {
"function_score": {
"query": {
"bool": {
"filter": [
{
"match": {
"manager.name": "alice"
}
}
],
"must": [
{
"nested": {
"path": "employees",
"query": {
"function_score": {
"query": {
"terms": {
"employees.name": [
"pikachu"
]
}
},
"functions": [
{
"field_value_factor": {
"field": "employees.age",
"factor": 1,
"modifier": "none",
"missing": 0
}
},
{
"field_value_factor": {
"field": "employees.age",
"factor": 1,
"modifier": "none",
"missing": 0
}
}
],
"score_mode": "sum",
"boost_mode": "multiply"
}
},
"score_mode": "sum"
}
}
]
}
},
"functions": [
{
"field_value_factor": {
"field": "manager.age"
}
},
{
"field_value_factor": {
"field": "manager.age"
}
}
],
"score_mode": "sum",
"boost_mode": "sum"
}
}
}
以下是使用Java API编写的等效Elasticsearch查询代码
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.index.query.BoolQueryBuilder;
import org.elasticsearch.index.query.MatchQueryBuilder;
import org.elasticsearch.index.query.NestedQueryBuilder;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.index.query.TermsQueryBuilder;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.elasticsearch.search.sort.FieldSortBuilder;
import org.elasticsearch.search.sort.SortOrder;
// 创建RestHighLevelClient实例
RestHighLevelClient client = new RestHighLevelClient();
// 构建查询请求
SearchRequest searchRequest = new SearchRequest("nested-properties");
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
// 构建function_score查询
BoolQueryBuilder queryBuilder = QueryBuilders.boolQuery();
// 添加manager.name的filter
MatchQueryBuilder managerNameQuery = QueryBuilders.matchQuery("manager.name", "alice");
queryBuilder.filter(managerNameQuery);
// 构建nested查询
NestedQueryBuilder nestedQuery = QueryBuilders.nestedQuery("employees",
QueryBuilders.functionScoreQuery(
QueryBuilders.termsQuery("employees.name", "pikachu"),
new FunctionScoreQueryBuilder.FilterFunctionBuilder[]{
new FunctionScoreQueryBuilder.FilterFunctionBuilder(
QueryBuilders.fieldValueFactorFunction("employees.age")
.factor(1)
.modifier(FunctionScoreQueryBuilder.Modifier.NONE)
.missing(0)
),
new FunctionScoreQueryBuilder.FilterFunctionBuilder(
QueryBuilders.fieldValueFactorFunction("employees.age")
.factor(1)
.modifier(FunctionScoreQueryBuilder.Modifier.NONE)
.missing(0)
)
}
)
.scoreMode(FunctionScoreQuery.ScoreMode.SUM)
.boostMode(CombineFunction.MULTIPLY)
);
queryBuilder.must(nestedQuery);
// 添加manager.age的function
queryBuilder.addFunction(
new FieldValueFactorFunctionBuilder("manager.age")
.factor(1)
.modifier(FieldValueFactorFunction.Modifier.NONE)
);
queryBuilder.addFunction(
new FieldValueFactorFunctionBuilder("manager.age")
.factor(1)
.modifier(FieldValueFactorFunction.Modifier.NONE)
);
// 设置查询条件和排序
sourceBuilder.query(queryBuilder);
sourceBuilder.sort(new FieldSortBuilder("_score").order(SortOrder.DESC));
// 将查询源添加到查询请求
searchRequest.source(sourceBuilder);
// 执行查询请求
SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
// 处理查询结果
// ...
// 关闭RestHighLevelClient连接
client.close();
请注意,您需要根据您的Elasticsearch集群配置和Java客户端版本进行适当的设置和错误处理。此代码仅作为示例,并可能需要根据您的实际环境进行调整。
不错,大致是能用的。技术确实改变了世界,至少改变了我写代码的方式。