ClickHouse函数 2.字符串函数

clickhouse

发布日期: 2021-08-13

1. 长度计算 length

-- 按照实际长度计算
SELECT length('hello world'), length('你好');
┌─length('hello world')─┬─length('你好')─┐
│                    11 │              6 │
└───────────────────────┴────────────────┘

-- lengthUTF8
SELECT lengthUTF8('hello world') ,lengthUTF8('你好');
┌─lengthUTF8('hello world')─┬─lengthUTF8('你好')─┐
│                        11 │                  2 │
└───────────────────────────┴────────────────────┘

2. isValidUTF8 检测 UTF8 编码

检查字符串是否为有效的UTF-8编码，是则返回1，否则返回0。

SELECT isValidUTF8('hello world'),isValidUTF8('你好');
┌─isValidUTF8('hello world')─┬─isValidUTF8('你好')─┐
│                          1 │                   1 │
└────────────────────────────┴─────────────────────┘

SELECT ;
SELECT toValidUTF8('\x61\xF0\x80\x80\x80b');

3. 为空判断 empty

判断字符串是否为空，空为1 ，不为空为0

select empty('hello world'),empty('');
┌─empty('hello world')─┬─empty('')─┐
│                    0 │         1 │
└──────────────────────┴───────────┘

4. 非空判断 notEmpty

判断字符串是否不为空，不为空 1，为空 0

select notEmpty('hello world'),notEmpty(''),notEmpty(NULL);
┌─notEmpty('hello world')─┬─notEmpty('')─┬─notEmpty(NULL)─┐
│                       1 │            0 │ ᴺᵁᴸᴸ           │
└─────────────────────────┴──────────────┴────────────────┘

5. 转小写 lower

字母全部小写

SELECT lower('hello WORLD'),lowerUTF8('hello WORLD');
┌─lower('hello WORLD')─┬─lowerUTF8('hello WORLD')─┐
│ hello world          │ hello world              │
└──────────────────────┴──────────────────────────┘

6. 转大写 upperUTF8

SELECT upper('hello WORLD'),upperUTF8('hello WORLD');
┌─upper('hello WORLD')─┬─upperUTF8('hello WORLD')─┐
│ HELLO WORLD          │ HELLO WORLD              │
└──────────────────────┴──────────────────────────┘

SELECT upper('abcd123--'),--字母全部大写（将字符串中的ASCII转换为大写。）
lowerUTF8('abcd123-/*\8asd-\\'), -- abcd123-/*8asd-\
upperUTF8('abcd123--'), -- ABCD123--

7. 字符反转 reverse

SELECT reverse('hello world'), reverseUTF8('hello world');
┌─reverse('hello world')─┬─reverseUTF8('hello world')─┐
│ dlrow olleh            │ dlrow olleh                │
└────────────────────────┴────────────────────────────┘

8. 字符串定义 format

SELECT format('{1} {0} {1}', 'World', 'Hello'); 
┌─format('{1} {0} {1}', 'World', 'Hello')─┐
│ Hello World Hello                       │
└─────────────────────────────────────────┘

SELECT format('{0} {0} {1} {1}', 'one', 'two'); 
┌─format('{0} {0} {1} {1}', 'one', 'two')─┐
│ one one two two                         │
└─────────────────────────────────────────┘

9. 字符串拼接 concat

SELECT concat('Hello',' ','World', '!');
┌─concat('Hello', ' ', 'World', '!')─┐
│ Hello World!                       │
└────────────────────────────────────┘

10. 字符串拼接 concatAssumeInjective

与concat相同，
区别在于，需要保证concat(s1, s2, s3) -> s4是单射的，它将用于GROUP BY 的优化。

SELECT concatAssumeInjective('Hello',' ','World', '!');
┌─concatAssumeInjective('Hello', ' ', 'World', '!')─┐
│ Hello World!                                      │
└───────────────────────────────────────────────────┘

-- 用于 group by 优化
SELECT concat(key1, key2), sum(value) FROM key_val GROUP BY concatAssumeInjective(key1, key2);

11. 字符串拼接 appendTrailingCharIfAbsent

appendTrailingCharIfAbsent(str, c)
如果 ‘ str ‘ 字符串非空并且末尾不包含 ‘ c ‘ 字符，则将 ‘ c ‘ 字符附加到末尾。

SELECT appendTrailingCharIfAbsent('hello','o'),appendTrailingCharIfAbsent('hello','y');
┌─appendTrailingCharIfAbsent('hello', 'o')─┬─appendTrailingCharIfAbsent('hello', 'y')─┐
│ hello                                    │ helloy                                   │
└──────────────────────────────────────────┴──────────────────────────────────────────┘

12. 数组合并成字符串 arrayStringConcat

SELECT arrayStringConcat(['one','two','three']); 
┌─arrayStringConcat(['one', 'two', 'three'])─┐
│ onetwothree                                │
└────────────────────────────────────────────┘

SELECT arrayStringConcat(['one','two','three'], '-');
┌─arrayStringConcat(['one', 'two', 'three'], '-')─┐
│ one-two-three                                   │
└─────────────────────────────────────────────────┘

13. 字符串截取 substring

截取指定位置字符串，
返回以 ‘ offset ‘ 位置为开头，长度为 ‘ length ‘ 的子串
‘ offset ‘ 从1开始（与标准SQL相同）
‘ offset ‘ 和 ‘ length ‘ 参数必须是常量。

SELECT substring('abcdefg', 1, 3),substring('你好世界', 1, 3);
┌─substring('abcdefg', 1, 3)─┬─substring('你好世界', 1, 3)─┐
│ abc                        │ 你                          │
└────────────────────────────┴─────────────────────────────┘

SELECT substringUTF8('abcdefg', 1, 3),substringUTF8('你好世界', 1, 3);
┌─substringUTF8('abcdefg', 1, 3)─┬─substringUTF8('你好世界', 1, 3)─┐
│ abc                            │ 你好世                          │
└────────────────────────────────┴─────────────────────────────────┘

14. 字符串截取 mid

SELECT mid('abcdefg', 1, 3),mid('你好世界', 1, 3);
┌─substring('abcdefg', 1, 3)─┬─substring('你好世界', 1, 3)─┐
│ abc                        │ 你                          │
└────────────────────────────┴─────────────────────────────┘

15. 字符串截取 substr

SELECT substr('abcdefg', 1, 3),substr('你好世界', 1, 3);
┌─substring('abcdefg', 1, 3)─┬─substring('你好世界', 1, 3)─┐
│ abc                        │ 你                          │
└────────────────────────────┴─────────────────────────────┘

16. 使用正则获取字符串 extract

extract(haystack, pattern)
使用正则表达式截取字符串。
如果 ‘ haystack ‘ 与 ‘ pattern ‘ 不匹配，则返回空字符串。

SELECT extract('HELLO2 world3', '[0-9]'),extract('HELLO world', '[0-9]');
┌─extract('HELLO2 world3', '[0-9]')─┬─extract('HELLO world', '[0-9]')─┐
│ 2                                 │                                 │
└───────────────────────────────────┴─────────────────────────────────┘

17. 使用正则获取字符串 extractAll

extract(haystack, pattern)
同extract，获取匹配的全部字符串

SELECT extractAll('HELLO2 world3', '[0-9]'),extractAll('HELLO world', '[0-9]');

┌─extractAll('HELLO2 world3', '[0-9]')─┬─extractAll('HELLO world', '[0-9]')─┐
│ ['2','3']                            │ []                                 │
└──────────────────────────────────────┴────────────────────────────────────┘

18. 字符串编码转换 convertCharset

convertCharset(s, from, to)
返回从 ‘ from ‘ 中的编码转换为 ‘ to ‘ 中的编码的字符串 ‘ s ‘ 。

SELECT convertCharset('hello', 'UTF8','Unicode');
┌─convertCharset('hello', 'UTF8', 'Unicode')─┐
│ ��hello                                   │
└────────────────────────────────────────────┘

SELECT convertCharset('hello', 'Unicode', 'UTF8');
┌─convertCharset('hello', 'Unicode', 'UTF8')─┐
│ 桥汬�                                      │
└────────────────────────────────────────────┘

SELECT convertCharset('hello', 'Unicode', 'ASCII');
┌─convertCharset('hello', 'Unicode', 'ASCII')─┐
│                                             │
└─────────────────────────────────────────────┘

SELECT convertCharset('hello', 'UTF8','UTF8');
┌─convertCharset('hello', 'UTF8', 'UTF8')─┐
│ hello                                   │
└─────────────────────────────────────────┘

19. 字符串转 base64 编码 base64Encode

SELECT base64Encode('hello world');
┌─base64Encode('hello world')─┐
│ aGVsbG8gd29ybGQ=            │
└─────────────────────────────┘

20. base64 编码转字符串 base64Decode

使用base64将字符串解码成原始字符串
如果转换失败将抛出异常

select base64Decode('aGVsbG8gd29ybGQ=');
┌─base64Decode('aGVsbG8gd29ybGQ=')─┐
│ hello world                      │
└──────────────────────────────────┘

base64 编码转字符串 tryBase64Decode

使用base64将字符串解码成原始字符串。
但如果出现错误，将返回空字符串。

select tryBase64Decode('aGVsbG8gd29ybGQ=');
┌─tryBase64Decode('aGVsbG8gd29ybGQ=')─┐
│ hello world                         │
└─────────────────────────────────────┘

select tryBase64Decode('aGVsbG8gd29ybGQ=11');
┌─tryBase64Decode('aGVsbG8gd29ybGQ=11')─┐
│                                       │
└───────────────────────────────────────┘

21. 判断指定字符开头 startsWith

startWith(s, prefix)
返回是否以指定的前缀开头。
如果字符串以指定的前缀开头，则返回1，否则返回0。

select startsWith('hello world','h'),startsWith('hello world','s');
┌─startsWith('hello world', 'h')─┬─startsWith('hello world', 's')─┐
│                              1 │                              0 │
└────────────────────────────────┴────────────────────────────────┘

22. 判断指定字符结尾 endsWith

endsWith(s, suffix)
返回是否以指定的后缀结尾。
如果字符串以指定的后缀结束，则返回1，否则返回0

SELECT endsWith('hello world', 'd'),endsWith('hello world', 's');
┌─endsWith('hello world', 'd')─┬─endsWith('hello world', 's')─┐
│                            1 │                            0 │
└──────────────────────────────┴──────────────────────────────┘

23. 删除左侧空字符 trimLeft

trimLeft(s)
返回一个字符串，用于删除左侧的空白字符


select trimLeft(' a b c ') as str , startsWith(str,'a');

┌─str────┬─startsWith(trimLeft(' a b c '), 'a')─┐
│ a b c  │                                    1 │
└────────┴──────────────────────────────────────┘

SELECT
trimLeft(' sdfdgs'), -- sdfdgs
('abcd '), -- abcd
trimBoth(' abcd '); -- abcd

24. 删除右侧空字符 trimRight

trimRight(s)
返回一个字符串，用于删除右侧的空白字符

select trimRight(' a b c ') as str , endsWith(str,'c');
┌─str────┬─endsWith(trimRight(' a b c '), 'c')─┐
│  a b c │                                   1 │
└────────┴─────────────────────────────────────┘

25. 删除两侧空字符 trimRight

trimBoth(s)
返回一个字符串，用于删除左侧和右侧的空白字符

select trimBoth(' a b c ') as str ,startsWith(str,'a'), endsWith(str,'c');
┌─str───┬─startsWith(trimBoth(' a b c '), 'a')─┬─endsWith(trimBoth(' a b c '), 'c')─┐
│ a b c │                                    1 │                                  1 │
└───────┴──────────────────────────────────────┴────────────────────────────────────┘

26. 字符串拆分 splitByChar

splitByChar(separator, string)
‘ separator ‘ 必须为仅包含一个字符的字符串常量。
‘ string ‘ 要被拆分的字符串
返回拆分后的子串的数组

-- 如果分隔符出现在字符串的开头或结尾，或者如果有多个连续的分隔符，则将在对应位置填充空的子串。
select splitByChar('l','hello world');
┌─splitByChar('l', 'hello world')─┐
│ ['he','','o wor','d']           │
└─────────────────────────────────┘

-- 当字符串中不包含 separator 时报错
select splitByChar('z','hello world');
┌─splitByChar('z', 'hello world')─┐
│ ['hello world']                 │
└─────────────────────────────────┘

27. 字符串拆分 splitByString

splitByString(separator, string)
用法与splitByChar相同
它使用多个字符的字符串作为分隔符。
该字符串必须为非空

select splitByString('l','hello world');
┌─splitByString('l', 'hello world')─┐
│ ['he','','o wor','d']             │
└───────────────────────────────────┘

select splitByString('lo','hello world');
┌─splitByString('lo', 'hello world')─┐
│ ['hel',' world']                   │
└────────────────────────────────────┘

28. 拆分连续字符 alphaTokens

alphaTokens(s)
从范围a-z和A-Z中选择连续字节的子字符串
返回子字符串数组

SELECT alphaTokens('ab1cd2ef3gh');
┌─alphaTokens('ab1cd2ef3gh')─┐
│ ['ab','cd','ef','gh']      │
└────────────────────────────┘

29. 字符串替换 replace

replace(str, pattern, replacement)
在 ‘ str ‘ 字符串中匹配所有的 ‘ pattern ‘ 字符替代成 ‘ replacement ‘ 字符

select replace('hello world', 'l', 'L') AS replace;
┌─replace─────┐
│ heLLo worLd │
└─────────────┘

30. 字符串替换 replaceAll

同 replace

select replaceAll('hello world', 'l', 'L') AS replace;
┌─replace─────┐
│ heLLo worLd │
└─────────────┘

字符串替换 replaceOne

replaceOne(haystack, pattern, replacement)
替换匹配到的字符串
用 ‘ replacement ‘ 子串替换 ‘ haystack ‘ 中与 ‘ pattern ‘ 子串第一个匹配的匹配项
‘pattern’和‘replacement’必须是常量。


select replaceOne('hello world', 'l', '-L-') AS replaceOne;
┌─replaceOne────┐
│ he-L-lo world │
└───────────────┘

replaceRegexpOne('hed1234544', '4', '*') AS replaceRegexpOne,-- hed123*544

31. 正则替换 replaceRegexpOne

-- 赋值字符串10次
SELECT replaceRegexpOne('hello world', '.*', '\\0\\0\\0') AS res;
┌─res───────────────────────────────┐
│ hello worldhello worldhello world │
└───────────────────────────────────┘

32. 正则替换 replaceRegexpAll

-- 与replaceRegexpOne相同，但会替换所有出现的匹配项。例如：
SELECT replaceRegexpAll('hello world', '.', '\\0\\0\\0') as res;
┌─res───────────────────────────────┐
│ hhheeellllllooo   wwwooorrrlllddd │
└───────────────────────────────────┘

33. 搜索字符串出现位置 pasition

pasition(str1, str2)
显示 str2 在 str1 的第一个出现的位置

SELECT POSITION('hello world', 'e');
┌─position('hello world', 'e')─┐
│                            2 │
└──────────────────────────────┘

34. 搜索字符串出现位置 positionUTF8

select positionUTF8('你好','好') AS positionUTF8;
┌─positionUTF8─┐
│            2 │
└──────────────┘

35. 不区分大小写搜索字符串出现位置 positionCaseInsensitive

select positionCaseInsensitive('hello world','ll') AS positionCaseInsensitive;
┌─positionCaseInsensitive─┐
│                       3 │
└─────────────────────────┘

36. 定位 locate

select locate('hello world','ll');
┌─position('hello world', 'll')─┐
│                             3 │
└───────────────────────────────┘

37. 模糊匹配 like

like()
注意大写敏感。
‘ % ‘表示任何字节数（包括零字符）
‘ _ ‘表示任何一个字节

select like('hello world', '%lo%'),like('hello', '_ell_');
┌─like('hello world', '%lo%')─┬─like('hello', '_ell_')─┐
│                           1 │                      1 │
└─────────────────────────────┴────────────────────────┘

38. 模糊匹配 not like

select notLike('hello world', '%lo%'),notLike('hello', '_ell_');
┌─notLike('hello world', '%lo%')─┬─notLike('hello', '_ell_')─┐
│                              0 │                         0 │
└────────────────────────────────┴───────────────────────────┘

hnbian

https://www.hnbian.cn/posts/92b75b18.html

本博客所有文章除特別声明外，均采用 CC BY 4.0 许可协议。转载请注明来源 hnbian !

olap clickhouse

ClickHouse函数汇总

2021-08-20 clickhouse

olap clickhouse

记录 Ambari 安装 Agent 异常

2021-08-12 ambari

ambari exception others

ClickHouse函数 2.字符串函数

1. 长度计算 length

2. isValidUTF8 检测 UTF8 编码

3. 为空判断 empty

4. 非空判断 notEmpty

5. 转小写 lower

6. 转大写 upperUTF8

7. 字符反转 reverse

8. 字符串定义 format

9. 字符串拼接 concat

10. 字符串拼接 concatAssumeInjective

11. 字符串拼接 appendTrailingCharIfAbsent

12. 数组合并成字符串 arrayStringConcat

13. 字符串截取 substring

14. 字符串截取 mid

15. 字符串截取 substr

16. 使用正则获取字符串 extract

17. 使用正则获取字符串 extractAll

18. 字符串编码转换 convertCharset

19. 字符串转 base64 编码 base64Encode

20. base64 编码 转 字符串 base64Decode

base64 编码 转 字符串 tryBase64Decode

21. 判断指定字符开头 startsWith

22. 判断指定字符结尾 endsWith

23. 删除左侧空字符 trimLeft

24. 删除右侧空字符 trimRight

25. 删除两侧空字符 trimRight

26. 字符串拆分 splitByChar

27. 字符串拆分 splitByString

28. 拆分连续字符 alphaTokens

29. 字符串替换 replace

30. 字符串替换 replaceAll

字符串替换 replaceOne

31. 正则替换 replaceRegexpOne

32. 正则替换 replaceRegexpAll

33. 搜索字符串出现位置 pasition

34. 搜索字符串出现位置 positionUTF8

35. 不区分大小写搜索字符串出现位置 positionCaseInsensitive

36. 定位 locate

37. 模糊匹配 like

38. 模糊匹配 not like

20. base64 编码转字符串 base64Decode

base64 编码转字符串 tryBase64Decode