SQLite 版本引发的 Python 程序调用问题 - 好文

<>问题

在跑 OpenStack functional 功能测试的时候有两个用例过不去。

*
nova.tests.functional.db.test_resource_provider.ResourceClassTestCase.test_create_duplicate_id_retry
*
nova.tests.functional.db.test_resource_provider.ResourceClassTestCase.test_create_duplicate_id_retry_failing
调试定位到问题代码：
# /opt/stack/queens/nova/nova/objects/resource_provider.py def create(self): if
'id' in self: raise exception.ObjectActionError(action='create', reason=
'already created') if 'name' not in self: raise exception.ObjectActionError(
action='create', reason='name is required') if self.name in fields.ResourceClass
.STANDARD: raise exception.ResourceClassExists(resource_class=self.name) if not
self.name.startswith(fields.ResourceClass.CUSTOM_NAMESPACE): raise exception.
ObjectActionError( action='create', reason='name must start with ' + fields.
ResourceClass.CUSTOM_NAMESPACE) updates = self.obj_get_changes() # There is the
possibility of a race when adding resource classes, as # the ID is generated
locally. This loop catches that exception, and # retries until either it
succeeds, or a different exception is # encountered. retries = self.
RESOURCE_CREATE_RETRY_COUNTwhile retries: retries -= 1 try: rc = self.
_create_in_db(self._context, updates) self._from_db_object(self._context, self,
rc) break except db_exc.DBDuplicateEntry as e: # NOTE: e.columns 为空，所以直接出发后续的异常
if 'id' in e.columns: # Race condition for ID creation; try again continue #
The duplication is on the other unique column, 'name'. So do # not retry; raise
the exception immediately. raise exception.ResourceClassExists(resource_class=
self.name) else: # We have no idea how common it will be in practice for the
retry # limit to be exceeded. We set it high in the hope that we never # hit
this point, but added this log message so we know that this # specific
situation occurred. LOG.warning("Exceeded retry limit on ID generation while "
"creating ResourceClass %(name)s", {'name': self.name}) msg = _("creating
resource class %s") % self.name raise exception.MaxDBRetriesExceeded(action=msg)
继续看 db_exc.DBDuplicateEntry 的实现：
# /usr/lib/python2.7/site-packages/oslo_db/exception.py class DBDuplicateEntry(
DBError): """Duplicate entry at unique column error. Raised when made an
attempt to write to a unique column the same entry as existing one. :attr:
`columns` available on an instance of the exception and could be used at error
handling:: try: instance_type_ref.save() except DBDuplicateEntry as e: if
'colname' in e.columns: # Handle error. :kwarg columns: a list of unique
columns have been attempted to write a duplicate entry. :type columns: list
:kwarg value: a value which has been attempted to write. The value will be
None, if we can't extract it for a particular database backend. Only MySQL and
PostgreSQL 9.x are supported right now. """ def __init__(self, columns=None,
inner_exception=None, value=None): # 正常情况下，触发 DBDuplicateEntry 会将冲突的 columns
返回，让开发者得以方便的作出进一步判断 self.columns = columns or [] self.value = value super(
DBDuplicateEntry, self).__init__(inner_exception)
定位到生成冲突 columns 的地方：
#
/opt/stack/queens/nova/.tox/functional/lib/python2.7/site-packages/oslo_db/sqlalchemy/exc_filters.py
@filters("sqlite", sqla_exc.IntegrityError, (r
"^.*columns?(?P<columns>[^)]+)(is|are)\s+not\s+unique$", r
"^.*UNIQUE\s+constraint\s+failed:\s+(?P<columns>.+)$", r
"^.*PRIMARY\s+KEY\s+must\s+be\s+unique.*$")) def _sqlite_dupe_key_error(
integrity_error, match, engine_name, is_disconnect): """Filter for SQLite
duplicate key error. note(boris-42): In current versions of DB backends unique
constraint violation messages follow the structure: sqlite: 1 column -
(IntegrityError) column c1 is not unique N columns - (IntegrityError) column
c1, c2, ..., N are not unique sqlite since 3.7.16: 1 column - (IntegrityError)
UNIQUE constraint failed: tbl.k1 N columns - (IntegrityError) UNIQUE constraint
failed: tbl.k1, tbl.k2 sqlite since 3.8.2: (IntegrityError) PRIMARY KEY must be
unique """ columns = [] # NOTE(ochuprykov): We can get here by last filter in
which there are no # groups. Trying to access the substring that matched by #
the group will lead to IndexError. In this case just # pass empty list to
exception.DBDuplicateEntry try: columns = match.group('columns') columns = [c.
split('.')[-1] for c in columns.strip().split(", ")] except IndexError: pass
raise exception.DBDuplicateEntry(columns, integrity_error)
没有生产冲突 columns 的原因是：「底层 db engine 返回的 string match 不符合上述的匹配规范」。e.g.
2013-05-20 错误：('(sqlite3.IntegrityError) PRIMARY KEY must be unique',)
2019-04-16 正确：('(sqlite3.IntegrityError) UNIQUE constraint failed:
resource_classes.id',)
这是一个 SQLite3 版本不匹配导致的问题，但在 Nova 项目中却没有明确的指定 SQLite3 的版本，所以只能手动的修复这一问题。

<>解决

手动编译升级 SQLite3 的版本：
wget https://www.sqlite.org/2019/sqlite-autoconf-3280000.tar.gz tar -xvf
sqlite-autoconf-3280000.tar.gz cd sqlite-autoconf-3280000 mkdir /opt/sqlite3
./configure --prefix=/opt/sqlite3 make && make install
升级完 SQLite3 依旧没有直接解决上述问题，这里主要涉及到一个 Python 如何调用 C so 库的问题，这也是解决这个问题的精髓所在。

* 首先我们找到 SQLite3 Python 客户端（API）的位置 $ python -c "import sqlite3;
print(sqlite3.__file__)" /usr/lib64/python2.7/sqlite3/__init__.pyc
* 查看 SQLite3 API 实现并找到 so 库导入语句 # /usr/lib64/python2.7/sqlite3/dbapi2.py from
_sqlite3 import *
* 查找 _sqlite3 so 库的位置 $ python -c 'import _sqlite3; print(_sqlite3)' <module
'_sqlite3' from
'/opt/stack/queens/nova/.tox/functional/lib64/python2.7/lib-dynload/_sqlite3.so'>
* 查看 _sqlite3 so 库内含的动态函数库 $ ldd
/opt/stack/queens/nova/.tox/functional/lib64/python2.7/lib-dynload/_sqlite3.so
linux-vdso.so.1 => (0x00007ffc4defb000) libsqlite3.so.0 =>
/lib64/libsqlite3.so.0 (0x00007f708ba42000) libpython2.7.so.1.0 =>
/lib64/libpython2.7.so.1.0 (0x00007f708b676000) libpthread.so.0 =>
/lib64/libpthread.so.0 (0x00007f708b45a000) libc.so.6 => /lib64/libc.so.6
(0x00007f708b08d000) libz.so.1 => /lib64/libz.so.1 (0x00007f708ae77000)
libm.so.6 => /lib64/libm.so.6 (0x00007f708ab75000) libdl.so.2 =>
/lib64/libdl.so.2 (0x00007f708a971000) libutil.so.1 => /lib64/libutil.so.1
(0x00007f708a76e000) /lib64/ld-linux-x86-64.so.2 (0x00007f708bf62000)
* 凭直觉，我们首先关注 libsqlite3.so.0 函数库 $ ls -alh /lib64/libsqlite3.so.0 lrwxrwxrwx.
1 root root 19 May 14 05:13 /lib64/libsqlite3.so.0 -> libsqlite3.so.0.8.6 $ ls
-alh /lib64/libsqlite3.so.0.8.6 -rwxr-xr-x. 1 root root 5.1M Jun 4 05:51
/lib64/libsqlite3.so.0.8.6
至此，我们可以想到之所以升级了 SQLite3 的版本但依旧没有解决问题的原因是「Python 程序中调用的动态函数库依旧没有被更新」
。所以我们只需要使用新安装的 so 文件替换掉就的就可以解决了。
mv /usr/lib64/libsqlite3.so.0.8.6 /usr/lib64/libsqlite3.so.0.8.6.bk cp
/opt/sqlite3/lib/libsqlite3.so.0.8.6 /usr/lib64/libsqlite3.so.0.8.6
<>最后

最后贴上 SQLite3 的修改 commit：
This issue is involved this commit, and introduced by version-3.8.2 ... commit
eb743f01b125bebd8736ceb2873b69f27721b0ae Author: D. Richard Hipp
<[email protected]> Date: Tue Nov 5 13:33:55 2013 +0000 Standardize the error
messages generated by constraint failures to a format of "$TYPE constraint
failed: $DETAIL". This involves many changes to the expected output of test
cases. ...
解决这一问题主要的收获是 Python 程序和 C 程序之间的调用关键，如果两者之间并非是通过 TCP 协议来通信，而是通过 so
库文件来调用的话，那么我们需要注意 C 程序在 Linux 操作系统上的文件安装方式。并非单纯的升级了 C 程序就会立马在 Python
程序上生效，还要注意两者之间的桥梁（调用库文件）是否也一同升级了。

热门工具换一换